EVM delves into Part 2

Introduction

In Part 1, we looked at how EVM knows which bytecode to run from the called contract function, where we looked at the call stack, calldata, function signature, and EVM opcode instructions.

In Part 2, we’ll start the memory journey by getting a complete picture of the contract’s memory and how it works on EVM.

Memory Trip

We’ll still use the sample code from Part 1 that we showed you on remix.

// SPDX-License-Identifier: GPL-3.0

pragma solidity >=0.7.0 <0.9.0;

/**
 * @title Storage
 * @dev Store & retrieve value in a variable
 */
contract Storage {

    uint256 number;

    /**
     * @dev Store value in variable
     * @param num value to store
     */
    function store(uint256 num) public {
        number = num;
    }

    /**
     * @dev Return value
     * @return value of 'number'
     */
    function retrieve() public view returns (uint256){
        return number;
    }
}

In Part 1, we looked at the parts related to feature selection based on the bytecode generated by the compiled contract. In this article, we focus on the first five bytes of the bytecode.

0.png
1.png

These 5 bytes represent the initialization of the free memory pointer operation. To fully understand the role of these bytecodes, you first need to understand the data structures that govern contract memory.

Memory data structure

Contract memory is a simple array of bytes in which the data store can store data using either 32-byte (256-bit) or 1-byte (8-bit) chunks, however, only a fixed-size 32-byte (256-bit) block can be read at a time. The following image illustrates this structure and the read/write capabilities of the contract memory.

2.png

( takenobu-hs.github.io/downloads/ethereum_evm_illustrated.pdf )

This function is determined by the 3 opcodes that operate on the memory.

  • MSTORE (x, Y) : stores a 32-byte (256-bit)“Y” value starting from the memory location“X”.
  • MLOAD (x) : loads 32 bytes (256 bits) onto the call stack starting at memory location“X”.
  • Mstore8(x, Y) : stores a 1-byte (8-bit) value“Y”(the least significant byte of the 32-byte stack value) in memory location“X”.

You can think of a memory location simply as an array index that starts writing/reading data. If you want to write/read more than 1 byte of data, simply continue to write or read from the next array index.

2、 EVM Playground

The EVM Playground helps solidify our understanding of how the 3 opcodes work, what they do, and where memory is located. Click Run and the upper-right arrow to debug to see how the stack and memory have changed. A comment above the opcode describes what each part does

3.png

In running debugging, first, when we use MSTORE8 to write a single byte of 0x22 to memory location 32(0x20) :

4.png

Memory from

5.png

Change to

6.png

You may notice something strange. I only added 1 byte. Why are there so many zeros?

Memory expansion

When the contract is written to memory, Gas is paid for the number of bytes written, which is the cost of expanding the memory. If we are writing to an area of memory that has not been written before, using it for the first time incurs additional memory expansion overhead.

Memory expands in increments of 32 bytes (256 bits) when writing to a previously untouched memory space. For the first 724 bytes, memory expansion grew linearly and then quadratic. (based on the Gas cost of expanding memory in Ethereum yellow book formula 326, the formula is:

7.png

, the cost of expanding memory for each additional word. Where A is the largest memory location written in the contract call, in 32 bytes. In the case of 1024 bytes of memory, A = 32.)

Our memory is 32 bytes before we write 1 byte at position 32. At this point we start writing to the untouched memory space. As a result, the memory is increased by 32 bytes to 64 bytes. All locations in memory are initially defined as 0, which is why we see 220000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.

4. Memory is a byte array

The second thing we might notice during debugging happens when we run MLOAD from memory location 33(0x21) . We return the following values to the call stack.

3300000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Memory reads can start with a non-32-byte element.

Memory is a byte array, which means it can be read (and written) from any memory location. We are not limited to multiples of 32. Memory is linear and can be addressed at the byte level. Memory can only be created in functions. It can be a complex type that is newly instantiated, such as an array/structure (for example, by creating a new int [ … ]) or copied from a variable that stores a reference.

Now that we have some idea of the data structure, let’s look at the free memory pointer.

Free memory pointer

A free memory pointer is simply a pointer to the beginning of free memory. It ensures that the smart contract keeps track of which memory locations have been written and which have not. This prevents the contract from overwriting some memory that has been allocated to another variable. When a variable is written to memory, the contract first refers to the free memory pointer to determine where the data should be stored. It then updates the free memory pointer by recording the amount of data to be written to the new location. A simple addition of these two values will result in the location where the new free memory begins.

The location of the free memory pointer + the byte size of the data = the location of the new free memory pointer

6. Bytecode

As we mentioned earlier, the free memory pointer is defined by the bytecode of these five opcodes at run time.

8.png

These opcodes declare that the free memory pointer is located in memory at byte 0x40(64 in decimal) and has a value of 0x80(128 in decimal) .

Solidity’s memory layout preserves four 32-byte slots:

  • 0x00 -0X3F (64bytes) : staging space, which can be used between statements, i. e. inline assembly and hash hashing methods.
  • 0x40 -0X5F (32bytes) : free memory pointer, current allocated memory size, starting position of free memory, initialized to 0x80.
  • 0x60 -0X7F (32bytes) : slot 0, which is used as the initial value of the dynamic memory array, should never be written.

As you can see, 0x40 is the predefined location of the free memory pointer. The value 0x80 is only the first memory byte that can be written after four 32-byte reserved value slots.

7. Memory in the contract

To consolidate what we’ve learned so far, let’s look at how the memory and free memory pointers are updated in the Solidity code.

We create a memory lane contract for demonstration purposes. The memoryLane () of the contract defines two arrays of length 5 and 2, and assigns a 1 of type UINT256 to B [0] .

// SPDX-License-Identifier: MIT
pragma solidity ^0.8.3;

contract MemoryLane {

    function memoryLane() public pure {
        bytes32[5] memory a;
        bytes32[2] memory b;
        b[0] = bytes32(uint256(1));
    }
}

To see the details of contract code execution in EVM, copy it into the Remix IDE to compile and deploy the contract. After calling Memorylane () , enter DeBug mode to step through the opcodes (see:

Https://remix-ide.readthedocs.io/en/latest/tutorial_debug.html ).

Extract the simplified opcode into the EVM Playground, where you can view the specific opcode and comment information (https://noxx.substack.com/p/EVM-deep-dives-the-path-to-shadowy-d6b#:~:text=version%20into%20an-,EVM%20playground,-and%20will%20run ) .

The opcode is broken into six different sections and read in turn, removing the JUMP and the memory-independent opcode and adding comments to make it easier to see what’s going on.

1) free memory pointer initialization (lines 1-15 of the EVM Playground opcode)

////////////////////////////////////////
// Free Memory Pointer Initialisation //
////////////////////////////////////////

// value to store for free memory pointer 0x80 = 128 in decimal
PUSH1 0x80
// location for free memory pointer 0x40 = 64 in decimal
PUSH1 0x40
MSTORE

// jump location (required to prevent stack underflow) 
PUSH2 0xffff

First, 0x80(128 in decimal) , which is the value specified by the Solidity memory layout, has nothing in current memory.

9.png

Next, the free memory pointer position, determined by Solidity’s memory layout, is put on the stack at 0x40(64 in decimal) .

10.png

Finally, we call MSTORE, which pops the first entry off the stack 0x40 to determine where to write in memory and takes the second value 0x80 as the write. This leaves an empty stack, but a portion is already filled into memory. Memory is represented by hexadecimal characters, where each character represents 4 bits. For example: there are 192 hexadecimal characters in memory, which means we have 96 bytes (1 byte = 8 bits = 2 hexadecimal characters) . If we look at the memory layout of Solidity, the first 64 bytes will be allocated to temporary storage, and the next 32 bytes will be used for free memory pointers.

11.png

2) memory allocation variable“A” and free memory pointer updates (lines 16-34 of the EVM Playground)

//////////////////////////////////////////////////////////////////
// Memory Allocation Varibale "a" & Free Memory Pointer Update ///
//////////////////////////////////////////////////////////////////

// load free memory pointer
PUSH1 0x40
MLOAD

// duplicate free memory pointer
DUP1
// 0xa0 = 160 in decimal, 32 * 5 = 160 first array is length 5
PUSH1 0xa0
// free memory pointer (0x80) + space for array (0xa0) = new free memory pointer
ADD
// Save this new value 0x120 to the free memory location
PUSH1 0x40
MSTORE

In the next sections, we’ll skip to the end state of each section and give a brief overview.

First, allocate the next memory for the variable“A”(bytes32[5]) and update the free memory pointer. The compiler determines how much space is required by the array size and the default array element size. The elements in the in-memory array in Solidity are multiples of 32 bytes (the same applies to bytes1[] , but not bytes and string) . The memory that currently needs to be allocated is 5 * 32 bytes, representing either 160 or 0xa0(160 in hexadecimal) . We can see it pushed onto the stack and added to the current free memory pointer 0x80(128 in decimal) to get the new free memory pointer value. This will return 0x120(288 = 128 + 160 in decimal) , and we can see that it has been written to the free memory pointer location. The call stack stores the memory location of the variable“A” on the stack 0x80 so that it can be referenced later when needed. 0xffff represents a JUMP -LRB-unconditional JUMP) location, which can be ignored because it is not related to memory operations.

12.png

3) the memory initialization variable“A”(lines 35-95 of the EVM Playground)

/////////////////////////////////////////
// Memory Initialisation Varaible "a" ///
/////////////////////////////////////////
// duplicate 0x80
DUP1
// push 0x05 = 5 in decimal (array length)
PUSH1 0x05
// Swap the top 2 items on the stack in this case 0x05 and 0x80
SWAP1
// push 0x20 = 32 in decimal (array item size)
PUSH1 0x20
// Duplicate the 3rd item on the stack in this case 0x05 to the top of the stack
DUP3
// 0x05 * 0x20 = 5 * 32 in decmial = 160 on top of the stack (size of array in bytes)
MUL
// Duplicate 0xa0 = 160 in decimal
DUP1
// Returns size of calldata in bytes currently just function signature = 0x04 or 4 in decmial
CALLDATASIZE
// duplicate 4th item on stack (0x80)
DUP4
// 0x80 (byte offset in the memory where the result will be copied.), 0x04 (byte offset in the calldata to copy.), 0xa0 (byte size to copy.) this offsets the 4 bytes in our call data with a size of 0xa0 which yeild a 160 bit set of 0's to be stored at the free memory pointer location
// this effectively initialises our array in memory 
CALLDATACOPY
// The remaining lines in this section manipulate the stack to ensure we have the memory location of variable "a" and removes any items that are no longer needed
// duplicate 0xa0
DUP1
// duplicate 0x80
DUP3
// new free memory pointer as before
ADD
// swap 1st (0x120) item on the stack and 3rd (0x80)
SWAP2
// pop top item off stack (0x80)
POP
// pop top item off stack (0xa0)
POP
// Swap top 2 items 0x120 & 0x05
SWAP1
// pop top item off stack (0x05)
POP
// pop top item off stack (0x120)
POP
// swap top 2 items 0x80 & 0xb6 (jump location)
SWAP1
// simulating a JUMP remove the top item off stack with POP
POP
// Simulated jump location
PUSH2 0xffff
// Simulated jump location
PUSH2 0xffff
// simulating a JUMP, remove the top item off stack with POP
POP

Now that the memory has been allocated and the free memory pointer has been updated, you need to initialize the memory space for the variable“A”. Since the variable is declared and not assigned, it will be initialized to a zero value.

EVM operates by using the CALLDATACOPY (copy message data) opcode, where there are three variables.

  • Memoryoffset/DESTOFFSET (the memory location to which the data is copied)
  • Calldataboffset/offset (the byte offset in the calldata to be copied)
  • Size/length (the size of bytes to copy)

Expression:

Memory[ destOffset: destOffset + length ] = msg.data [offset: offset + length]

In this example, memoryOffset (DESTOFFSET) is the memory location of the variable“A”(0x80) . Calldataboffset (offset) is the size of the actual calldata because no calldata needs to be copied, so the initialization memory is zero. Finally, the variable passed in is 0xa0(160 in decimal) .

We can see that our memory has expanded to 288 bytes (this includes slot 0) , and the call stack once again holds the memory location of the variable and the JUMP address on the stack.

13.png

4) memory allocation variable“B” and free memory pointer update (lines 96-112 of the EVM Playground)

/////////////////////////////////////////////////////////////////
// Memory Allocation Varibale "b" & Free Memory Pointer Update //
/////////////////////////////////////////////////////////////////

// free memory pointer load in 
PUSH1 0x40
MLOAD
// duplicate free memory pointer (0x120)
DUP1
// 0x40 = 64 in decimal, 32 * 2 = 64 second array is length 2
PUSH1 0x40
// free memory pointer (0x120) + space for array (0x40) = new free memory pointer
ADD
// save new free memory pointer value at free memory location 0x40
PUSH1 0x40
MSTORE

This is the same as the memory allocation for variable“A” and the free memory pointer update, only this time for“BYTES32[2] memory B”. The memory pointer is updated to 0x160(352 in decimal) , which is equal to the previous free memory pointer of 288 plus the size of the new variable of 64(in bytes 64) . The free memory pointer has been updated to 0x160 in memory, so you now have the memory location of the variable“B”(0x120) on the stack.

14.png

5) the memory initialization variable“B”(lines 113-162 of the EVM Playground)

////////////////////////////////////////
// Memory Initialisation Variable "b" //
////////////////////////////////////////
// duplicate 0x120 (memory start location for variable "b")
DUP1
// 0x02 = 2 in decimal = array length
PUSH1 0x02
// swap top 2 items 0x02 & 0x120
SWAP1
// 0x20 = 32 in decimal (array item size in bytes)
PUSH1 0x20
// duplicate 3rd item on the stack 0x02
DUP3
// 0x02 * 0x20 = 0x40 = 64 (amount of bytes in memory to initialise)
MUL
// duplicate 0x40 (free memory pointer location)
DUP1
// same as before 4 bytes for function signature 0x04
CALLDATASIZE
// duplicate 4th item on the stack = 0x120
DUP4
// 0x120 (byte offset in the memory where the result will be copied.), 0x04 (byte offset in the calldata to copy.), 0x40 (byte size to copy.)
CALLDATACOPY
// The remaining lines in this section manipulate the stack to ensure we have the memory location of variable "a" and removes any items that are no longer needed
//duplicate the top of the stack 0x40
DUP1
// duplicate 3rd item on the stack 0x120
DUP3
// add together yields free memory pointer value
ADD
// swap 0x160 & 0x120
SWAP2
// pop top item off stack (0x120)
POP
// pop top item off stack (0x40)
POP
// swap 0x160 & 0x02
SWAP1
// pop top item off stack (0x02)
POP
// pop top item off stack (0x160)
POP
// jump location to top of the stack 0xbe
SWAP1
// simulate jump pop jump location off stack
POP

The same memory initialization as the variable“A”. Now the memory has been increased to 352 bytes, and the memory location of the two variables is still stored on the stack.

15.png

6) B [0] assignment (lines 163-207 of the EVM Playground)

//////////////////////////
// Assign Value to b[0] //
//////////////////////////
// push 0x01, value to add b[0]
PUSH1 0x01
// push 0x00
PUSH1 0x00
// left shift operation no shift, first input is 0 
SHL
// duplicate 2nd item on stack (0x120)
DUP2
// push 0x00 = [0] where in the array should this item go
PUSH1 0x00
// push 0x20 = 64 bytes the length of the array 
PUSH1 0x02
// duplicate 2nd item on stack (0x00)
DUP2
// 0x00 < 0x20 =  true = 0x01 (check the user is not trying to store a value at a location that doesn't exist in the array)
LT
// jump location
PUSH2 0x00d7
// 2 POPs since this is a JUMPI (checking if LT returned true or false)
// simulate JUMPI 
POP
// simulate JUMPI 
POP
// push 0x20 (32 bytes aray item size)
PUSH1 0x20
// 0x20 * 0x00 = 0x00 = 0 in decimal (array item size * index to determine byte offset)
MUL
// 0x00 + 0x120
ADD
// duplicate 2nd on stack 0x01 (value for b[0])
DUP2
// duplicate 2nd on stack 0x120 (memory location for b[])
DUP2
// store 0x01 at memory location 0x120
MSTORE
// clean up stack
POP
POP
POP
POP

Finally, we begin to assign a value to the array“B” index 0. The code indicates that the value of B [0] should be 1. The value is pushed onto stack 0x01. The next shift to the left occurs, but the input for the shift is 0, which means that our value will not change. Next, the index position of the array to write 0x00 is pushed onto the stack and checked to see if the value is less than the length of the array 0x02. If not, execute a jump to a different part of the bytecode that handles this error state. The Mul (multiplication) and ADD (addition) opcodes are used to determine where in memory the value needs to be written to correspond to the correct array index.

0x20(32 in base 10) * 0x00(0 in base 10) = 0x00

Remember that an in-memory array is a 32-byte element, so this value represents the starting position of the Array Index. Since we are writing index 0, there is no offset, which is writing from 0x00.

0x00 + 0x120 = 0x120(10进制为288)

Add is used to ADD this offset value to the memory location of the variable“B”. The offset is 0 and the data is written directly to the allocated memory location. Finally, MSTORE stores the value 0x01 to this memory location 0x120.

The following figure shows the state of the system at the end of function execution. All stack entries have been popped. Note that there are actually some items left on the stack in remix, a JUMP position and a function signature, but they are not related to memory operations and are therefore omitted from the EVM playground.

Memory has been updated to contain a value of B [0] = 1, and in the third to last row of our memory, the value of 0 becomes 1. You can verify that the value is in the correct memory location and that B [0] should occupy the location 0x120-0x13f (bytes 289-320) .

16.png

We now have some understanding of how contract memory works. It will provide us with a good understanding and help when we need to write code in the future. When you skip over some contract opcodes and see that certain memory locations keep popping up (0x40) , now you know exactly what they mean.

In the next article in this series, we’ll explore how contract storage works in more depth in part 3 of the EVM series. Learn about slot packing and uncover the secrets of storage slots.

Like what you're reading? Subscribe to our top stories.

We will continue to update Gambling Chain; if you have any questions or suggestions, please contact us!

Follow us on Twitter, Facebook, YouTube, and TikTok.

Share:

Was this article helpful?

93 out of 132 found this helpful

Gambling Chain Logo
Industry
Digital Asset Investment
Location
Real world, Metaverse and Network.
Goals
Build Daos that bring Decentralized finance to more and more persons Who love Web3.
Type
Website and other Media Daos

Products used

GC Wallet

Send targeted currencies to the right people at the right time.