EVM Deep Dive Part 1

Introduction

In the intelligent contract world, the“Ether Virtual Machine (EVM)” and its algorithms and data structures are the first principles. The smart contract we created was built on this foundation. Whether you want to be a great Solidity smart contract developer or a security person, you must have an in-depth understanding of EVM.

In this series, we’ll introduce you to the Noxx translation https://noxx.substack.com/ and explore the basics of EVM.

Basics: Solidity → bytecode → opcode

Before reading this article, you need to know the basics of smart contracts and how to deploy Smart Contract Code on the ether chain. As we all know, smart contracts require the Solidity code to be compiled into bytecode before being deployed to the Ethernet network, and EVM will do the same based on the bytecode. This article focuses on compiled bytecode and how it is executed by EVM.

The bytecode compiled after the smart contract is deployed represents the entire contract, with multiple callable functions. So how does EVM know which bytecode corresponds to which function? We will use a Solidity smart contract and its bytecode and opcode to demonstrate how EVM selects the corresponding function in the bytecode when executing the code.

We use the online Solidity IDE tool Remix to compile Storage contracts.


// SPDX-License-Identifier: GPL-3.0

pragma solidity >=0.7.0 <0.9.0;

/**
 * @title Storage
 * @dev Store & retrieve value in a variable
 */
contract Storage {

    uint256 number;

    /**
     * @dev Store value in variable
     * @param num value to store
     */
    function store(uint256 num) public {
        number = num;
    }

    /**
     * @dev Return value
     * @return value of 'number'
     */
    function retrieve() public view returns (uint256){
        return number;
    }
}

There are two functions in this contract, store () and retrieve () , and EVM needs to determine which function we call when we make a function call. We can see the compiled bytecode of the entire contract through remix.

0.png

The following bytecode is the one we need to focus on. This is the selector for EVM to determine the function being called. Corresponding to this is the EVM opcode and input values.

00.png

We can view the EVM opcode list through ETHERVM. Io. An opcode is 1 byte long, which makes it possible to have 256 different opcodes. But EVM uses only 140 of them.

Here we parse the bytecode into its corresponding opcode. These opcodes are executed sequentially by EVM on the call stack.

1.png

Smart contract function call

Before delving into opcodes, we need a quick look at how to call the functions in the contract. The functions in the smart contract are called in the following ways:

  • Abi. Encode (…) returns (bytes) : the ABI encoding of the evaluated parameter.
  • Abi. encodePacked (…) returns (bytes) : a tightly packed encoding of the calculated parameter.
  • Abi. Encodewith selector (bytes4 selector, …) returns (bytes) : evaluates the ABI encoding of the function selector and parameters.
  • Abi.encodeWithSignature abi.encodeWithSelector (bytes4(keccak256
  • Abi. Encodecall (function functionPointer, (…)) returns (bytes memory) : calls functionPointer () using the tuple type parameter ABI encoding. Perform full type checking to ensure that the type matches the function signature. . Result and Abi. Encodewith selector (functionPointer.selector, (…)) .

Here we take the fourth example, calling store () and passing in the parameter 10:

111.png

Here’s through Abi. Encoded with signature (“Store (uint 256)”, 10) :

2.png
AA.png

This data is the encoded function signature.

We can use the online tool (“Https://emn178.github.io/online-tools/keccak_256.html”) to view the results from the store (UINT256) and retrieve () Hachy.

3.png
4.png

You can also do a counter check through the ether square function signature database (https://www.4byte.directory/signatures/) .

5.png
BB.png
6.png

Returning to the above set of function signature data, the first 4 bytes correspond to store (UINT256) . The remaining 32 bytes correspond to a hexadecimal value of“A”, which is 10 of the type UINT256 that we passed in when we called the function.

Picture

Here we can draw a conclusion through Abi. Encodewith signature () encodings the data in 36 bytes. These 36 bytes of data are the function signatures, of which the first 4 bytes are the function selector, which directs EVM to select the target function we call, the last 32 bytes of data are the arguments we pass in when we call the function.

Opcode and call stack

Now that you have a general idea of how function calls work in smart contracts, we’ll look at what each opcode does and how it affects stack calls. If you’re not familiar with how stack data structures work, watch this video to get you started: https://www.youtube.com/watch?v=fnz5o9s9pru

We decompose the bytecode into corresponding opcodes and analyze them in turn.

  • The PUSH1 operation, which pushes a 1-byte value onto the stack, tells EVM to push the next data byte, 0x00(also decimal 0) , onto the stack
7.png
  • Next is CALLDATALOAD, which reads a 32-byte value from the message data, where calldata is loaded onto the stack using the “input” value as an offset. The stack item size is 32 bytes, but currently our calldata has 36 bytes. The pushed value is msg.data[i:i+32] where “i” is the input value. This operation ensures that only 32 bytes are pushed onto the stack, while also guaranteeing that we can access any part of the calldata.

The current input value is 0, which means there is no offset (the value that pops up from the stack is the value of the previous Push1,0) , so the first 32 bytes of calldata are pushed onto the call stack.

Remember the function signature you got earlier? If these 36 bytes are passed in, this means that the next 4 bytes of“0000000A” will be lost. If you want to access this Uint256-type parameter, you need to set an offset of 4 to omit the function signature, which guarantees the integrity of the parameter.

8.png
  • The second PUSH1 operation will pass in hexadecimal data 0xe0, also known as decimal 224. As we mentioned above, the function signature is 4 bytes or 32 bits. The calldata we loaded was 32 bytes, or 256 bits, and 256-32 = 224 was exactly what we needed
9.png
  • SHR is the right shift instruction. It gets the first item 224 from the stack, indicating the number of bits to be shifted, and the second item -LRB-0x6057361d0.00) from the stack, indicating the content to be shifted. After this operation, there is a 4-byte function selector on the call stack
10.png

If you’re not familiar with how displacement works, check out this video: https://www.youtube.com/watch?v=fdkuq38h2jk&t=176s

  • The next opcode, Dup1, is used to get and copy the values at the top of the stack
11.png
  • Push4 pushes the 4-byte function signature of retrieve ()(0x2e64cec1) onto the call stack

If you’re curious about how this value is obtained, it’s because the solidity code is compiled into bytecode. The compiler can get all the function name and parameter type information from the bytecode.

12.png
  • EQ is used to determine the two values that pop off the stack, 0x2e64cec1 and 0x6057361d in the current case, and check that they are equal. If equal, 1 is pushed back onto the stack, or 0 if not
13.png
  • PUSH2 pushes 2 bytes of hexadecimal data 0x003b, with a decimal value of 59, onto the call stack

There’s something called a Program counter in the call stack that specifies where in the bytecode the next command to execute will be. Here, 59 is retrieved through the start position of the retrieve () bytecode.

14.png
  • JUMPI stands for “Jump if the condition is true” and pops up two values from the stack as input. The first 59 represents the jump position, and the second 0 is the Bourg value of whether the jump condition should be executed. Where 1 is true and 0 is false

If the condition is true, the Program counter will be updated and execution will jump to that location. But in our case the condition is false, the Program counter doesn’t change and the execution continues.

15.png
  • Dup1 again
16.png
  • Push4 pushes the 4-byte function signature of store (UINT256)(0X6057361D) onto the call stack
17.png
  • Do EQ again, but this time it’s true because the function has the same signature
18.png
  • PUSH2 pushes the 2-byte hexadecimal data 0x0059, or decimal 89, to the store (UINT256) bytecode Program counter
19.png
  • Jumpi is executed, this time with the bool value true, and jump is performed. Therefore, the Program counter is updated to 89, which moves execution to a different part of the bytecode. At this location, there will be a jumptest opcode, and without it, the JUMPI operation will fail
20.png

With it, after executing the opcode, it is taken to the corresponding bytecode location in the store (UINT256) , and execution of the function continues. Although the contract has only two functions, the basic principle is the same.

From the above example we see how EVM determines the location of the function bytecode it needs to execute based on the contract function call. Simply put, it is a simple set of“If statements” consisting of each function in the contract and its jump location.

EVM playground

This is an EVM Playground (https://www.EVM.codes/Playground ) testing platform where we can set up the bytecode we just ran. You can interactively look at the changes to the stack and pass in JUMPDEST to see what happens after JUMPI.

21.png

EVM Playgrpund also helps us understand how the Program counter works. You can see the corresponding comments next to each command and the position of the Program counter represented by the offset, you can also see the CALLDATA input in the left margin. When you click on the Run Command, you can step through each opcode by the arrow in the upper right corner, for example changing the call data to retrieve ()0x2e64cec1 to see the change in execution.

Stay Tuned for EVM in-Part 2 to explore what contract memory is and how it works under EVM.

Like what you're reading? Subscribe to our top stories.

We will continue to update Gambling Chain; if you have any questions or suggestions, please contact us!

Follow us on Twitter, Facebook, YouTube, and TikTok.

Share:

Was this article helpful?

93 out of 132 found this helpful

Gambling Chain Logo
Industry
Digital Asset Investment
Location
Real world, Metaverse and Network.
Goals
Build Daos that bring Decentralized finance to more and more persons Who love Web3.
Type
Website and other Media Daos

Products used

GC Wallet

Send targeted currencies to the right people at the right time.