The Skyfall team at CertiK recently discovered multiple vulnerabilities in Rust-based RPC nodes in several blockchains, including Aptos, StarCoin, and Sui. As RPC nodes are a critical infrastructure component that connects dApps and underlying blockchains, their robustness is essential for seamless operations. Blockchain designers understand the importance of stable RPC services and therefore adopt memory-safe languages like Rust to mitigate common vulnerabilities that could disrupt RPC nodes.
Using memory-safe languages like Rust helps RPC nodes avoid many attacks based on memory corruption vulnerabilities. However, through recent audits, we have found that even memory-safe Rust implementations can be susceptible to certain security threats if not carefully designed and reviewed, thus compromising the availability of RPC services.
In this article, we will present our findings on a series of vulnerabilities through actual cases.
Role of Blockchain RPC Nodes
- After 14 years of development in the crypto industry, what exactly is meant by Crypto Native?
- In-depth analysis of the LSDFi star project Pendle Achieving eating tomorrow’s food today for users through interest separation.
- EigenLayer Completely Changing Decentralized Trust through Re-Staking
Remote Procedure Call (RPC) services in blockchains are core infrastructure components of Layer 1 blockchains. They provide important API frontends to users and serve as gateways to backend blockchain networks. However, blockchain RPC services differ from traditional RPC services in that they facilitate user interactions without requiring authentication. The continuous availability of these services is crucial, as any service interruption severely impacts the availability of the underlying blockchain.
Audit Perspective: Traditional RPC Servers vs. Blockchain RPC Servers
Audits of traditional RPC servers primarily focus on aspects such as input validation, authorization/authentication, Cross-Site Request Forgery/Server-Side Request Forgery (CSRF/SSRF), injection vulnerabilities (such as SQL injection, command injection), and information leakage.
However, the case is different for blockchain RPC servers. As long as transactions are signed, there is no need to authenticate the client initiating the request at the RPC layer. As the frontend of a blockchain, the primary goal of RPC services is to ensure their availability. If they fail, users cannot interact with the blockchain, hindering functionalities like querying on-chain data, submitting transactions, or deploying contracts.
Therefore, the most vulnerable aspect of blockchain RPC servers is “availability.” If the server goes down, users lose the ability to interact with the blockchain. More critically, some attacks can propagate across the chain, affecting a large number of nodes and even causing the entire network to collapse.
Why New Blockchains Adopt Memory-Safe RPC
Some well-known Layer 1 blockchains, such as Aptos and Sui, implement their RPC services using the memory-safe programming language Rust. Thanks to its strong security and compile-time strict checks, Rust can almost protect programs from memory corruption vulnerabilities, such as stack overflow, null pointer dereference, and use-after-free.
To further ensure the security of codebases, developers need to strictly adhere to best practices, such as avoiding the introduction of unsafe code. Using #![forbid(unsafe_code)] in the source code ensures the blocking and filtering of unsafe code.
Examples of blockchain developers implementing Rust programming practices
To prevent integer overflow, developers often use functions such as checked_add, checked_sub, saturating_add, and saturating_sub instead of simple addition and subtraction (+, -). Resource exhaustion can be mitigated by setting appropriate timeouts, request size limits, and request item limits.
Memory safety RPC threats in Layer 1 blockchains
Although there are no memory-unsafe vulnerabilities in the traditional sense, RPC nodes are exposed to inputs that are easy for attackers to manipulate. In memory-safe RPC implementations, there are several cases that can lead to denial of service. For example, memory amplification can deplete the memory of a service, and logic issues can introduce infinite loops. In addition, race conditions can pose threats, and concurrent operations can result in unexpected event sequences, leaving the system in an undefined state. Improper management of dependencies and third-party libraries can also introduce unknown vulnerabilities to the system.
In this article, our goal is to draw attention to more direct ways to trigger Rust runtime protections, resulting in self-termination of the service.
Explicit Rust panic: A method to directly terminate RPC services
Developers can intentionally or unintentionally introduce explicit panic code. This code is primarily used to handle unexpected or exceptional situations. Some common examples include:
assert!(): This macro is used when a condition must be satisfied. If the condition of the assertion fails, the program will panic, indicating a serious error in the code.
panic!(): This function is called when the program encounters an unrecoverable error and cannot continue execution.
unreachable!(): This macro is used when a piece of code should not be executed. If this macro is called, it indicates a serious logical error.
unimplemented!() and todo!(): These macros are placeholders for functionality that has not yet been implemented. If reached, the program will crash.
unwrap(): This method is used for Option or Result types and will cause the program to crash if an Err variable or None is encountered.
Vulnerability 1: Triggering assert! in the Move Verifier
The Aptos blockchain adopts the Move bytecode verifier, which performs reference safety analysis through abstract interpretation of the bytecode. The execute() function is part of the TransferFunctions trait implementation and simulates the execution of bytecode instructions in a basic block.
The task of the execute_inner() function is to interpret the current bytecode instruction and update the state accordingly. If we have reached the last instruction in the basic block, as indicated by index == last_index, the function will call assert!(self.stack.is_empty()) to ensure that the stack is empty. The intention behind this behavior is to ensure that all operations are balanced, which also means that there is a corresponding pop for each push.
In the normal execution flow, the stack is always balanced during the abstract interpretation process. The stack balance checker guarantees this by validating the bytecode before interpretation. However, once we expand our perspective to the scope of the abstract interpreter, we find that the stack balance assumption is not always valid.
Patch program for the analyze_function vulnerability in AbstractInterpreter
The core of the abstract interpreter is to simulate bytecode at the basic block level. In its initial implementation, when encountering an error in the execute_block process, the analysis process will prompt and record the error, and continue executing the next block in the control flow graph. This may cause a situation where errors in the executed block result in stack imbalance. If execution continues in this case, an assert! check will be performed when the stack is not empty, resulting in a LianGuainic error.
This provides an opportunity for attackers. Attackers can trigger errors by designing specific bytecode in execute_block(), and then execute() may execute assert when the stack is not empty, causing the assert check to fail. This will further cause LianGuainic and terminate the RPC service, affecting its availability.
To prevent this situation, in the implemented fix, it is ensured that the entire analysis process will stop when the execute_block function encounters an error for the first time, thereby avoiding the potential risk of subsequent crashes that may occur when continuing the analysis due to stack imbalance caused by errors. This modification eliminates the potential for LianGuainic and helps improve the robustness and security of the abstract interpreter.
Vulnerability 2: Trigger LianGuainic in StarCoin!
The Starcoin blockchain has its own Move implementation fork. In this Move repo, there is a LianGuainic in the constructor of the Struct type! If the provided StructDefinition has the Native field information, this LianGuainic will be explicitly triggered.
Explicit LianGuainic in the normalization routine for initializing structures
This potential risk exists in the process of re-publishing modules. If the module being published already exists in the data store, existing modules and input modules controlled by attackers need to undergo module normalization. In this process, the “normalized::Module::new” function will construct module structures from input modules controlled by attackers, thereby triggering “LianGuainic!”
Preconditions of the normalization routine
This LianGuainic can be triggered by submitting a specially crafted payload from the client. Therefore, malicious actors can disrupt the availability of RPC services.
Patch for structure initialization LianGuainic
The Starcoin patch introduces a new behavior to handle the Native case. Now, it does not cause LianGuainic, but returns an empty ec. This reduces the likelihood of LianGuainic being triggered by user-submitted data.
Implicit Rust LianGuainic: An easily overlooked way to terminate RPC services
Explicit LianGuainic is easy to identify in source code, while implicit LianGuainic is more likely to be overlooked by developers. Implicit LianGuainic often occurs when using APIs provided by standard or third-party libraries. Developers need to thoroughly read and understand the API documentation, otherwise their Rust programs may unexpectedly stop.
Implicit LianGuainic in BTreeMap
Let’s take BTreeMap in Rust STD as an example. BTreeMap is a commonly used data structure that organizes key-value pairs in the form of a sorted binary tree. BTreeMap provides two methods for retrieving values by key: get(&self, key: &Q) and index(&self, key: &Q).
The method get(&self, key: &Q) retrieves the value by key and returns an Option. Option can be Some(&V), which returns a reference to the value if the key exists, or None if the key is not found in the BTreeMap.
On the other hand, index(&self, key: &Q) directly returns a reference to the value corresponding to the key. However, it carries a significant risk: if the key does not exist in the BTreeMap, it triggers an implicit LianGuainic. If not handled properly, the program may crash unexpectedly, posing a potential vulnerability.
In fact, the index(&self, key: &Q) method is the underlying implementation of the std::ops::Index trait. This trait provides convenient syntax for indexing operations in an immutable context (i.e., container[index]). Developers can directly use btree_map[key] to call the index(&self, key: &Q) method. However, they may overlook the fact that this usage may trigger LianGuainic if the key is not found, posing an implicit threat to the stability of the program.
Vulnerability 3: Triggering Implicit LianGuainic in Sui RPC
The Sui module’s release routine allows users to submit module payloads via RPC. Before forwarding the request to the backend validation network for bytecode validation, the RPC handler directly disassembles the received module using the SuiCommand::Publish function.
In this disassembly process, the code_unit part in the submitted module is used to construct a VMControlFlowGraph. This construction process includes creating basic blocks, which are stored in a BTreeMap named ‘blocks’. During this process, implicit LianGuainic can be triggered under certain conditions.
Here is a simplified code snippet:
Implicit LianGuainic when creating VMControlFlowGraph
In this code, a new VMControlFlowGraph is created by iterating through the code and creating a new basic block for each code unit. The basic blocks are stored in a BTreeMap named block.
In the loop iterating over the stack, the block[&block] is used to index the block graph, which has been initialized with ENTRY_BLOCK_ID. The assumption here is that at least one ENTRY_BLOCK_ID exists in the block mapping.
However, this assumption is not always true. For example, if the submitted code is empty, then after the “create basic blocks” process, the “block mapping” will still be empty. When the code later attempts to traverse the block mapping using for succ in &blocks[&block].successors, it may cause implicit LianGuainic if the key is not found. This is because the expression blocks[&block] essentially calls the index() method, which, as mentioned earlier, will result in LianGuainic if the key does not exist in the BTreeMap.
An attacker with remote access permissions can exploit this vulnerability by submitting a malformed payload with an empty code_unit field. This simple RPC request will cause the entire JSON-RPC process to crash. If the attacker continues to send such malformed payloads at minimal cost, it will result in a continuous interruption of service. In a blockchain network, this means that the network may be unable to confirm new transactions, resulting in a denial of service (DoS) situation. Network functionality and user trust in the system will be severely impacted.
Sui’s fix: Remove disassembly functionality from the RPC publishing routine
It is worth noting that the CodeUnitVerifier in the Move Bytecode Verifier is responsible for ensuring that the code_unit section is never empty. However, the order of operations exposes the RPC handler to potential vulnerabilities. This is because the validation process takes place on the Validator node, which is a stage after the RPC processes the input modules.
To address this issue, Sui resolves the vulnerability by removing the disassembly functionality from the module publishing RPC routine. This is an effective way to prevent the RPC service from handling potentially dangerous and unverified bytecode.
Additionally, it is worth noting that other RPC methods related to object queries also include disassembly functionality, but they are not easily susceptible to attacks using empty code units. This is because they always query and disassemble existing published modules. Published modules must have already been verified, so the assumption of non-empty code units always holds when building the VMControlFlowGraph.
Recommendations for Developers
After understanding the explicit and implicit threats of LianGuainic to the stability of RPC services in blockchain, developers must master strategies to prevent or mitigate these risks. These strategies can reduce the likelihood of service interruptions and improve system resilience. Therefore, CertiK’s expert team provides the following recommendations and lists them as best practices for Rust programming.
Rust LianGuainic Abstraction: Consider using Rust’s catch_unwind function as much as possible to catch LianGuainic and convert it into error messages. This prevents the entire program from crashing and allows developers to handle errors in a controlled manner.
Use APIs with caution: Implicit LianGuainic often occurs due to abuse of APIs provided by standard or third-party libraries. Therefore, fully understanding the API and learning how to handle potential errors properly is crucial. Developers should always assume that APIs may fail and be prepared for such cases.
Proper error handling: Use Result and Option types for error handling instead of relying on LianGuainic. The former provides a more controlled way to handle errors and special cases.
Add documentation and comments: Ensure that the code is well-documented and add comments to critical sections, including sections where LianGuainic may occur. This will help other developers understand potential risks and effectively handle them.
Summary
Rust-based RPC nodes play a crucial role in blockchain systems such as Aptos, StarCoin, and Sui. As they are used to connect DApps and the underlying blockchain, their reliability is essential for the smooth operation of blockchain systems. Despite using the memory-safe language Rust, there are still risks of improper design. CertiK’s research team explores these risks through real-world examples, proving the need for caution and meticulous design in memory-safe programming.
Like what you're reading? Subscribe to our top stories.
We will continue to update Gambling Chain; if you have any questions or suggestions, please contact us!