Vitalik’s Latest Long Article Should the Ethereum Protocol Encapsulate More Functions?

Author: Vitalik Buterin Translation: Odaily Star Daily Nian Yin Si Tang

Special thanks to Justin Drake, Tina Zhen, and Yoav Weiss for their feedback and review.

From the beginning of the Ethereum project, there has been a strong idea of trying to make the core Ethereum as simple as possible and achieve this by building protocols on top of it as much as possible. In the blockchain field, the debate between “building on L1” and “focusing on L2” is often considered mainly about scalability, but in fact, there are similar issues in meeting the needs of various Ethereum users: digital asset exchange, privacy, usernames, advanced encryption, account security, censorship resistance, front-running protection, etc. However, there are recently some cautious thoughts willing to enshrine more of these features into the core Ethereum protocol.

This article will delve into the philosophical reasoning behind the initial minimalism philosophy and some recent approaches to thinking about these ideas. The goal will be to start building a framework to better identify possible objectives, among which encapsulating certain features may be worth considering.

Early Philosophy of Protocol Minimalism

In the early history of what was then called “Ethereum 2.0”, there was a strong desire to create a clean, simple, and elegant protocol that tried to do as little as possible by itself and left almost all such work to the users. Ideally, the protocol would just be a virtual machine and validating a block would just be a virtual machine call.

This is an approximate reconstruction of a whiteboard drawing Gavin Wood and I made in early 2015 when I was talking about what Ethereum 2.0 might look like.

The “state transition function” (the function that processes blocks) would simply be a single VM call and all other logic would happen through contracts: some system-level contracts, but mostly contracts provided by users. One very nice feature of this model is that even a whole hard fork can be described as a single transaction for the block processor contract, which would be approved through off-chain or on-chain governance and then executed with upgrade permissions.

These discussions in 2015 were particularly applicable to two areas we were considering: account abstraction and scalability. In the case of scalability, our idea was to try to create a maximally abstract form of scalability that felt like a natural extension of the diagram above. Contracts could call data that most Ethereum nodes do not store, and the protocol would detect this and solve the call with some very general extension computation. From the perspective of the virtual machine, this call would go into some separate subsystem and magically return the correct answer after some time.

We did a brief exploration of this idea, but quickly abandoned it because we were too focused on proving that any type of blockchain scalability was possible. Although we will see later that the combination of data availability sampling and ZK-EVM means that a possible future of Ethereum scalability actually looks very close to this vision! On the other hand, for account abstraction, we knew from the beginning that some form of implementation was possible, so research immediately began to try to make something as close to the pure starting point of “a transaction is just a call” a reality.

Between processing transactions and making actual underlying EVM calls from the sending address, there is a lot of boilerplate code, and even more boilerplate code to come. How can we minimize this code as close to zero as possible?

One of the key pieces of code here is validate_transaction(state, tx), which is responsible for checking the nonce and signature of the transaction. From the beginning, the actual goal of the account abstraction is to allow users to replace the basic non-incremental validation and ECDSA validation with their own validation logic, making it easier for users to use features like social recovery and multi-signature wallets. Therefore, finding a way to restructure apply_transaction as a simple EVM call is not just a task of “making the code clean for the sake of making the code clean”; instead, it is about moving the logic into the user’s account code to provide the flexibility the user needs.

However, the practice of keeping apply_transaction containing as little fixed logic as possible eventually brings many challenges. We can take a look at one of the earliest proposals for account abstraction, EIP-86.

If EIP-86 were included as is, it would lower the complexity of the EVM at the cost of significantly increasing the complexity of other parts of the Ethereum stack, requiring essentially the same code to be written elsewhere, introducing new strange categories such as the same transaction with the same hash appearing multiple times in the chain, not to mention the issue of multiple invalidations.

The issue of multiple invalidations in account abstraction. A transaction included in the chain can invalidate thousands of other transactions in the mempool, making it easy to be cheaply spammed.

Since then, account abstraction has developed in stages. EIP-86 later became EIP-208, and eventually the practical EIP-2938 emerged.

However, EIP-2938 is far from concise. Its content includes:

New transaction types
Three new global variables for transaction scopes
Two new opcodes, including the clumsy LianGuaiYgas opcode for handling gas price and gas limit checks, acting as EVM execution breakpoints, and temporary storage of ETH for one-time payment fees
A set of complex mining and broadcasting strategies, including a list of opcodes prohibited during transaction verification

In order to implement account abstraction without involving Ethereum core developers (who focus on optimizing Ethereum clients and implementing the merge), EIP-2938 was eventually restructured as ERC-4337, which is completely off-chain.

ERC-4337. It does rely entirely on EVM calls!

Because this is an ERC, it does not require a hard fork and technically exists “outside the Ethereum protocol”. So… is the problem solved? As it turns out, it is not. The current midterm roadmap for ERC-4337 actually involves eventually transforming most of ERC-4337 into a series of protocol features, which is a useful guiding example to understand why this path should be considered.

Wrapper ERC-4337

There are several key reasons discussed for ultimately reintroducing ERC-4337 into the protocol:

Gas efficiency: Any operation performed within the EVM incurs a certain level of virtual machine overhead, including inefficient gas consumption when using expensive features such as storage slots. Currently, these additional inefficiencies add up to at least 20,000 gas or even more. Including these components in the protocol is the simplest way to eliminate these issues.
Code bug risk: If the “entry point contract” of ERC-4337 has a sufficiently severe bug, all wallets compatible with ERC-4337 could potentially see all their funds drained. Replacing the contract with protocol-native functionality introduces an implicit responsibility to fix code errors through hard forks, thereby eliminating the risk of fund depletion for users.
Support for EVM opcodes such as txt.origin. ERC-4337 itself makes txt.origin point to the address of a “bundler” that packages a set of user operations into a transaction. Native account abstraction can address this issue by making txt.origin point to the actual account sending the transaction, functioning similarly to an EOA.
Censorship resistance: One of the challenges of the proposer/builder separation is that reviewing individual transactions becomes easier. In a world where the Ethereum protocol can identify individual transactions, an inclusion list can greatly mitigate this issue by allowing proposers to specify a list of transactions that must be included in the next two slots in almost all cases. However, ERC-4337 outside of the protocol encapsulates “user operations” within a single transaction, making user operations opaque to the Ethereum protocol; thus, the inclusion list provided by the Ethereum protocol would not provide censorship resistance to ERC-4337 user operations. Wrapping ERC-4337 and making user operations a “proper” transaction type would solve this problem.

It is worth noting that in its current form, ERC-4337 is much more expensive than “basic” Ethereum transactions: the cost of a transaction is 21,000 gas, while the cost of ERC-4337 is approximately 42,000 gas.

In theory, it should be possible to adjust the EVM gas cost system until the in-protocol cost matches the cost of accessing storage off-protocol; there is no reason why transferring ETH should need to cost 9,000 gas when other types of storage editing operations are cheaper. In fact, two upcoming EIPs related to the Verkle tree conversion attempt to do just that. However, even if we do this, there is an obvious reason why the wrapped protocol functionality will inevitably be much cheaper than EVM code no matter how efficient the EVM becomes: the wrapped code does not need to pay for gas preloads.

A fully-featured ERC-4337 wallet is large, and this implementation compiled and placed on-chain takes up approximately 12,800 bytes. Of course, you can deploy this code once and allow each individual wallet to call it using DELEGATECALL, but you still need to access that code in every block where it is used. Under the Verkle tree gas cost EIP, 12,800 bytes would constitute 413 chunks, accessing which would require paying 2 times the witness branch_cost (a total of 3,800 gas) and 413 times the witness chunk_cost (a total of 82,600 gas). And this is without even mentioning the entry point of ERC-4337 itself, which in version 0.6.0 occupies 23,689 bytes on-chain (requiring approximately 158,700 gas to load according to the Verkle tree EIP rules).

This leads to a problem: the actual gas cost of accessing these codes must be shared in some way in the transaction. The current method used by ERC-4337 is not very good: the first transaction in the bundle incurs a one-time storage/code reading cost, making it much more expensive than other transactions. Protocol encapsulation will allow these shared libraries to be part of the protocol, and everyone can access them for free.

What can we learn from this example and when should encapsulation be more common?

In this example, we see some different basic principles of encapsulating account abstractions in protocols.

When fixed costs are high, a market-based approach to “pushing complexity to the edge” is most likely to fail. In fact, the long-term account abstraction roadmap seems to have many fixed costs for each block. 244,100 gas used to load standardized wallet code is one thing; but aggregation could add hundreds of thousands of gas for ZK-SNARK verification and the on-chain cost of proof verification. There is no way to charge users these costs without introducing a lot of market inefficiencies, and turning some of these features into protocol features that everyone can access for free can solve this problem well.
Community-wide response to code bugs. If some code fragments are used by all or a very large number of users, it often makes more sense for the blockchain community to bear the responsibility of hard forking to fix any errors that arise. ERC-4337 introduces a large amount of globally shared code, and in the long run, it is undoubtedly more reasonable to fix errors in the code through hard forks than to cause users to lose a large amount of ETH.
Sometimes, stronger forms of the protocol can be achieved by directly leveraging the protocol’s capabilities. The key example here is anti-censorship features inside the protocol, such as inclusion lists: inclusion lists inside the protocol can provide better censorship resistance than methods outside the protocol. In order to make user-level operations truly benefit from the inclusion list inside the protocol, individual user-level operations need to be “protocol-readable”. Another little-known example is the Ethereum proof-of-stake design in 2017, which abstracted the stake key as an account, but this was abandoned in favor of supporting encapsulated BLS because BLS supports an “aggregation” mechanism that must be implemented at the protocol and network level, which can make the processing of a large number of signatures more efficient.

But it is important to remember that even encapsulating account abstractions inside protocols is still a huge “de-encapsulation” compared to the current state. Today, top-level Ethereum transactions can only be initiated by externally owned accounts (EOAs), which use a single secp256k1 elliptic curve signature for verification. Account abstraction eliminates this and leaves the verification conditions to be defined by the users themselves. Therefore, in this story about account abstraction, we also see the biggest reason against encapsulation: flexibly meeting the needs of different users.

Let’s further enrich this story by looking at several other feature examples that have recently been considered for encapsulation. We will specifically focus on: ZK-EVM, proposer-builder separation, private memory pool, liquidity staking, and new precompiles.

Encapsulating ZK-EVM

Let’s shift our attention to another potential encapsulation target of the Ethereum protocol: ZK-EVM. Currently, we have a large number of ZK-rollups, which all have to write fairly similar code to verify the execution of Ethereum-like blocks in ZK-SNARKs. There is a fairly diverse ecosystem of independent implementations: PSE ZK-EVM, Kakarot, Polygon ZK-EVM, Linea, Zeth, and so on.

One recent controversy in the EVM ZK-rollup space concerns how to handle potential bugs in the ZK code. Currently, all these running systems have some form of “security council” mechanism that can control the proof system in case of bugs. Last year, I tried to create a standardized framework to encourage projects to make explicit their trust in the proof system and the security council, and gradually reduce the power of that organization over time.

In the medium term, rollups may depend on multiple proof systems, with the security council having power only in extreme cases of disagreement between two different proof systems.

However, there is a sense that some of this work feels redundant. We already have the Ethereum base layer, which has an EVM, and we already have a mechanism for dealing with bugs in the implementation: if there is a bug, the clients will update to fix it, and then the chain continues to operate. From the perspective of a client with a bug, it may seem that blocks that have already been finalized will no longer be finalized, but at least we won’t see users losing funds. Similarly, if rollups just want to maintain equivalent functionality to the EVM, then they need to implement their own governance to continuously change their internal ZK-EVM rules to match the upgrades to the Ethereum base layer, and this feels wrong because ultimately they are building on top of the Ethereum base layer, which knows when to upgrade and according to what new rules.

Given that these L2 ZK-EVMs essentially use the same EVM as Ethereum, can we somehow incorporate “verifying EVM execution in ZK” into the protocol’s capabilities and handle exceptional cases like bugs and upgrades with the application of Ethereum’s social consensus, just like we already do for the base layer EVM execution itself?

This is an important and challenging topic.

One possible argument topic about data availability in native ZK-EVM is statefulness. If ZK-EVMs do not need to carry “witness” data, their data efficiency would be much higher. That is to say, if a certain piece of data has already been read or written in a previous block, we can simply assume that the prover has access to it and it does not need to be made available again. This is not just about not reloading storage and code; in fact, if a rollup correctly compresses the data, stateful compression can save up to 3 times more data compared to stateless compression.

This means that for ZK-EVM precompiles, we have two options:

1. Precompiles require all data to be available in the same block. This means that the prover can be stateless, but it also means that using ZK-rollup with this precompile is much more expensive than using a rollup with custom code.

2. Precompiles allow pointers to refer to data used or generated in previous executions. This makes ZK-rollup closer to optimal, but it is more complex and introduces a new state that must be stored by the prover.

What can we learn from this? There is a good reason to encapsulate ZK-EVM verification in some way: rollup has been building its own custom versions, and Ethereum is willing to put the weight of multiple implementations and off-chain social consensus on L1 to execute EVM, which feels wrong, but L2s that do the exact same work must implement complex little tools involving security councils. But on the other hand, there is a big issue in the details: there are different versions of ZK-EVM with different costs and benefits. The distinction between stateful and stateless only scratches the surface; attempting to support “almost-EVM” with custom code that has been proven by other systems may expose greater design space. Therefore, encapsulating ZK-EVM brings both hope and challenges.

Separation of Proposers and Builders (ePBS)

The rise of MEV has made block production a large-scale economic activity, where sophisticated participants can produce blocks that generate more revenue than the default algorithm by simply observing the transaction mempool and including them. So far, the Ethereum community has tried to address this issue by proposing-builder separation schemes like MEV-Boost, which allow regular validators (“proposers”) to outsource block construction to specialized participants (“builders”).

However, MEV-Boost makes a trust assumption in a new participant category called relays. Over the past two years, many have proposed creating “encapsulated PBS”. What are the benefits of doing this? In this case, the answer is very simple: an ePBS built directly using protocol features is more powerful (in the sense of having weaker trust assumptions) than one built without them. This is similar to the case of price oracles within the encapsulation protocol – although there are strong objections in this case as well.

Encapsulating Private Memory Pools

When a user sends a transaction, the transaction is immediately exposed and visible to everyone, even before it is included on-chain. This makes users of many applications vulnerable to economic attacks such as frontrunning.

Recently, there have been many projects dedicated to creating “private memory pools” (or “encrypted memory pools”) that encrypt users’ transactions until they are irreversibly accepted into a block.

However, the problem is that such a solution requires a special encryption: in order to prevent users from flooding the system and decrypting it first, encryption must be automatically decrypted after the transaction is irrevocably accepted.

To achieve this form of encryption, there are various technologies with different trade-offs. Jon Charbonneau has described them well:

Encrypting centralized operators, such as Flashbots Protect.
Time lock encryption, which can be decrypted by anyone after a certain sequence of calculation steps and cannot be parallelized;
Threshold encryption, trusting a majority committee to decrypt the data. For specific recommendations, see the concept of closed beacon chains.
Trusted hardware, such as SGX.

Unfortunately, each encryption method has different weaknesses. Although there are some users willing to trust each solution, no solution is trusted enough to be actually accepted by Layer 1. Therefore, at least until delayed encryption is perfected or other technological breakthroughs occur, encapsulating anti-front-running functionality at Layer 1 seems to be a difficult proposition, even though it is a valuable feature that many application solutions have emerged.

Encapsulating Liquidity Staking

A common requirement for Ethereum DeFi users is the ability to use their ETH for both staking and as collateral in other applications. Another common requirement is simply for convenience: users want to be able to stake without running a node and keeping it online at all times (and protecting the online staking key).

So far, the simplest “interface” to meet these two requirements for staking is just an ERC 20 token: convert your ETH to “staking ETH”, hold it, and then convert it back. In fact, liquidity staking providers like Lido and Rocket Pool have already started doing this. However, liquidity staking has some natural centralization mechanisms at work: people naturally enter the largest version of staking ETH because it is the most familiar and liquid.

Each version of staking ETH needs some mechanism to determine who can become the underlying node operator. It cannot be unrestricted, as attackers would join and exploit users’ funds to amplify attacks. Currently, the top two are Lido and Rocket Pool, with the former having a DAO whitelist of node operators and the latter allowing anyone to run a node with a deposit of 8 ETH. These two approaches have different risks: the Rocket Pool approach allows attackers to perform a 51% attack on the network and force users to pay most of the cost; as for the DAO approach, if a staking token dominates, it would result in a single, potentially vulnerable governance tool controlling a large portion of Ethereum validators. It is worth noting that protocols like Lido have implemented safeguards, but one layer of defense may not be enough.

In the short term, one option is to encourage ecosystem participants to use diverse liquidity providers to reduce the possibility of systemic risks brought by a monopoly. However, in the long run, this is an unstable balance, and it is dangerous to rely too heavily on moral pressure to solve the problem. A natural question arises: does it make sense to encapsulate certain functionality in the protocol to make liquidity staking less centralized?

The key question here is: what kind of functionality within the protocol? Simply creating a protocol-specific “staking ETH” token presents a problem: either it must have an Ethereum-wide governance to choose who runs the nodes, or it is open, but this would turn it into a tool for attackers.

An interesting idea is Dankrad Feist’s article on maximizing liquidity staking. First, we bear the cost that only 5% of the attacked ETH will be confiscated if Ethereum is subject to a 51% attack. This is a reasonable trade-off; currently, over 26 million ETH are staked, and the cost of attacking one-third of it (about 8 million ETH) is excessive, especially considering how many “off-model” attacks can be carried out at a lower cost. In fact, similar trade-offs have been discussed in the “supercustody committee” proposal for implementing single-slot finality.

If we accept that only 5% of the attacked ETH will be confiscated, then over 90% of the staked ETH will not be affected by confiscation, so they can serve as protocol-specific substitute liquidity staking tokens and be used by other applications.

This path is interesting. But it still leaves one question: what exactly should be encapsulated? The way Rocket Pool works is very similar to this: each validator operator provides some funds, and liquidity stakers provide the rest. We can simply adjust some constants to limit the maximum confiscation penalty to 2 ETH, and Rocket Pool’s existing rETH will become risk-free.

With simple protocol adjustments, we can also do other clever things. For example, suppose we want a system with two “layers” of staking: validator operators (with high collateral requirements) and depositors (with no minimum collateral requirement, can join and leave at any time), but we still want to prevent centralization of validator operators by granting power to a randomly sampled depositor committee, such as proposing a list of transactions that must be included (for anti-censorship reasons), controlling fork choice during inactive leak periods, or requiring signatures on blocks. This can be achieved in a way that is essentially detached from the protocol by adjusting the protocol to require each validator to provide (i) a regular staking key and (ii) an ETH address that can be called between each slot to output a secondary staking key. The protocol will grant power to these two keys, but the mechanism for selecting the second key in each slot can be left to the staking pool protocol. Directly encapsulating some functionality may still be better, but it’s worth noting that there is design space for “including something and leaving something to the user.”

More Precompiles Encapsulation

Precompiles (or “precompiled contracts”) are Ethereum contracts that implement complex cryptographic operations natively in client code, rather than in EVM smart contract code. Precompiles are a compromise solution adopted at the beginning of Ethereum development: because the overhead of the virtual machine is too high for certain complex and highly specialized code, we can implement key operations that are valuable to important applications in local code to make them faster. Today, this basically includes some specific hash functions and elliptic curve operations.

Currently, there are initiatives to add precompiles for secp256r1, which is a slightly different elliptic curve from secp256k1 used for basic Ethereum accounts, as it is well-supported by trusted hardware modules and can improve wallet security when widely used. In recent years, the community has also pushed for adding precompiles for BLS-12-377, BW6-761, generalized pairings, and other functionalities.

The argument against adding more precompile files for these requirements is that many of the precompiles added previously, such as RIPEMD and BLAKE, ended up being used far less than expected, and we should learn from this. Instead of adding more precompiles for specific operations, we may focus on a more moderate approach based on ideas like EVM-MAX and dormant but always recoverable SIMD proposals, which would allow the EVM implementation to execute a wide range of code classes at a lower cost. Perhaps even existing underused precompiles can be removed and replaced with EVM code implementations of the same functions (inevitably less efficient). That being said, it is still possible that there are specific cryptographic operations whose value is significant enough to warrant acceleration, and adding them as precompiles makes sense.

What Have We Learned From All This?

The desire for minimal encapsulation is understandable and good; it stems from the Unix philosophy tradition of creating minimal software that can easily adapt to different user needs and avoids the curse of software bloat. However, blockchain is not a personal computing operating system but a social system. This means that encapsulating certain functionalities in the protocol makes sense.

In many cases, these other examples are similar to what we see in account abstractions. But we have also learned some new lessons:

Encapsulating functionality can help avoid centralization risks in other areas of the stack:

Often, keeping the base protocol minimal and simple pushes complexity to ecosystems outside of the protocol. From the perspective of the Unix philosophy, this is good. However, sometimes there are ecosystem outside the protocol that carry centralization risks, typically (but not exclusively) due to high fixed costs. Encapsulation can sometimes reduce de facto centralization.

Encapsulating too much can potentially overextend the trust and governance burden of the protocol:

This is the theme of the previous article “Don’t Overload Ethereum Consensus”: If encapsulating a specific function weakens the trust model and makes Ethereum as a whole more “subjective”, it weakens Ethereum’s trusted neutrality. In these cases, it is best to treat specific functions as mechanisms on top of Ethereum rather than trying to introduce them into Ethereum itself. Here, encrypted memory pool is the best example, which may be a bit difficult to encapsulate, at least until delayed encryption technology improvements are made.

Encapsulating too much content can make the protocol too complex:

Protocol complexity is a systemic risk, and adding too many features to the protocol increases this risk. Precompiles are the best example.

In the long run, encapsulating functions may backfire because user demands are unpredictable:

A function that many people consider important and will be used by many users may not be frequently used in practice.

In addition, examples such as liquidity staking, ZK-EVM, and precompiles show the possibility of a middle ground: minimal viable enshrinement. The protocol does not need to encapsulate the entire function, but can include specific parts that address key challenges, making the function easy to implement without being overly paranoid or narrow-minded. Examples of this include:

Rather than encapsulating a complete liquidity staking system, it is better to change the staking penalty rules to make trustless liquidity staking more feasible;
Rather than encapsulating more precompilers, it is better to encapsulate EVM-MAX and/or SIMD to make a wider range of operations easier to implement effectively;
It is possible to simply encapsulate EVM verification instead of encapsulating the entire concept of rollup.

We can expand the previous chart as follows:

Sometimes, it makes sense to encapsulate something, and removing rarely used precompiles is an example. Account abstraction, as mentioned earlier, is also an important form of encapsulation. If we want to support backward compatibility for existing users, this mechanism may actually be surprisingly similar to the mechanism of encapsulating precompiles: one proposal is EIP-5003, which allows an EOA to convert its account into a contract with the same (or better) functionality.

It is a complex trade-off to determine which functions should be introduced into the protocol and which functions should be left to other layers of the ecosystem. As our understanding of user needs and the improvement of available ideas and technical suites continue to progress, this trade-off is expected to continue to improve over time.

Like what you're reading? Subscribe to our top stories.

We will continue to update Gambling Chain; if you have any questions or suggestions, please contact us!

ETH

Gambling Chain

Vitalik’s Latest Long Article Should the Ethereum Protocol Encapsulate More Functions?

Early Philosophy of Protocol Minimalism

Wrapper ERC-4337

What can we learn from this example and when should encapsulation be more common?

Encapsulating ZK-EVM

Separation of Proposers and Builders (ePBS)

Encapsulating Private Memory Pools

Encapsulating Liquidity Staking

More Precompiles Encapsulation

What Have We Learned From All This?

Like what you're reading? Subscribe to our top stories.

Was this article helpful?

Reviewing the rise of the BASE chain On-chain marketing may be the best way, with no possibility of airdropping new public chains.

IBC Product Manager states that Ethereum Layer2 is not the holy grail of scaling.

Products used

GC Wallet