Looking back at the principle of Tornado Cash A thorn in the side of regulators, yet the most sophisticated ZK application.

Author: Faust, Geek Web3

Introduction: Recently, Vitalik and some scholars jointly published a new paper, which mentioned how Tornado Cash implements the anti-xi money scheme (which is actually to let the withdrawer prove that their deposit record belongs to a collection that does not include black money), but the paper lacks a detailed interpretation of the Tornado Cash business logic and principles, making it difficult for people to understand.

In addition, it is worth mentioning that privacy projects represented by Tornado are the true applications of the ZK-SNARK algorithm, while most Rollups that claim to use ZK only use the simplicity of ZK-SNARK. Many times people often confuse the difference between Validity Proof and ZK, and Tornado happens to be an excellent example of understanding ZK applications.

The author of this article happened to have written an article about the principles of Tornado in 2022 at Web3Caff Research, and today selected and expanded some paragraphs from it to organize it into an article in order to help everyone systematically understand Tornado Cash.

Original link:

Tornado Cash 及混币器原理研究报告：结合 ZK 和 Merkle Proof 的鬼斧神工

Principle of “Tornado”

Tornado Cash is a mixer protocol that utilizes zero-knowledge proofs. The old version was put into use in 2019, and the new version started beta testing at the end of 2021. The old version of Tornado basically achieved decentralization, with on-chain contracts being open source and without multi-signature control, and the front-end code being open source and backed up on the IPFS network. Since the overall structure of the old version of Tornado is simpler and easier to understand, this article will focus on the interpretation of the old version.

The main idea of Tornado is: mix a large number of deposit and withdrawal behaviors together. After depositing tokens into Tornado, the depositor presents a ZK proof to prove that they have made a deposit, and then withdraws the tokens using a new address, thereby cutting off the correlation between the deposit and withdrawal addresses.

To be more specific, Tornado is like a glass box, mixed with coins put in by many people. We can see who puts in the coins, but these coins are highly homogeneous. If someone with a new face takes out a coin from the glass box, it is difficult for us to know who originally put in that coin.

This scenario seems to be common: when we SWAP a few ETH coins from the Uniswap pool, we cannot know who provided the withdrawn ETH, because there are too many people who have provided liquidity to Uniswap. However, the difference is that every time we use Uniswap to withdraw tokens, we need to use other tokens as equivalent costs, and we cannot “privately” transfer funds to others; while the mixer only requires the withdrawer to present a deposit certificate.

In order to make the deposit and withdrawal actions appear homogeneous, the deposit addresses of the Tornado pool and the withdrawal addresses keep consistent for each deposit and withdrawal, for example, for a pool with 100 depositors and 100 withdrawers, although they are publicly visible, they seem to have no connection with each other, and the amount deposited and withdrawn by each person is the same. This confuses the perception and makes it impossible to judge the correlation based on the deposit and withdrawal amounts, thereby cutting off the trace of fund transfer. It is evident that this provides natural convenience for xi money behavior.

However, there is a key issue: how can the withdrawer prove that they have made a deposit when making a withdrawal? The address used to initiate the withdrawal from the mixer is not associated with any of the deposit addresses, so how can their eligibility for withdrawal be determined? The most direct method seems to be for the withdrawer to directly disclose which deposit record they are referring to, but this would directly reveal their identity. This is where zero-knowledge proofs come into play.

The withdrawer provides a ZK proof to prove that they have a deposit record in the Tornado contract and that the deposit has not yet been withdrawn, allowing them to initiate the withdrawal successfully. Zero-knowledge proofs themselves provide privacy protection, so the outside world only knows that the withdrawer has indeed made a deposit into the fund pool, but they do not know which depositor they correspond to.

To prove “I have made a deposit in the Tornado fund pool” can be translated to “my deposit record can be found in the Tornado contract”. If we use Cn to represent the deposit records, the problem can be summarized as:

Given that the set of deposit records in Tornado is {C1, C2, … C100…}, the withdrawer Bob proves that he has generated a certain Cn in the deposit record using his private key, but does not disclose which Cn specifically through ZK.

This is where the special properties of Merkle Proof come into play. Because all the deposit records in Tornado are stored in a Merkle Tree constructed on the chain as the leaf nodes at the bottom layer, and the total number of leaf nodes is about 2 to the power of 20 (>1 million), most of them are in a blank state (assigned with initial values). Whenever a new deposit behavior occurs, the contract will write its corresponding characteristic value Commitment into a leaf node, and then update the root of the Merkle Tree.

For example, if Bob’s deposit operation is the 10,000th in the history of Tornado, then a characteristic value Cn associated with this deposit will be written into the 10,000th leaf node of the Merkle Tree, that is, C10000 = Cn. Then the contract will automatically calculate the new root and update it. (Note: To save computational resources, the Tornado contract will cache the data of the previous batch of nodes that have changed, such as Fs1 and Fs2, Fs0 in the following figure)

MerkleProof itself is very concise and lightweight, it takes advantage of the simplicity of tree data structures in the retrieval/tracing process. If you want to prove to the outside world that a certain transaction TD exists in the MerkleTree, you only need to provide the MerkleProof corresponding to the root (the right part in the figure below), which is quite concise. Even if the Merkle Tree is extremely large, with 2 to the power of 20 leaf nodes, which means it contains 1 million deposit records, the Merkle Proof only needs to include the values of 21 nodes, which is very short.

If you want to prove that a transaction H3 is indeed included in the Merkle Tree, you need to prove that by using H3 and other parts of data on the Merkle Tree, you can generate the Root, and the part of data (including Td) required to generate the Root constitutes the Merkle Proof.

When Bob withdraws money, he needs to prove that the voucher he owns corresponds to a certain deposit hash Cn recorded on the Merkle Tree. In other words, he needs to prove two things:

· Cn exists in the Merkle Tree of the Tornado contract on the chain, and a Merkle Proof can be constructed, which includes Cn;

· Cn is associated with the deposit voucher in Bob’s possession.

Tornado Business Logic Explanation

The front-end code of the Tornado user interface has implemented many functions in advance. When a depositor opens the TornadoCash webpage and clicks the deposit button, the program attached to the front-end code will generate two random numbers K and r locally. Then, it will calculate the value of Cn = Hash(K, r), and pass Cn (which is the commitment in the figure below) to the Tornado contract, inserting it into the Merkle Tree recorded by the contract. In plain terms, K and r are equivalent to private keys. They are very important, and the system will prompt users to keep them properly. K and r will still be used for withdrawal later.

It is worth noting that all the above work happens off-chain, that is to say: neither the Tornado contract nor external observers know K and r. If K and r are leaked, it is similar to a wallet’s private key being stolen.

After the Tornado contract receives the user’s deposit and the user’s submitted Cn = Hash(K, r), it will insert Cn into the bottom layer of the Merkle tree as a new leaf node and update the value of Root. Therefore, Cn and the user’s deposit action are one-to-one associated. The outside world can know which user each Cn corresponds to, know who has deposited tokens into the mixer, and know the deposit record Cn corresponding to each depositor.

In the withdrawal step, the withdrawer enters the voucher/private key (random numbers K and r generated during deposit) in the front-end webpage. The program in the TornadoCash front-end code will use K and r, Cn = Hash(K, r), and the Merkle Proof corresponding to Cn as input parameters to generate a ZK Proof, proving that Cn is a deposit record existing on the Merkle Tree and that K and r are the vouchers corresponding to Cn.

This step is equivalent to proving: I know the key corresponding to a deposit record on the Merkle Tree. When the ZK Proof is submitted to the Tornado contract, the above four parameters are all hidden, and the outside world (including the Tornado contract) cannot know them, thereby ensuring privacy.

Other parameters involved in generating ZKProof include: the root of the Merkle Tree in the Tornado contract when withdrawing, the custom receiving address A, and the identifier nf (which will be explained later) to prevent replay attacks. These three parameters will be publicly released on the chain and can be known by the outside world, but they do not affect privacy.

There is a detail here. When generating Cn during the deposit operation, two random numbers K and r are used to generate Cn, instead of a single random number. This is because a single random number is not secure enough and there is a certain probability of collision. For example, using a single random number may cause two different depositors to coincidentally use the same random number, resulting in a collision in the generated Cn.

As for A in the above image, it represents the address for receiving withdrawals, which is filled in by the withdrawer themselves. nf is an identifier to prevent replay attacks, and its value nf=Hash(K), where K is one of the two random numbers (K and r) used to generate Cn during the deposit. In this way, nf is associated with Cn, in other words, each Cn has a corresponding nf, and they are associated one-to-one.

Why prevent replay attacks? Due to the design characteristics of the mixer, when withdrawing, it is not known which leaf Cn in the Merkle tree corresponds to the coins withdrawn by the user, and it is not known which depositors are associated with the withdrawer, and it is not known how many deposits the withdrawer has made. The withdrawer can take advantage of this characteristic to frequently make withdrawals and launch replay attacks, repeatedly withdrawing tokens from the mixer pool until the fund pool is drained.

Here, the role of the nf identifier is similar to the nonce, which is possessed by each Ethereum address, and is set to prevent a transaction from being replayed. When a withdrawal occurs, the withdrawer needs to submit an nf and check if this nf has been used (recorded): if it has, the withdrawal is invalid. If not, it means that this nf has not been used, and the withdrawal is valid, and the corresponding nf will be recorded. When someone submits this nf again, the corresponding withdrawal action will be directly judged as invalid.

Can someone randomly generate an nf that has not been recorded in the contract? Of course not, because when the withdrawer generates the ZK Proof, it needs to ensure that nf=Hash(K), and the random number K is associated with the deposit record Cn, which means that nf is associated with a recorded deposit Cn. If a nf is randomly fabricated, this nf does not match any of the deposit records, and a valid ZK Proof cannot be generated smoothly, and the subsequent work cannot be completed successfully, and the withdrawal operation will not be successful.

Some people may also ask: Can we do without nf? Since the withdrawer needs to submit a ZK proof when making a withdrawal, proving their association with a certain Cn, can’t we just check if the corresponding ZK proof has been submitted to the chain every time a withdrawal action occurs?

However, the cost of doing so is very high because the Tornado cash contract does not permanently store the ZK Proofs submitted in the past, as this would severely waste storage space. Instead of comparing each new ZK Proof submitted to the chain with the existing Proof, it is more cost-effective to set a small identifier nf and store it permanently.

According to the code example of the withdrawal function, the required parameters and business logic are as follows:

The user submits ZK Proof, nf (NullifierHash) = Hash (K), customizes a recipient address for receiving the withdrawal, and the ZK Proof hides the values of Cn and K, making it impossible for the outside world to determine the user’s identity. The recipient address is usually filled with a clean new address and does not leak personal information.

But there is a small problem here. When users make withdrawals, they often use newly generated addresses to initiate withdrawal transactions in order to remain untraceable. At this time, the new address does not have enough ETH to pay for gas fees. Therefore, when the withdrawal address initiates a withdrawal, it needs to explicitly declare a relayer to pay for the gas fees on its behalf. Afterwards, the mixer contract will deduct a portion of the withdrawal from the user and give it to the relayer as a reward.

In summary, TornadoCash can conceal the association between the withdrawer and the depositor. In the case of a large number of users, it is like a busy market where it is difficult for the police to track down criminals who have blended into the crowd. ZK-SNARK is used in the withdrawal process, and the hidden witness part contains crucial information about the withdrawer. This is the most crucial point of the entire mixer. For now, Tornado may be one of the most ingenious applications related to ZK.

References:

1. https://etherscan.io/address/0xa160cdab225685da1d56aa342ad8841c3b53f291#code Tornado contract source code

2. https://mirror.xyz/mazemax.eth/BTbTOrEKzGkc-XoDcFtLPfJPtQ1Mt96BZYsW83m33IUTornado.cash new and old mechanism comparison

3. https://www.youtube.com/watch?v=Z0s4W3UBxM8 Anonymous Payments

4. https://medium.com/taipei-ethereum-meetup/zkp-study-group-tornado-cash-fdbb84d44b93 [ZKP Study Group] TornadoCash

5. https://medium.com/taipei-ethereum-meetup/tornado-cash-%E5%AF%A6%E4%BE%8B%E8%A7%A3%E6%9E%90-eb84db35de04 TornadoCash practical analysis

Like what you're reading? Subscribe to our top stories.

We will continue to update Gambling Chain; if you have any questions or suggestions, please contact us!

ETHTornado CashUniswap

Gambling Chain

Looking back at the principle of Tornado Cash A thorn in the side of regulators, yet the most sophisticated ZK application.

Principle of “Tornado”

Tornado Business Logic Explanation

Like what you're reading? Subscribe to our top stories.

Was this article helpful?

Vitalik Buterin’s Speech Transcript at Nanyang Technological University The Past and Present of Ethereum

Is EigenLayer’s business model a gold nugget or dross?

Products used

GC Wallet