Data Revolution Unveiling the Panorama of Decentralized Storage

TL;DR

Decentralized storage refers to individuals or groups using their idle storage space as units of a storage network, bypassing centralized institutions such as AWS and Google Cloud’s absolute control over data.
Low storage costs, data redundancy backups, and token economics are also characteristics of decentralized storage, and a large number of Web3 applications are built on this infrastructure.
As of June 2023, the overall storage capacity of decentralized storage has exceeded 22,000 PB, while the network utilization is only about 20%. This indicates a great potential for future growth.
Of the existing storage capacity, approximately over 80% is provided by Filecoin, undoubtedly the leader in this field. Filecoin has also launched projects such as Filecoin Plus and FVM to incentivize developers and promote ecosystem development.
With the rise of fields such as artificial intelligence and blockchain-based gaming, decentralized computing and storage are expected to have exciting growth opportunities.

1. Why do we need decentralized storage?

Cloud storage services like Dropbox and Google Cloud have changed the way we store and share large files (such as videos and photos) online. They allow anyone to store several TB of data at a much lower cost than buying new hard drives and access files from any device when needed. However, there is a problem: users have to rely on the management systems of centralized entities, which can revoke their access to accounts at any time, share their files with government agencies, or even delete files without reason. This storage model leads to unclear ownership of data assets and effectively allows large internet companies like Amazon and Google to monopolize data. In addition, downtime of centralized services often has disastrous consequences.

The storage field is inherently suitable for decentralized applications. Firstly, it solves problems such as user data privacy and ownership. Files stored on decentralized file services are not influenced by any centralized institutions, such as government agencies that may wish to control and review content. It also prevents private companies from taking actions such as censoring services or sharing files with law enforcement agencies.

Secondly, storing massive amounts of data requires distributed systems. Existing centralized cloud services also use distributed solutions, such as SLianGuainner and TiDB. It can be said that distributed does not necessarily mean decentralized, but decentralized definitely means distributed. Unlike the architecture of centralized storage, existing decentralized solutions divide data into small chunks and store them encrypted on nodes worldwide, creating multiple copies of the data and improving the ability to recover from data loss.

Third, it solves the issue of wasted resources in mining. The excessive energy consumption caused by Bitcoin’s PoW mechanism has always been criticized. Decentralized storage gives users the opportunity to become nodes and mine by using idle storage resources to profit. A large number of storage nodes also mean reduced costs. It can be foreseen that decentralized storage cloud services may even eat up a portion of the Web2 cloud service market. In today’s constantly upgrading network bandwidth and hardware services, this is a huge market. According to Business Research predictions, the global database market will exceed $120 billion by 2028.

2. Decentralized Storage Architecture

In order to create truly decentralized applications, decentralized databases should also be included in the Web3 application architecture. It can be divided into four main components: smart contract layer, file storage, database, and general infrastructure layer.

The smart contract layer is equivalent to Layer1, while the general infrastructure layer includes but is not limited to oracles, RPC, access control, identity, off-chain computation, and indexing networks.

Although not easily noticeable to users, the file storage and database layers play a crucial role in the development of Web3 applications. They provide the necessary infrastructure for storing structured and unstructured data, which is a requirement for various applications. Due to the nature of this report, the following sections will further elaborate on these two components.

2.1 Decentralized File Storage Networks (DFSNs)

DFSNs like Filecoin, Arweave, and Crust are mainly used for the persistent storage of unstructured data, which does not follow predefined formats and does not require frequent updates or retrievals. Therefore, DFSNs are usually used to store various types of static data such as text documents, images, audio files, and videos.

One advantage of storing this type of data in a distributed storage architecture is the ability to move the data storage closer to the end-users using edge storage devices or edge data centers. This storage approach provides lower network communication costs, lower interaction latency, and lower bandwidth overhead. It also offers greater adaptability and scalability. For example, taking Storj as an example, the monthly storage cost for 1TB is $4.00, while the market-leading enterprise cloud storage solution Amazon S3 charges approximately $23.00 per month for the same amount of data.

Compared to traditional centralized cloud storage solutions, users can benefit from more cost-effective storage options. The decentralized nature of DFSNs also provides higher data security, privacy, and control, as data is distributed among multiple nodes or miners rather than stored in a single centralized server.

2.2 Decentralized Database

The limitations of storing unstructured files in DFSNs are obvious, especially in terms of efficient data retrieval and updates. For data that requires frequent updates, these architectures are not the ideal choice. In this case, traditional databases such as MySQL and Redis are more suitable options for developers, as they have been extensively optimized and tested in the Web 2.0 era.

Structured data storage is an inevitable requirement, especially in applications such as blockchain games and social networks. Traditional databases provide an efficient way to manage large amounts of dynamic data and control access to it. They offer functionalities such as indexing, querying, and data manipulation, which are crucial for applications that rely on structured data. Therefore, whether based on DFSNs or self-developed underlying storage, high-performance and highly available decentralized databases are a very important branch in the storage field.

3. Analysis of DFSNs at the Technical Level

3.1 Overview

In the current Web3 projects, decentralized file storage projects (DFSNs) can be roughly divided into two categories. The first category includes projects based on IPFS, such as Filecoin and Crust. The second category includes projects like AR, Sia, and Storj, which have their own underlying protocols or storage systems. Although they have different implementation methods, they face the same challenge: ensuring efficient data storage and retrieval while ensuring truly decentralized storage.

Due to the fact that the blockchain itself is not suitable for storing large amounts of data on-chain, the associated costs and the impact on block space make this approach impractical. Therefore, an ideal decentralized storage network must be able to store, retrieve, and maintain data while ensuring that the work of all participants in the network is incentivized and compliant with the trust mechanism of the decentralized system.

We will evaluate the technical features and strengths and weaknesses of several mainstream projects from the following aspects:

Data Storage Format: The storage protocol layer needs to determine how data should be stored, such as whether data should be encrypted and whether data should be stored as a whole or divided into small hash blocks.

Data Replication and Backup: It is necessary to decide where to store the data, such as how many nodes should keep the data, whether all data should be replicated to all nodes, or whether each node should receive different fragments to further protect data privacy. The data storage format and propagation will determine the probability of data availability on the network, i.e. the persistence when devices fail over time.

Long-term Data Availability: The network needs to ensure when and where data should be available. This means designing incentive mechanisms to prevent storage nodes from deleting old data over time.

Proof of Stored Data: The network not only needs to know the storage location of the data, but also the storage nodes should be able to prove that they have indeed stored the data they want to store in order to determine the share of incentives.

Storage Price Discovery: It is expected that nodes will pay for the continuous storage of files.

3.2 Data Storage and Replication

As mentioned just now, Filecoin and Crust use IPFS as the network protocol and communication layer to transfer files between peers and store them on nodes. The difference is that Filecoin uses erasure coding (EC) to achieve scalability of data storage. Erasure coding (EC) is a data protection method that divides data into fragments, extends and encodes redundant data blocks, and stores them in different locations such as disks, storage nodes, or other geographical locations. EC creates a mathematical function to describe a set of numbers, allowing their accuracy to be checked and recovered in case one of the numbers is lost.

Source: usenix

The basic equation is n=k+m, where the total data blocks equal the original data blocks plus the parity blocks.

Calculate m parity blocks from k original data blocks. Store these k+m data blocks on k+m hard disks, and it can tolerate any m hard disk failures. When a hard disk failure occurs, as long as any k surviving data blocks are selected, all the original data blocks can be calculated. Similarly, if the k+m data blocks are dispersed among different storage nodes, it can tolerate m node failures.

When new data needs to be stored on the Filecoin network, users must connect to a storage provider through the Filecoin storage market, negotiate storage terms, and then place a storage order. At the same time, users must decide which type of erasure code to use and the replication factor. Through erasure coding, data is broken down into fixed-size fragments, each fragment is expanded and encoded with redundant data, so only a subset of the fragments needs to be reconstructed to restore the original file. The replication factor refers to how long the data should be replicated to more storage sectors of storage miners. Once the storage miners and users reach an agreement on the terms, the data will be transmitted to the storage miners and stored in their storage sectors.

Crust’s data storage method is different, they replicate the data to a fixed number of nodes: when a storage order is submitted, the data is encrypted and sent to at least 20 Crust IPFS nodes (the number of nodes can be adjusted). On each node, the data is divided into many smaller fragments, which are hashed into a Merkle tree. Each node retains all the fragments that make up the complete file.

Arweave also uses full-file replication, but Arweave uses some different methods. After a transaction is submitted to the Arweave network, the first single node stores the data as a block on the blockweave (Arweave’s blockchain representation). From there, a very aggressive algorithm called Wildfire ensures that the data is quickly replicated on the network, because in order for any node to mine the next block, they must prove that they can access the previous block.

Sia and Storj also use erasure coding to store files. In fact, Crust’s implementation: storing 20 complete datasets on 20 nodes is very redundant, but it also makes the data very durable. However, from the perspective of bandwidth, this is very inefficient. Erasure coding provides a more efficient way to implement redundancy by increasing data durability without significant bandwidth impact. Sia and Storj directly propagate EC shards to a specific number of nodes to meet certain durability requirements.

3.3 Data Storage Proofs and Incentives

The reason why the data storage format is explained first is because the technical path chosen directly determines the differences in proofs and incentives at the protocol layer. That is, how to verify that the data to be stored on a specific node is indeed stored on that specific node. Only after verification occurs can the network use other mechanisms to ensure that the data remains stored over time (i.e., storage nodes do not delete the data after the initial storage operation).

Such mechanisms include algorithms that prove the storage of data during specific time periods, financial incentives for successfully completing storage requests, and suppression of unfinished requests, etc. This section will introduce the storage and incentive protocols of each protocol.

3.3.1 Filecoin

On Filecoin, storage miners must deposit collateral into the network before receiving any storage requests as a commitment to providing storage to the network. Once completed, miners can offer storage on the storage market and set prices for their services. At the same time, Filecoin innovatively introduces PoRep and PoSt for storage validation by miners.

Source: Filecoin

Proof of Replication (PoRep): Miners need to prove that they store unique copies of the data. The unique encoding ensures that two storage transactions of the same data cannot reuse the same disk space.

Proof of Spacetime (PoSt): During the lifecycle of a storage transaction, storage miners need to prove every 24 hours that they continuously allocate dedicated storage space to store the data.

After submitting the proof, storage space providers will receive FIL rewards. If they fail to comply with the commitment, their collateral tokens will be seized (slashed).

However, over time, storage miners need to consistently prove their ownership of the stored data by running the algorithm regularly. But such consistent checks require a large amount of bandwidth. The novelty of Filecoin is that, in order to prove the storage of data over time and reduce bandwidth usage, miners use the output of the previous proof as the input for the current proof and generate replication proofs in order. This is done through multiple iterations, which represent the duration of the data to be stored.

3.3.2 Crust Network

Similar to Filecoin, the relationship between Crust and IPFS is also the relationship between the incentive layer and the storage layer. In Crust Network, nodes must also deposit collateral before accepting storage orders on the network. The amount of storage space provided by nodes determines the maximum amount of collateral, which is pledged and allows nodes to participate in block creation on the network. This algorithm is called Guaranteed Proof of Stake (GPoS), which ensures that only nodes with stakes in the network can provide storage space.

Source: Crust Wiki

Unlike Filecoin, Crust’s storage price discovery mechanism relies on DSM. Nodes and users automatically connect to the Decentralized Storage Market (DSM), which automatically selects nodes to store user data. Storage prices are determined based on user demands (such as storage duration, storage space, replication factor) and network factors (such as congestion). When a user submits a storage order, the data will be sent to multiple nodes on the network, which use the Trusted Execution Environment (TEE) of the machine to split and hash the data fragments. Since the TEE is a closed hardware component that even the hardware owner cannot access, node owners cannot reconstruct the file on their own.

After the file is stored on the node, a work report containing the file hash is published to the Crust blockchain along with the remaining storage of the node. From here, data storage is ensured over time, and the network periodically requests random data checks: in TEE, random Merkle tree hashes are retrieved along with the corresponding file fragments, which are decrypted and rehashed. The new hash is then compared with the expected hash. This implementation of storage proof is called Meaningful Proof of Work (MPoW).

GPoS is a PoS consensus algorithm that defines quotas based on storage resources. By providing workload reports through the first layer MPoW mechanism, Crust can obtain the storage workload of all nodes on the chain. The second layer GPoS algorithm calculates a Staking quota for each node based on the node’s workload. Based on this quota, PoS consensus is achieved. That is, the block rewards are proportional to the stake of each node, and the upper limit of the stake of each node is limited by the storage capacity provided by the node.

3.3.3 Arweave

Compared to the previous two pricing models, Arweave uses a very different pricing model, which is based on the fact that all stored data on Arweave is permanent, and the storage price depends on the cost of storing data on the network for 200 years.

The underlying layer of Arweave’s data network is based on the block generation mode of Bockweave. Typical blockchains, such as Bitcoin, are single-chain structures, where each block is linked to the previous block in the chain. In the mesh structure of blockweave, each block is linked not only to the previous block, but also to a randomly recalled block in the previous history of the blockchain. The recall block is determined by the hash value of the previous block in the block history and the height of the previous block. This is a deterministic but unpredictable way. When a miner wants to mine or verify a new block, the miner needs the right to access the recall block information.

The PoA of Arweave uses the RandomX hash algorithm, the probability of a miner mining a block = the probability of randomly recalling a block * the probability of finding the first hash. Miners need to find the appropriate hash value through the PoW mechanism to generate a new block, but the random number (Nonce) depends on the information of the previous block and any randomly recalled block. The randomness of the recall block encourages miners to store more blocks, thereby obtaining a relatively high success rate of computation and block rewards. PoA also incentivizes miners to store “scarce blocks”, i.e., blocks that others have not stored, in order to obtain a higher probability and reward for mining.

Source: Arweave Yellow LianGuaiper

When one-time fees are charged and subsequent data reading is provided as a free service, sustainability means that users can access the data at any time. But how can we incentivize miners to provide data reading services without any income?

Source: Arweave Yellow LianGuaiper

In the BitTorrent game theory strategy “optimistic tit-for-tat algorithm”, nodes are optimistic and will cooperate with other nodes. Non-cooperative behavior will be punished. Based on this, Arweave has designed Wildfire, a node rating system with implicit incentives. Each node in the Arweave network will rate its neighboring nodes based on the amount of data received and the response speed. Nodes will prioritize sending requests to higher-ranked peers. The higher the node’s ranking, the higher its credibility, and the greater the probability of mining a block and obtaining scarce blocks.

Wildfire is actually a game, a highly scalable game. There is no “ranking” consensus between nodes, and there is no obligation to report the generation and determination of rankings. The “goodness” or “badness” between nodes is regulated by an adaptive mechanism to determine rewards and penalties for new behaviors.

3.3.4 Sia

Like Filecoin and Crust, storage nodes in Sia must pledge collateral to provide storage services. On Sia, nodes must decide how much collateral to publish: collateral directly affects the user’s storage price, but publishing low collateral also means that if they disappear from the network, the nodes will not suffer any loss. These forces push the nodes towards balanced collateral.

Users connect to storage nodes through an automated storage market, similar to Filecoin: nodes set storage prices, and users set expected prices based on target prices and expected storage duration. Then, users and nodes will automatically connect with each other.

Source: Crypto Exchange

Among these projects, Sia’s consensus protocol uses the simplest method: storing contracts on the chain. Once a consensus is reached between the user and the node regarding the storage contract, funds are locked in the contract. The data is divided into fragments using erasure coding, and each fragment is individually hashed with a different encryption key. Then, each fragment is replicated on several different nodes. The storage contract recorded on the Sia blockchain records the protocol terms and the Merkle tree hash value of the data. To ensure that the data is stored within the expected storage time, storage proofs are regularly submitted to the network. These storage proofs are created based on a randomly selected part of the original storage file and a list of Merkle tree hash values recorded on the blockchain. Each storage proof submitted by a node within a certain period of time will be rewarded, and the final reward will be given when the contract is completed.

On Sia, storage contracts can last for a maximum of 90 days. To store files for more than 90 days, users must manually connect to the network using the Sia client software to extend the contract for another 90 days. Skynet is another layer on top of Sia, similar to Filecoin’s Web3.Storage or NFT.Storage platforms, which allows Skynet’s own client software to handle contract renewals for users, automating the process for them. While this is a workaround, it is not a solution at the Sia protocol level.

3.3.5 Storj

In the decentralized storage network of Storj, there is no blockchain or similar blockchain structure. The absence of a blockchain also means that the network does not have a global consensus on its state. Instead, data storage location tracking is handled by satellite nodes, and data storage is handled by storage nodes. Satellite nodes can decide which storage nodes to use for storing data, and storage nodes can decide which satellite nodes to accept storage requests from.

In addition to tracking data storage location across storage nodes, satellites also handle billing and payments for storage node storage and bandwidth usage. In this arrangement, storage nodes set their own prices, and as long as users are willing to pay these prices, satellites will connect them together.

Source: Storj GitHub

When a user wants to store data on Storj, the user must choose a satellite node to connect to and share their specific storage requirements. The satellite node then selects storage nodes that meet the storage requirements and connects them to the user. The user then directly transfers the file to the storage nodes while making payment to the satellite. The satellite then pays the storage node fees for the saved files and used bandwidth on a monthly basis.

This technical solution is actually very centralized, as the development of satellite nodes is entirely defined by the project team, which also means that the project team has control over pricing. Although a centralized architecture also provides Storj with efficient performance, as mentioned at the beginning, distributed storage does not necessarily mean decentralization. The ERC-20 token Storj released on Ethereum also does not utilize any smart contract functionality, essentially providing an alternative payment method.

This is closely related to Storj’s business model, as they focus on enterprise-level storage services, directly competing with Amazon’s S3 service and partnering with Microsoft Azure, aiming to provide services that match or even surpass the performance metrics of Amazon’s storage for enterprises. In the absence of performance data, it is indeed much more cost-effective to store data with Storj compared to Amazon, to some extent demonstrating the viability of the decentralized storage business model.

4. Impact of Different Technical Paths

4.1 Economic Model

The choice of technical paths also affects the design of token models to some extent. Each of the four major decentralized storage networks has its own economic model.

Filecoin, Crust, and Sia all use the Stake for Access (SFA) token model. In this model, storage providers must lock in the network’s native assets to accept storage transactions. The amount locked is proportional to the amount of data the storage provider can store. This creates a situation where storage providers must increase their collateral when storing more data, thereby increasing the demand for native assets of the network. In theory, the price of assets should increase as the amount of data stored on the network increases.

Arweave, on the other hand, utilizes a unique donation token model, where a significant portion of the one-time storage fee from each transaction is added to a donation pool. Over time, the tokens in the donation pool accumulate interest in the form of storage purchasing power. As time goes on, the donations are allocated to miners to ensure the persistence of data on the network. This donation model effectively locks tokens for the long term: as storage demand on Arweave increases, more tokens are removed from circulation.

Compared to the other three networks, Storj has the simplest token model. Its token, $STORJ, is used as a means of payment for storage services on the network, both for end users and storage providers. Therefore, the price of $STORJ is a direct function of the demand for $STORJ services.

4.2 Target Users

It is difficult to objectively say that one storage network is better than another. When designing decentralized storage networks, there is no single best solution. Depending on the purpose of the network and the problems it aims to solve, trade-offs must be made in terms of technical design, token economics, community building, etc.

Filecoin primarily targets enterprises and application development, providing cold storage solutions. Its competitive pricing and accessibility make it an attractive alternative for Web2 entities seeking cost-effective storage for a large amount of archived data.

Crust ensures excessive redundancy and fast retrieval, making it suitable for efficient retrieval of high-traffic dApps and popular NFT data. However, it lacks persistent redundancy, which severely affects its ability to provide permanent storage.

Arweave stands out from other decentralized storage networks with its concept of permanent storage, which is particularly popular for storing Web3 data such as blockchain state data and NFTs. Other networks are mainly optimized for hot storage or cold storage.

Sia targets the hot storage market and primarily focuses on developers seeking fully decentralized and private storage solutions with fast retrieval times. Although it currently lacks native AWS S3 compatibility, access layers like Filebase provide such services.

Storj seems more comprehensive but sacrifices some decentralization. Storj significantly lowers the entry barrier for AWS users and caters to the key target audience of enterprise hot storage optimization. It provides cloud storage compatible with Amazon S3.

5. Ecological Construction of Decentralized Storage

In terms of ecosystem construction, we can mainly discuss two types: the first type is upper-layer dApps built entirely on the storage network, aiming to enhance the network’s functionality and ecosystem; secondly, existing decentralized applications and protocols like Opensea, AAVE, etc., choose to integrate with specific storage networks to become more decentralized. In this section, we will focus on Filecoin, Arweave, and Crust, as Sia and Storj do not have prominent performances in terms of the ecosystem.

5.1 Filecoin Ecosystem

Source: Filecoin

In the ecosystem demonstrated by Filecoin, there are already 115 projects belonging to the aforementioned first type, which are all built entirely on the underlying structure of Filecoin. It can be observed that most projects are concentrated in general storage, NFTs, and consumer storage. Another important milestone in the Filecoin ecosystem is the Filecoin Virtual Machine (FVM), which is similar to the Ethereum Virtual Machine (EVM) and provides the environment needed to deploy and execute code in smart contracts.

Source: Filecoin

With FVM, the Filecoin network gains the ability to execute smart contracts on top of the existing storage network. In FVM, developers do not program the user’s stored data directly but define how these data will automatically or conditionally operate after being stored in the network through smart contracts (in a trustless manner). Imaginable scenarios include:

Distributed Computing based on Filecoin Storage (performing computations on the location where data is stored, without the need to move it first)

Crowdfunded Data Preservation Plan – where anyone can fund the storage of important data for society, such as crime data or climate change-related data

Intelligent Storage Market – dynamically adjusting storage rates based on different time periods, replication levels, and availability within a specific region

Centuries-long Storage and Perpetual Custody – storing data that can be accessed by future generations

Data DAO or Tokenized Dataset – modeling the value of data as tokens and forming a DAO to coordinate and trade computations performed on it

NFTs Stored Locally – co-locating NFT content with the registration records that track them

Time-Locked Data Retrieval – unlocking relevant datasets only after certain company records have been made public

Mortgage Loans (providing loans for specific purposes to storage providers, such as accepting FIL+ transaction proposals from specific users or increasing capacity within a defined time window)

Source: Filecoin

At its core, the FVM virtual machine is based on WebAssembly (WASM). This choice allows developers to write native upper-layer applications using any programming language that can be compiled to WASM. This feature makes it easier for Web3 developers to get started as they can leverage their existing knowledge and bypass the learning curve associated with specific languages.

Developers can also port existing Ethereum smart contracts with minimal (or even no) modifications to the source code. The ability to reuse audited and battle-tested smart contracts from the Ethereum network allows developers to save on development costs and time, while users can enjoy their utility in a less risky manner.

Another noteworthy feature is Filecoin Plus, a program designed to subsidize users to store large and valuable datasets at a discounted price. Customers who want to upload data to the network can apply to a selected group of community members called notaries, who review and allocate resources called DataCaps to the customers. The customers can then use DataCaps to subsidize their transactions with storage providers.

Filecoin Plus brings many benefits, making the Filecoin network more active and the storage of valuable data continues to generate block demand; customers get better services at competitive prices; with the increase in block rewards, compared with 2021, after the launch of Filecoin Plus in 2022, the stored data will increase by 18 times.

5.2 Crust Network Ecosystem

Compared with Filecoin and Arweave, Crust has a different approach in ecosystem construction. It tends to directly cooperate with existing Web3 applications and provide services, rather than incentivize third-party developers to build their own ecosystem applications on Crust. The main reason is that Crust is built on Polkadot. Although Ethereum and Cosmos ecosystems were considered choices in the early stages of the Crust project, their technical paths are not sufficiently compatible with it. Crust prefers Polkadot’s Substrate framework to provide a highly customizable development space, on-chain upgrades, and on-chain governance.

Source: Crust Network

Crust performs well in developer support. It has introduced the Crust development toolkit, which includes js SDK, Github Actions, Shell Scripts, and IPFS Scan to meet the integration preferences of different Web3 projects. Currently, the development toolkit has been integrated into various Web3 projects such as Uniswap, AAVE, Polkadot Apps, Liquity, XX Messenger, and RMRK.

According to the data provided on the official website, there are currently more than 150 projects integrated with Crust Network. A large portion of these applications (over 34%) are DeFi projects. This is because DeFi projects usually have high-performance requirements for data retrieval.

As mentioned earlier, on Crust Network, data is replicated to at least 20 nodes, and in many cases, to over 100 nodes. Although this does require larger initial bandwidth, the ability to retrieve data from multiple nodes simultaneously speeds up file retrieval and provides strong redundancy in the event of failures or nodes leaving the network. Crust Network relies on this high level of redundancy as it does not have data supplementation or repair mechanisms like other chains. Among these decentralized storage networks, Crust Network is the youngest.

5.3 Arweave Ecosystem

Source: Arweave, the newest ecosystem landscape

As shown in the above figure, Arweave also has a strong ecosystem. About 30 applications are highlighted, which are developed entirely based on Arweave. Although not as many as Filecoin’s 115 applications, these applications still meet users’ basic needs and cover a wide range of fields, including infrastructure, exchanges, social, and NFTs, etc.

Of particular note is the decentralized database built on Arweave. Arweave primarily uses its block organization for data storage, while executing off-chain computations on the user side. Therefore, the cost of using Arweave is determined solely by the amount of data stored on the chain.

This separation of computation from the chain is known as Storage-based Consensus Paradigm (SCP), which solves the scalability challenges of blockchain. SCP is feasible on Arweave because the input data is stored on-chain, and off-chain computations reliably produce the same state as on-chain computations.

The successful implementation of SCP has opened the door to the development of numerous databases on Arweave. Four different databases built on Arweave are:

WeaveDB: A key-value database built as a smart contract on Arweave, which uses whitelist addresses for access control logic.
HollowDB: A key-value database built as a smart contract on Arweave, which uses whitelist addresses and ZK proofs to ensure data verifiability. ZK proofs are also used to ensure data verifiability.
Kwil: An SQL database that runs its own P2P node network but uses Arweave as the storage layer. It uses public/private key pairs for access control logic and its own consensus mechanism for data validation.
Glacier: A NoSQL database architected as ZK-Rollup, using Arweave as its data availability layer. It uses public/private key pairs as access control logic and ZK proofs for data verifiability.

6. Growth Drivers

The growth of decentralized storage depends on several core factors, which can be divided into three major categories based on their characteristics: overall market prospects, technology, and public awareness. These factors are interrelated and complement each other and can be further subdivided into more subtle subcategories. The following paragraphs provide a more detailed breakdown of each factor.

6.1 Market Prospects

6.1.1 Potential of the Cloud Storage Market

With the penetration of the internet into contemporary life, cloud storage services are crucial for almost everyone. In 2022, the global cloud storage market reached an astonishing $78.6 billion, with no sign of slowing down. A market study suggests that by 2027, the industry’s valuation could reach $183.75 billion.

Meanwhile, IDC predicts that by 2029, the valuation of the cloud storage market will reach $376 billion. IDC’s forecast further illustrates the growing demand for data storage, estimating that by 2025, the global datasphere will expand to 175 zettabytes. Given these promising prospects, it can be concluded that decentralized storage, as an alternative to Web2 counterparts, will benefit from overall market growth and drive it upward.

6.1.2 Digital Asset Drive

As one of the key infrastructures of Web3, the growth of decentralized storage is inherently linked to the expansion of the entire cryptocurrency market. Even without considering the surge in storage demand, if the adoption rate of digital assets continues to rise, the market size of decentralized storage may also steadily increase. Without the infrastructure of power decentralization, true decentralization cannot be achieved. The increase in cryptocurrency adoption rate may signal that the public has a better understanding of the importance of decentralization, thereby driving the use of decentralized storage.

6.2 Technological Driver

6.2.1 Cloud-based Products and Computing Resources

The value of data often lies in the analytical meaning it provides, which requires data computation. However, in the existing decentralized storage market, there is a significant obstacle to large-scale data applications, which is the lack of mature compute-based products. Projects like Bacalhau and Shale are addressing this challenge and focusing their work on Filecoin. Other notable projects include Fluence and Slianguaice and Time, which are respectively developing AI query systems and computing markets. As compute-based products thrive, the demand for computing resources will also increase. This demand can be glimpsed at from the price trajectory of $RNDR, a peer-to-peer GPU computing network for users who need additional computing power. Its performance has grown an astonishing 500% year-to-date, reflecting investors’ expectations for demand growth. As these industries mature and the ecosystem becomes more comprehensive, the adoption of decentralized storage will increase significantly with the influx of users.

6.2.2 Decentralized Physical Infrastructure Network (DePIN)

Decentralized Physical Infrastructure Network (DePIN) is a blockchain-based network that integrates real-world digital infrastructure into the Web3 ecosystem. The key areas of DePIN include storage, computing, content delivery networks (CDNs), and virtual private networks (VPNs). These transformative networks seek to improve efficiency and scalability through the adoption of encrypted economic incentives and blockchain technology.

The advantage of DePIN lies in its potential for a virtuous cycle, which consists of three important components. Firstly, the protocol adopts token economics to incentivize participants, often through token-enhanced real-world applications and network usage. As the economic model solidifies, the rapid increase in token price and protocol usage has attracted attention, driving user and capital influx. This growing pool of capital and expanding user base attracts more ecosystem builders and developers, perpetuating the cycle. As the core track of DePIN, storage will also be one of the major beneficiaries of DePIN’s expansion.

6.2.3 Artificial Intelligence (AI)

The rapid development of artificial intelligence is expected to catalyze the growth of the crypto ecosystem and accelerate the development of various areas of digital assets. Artificial intelligence brings incentives to decentralized storage in two main aspects – by stimulating storage demand and enhancing the importance of the decentralized physical infrastructure network (DePIN).

As the number of products based on generative AI grows exponentially, the data they generate also increases exponentially. The surge in data stimulates the demand for storage solutions, thus driving the growth of the decentralized storage market.

Although Generative AI has already experienced significant growth, it is expected to continue this momentum in the long term. According to statistics from EnterpriseAppsToday, generative AI will account for 10% of all generated data globally by 2025. In addition, the CAGR predicts that generative AI will grow at a compound annual growth rate of 36.10% and reach $188.62 billion by 2032, indicating its enormous potential.

In the past year, the popularity of generative AI has significantly increased, as evidenced by Google Trend and YouTube searches. This growth further highlights the positive impact of artificial intelligence on the demand for decentralized storage solutions.

The surge in storage and computing resources required by artificial intelligence technology highlights the value of DePIN. As the Web 2.0 infrastructure market is monopolized by central entities, DePIN becomes an attractive alternative for users seeking cost-effective infrastructure and services. By democratizing access to resources, DePIN offers significantly lower costs, thereby increasing adoption rates. As artificial intelligence continues to evolve, its demand will further stimulate the growth of DePIN. In turn, this contributes to the expansion of the decentralized storage industry.

6.2.4 Filecoin Virtual Machine (FVM)

The Filecoin Virtual Machine (FVM) not only unleashes the potential of Filecoin itself but also completely transforms the entire decentralized storage market. As Filecoin is the largest decentralized storage provider, occupying a significant market share, its growth is essentially parallel to the expansion of the entire industry. The emergence of FVM transforms Filecoin from a data storage network into a comprehensive decentralized data economy. In addition to achieving permanent storage, FVM also integrates DeFi into the ecosystem, generating more revenue opportunities and attracting a larger user base and capital inflows into the industry.

As of June 22nd, when FVM went online for 100 days, more than 1,100 unique smart contracts supporting dApps have been deployed on the Filecoin network. In addition, over 80,000 wallets have been created, initiating interactions with these FVM-driven dApps. The total balance of FVM accounts and contracts has exceeded 2.8 million FIL. Currently, the protocols within the FVM ecosystem are all related to DeFi, enhancing the utility of $FIL. With the continuation of this upward trend, we expect to see a large number of applications that could trigger another wave of growth in the storage market. Furthermore, we anticipate that other storage networks will introduce virtual machine mechanisms similar to FVM, sparking an ecosystem frenzy. For example, Crust Network officially launched its EVM storage on July 17th, combining Crust Mainnet, Polkadot, and EVM contracts to build a new Crust protocol that seamlessly provides storage services for any EVM-based public chain.

6.2.5 Social and Gaming based on Decentralized Databases

Whether it is gaming or social applications, a decentralized database service is needed that can resist censorship and achieve high-speed read and write capabilities. Decentralized databases can enhance current Web3 applications and support the development of new applications and experiences in various fields.

Decentralized Social – By storing a large amount of social data in decentralized databases, users will have greater control over their data, the ability to migrate between platforms, and the opportunity to monetize content.
Gaming – Managing and storing player data, in-game assets, user settings, and other game-related information is an important aspect of blockchain-based gaming. Decentralized databases can ensure that this data can be seamlessly exchanged and combined by other applications and games. A hot topic in the current GameFi field is full-chain gaming, which means deploying all core modules, including static resource storage, game logic computation, and asset management, on the blockchain. A decentralized database with high-speed read and write capabilities is essential infrastructure for realizing this vision.

Games and social applications are the industries with the most Internet users, and also the industries most likely to produce killer applications, like the one that exploded in February this year, Demus. We believe that the explosion of Web3 games and social applications will also bring about a huge demand for decentralized databases.

6.3 Public Awareness

In addition to market prospects and technology, public awareness is a key component driving the growth of the decentralized storage market. The comparison between centralized storage and decentralized storage clearly highlights the many advantages of the latter. However, the ability to attract more users depends on more and more people realizing these benefits. This may be a long process that requires joint efforts from the entire industry. From content output to brand exposure marketing, industry practitioners must strive to convey how decentralized storage is fundamentally changing the cloud storage industry. This effort complements other growth factors and amplifies the impact of market expansion and technological evolution.

7. Conclusion and Prospects

Overall, decentralized storage is an infrastructure industry with a huge technological challenge, a long investment cycle, but enormous growth potential.

The long investment cycle is mainly due to the long iteration cycle of distributed technology itself, and project developers need to find a delicate balance between decentralization and efficiency. Providing efficient and highly available data storage and retrieval services while ensuring data privacy and ownership undoubtedly requires extensive exploration. Even IPFS often experiences unstable access, and other projects like Storj are not decentralized enough.

The potential for growth in this market is also highly anticipated. In 2012 alone, AWS S3 stored 1 trillion objects. Considering that an object can range from 10 to 100 MB, this means that AWS S3 alone used 10,000 to 100,000 PB of storage space.

According to Messari’s data, as of the end of 2022, the largest provider, Filecoin, had a storage utilization rate of only about 3%. This means that only about 600 PB of storage space on Filecoin is actively utilized. Clearly, there is still a lot of room for development in the decentralized storage market.

With the rise of artificial intelligence DePin, we have a bright future for decentralized storage, as several key growth drivers will promote market expansion.

References

The Essential Guide to Decentralized Storage Networks
Decentralized Databases: The Missing Piece of Web3
Crust Wiki
Arweave: A Protocol for Economically Sustainable Information Permanence
Blogs from Filecoin
The Most Comprehensive Analysis of Decentralized Storage Technology

Statement: This report is an original work completed by @ChenxiL46898047 and @BC082559, students of @GryphsisAcademy, under the guidance of @Zou_Block and @CryptoScott_ETH. The authors are solely responsible for all content, which does not necessarily reflect the views of Gryphsis Academy or the organization commissioning the report. Editorial content and decisions are not influenced by readers. Please be aware that the authors may own cryptocurrencies mentioned in this report. This document is for informational purposes only and should not be relied upon as the basis for investment decisions. It is strongly recommended that you conduct your own research and consult with an independent financial, tax, or legal advisor before making any investment decisions. Please remember that past performance of any assets does not guarantee future returns.

Like what you're reading? Subscribe to our top stories.

We will continue to update Gambling Chain; if you have any questions or suggestions, please contact us!

AIArweaveFilecoin

Gambling Chain