How web3 data storage works

Listen to the article

What is Chainlink VRF

The rise of web3 has also led to the demand for decentralized storage solutions allowing users to store, retrieve, and maintain their data. According to a new report published by Holon and Filecoin, web3 data storage could solve the global data storage crisis. The collaborative report from Holon and Filecoin states that the current centralized data storage model won’t be able to handle future data demand. As per the report, global data generation grew to 79 ZB last year, up from just 2 ZB in 2010. Data generated by IoT, electrical vehicles, virtual and augmented reality, and 5G will increase the global data footprint by more than 300 times to meet future demand. The cost to reach a 3 ZB global data center footprint is estimated at more than US$300 billion. The report stated that expanding this global footprint by more than 300x to meet future demands will likely cost US$100 trillion. This is far too much to bear by existing cloud service providers. Decentralized data storage may be the answer to this. This article will give you an in-depth analysis of decentralized web3 storage.

The evolution of data storage from web1 to web3
Why do we need decentralized web3 storage?
Understanding the potential of decentralized data storage with examples
How to choose web3 storage?
Popular web3 storage networks
Limitations of current web3 storage solutions

The evolution of data storage from web1 to web3

In its early days, the web was largely a static medium, where the Web1 version was the first to introduce websites. This groundbreaking innovation provided a powerful medium for consuming content. It allowed one-way communication, which was its main drawback. Web1 was limited to content consumption, not user creation or contribution. This made web1 non-collaborative and somewhat boring compared to today’s web version. Web1 was also heavily centralized and controlled by its creators. They also had access to user data storage and communications. As a result, users didn’t have a significant role in web1; they were simply participants in the web content but had no ownership rights.

Web2 – towards user participation

Web2 allows users to not only read but also write. They can also create blogs, tutorial videos, and other content. However, there are restrictions on what users can do. Web2 offers users a wider range of creative possibilities, like creating custom websites and interacting directly with data. This gives them the ability to provide solutions and services that are impossible in web1. Yet, the servers that store and host such data are owned and managed by big tech companies. Users can create and submit data to the internet but cannot control it. Web-based communication and storage are, therefore, highly centralized.

Web3 – towards decentralization

Web3 gives users control over their data through decentralization, but they retain full access to storage and communication. This user-centric web version runs on blockchain networks, and it replaces single servers (centralized) with thousands of distributed computers (nodes) worldwide. They interact with users through decentralized apps or dApps, instead of traditional channels and processes.

Why do we need decentralized web3 storage?

We need decentralized web3 storage from the following perspectives-

Blockchain

The underlying infrastructure of web3 is built on blockchains. Decentralized storage is essential for a blockchain view, as blockchains are not intended to store a large amount of data. Blockchain consensus relies on small amounts of transaction data being organized in blocks and quickly shared across nodes to be validated. First of all, although storing data in these blocks is possible, they are very expensive. Secondly, suppose large amounts of arbitrary information are stored in these blocks. In that case, network congestion could increase greatly, leading to increased prices for users to use the network via gas bidding battles. This is due to the implicit time-value block where users who need to submit transactions at a particular time to the network will have to pay higher gas fees to prioritize their transactions. It is therefore recommended that both the NFT underlying metadata and the image data for dApp front-ends are stored off-chain.

Centralized network

Centralized networks can be censored and mutable. They require that the user trusts the storage provider to keep data safe. However, it is impossible to guarantee that an operator of a central network will live up to the trust they have placed in them. Data could be deleted on purpose or accidentally due to policy changes by the storage provider, hardware failures, or being attacked by other parties.

NFT

The floor prices of some NFT collections are exceeding high, and some have a value of up to US$70K per kb of image data. However, this is not enough to guarantee that the data will always be available. To ensure the immutability of NFT data and its permanence, greater assurance is required. NFTs don’t contain image data; instead, they contain a pointer that points to metadata and image data stored off-chain. This metadata and image data must be protected because an NFT would be nothing but an empty container if it disappeared. Beyond art collectibles and profile photos, NFTs can also represent ownership of real-world assets such as financial instruments or real estate. This data holds an extrinsic real-world value, and preserving every bit of data under the NFT is at least as valuable as the NFT on-chain.

dApps

dApps combine a front-end user interface, which is located off-chain, and a smart contract that is on the network and interacts with a blockchain. These may also include a backend that performs certain calculations on-chain to reduce gas consumption and the associated costs. While the core mechanics and operation of dApps can be done through smart contracts, end-users have access to them through their front-ends. In a sense, dApp front-end availability can be considered to ensure the underlying service’s availability. A decentralized storage system reduces the chance of server malfunctions, DNS hacks, or an entity removing access to the dApp front end.

Understanding the potential of decentralized data storage with examples

To better understand the potential of decentralized storage, it is important to describe what they offer and how they differ from centralized platforms. The decentralized storage system is an alternative to centralized systems. As today’s internet is centralized, most of the data that makes up our websites every day is stored in data warehouses primarily controlled by three companies, Amazon Web Services, Microsoft Azure, and Google cloud. These companies have been known to suffer blackouts that cause large swathes of the web to go down for hours. That’s the problem of having a single point of failure. Decentralized data storage eliminates this problem. Here we will see some examples of web3 networks playing key roles in web3 data storage.

Filecoin, a decentralized storage provider, is capable of creating a better web. They combine multiple devices’ computing power and storage capacity to create a supercomputer-like network that can store multiple copies of data. Filecoin’s programmable money system allows for the creation of a decentralized internet. Consequently, websites can still be accessed even when some nodes go down. Filecoin tokens can be used to rent extra storage space from users’ computers. This is what we see as the foundation technology of the next-generation web. Filecoin works on an incentive model. Users get paid every time they store data on the network. AWS and other centralized storage providers depend on specific servers or companies to store and deliver information. Filecoin is built on top of the InterPlanetary File System or IPFS, which does not retrieve content from where it is found. Instead, it leverages content addressing to extract content using a cryptographic hash. This means that content availability is not dependent on one company or server, allowing information to be retrieved more quickly and decreasing network latency.

Similarly, the Storj network, which consists of storage nodes, stores data for others. Contributors get paid to allocate storage and bandwidth. All data stored on storage nodes are client-side encrypted and erasure-coded. Storj uses uplink clients to allow developers to store information on Storj’s decentralized cloud storage. The files are then divided into 80 pieces and distributed over the network of storage nodes. Each of the 80 pieces is stored on different storage nodes with different power supplies and operations. This gives them tremendous security, performance, and durability benefits.

Filecoin and Storj are among some decentralized web3 storage that offers features quite different from those provided by a centralized system, and web3 storage requires these solutions.

How to choose web3 storage?

Developers must make some key decisions while choosing web3 data storage. First, they need to look at their data and determine if it’s structured or unstructured data. Structured data includes data they would store in a spreadsheet, a JSON file or XML file or a Notion database. Images, movies and other multimedia are examples of unstructured data.

Secondly, they need to check whether the data is private or public. Public means it is accessible without an access control mechanism. It is important to note that encryption does not provide access control mechanisms, and access cannot be revoked in the event of a key leak.

Finally, do they need to delete or update data? Some storage systems allow for deletion, which is the deletion of data references stored elsewhere. These cases are not uncommon, and the data can still be accessed by anyone who has recorded it. Updates and deletions should be done to ensure that no copies of the data can ever be modified.

One network can’t be objectively superior to the other. There are many trade-offs to consider when designing a decentralized web3 storage system. For example, Arweave can be used to store data permanently, but it is unsuitable for moving web2 industry players to web3 because not all data must be permanent. There is, however, a subsector that requires permanence and its NFTs/dApps. So, design decisions are ultimately based on the intended purpose of the network.

Below are summative profiles for the different storage networks. However, it is important to note that many strategies for overcoming decentralized storage challenges are not better or worse than others. Instead, they reflect design decisions based on the followings –

Storage parameter flexibility – The extent to which users can control file storage parameters.
Storage permanence – The extent to which file storage can attain theoretical permanence via the network.
Redundancy persistence – The network’s ability to maintain data redundancy by replenishment or repair.
Data transmission incentive – The extent to which nodes are able to transmit data freely.
Storage tracking – The extent to which nodes agree on the data storage location.
Assured data access – The ability of the network to ensure that no single actor can remove files from the network.

Popular web3 storage networks

Here are some of the popular web3 storage networks-

IPFS

IPFS is a peer-to-peer hypermedia protocol that aims to make the internet more accessible, faster, safer and more open. IPFS allows users to store and share content. Each user has its node (server) in the blockchain world. These nodes can communicate with each other and share files. IPFS is decentralized as it loads content from thousands of peers rather than one central server. Each piece of data is cryptographically hashed to create a unique content identifier: CID. You can store a website in IPFS to avoid censorship and a single point of failure. But what happens if your personal IPFS node goes offline? You don’t have to worry about it, as the website will still load via other nodes worldwide that serve it. We can cryptographically confirm the integrity of IPFS contents. Finally, the IPFS contents are de-duplicated. Because their hash would yield an identical ID, so you can store two identical 1MB files within the same IPFS node.

How IPFS works?

When you add a file to IPFS, it is broken down into smaller chunks, cryptographically hashed, and given a unique identifier known as content ID (CID). The CID is a permanent record of the file, as it exists at that point in time.
Other nodes look up your file and ask their peers who are storing the content referenced in the file’s CID. They cache a copy of your file when they view it or download it. This allows them to continue providing your content until the cache is empty.
A node can pin content to allow it to be kept forever or to discard content that it hasn’t used for a while in order to save space. Each node in the network stores only the contents it is interested in. It also stores indexing information that allows you to determine which node is storing which content.
IPFS will give you a new CID if you upload a new file. Its cryptographic hash will be different if it is added to the IPFS system. This means files saved to IPFS are immune to censorship and tampering. Any changes to files don’t overwrite the existing ones. We can also reuse common chunks to reduce storage costs.
This doesn’t mean you must remember a long string of CIDs. IPFS can locate the most recent version of your file using the IPNS decentralized naming system. DNSLink can map CIDs with human-readable names.

Embark on your web3 journey with our future-ready web3 solutions.

Launch your Web3 project with LeewayHertz

Learn More

Filecoin

Developed by Protocol Labs, the same company that has developed IPFS, Filecoin is a peer-to-peer network that stores files. It has built-in economic incentives so we can store files reliably over time. Filecoin allows users to pay for storage services. The storage providers are computers that store files and can prove they have maintained the files correctly over time. Filecoin is open to anyone who would like to store files and get paid to do so. The price and availability of storage are not set by one company. Filecoin instead facilitates an open market for storing and retrieving files, in which anyone can participate. Filecoin includes a blockchain and native cryptocurrency (FIL). For storing files, storage providers can earn FIL units. Filecoin’s blockchain records FIL transactions and provides proof from storage providers that the files are being stored correctly.

Why Filecoin?

This overview shows you the features of Filecoin, which makes it an attractive system for storing files.

Verifiable storage
Open market
Competitive price
Reliable storage
Reputed
Choice of trade-offs
Censorship resistance
Provide storage to other blockchains
Content addressing
Content distribution network
Single protocol
No lock-in
Open source code
Active community

Swarm

Swarm was designed to interoperate with the Ethereum smart contracts ecosystem. It will be similar to Filecoin and piggyback on Ethereum’s consensus process to offer a decentralized alternative for our existing client/server infrastructure. Swarm is a decentralized infrastructure that stores and transfers data from around the globe.

Why Swarm?

Completely decentralized
Permissionless and private
Truly leakproof messaging for full nodes
Scalable with zero hosting costs

How does it work?

Swarm is a native data storage platform that works with the blockchain. This allows EVM-based smart contracts to verify and parse data structures in their distributed DISC.

Kademlia routing – Kademlia routing is used by nodes to organize themselves into a regular network. Each node takes responsibility for forwarding and storing chunks according to the addressing scheme. All nodes have access to all data in the Swarm.
Push – When data is uploaded to the Swarm, it is split into 4kb chunks. Then they are distributed by Kademlia routing protocol in their nearest neighbor. This evenly distributes data across the swarm network,
Pull – Each full node becomes part of the Swarm when it joins the Swarm. Chunks in its radius of responsibility are synced with nodes within each radium. This allows each node to provide chunks close to its address when the retrieval protocol requests.
Retrieval – Each node requests chunks from the retrieval protocol, which routes it to the nearest node in that chunk’s neighborhood. This allows every node within the Swarm to access chunks in their area of responsibility.

Limitations of current web3 storage solutions

The first storage type that web3 developers will use is blockchain storage. It doesn’t take too long to realize that this storage best suits very specific data types, such as links to other data like NFTs. It is expensive and inefficient for small amounts of data to be stored on a blockchain. Additionally, the data’s immutable and public nature makes it unsuitable for many applications. These limitations are often discovered by developers who switch to IPFS. IPFS is a globally accessible content-addressable file storage system. Files are referenced using the hash of the content. It is well suited for flat file storage and has been used extensively to store NFT images. IPFS can be described as a way that AWS S3 is used for web2 development to make files accessible to users and applications.

Web3 developers face the next problem, managing access permissions and private data. Web3 users want to control their personal data and have self-sovereignty. How can you keep the information, such as users’ health information or their interests, private and not available to others for identity theft or data mining?

Many developers resort to encryption and storage of data on-chain. This poses serious problems.

Most developers are not cryptographers. They often make mistakes when implementing encryption, which can expose user data.
Both IPFS and blockchains are immutable storage systems that can be accessed from anywhere.
Apps that lack a good query mechanism are less likely to be successful. Applications can either load large files into memory, and developers search for data within them, or they may have many smaller files that must be requested every time they are needed.
You can update or delete data by copying the entire file to a newer version and saving it. This can be both costly and slow.
There is no control over where data are stored, which can be crucial for the policy or legal reasons.

Conclusion

We have given a brief introduction to one part of the web3 storage vision. Distributed storage and networking are rapidly developing and could provide solutions that could have trillions of dollars annually for the global economy. Today’s web requires a new security model and architecture that is adapted to modern use cases. Swarm and IPFS are the most ambitious solutions. There are other options worth mentioning: Sia and Storj are two decentralized storage options that are near maturity, and it would be remiss not to mention them. Unexpected opportunities will emerge as the global infrastructure adjusts to meet the demands that we place on it. The way we live, work and play will be transformed by new tools. This will also impact the way we group ourselves.

We are delighted to collaborate with you if you are looking for a decentralized storage solution for your business or want to build a decentralized web3 storage. Our web3 developers have developer-friendly and enterprise-requirement-centric services. Ask for a free demo!

Listen to the article

What is Chainlink VRF

Author’s Bio

Akash Takyar

CEO LeewayHertz

Akash Takyar is the founder and CEO of LeewayHertz. With a proven track record of conceptualizing and architecting 100+ user-centric and scalable solutions for startups and enterprises, he brings a deep understanding of both technical and user experience aspects.
Akash's ability to build enterprise-grade technology solutions has garnered the trust of over 30 Fortune 500 companies, including Siemens, 3M, P&G, and Hershey's. Akash is an early adopter of new technology, a passionate technology enthusiast, and an investor in AI and IoT startups.

Write to Akash

Start a conversation by filling the form

Once you let us know your requirement, our technical expert will schedule a call and discuss your idea in detail post sign of an NDA.
All information will be kept confidential.

This field is hidden when viewing the form

OID

This field is hidden when viewing the form

Campaign ID

This field is hidden when viewing the form

Source

This field is hidden when viewing the form

Lead Source Description

First Name(Required)

Last Name(Required)

Company Email(Required)

Company Name(Required)

Job Title(Required)

Country(Required)

Select a state/province(Required)

Comments(Required)

Name

This field is for validation purposes and should be left unchanged.

How web3 data storage works

The evolution of data storage from web1 to web3

Web2 – towards user participation

Web3 – towards decentralization

Why do we need decentralized web3 storage?

Blockchain

Centralized network

NFT

dApps

Understanding the potential of decentralized data storage with examples

How to choose web3 storage?

Popular web3 storage networks

IPFS

How IPFS works?

Embark on your web3 journey with our future-ready web3 solutions.

Filecoin

Why Filecoin?

Swarm

Why Swarm?

How does it work?

Limitations of current web3 storage solutions

Conclusion

Author’s Bio

Start a conversation by filling the form

Insights

Stellar-vs-EVM-Based-Blockchains

A deeper look into liquidity pools and how they are vital to the DeFi ecosystem

What are Soulbound Tokens, and How do they Work?

Book a Consultation with Our AI Experts!