If web3 wants to scale like web2, it needs to have infrastructure that matches the speed and efficiency of web2. One of the key pain points to address here is retrieval of specific blockchain data. This is changing with the blockchain data indexers. Let’s dissect this key component of scalable blockchain designs and understand how it’s evolving for the changing nature of today’s blockchains like Rollups or Appchains.
What is an indexer? What does it do?
The main job of a blockchain indexer is to take the raw blockchain data and store it in an efficient format so that it can be retrieved and read in a way that makes it easy to access the specific information for Appchains or anyone interested. The data is accessible to the users in the form of API endpoints or rich user interfaces built using those API endpoints.
How does an indexer work?
Indexers go through a complex procedure of accessing the blockchain data, parsing it to make sense in the traditional databases, whether relational or non-relational, and then using query language or similar options to access the data the way anyone wants. This single component of blockchain development is responsible for high-speed applications built on the blockchain, rollup, or appchain technology, all being variations of the blockchain itself.
Indexers & rollups/appchains
Blockchains like Ethereum and Bitcoin Mainnet, Arbitrum One, Polygon PoS, Optimism Mainnet, etc, have decentralized indexer setups because of their popularity. However, it begins to get interesting when we talk about rollups or custom L1 appchains. The data on Rollups is slightly more complicated to work on. Because rollups and appchains process significantly more transactions, they have a layered architecture, and you might need to see L1 → L2 → L3 (and reverse) transactions, withdrawals, swaps, and data availability on another chain. This modular architecture becomes a hurdle to building simpler indexing solutions.
Another fact one should know when talking about indexers is that the blockchain data indexers is not stored in a very suitable format to be able to be accessed with complicated queries. A simple block structure chained together with hashes is the last thing you want when you’re building an application that wants to get all sorts of specific information. Even if blockchains can have some limited form of search operations built in, like the ownership of tokens and total supply in smart contracts, as soon as you start getting a little advanced, the system falls apart. Search, relationships, non-trivial filtering, and similar operations are next to impossible to pull off without using a solution like an indexer. Combined with the complexity of data on rollups and the storage medium, this makes the indexer's job especially hard.
The Graph and Subquery
This is where specialized protocols like The Graph and SubQuery come into play. With data structures specifically designed to handle data of the sort we get on blockchain in subgraphs, The Graph protocol provides a decentralized way of accessing the complicated information on Rollups and Appchains by giving us the necessary API endpoints. These endpoints can be queried using the query language GraphQL.
The Graph Protocol achieves this by constantly listening to the events emitted by the smart contracts that handle the data availability of the specified rollup or appchain. By constantly listening to events and storing the subgraph data, it builds the database that DApps can query to provide up-to date and accurate information to the user. Graph basically stores data in an external database and is delivered real fast at request. But unless there are multiple Subgraphs running for a graph node, it’s costly. For mature projects with more variety of data needs, Graph is perfect and fast.
The SubQuery protocol also offers similar features to quickly query the blockchain data to speed up your DApp. It also claims to be 3.9 times faster than the Graph protocol. But it works a bit differently and doesn’t store all the data in an external database. Using its libraries, it can provide real-time data with more filters. For newer projects, Subquery is ideal.
Building your own indexer
For both of these solutions, you can build the custom API from scratch by yourself and choose the specifics to control the data flow. Self-management comes at the cost of development time, storage and other maintenance costs, and other hassles associated with maintaining a full indexing system on your own. It’s hard, especially if your project is in the early stages of development. It’s hard if you handle too much data. Because efficient storage and pruning are prime tasks for indexers. For this, you can either use hosted or managed services wherein you will gain full control without needing to care about anything but your project. The choice depends on where you want to allocate the resources.
However, in any case, an indexer is a must-have for any blockchain rollup or Appchain solution.
Conclusion:
Indexers increase the speed multi-fold to access the data from rollup or appchains. Indexer is a must-have infrastructure for any kind of DApp that demands any type of data access from the blockchain. There are a few types of indexing solutions available; you can choose the one that makes the most sense for your project. There are options to bring your own RPC as well to get it indexed. Some data indexing platforms also give you the choice to make your indexer public to the community so that others can build on it.