BNB Smart Chain Performance Anatomy Series: Smooth Validation

In the previous BNB Smart Chain Performance Anatomy Series, we covered the Critical 3 Seconds and 99% Cache Hit Rate. In chapter 3, we are going to share a deep analysis of the Smooth Validation that makes the validation faster and more stable during the traffic burst, and what NodeReal has been doing to improve its performance with higher throughput.

1. Block Validation Process

Before we start, let me introduce how a block is validated. There are several stages for a BNB Smart Chain client to mine or sync blocks:

VerifyHeaders verify block headers to check block number, timestamp, parentHash
ValidateBody verify block bodies to check transactions root, receipts root
Process execute transactions and update temporary statedb
ValidateState calculate and verify state root after transactions have been executed
Commit persist state modification into the database as MPT nodeIn the validation stage, the importing module will calculate the state trie root (including storage trie and account trie) which needs to read data from the database on disk.

2. The Validation Cost

As we observed, during burst traffic, the validation stage costs a lot of time and could be quite unstable.

There are features which can help speed up the validation process and make it more smooth among different patterns of transactions or scenarios like burst traffic.

A significant improvement can be achieved in smooth validation by introducing the state prefetcher and redesigning the logic of trie prefetcher to make it work concurrently.

Referring to the picture above, the blue line shows the validation stage performance before the smooth validation during burst traffic while the orange one represents the version with the optimization.

3. The Approaches

Picture below demonstrates the additional features added to the process, and we will talk about them in separate chapters (mainly about triePrefetcher and the details about pipecommit).

3.1. Snapshot

BNB Smart Chain persists world state data of the blockchain in the structure of MPT. Every account state or storage key-value data is stored as an MPT node in the database. It’s a brilliant design of MPT which summarizes the whole world state as a single root node, and verifies all the state nodes in that tree. This made a good compression to minimize disk usage.

On the other hand, it brings bad query performance as a consequence of a long query path when we need to access the state data.

Snapshot provides a compatible approach to improve the query performance which introduces a middle layer with a flattened key-value structure between the execution layer and MPT data layer. With snapshot enabled, hot data could be accessed directly in memory (as difflayer) or of O(1) time complexity on disk (as disklayer).

Still, we need to commit state data as MPT nodes into the database, so a module named triePrefetcher is introduced along with Snapshot to load MPT nodes concurrently to improve the performance of the validation stage.

As for state data not accessed in recent blocks (the number of layers decided by the configuration about difflayer of the snapshot), it will search through all difflayers in memory and reach down diskLayer on disk eventually (it’s still much better than reading MPT nodes on disk). In this case state prefetcher was introduced to reduce the disk IO operations.

3.2. Trie Prefetcher

Trie prefetcher is more like a side effect of Snapshot. As discussed above, although snapshot improves the query performance a lot for MPT nodes stored in the database, we still need to persist data as MPT nodes in the database since we use MPT to summarize the world state for verification and reduce disk usage by data compression.

Trie prefetcher runs concurrently as a background routine while transactions are being executed. It loads MPT nodes from disk when the corresponding state data is accessed by EVM. As a result at the validation stage the MPT nodes data used to calculate state root would have already been loaded into memory instead of in the database on disk.

3.3. State prefetcher

Snapshot only caches hot data in memory which have been modified in recent blocks, and data that have not been modified would still be loaded from diskLayer on disk.

State prefetcher is designed to execute transactions on the current world state concurrently with the main routine without committing the modification. These executions will load state data into memory from disk in advance for the main process routine to reduce time-intensive IO operations.

3.4. PipeCommit

Validation and commit stages are always time sensitive. While most of the time the result of state verification would be right and there is no need for the next importing block to wait for the previous commit stage to be completed to process the transactions.

We redesigned the logic to make the validation and commit stages a background routine so that the main routine could be flowing in pipeline mode without being blocked by the time-insensitive procedure of validation and commit stages.

4. Our Work and Benefits

The original triePrefetcher maintains a hashMap tracking the trie root with a prefetcher routine including one state trie used for loading all account data as well as some storage tries corresponding accounts which persist all key-value data of smart contracts. Each prefetcher routine of a root will run concurrently.

We redesigned the logic of trie prefetcher by adding two key features to improve the performance in dealing with the scenario like burst traffic :

Make prefetcher routine of separate trie working parallelization
Make the trie prefetcher be concurrency safe so that it can benefit from state prefetcher

When a value from a storage trie is accessed, the key of the value would be appended to the task list of the corresponding trie prefetcher routine. And when the size of the pending list exceeds the limit depending on configuration, the prefetcher will create children prefetchers with the same storage trie root hash and schedule the tasks among all the children prefetchers with some certain algorithm deciding the order.

In this way, we maximize the performance of trie prefetcher with the parallel children prefetcher routines under the same trie by reducing the blocking time of the time-insensitive operations on disk.

4.1.Concurrent Trie Prefetcher

Currently, we create a subfetcher for each account address to do trie prefetch. But if the number of changed accounts or updated KV was too large, the subfetcher would not be able to fetch all the trie nodes in time. In this case, we will create child subfetchers to do parallel trie prefetch for it.

Child subfetcher will be created on demand when the pending task size exceeds the threshold of parallelTriePrefetchThreshold, and each child subfetcher has a capacity of parallelTriePrefetchCapacity. The created child subfetchers can be reused, we will try to dispatch tasks to these hungry child first before starting new child subfecther. If the parent subfetcher is stopped, the child subfecthers will be stopped too.

The general workflow of parallel trie prefetch is shown in the following diagram.

4.2.Trie Prefetch on State Prefetch

As the original state prefetch just pre-executes the transactions and discards the results. It is helpful to increase the snapshot cache hit rate. It would be more helpful, if it can do trie prefetch at the same time to preload the trie node and build the trie tree in advance.

Since we redesigned the trie prefetcher as concurrency safe, we enable the trie pfetcher on state prefetcher by reusing the main trie prefetch and doing finalize after the transaction is executed.

With this feature, the peak validation of the BSC 2022.04.18 burst traffic could be reduced by ~39%.

4.3. The Benefits

Besides the improvement demonstrated above, we did a long-run test which shows about 60% reduction of the validation cost.

5. Conclusion

Validation is a time-sensitive procedure and will varify from different patterns of transactions. With state prefetcher we make full use of trie prefetcher with as many MPT nodes loaded into cache in advance as possible and load the state data into memory for fast access in the main routine(snapshot).

About NodeReal

NodeReal is a one-stop blockchain infrastructure and service provider that embraces the high-speed blockchain era and empowers developers by “Make your Web3 Real”. We provide scalable, reliable, and efficient blockchain solutions for everyone, aiming to support the adoption, growth, and long-term success of the Web3 ecosystem.

Join Our Community

Join our community to learn more about NodeReal and stay up to date!

Discord | Twitter | Youtube | LinkedIn