Introduction

Ethereum was launched on 30th July 2015, while BNB Smart Chain (BSC) was launched on 29th August 2020. As the above diagram shows, BSC’s transaction traffic is much larger than Ethereum’s. And BSC’s total blockchain data exceeded Ethereum in around Q4 of 2021.

The peak of BSC daily transactions reached 16 million on 25th Nov 2021, that is ~188 TPS continuously running for 24 hours, none of the other EVM blockchains have faced such large online traffic yet.

As an EVM-compatible chain, BSC is more accessible to existing Web 3 developers, while its performance cannot be directly compared with other non-EVM compatible blockchains, such as Avalanche, Solana, or Near. As the accumulated storage capacity grows, the IO efficiency reduces, that is because larger MPT (Merkle Patricia Tries) depth (write amplification), and disk resource lookup costs increase as the data reaches several TB or more.

While it seems that Ethereum TPS has been around 15 for several years, which leads to its high gas prices, a simple ERC20 token transfer could cost tens of dollars. We all know that Ethereum put most of its efforts into scaling, through sharding or layer 2.

BSC’s strategy is a bit different, NodeReal has a strategic collaboration with the BNBChain Core Tech team and will help keep improving its layer 1 performance. At the same time, NodeReal is also improving the BSC’s scalability through Layer 2 technology. We think both layer 1 and layer 2 need high-performance capabilities.

In this article, I will try to share a deep analysis of the critical path and bottlenecks of the BSC networks, and there will be future articles to show what NodeReal has done to improve its performance.  To understand this article more easily, you should have a fundamental understanding of EVM from here.

The Critical 3 Seconds

As you may know, BSC is PoSA (Proof of Staked Authority) consensus, and it will generate a block every 3 seconds. Although the interval could fluctuate a bit if the network is not quite stable, most of the time it works smoothly.

To make sure the BSC network runs smoothly, all these tasks should be done within 3 seconds. We divide the 3s period into 4 phases:

Phase 1: Block Download

The validator node has to download the block first which was generated by the previous validator node.

Refer to https://bscscan.com/chart/blocksize, for most of the time the average block size is less than 100KB. Nowadays, the cost to download several hundreds of KB is very low, several milliseconds could be enough. So the block download cost could be ignored as long as the network is stable.

Phase 2: Block Import

Once the block is downloaded, the validator node needs to process the block by:

  1. Doing a quick check on the “header field” to make sure it's valid. It is very fast, 1ms could be enough.
  2. Executing the transactions one by one to get the block execution results. Depending on the transaction size and their workload, it ranges from ~50ms to hundreds of milliseconds.
  3. When all the transactions are executed, it will calculate the updated state root of MPT, to make sure the world state is consistent. To calculate the state root hash, it will go through the trie tree. The validation cost could be very expensive since BSC has a very large trie tree. The cost could be ~50ms to hundreds of milliseconds, and even more than 1 second in extreme cases.
  4. In the end, the updated state will be committed to DB if it is valid. The commit cost is relatively stable, it could be ~10ms to ~100ms, depending on the updated IO size and the underlying disk material (SSD, NVME…).

Phase 3: Mine

A validator node maintains a transaction pool to keep the incoming transactions. But it is a background thread, it is not part of the 3-second critical path, so we can ignore the cost of it.

After the previous block is imported, the validator node will try to generate its own block, which is quite similar to the import phase.

  1. Commit Transactions, it will get runnable transactions from the transaction memory pool and execute the transactions one by one. It is similar to the execution phase during block import, it could take ~50ms to hundreds of milliseconds.
  2. The finalize stage is similar to the validation phase during block import, it will calculate the MPT state root and put it as part of the block header, its cost is not stable, it could be ~50ms to hundreds of milliseconds, and even more than 1 second in extreme cases.
  3. The sign phase is fast, encryption operations are performed before sealing the block with the miner's key, several milliseconds should be enough.
  4. Time.Wait: Although the block is ready after the sign phase it wouldn’t be broadcasted until the 3-second interval is reached.
  5. Commit is similar to the commit phase during import, it just updates the state change into DB. it could be ~10ms to ~100ms, depending on the changed IO size and the underlying disk material(SSD, NVME…).

Phase 4: Block Broadcast

Similar to block download, block broadcast is very fast, several milliseconds should be enough.

The Bottlenecks

  1. Storage Access

In the same way as Ethereum, BSC uses KV storage to keep all of the world states, each key/value size is fixed to 32 Bytes.

Smart contracts access the world state frequently, we profiled several hot smart contract’s IO access data, as the below table shows:

And the below storage access hierarchical view shows the levelised cache of BSC. Most of the IO operations can be cached. But if the cache is missed, it will reach the hardware disk. The IO cost is very high, could reach 1 millisecond or more for a single IO operation.

2. EVM Execution

Most of the on-chain smart contracts won’t do complex computation tasks, since it is very expensive, so the pure opcode’s execution cost of EVM is a bit lower compared to other VMs, like JVM, V8 or WASM. We profiled several contracts’ opcode records, as shown by the table:

Most interpreter VM can process 10K opcodes very fast, in less than 1ms. But the accumulated execution cost cannot be ignored, since there could be hundreds of transactions within a block.

3. Validation

When transactions of the block are executed, it will recalculate the new MPT root to check if it is as expected, this process is called validation.  The validation cost of the BSC node is quite unstable since it needs to go through the MPT Tree, its cost depends on the pattern of the block data. We did a profile of a burst of traffic on the BSC node on the 18th of April 2022, the burst of traffic made the BSC network quite unstable, validation cost even exceeded 3 seconds. On the other hand, the execution cost for this burst of traffic is more stable.

4. System Level

As the BSC network keeps evolving, more features are added to it. It is necessary to profile from the system level from time to time. The main BSC client is a single process implemented by Golang. NodeReal did several profiles on lock, memory, IO, goroutine, address layout, and GC….

We found some interesting points to consider:

1. Memory cost is high, it could easily reach 20GB memory usage.

2. Very large goroutine numbers, there are hundreds of routines running within the process. At the same time, there are more temporary goroutines created to process a task and exit quickly.

3. GC cost is obvious since the memory allocation is very frequent, GC could occur in less than 1 minute and it could last for several seconds. Although golang GC did lots of improvements to eliminate the impact of STW (Stop The World), it still has a great impact on performance.

4. Most locks are used for chain insert, reorg and trie commit. Lock and unlock are used widely.

Potential Improvements

As we explained the bottlenecks above, there are optimization opportunities.

To make a short conclusion, I will list some of the optimizations NodeReal carried out to address these bottlenecks. More details of optimizations will be revealed later.

  • Storage Cache: preload data from disk to memory to avoid IO access on the critical path
  • Smooth Validation: to fix the unstable validation cost of burst data traffic.
  • Pipeline Block Process: to separate the block process into several steps and run these steps asynchronously.
  • Erigon Protocol: it is a new protocol of community with flattened storage and faster block data sync.

About NodeReal

NodeReal is a one-stop infrastructure and solution provider that embraces the high-speed blockchain era. We provide scalable, reliable, and efficient blockchain solutions for everyone, aiming to support the adoption, growth, and long-term success of the Web3 ecosystem.

Join Our Community

Join our community to learn more about NodeReal and stay up to date!

Discord | Twitter| Youtube | Linkedin