State Sync

Learn about Tendermint Core State Sync and Support Offered by the Cosmos SDK

Tip: Only interested in how to sync a node with the network? Skip to this section.

Tendermint Core State Sync

State sync enables a new node to join a network by fetching a snapshot of the network state at a recent height, rather than retrieving and replaying all historical blocks. Because the application state is significantly smaller than the complete block history, and restoring state is faster than replaying blocks, this method drastically reduces the time required to synchronize with the network from days to minutes.

This section provides a concise overview of the Tendermint state sync protocol and how to sync a node. For more details, refer to the ABCI Application Guide and the ABCI Reference Documentation.

State Sync Snapshots

A fundamental design principle of Tendermint state sync is to provide applications with maximum flexibility. Consequently, Tendermint does not enforce specific snapshot contents, how they are created, or how they are restored. It only focuses on discovering existing snapshots within the network, retrieving them, and forwarding them to applications via ABCI. Tendermint uses light client verification to validate the final application hash against the chain’s app hash, but additional verification is left to the application itself during restoration.

Snapshots consist of binary chunks in an arbitrary format. While chunks cannot exceed 16 MB, there are no additional restrictions. Snapshot metadata exchanged via ABCI and P2P includes:

height (uint64): The block height at which the snapshot was taken.
format (uint32): Application-specific format identifier (e.g., version number).
chunks (uint32): The number of binary chunks in the snapshot.
hash (bytes): A snapshot hash for comparing snapshots across nodes.
metadata (bytes): Arbitrary binary snapshot metadata used by applications.

The format field allows applications to maintain backward compatibility by supporting multiple snapshot formats. This is useful when changing serialization or compression methods, enabling nodes to share snapshots with older peers or reuse legacy snapshots when starting a newer version.

The hash field is arbitrary and should not be considered secure since Tendermint does not verify it. Instead, applications should perform their own verification to prevent nondeterminism in snapshot generation.

The metadata field contains optional information such as chunk checksums to discard damaged chunks or Merkle proofs for individual chunk verification. In Protobuf-encoded form, snapshot metadata messages are limited to 4 MB.

Taking and Serving Snapshots

To enable state sync, some network nodes must take and serve snapshots. When a peer attempts to state sync, an existing Tendermint node will call the following ABCI methods to provide snapshot data:

ListSnapshots: Returns a list of available snapshots along with their metadata.
LoadSnapshotChunk: Returns binary chunk data.

Snapshots should be generated at regular intervals rather than on-demand to optimize performance and prevent denial-of-service attacks caused by excessive snapshot requests. Typically, older snapshots can be discarded, but retaining at least the two most recent snapshots ensures that nodes in the process of restoration can complete the sync before a snapshot is deleted.

Snapshot creation should adhere to the following best practices:

Asynchronous Execution: Snapshot generation should not halt block processing and should run in a separate thread.
Consistency: Snapshots should be taken at isolated heights, free from concurrent writes due to block processing.
Determinism: Snapshot chunks and metadata should be identical across all nodes at a given height and format.

A recommended implementation approach includes:

Using a database that supports transactions with snapshot isolation, such as RocksDB or BadgerDB.
Initiating a read-only database transaction after committing a block.
Spawning a new thread to handle snapshot creation.
Iterating over all data items in a deterministic order.
Serializing and writing data to a byte stream.
Hashing the byte stream and splitting it into fixed-size chunks (e.g., 10 MB).
Storing chunks as separate files.
Recording snapshot metadata, including the byte stream hash.
Closing the transaction and exiting the thread.

Additional optimizations include compressing data, implementing chunk checksums, generating incremental verification proofs, and removing outdated snapshots.

Restoring Snapshots

Upon startup, Tendermint checks whether the local node has any existing state (i.e., whether LastBlockHeight == 0). If none exists, it begins discovering snapshots via the P2P network. These snapshots are offered to the local application through the following ABCI calls:

OfferSnapshot(snapshot, apphash): Proposes a discovered snapshot to the application.
ApplySnapshotChunk(index, chunk, sender): Applies a snapshot chunk.

Once a snapshot is accepted, Tendermint retrieves chunks from available peers and sequentially applies them to the application, which can accept or reject chunks, reject the snapshot, reject specific senders, or abort state sync altogether.

After all chunks have been applied, Tendermint calls the Info ABCI method and verifies that the app hash and height match the trusted values from the blockchain. If fast sync is enabled, it fetches any remaining blocks before transitioning to normal consensus operation.

Key Considerations:

Snapshot restoration is application-specific and generally mirrors snapshot generation.
Tendermint only verifies snapshots after full restoration and does not reject malicious peers.
A properly verified app hash ensures that an adversary cannot manipulate a state-synced node into an incorrect state.
State-synced nodes will have a truncated block history starting at the snapshot height, without backfilled block data.
Networks should ensure that archival nodes retain full block history for auditing and backup purposes.

Cosmos SDK State Sync

Cosmos SDK v0.40+ includes built-in support for state sync, eliminating the need for application developers to implement the Tendermint state sync protocol manually. However, applications must still generate and serve snapshots.

The Cosmos SDK stores application state in IAVL, a versioned data store used by modules. At configurable height intervals, the SDK exports each store's contents, encodes them in Protobuf, compresses them, and saves them locally as snapshots. These snapshots are fetched via ABCI when a new node state-syncs.

If an application stores additional data outside of IAVL, these must be manually incorporated into the state sync mechanism. Otherwise, automatic state sync via the SDK is not possible.

Enabling State Sync Snapshots

Applications using Cosmos SDK's BaseApp must configure a snapshot store and specify snapshot intervals and retention policies. Example setup:

snapshotDir := filepath.Join(
  cast.ToString(appOpts.Get(flags.FlagHome)), "data", "snapshots")
snapshotDB, err := sdk.NewLevelDB("metadata", snapshotDir)
if err != nil {
  panic(err)
}
snapshotStore, err := snapshots.NewStore(snapshotDB, snapshotDir)
if err != nil {
  panic(err)
}
app := baseapp.NewBaseApp(
  "app", logger, db, txDecoder,
  baseapp.SetSnapshotStore(snapshotStore),
  baseapp.SetSnapshotInterval(cast.ToUint64(appOpts.Get(
    server.FlagStateSyncSnapshotInterval))),
  baseapp.SetSnapshotKeepRecent(cast.ToUint32(appOpts.Get(
    server.FlagStateSyncSnapshotKeepRecent))),
)

State Syncing a Node

After enabling state sync snapshots, new nodes can join the network using state sync. Nodes must:

Retrieve two available RPC servers.
Obtain a trusted block height and block ID hash.
Configure state sync settings accordingly.

After starting the node, it will automatically discover and restore a state sync snapshot, allowing it to join the network within minutes instead of days.

PreviousDisk Usage Optimization NextMempool

Last updated 4 months ago

Was this helpful?