state synchronization

Learn about Tendermint Core state synchronization and support provided by the Cosmos SDK.

📣 Tip: Just concerned about how to sync nodes with the network? Skip to this section.

Tendermint core state synchronization

State synchronization allows new nodes to join the network by taking a snapshot of the most recent state of the network, rather than fetching and replaying all historical blocks. Since the application state is smaller than all blocks combined, and restoring state is faster than replaying blocks, this reduces the time to sync with the network from days to minutes.

This part of the document provides a brief overview of the Tendermint state synchronization protocol, and how nodes are synchronized. For more details, see the ABCI Application Guide and the ABCI Reference Documentation.

State synchronization snapshot

A guiding principle when designing Tendermint state synchronization is to provide as much flexibility as possible for applications. Therefore, Tendermint doesn't care what the snapshots contain, how they were taken or how they were restored. It is only concerned with discovering existing snapshots in the network, fetching them and passing them to the application via ABCI.

Tendermint uses light client validation to check the final application hash of the restored application against the chain application hash, but any further validation must be done by the application itself during recovery.

Snapshots consist of binary blocks of arbitrary format. Chunks cannot be larger than 16 MB, otherwise there is no limit. Snapshot Metadata, exchanged via ABCI and P2P, contains the following fields:

height (uint64): The height at which the snapshot was taken
format (uint32): Arbitrary application-specific format identifier (e.g. version)
chunks (uint32): number of binary chunks in the snapshot
hash (bytes): Arbitrary snapshot hash for comparing snapshots across nodes
metadata (bytes): Arbitrary binary snapshot metadata for use by the application

The format field allows an application to change its snapshot format in a backwards-compatible manner by providing snapshots in multiple formats and choosing which formats to accept during restore.

This is useful, for example, when changing the serialization or compression format: as a node may be able to provide a snapshot to a node running an older version, or use an old snapshot when starting with a newer version.

The hash field contains an arbitrary snapshot hash. Snapshots with the same metadata fields (including hash) across nodes are considered to be the same, and chunks will be fetched from any of those nodes.

The hash is trustless and not verified by Tendermint itself, which prevents unintentional non-determinism in snapshot generation.

hash can be verified by the application instead.

The metadata field can contain any arbitrary metadata required by the application. For example, applications may wish to include block checksums to discard corrupt chunks, or Merkle proofs validates each block individually against the chain application hash. Snapshot metadata messages cannot exceed 4 MB in Protobuf encoded form.

Take and provide snapshots

To enable state synchronization, certain nodes in the network must take and provide snapshots. When a peer attempts a state sync, an existing Tendermint node will call the following ABCI method on the application to provide snapshot data to the peer:

ListSnapshots: Returns a list of available snapshots, along with metadata
LoadSnapshotChunk: returns binary chunk data

Snapshots should generally be generated periodically, rather than on-demand: this improves state synchronization performance, since snapshot generation can be slow, and avoids a denial-of-service vector for attackers flooding nodes with such requests.

Older snapshots can usually be deleted, but it may be useful to keep at least the two most recent snapshots to avoid deleting previous snapshots when the node recovers.

It is entirely up to the application to decide how to take a snapshot, but it should strive to meet the following guarantees:

async: Snapshot should not stop block processing, so it should happen asynchronously, eg. in a separate thread
consistent: snapshots should be taken at isolated heights and should not be affected by concurrent writes, eg. Due to the block processing in the main thread
Deterministic: For a given height and format, snapshot chunks and metadata should be the same (at byte level) for all nodes to ensure good availability of chunks

For example, this can be achieved as follows:

Use a data store that supports transactions with snapshot isolation, such as RocksDB or BadgerDB.
Start a read-only database transaction in the main thread after the block is committed.
Pass the database transaction handle to the newly spawned thread.
Iterate over all data items in a definite order (for example, sort by key)
Serialize the data items (e.g. using Protobuf) and write them to a byte stream.
Hash the byte stream and split it into fixed size chunks (say 10 MB)
Store the chunks as separate files in the file system.
Write snapshot metadata to database or file, including byte stream hash.
Close the database transaction and exit the thread.

Applications may also need to take additional steps, such as compressing data, validating blocks, generating proofs for incremental verification, and deleting old snapshots.

restore snapshot

When Tendermint starts, it checks to see if the local node has any state (i.e. whether LastBlockHeight == 0), and if not, it will start discovering snapshots over the P2P network. These snapshots will be made available to native applications via the following ABCI calls:

OfferSnapshot(snapshot, apphash): Offer a discovered snapshot to the application
ApplySnapshotChunk(index, chunk, sender): apply snapshot chunk

The discovered snapshot is provided to the application, which can respond by accepting the snapshot, rejecting the snapshot, rejecting the format, rejecting the sender, aborting the state synchronization, etc.

Once the snapshot is accepted, Tendermint will fetch blocks from available peers and apply them in order to the application, which can choose to accept blocks, refetch, reject snapshot, reject sender, abort state sync, etc.

After all blocks are applied, Tendermint will call the Info ABCI method on the application and check the application hash and The height corresponds to the trusted value in the chain. It will then switch to fast sync for any remaining blocks (if enabled), before finally joining normal consensus operations.

How snapshots are actually restored is entirely up to the application, but usually the opposite of how they were generated.

Note, however, that Tendermint only verifies snapshots after all blocks have been restored, and will not reject any P2P peers on its own.

As long as the trusted hash and application code are correct, it is impossible for an adversary to cause a state-sync node to have an incorrect state when joining consensus, but applications can counteract state-sync denial of service (e.g., by implementing incremental verification, denying invalid node).

Note that state sync nodes will have a truncated block history from the height of the restored snapshot, there is currently no backfill of all block data. Networks should consider the wider implications and may wish to ensure that at least a few archiving nodes maintain full block history for auditability and backup purposes.

Cosmos SDK state synchronization

Cosmos SDK v0.40+ includes automatic support for state synchronization, so app developers just need to enable it to use. They will not need to implement the state sync protocol described in the above section on Tendermint themselves.

State synchronization snapshot

Tendermint Core handles most of the heavy lifting of discovering, exchanging, and validating state data for state synchronization, but applications must periodically take snapshots of their state, provide these snapshots to Tendermint via ABCI calls, and be able to resume these synchronizations when new nodes are created.

The Cosmos SDK stores application state in a data store called IAVL, and each module can set its own IAVL store. At a fixed height interval (configurable), the Cosmos SDK will export the content of each store at that height, Protobuf-encode and compress it , and save it to the snapshot store on the local file system. Since IAVL preserves historical versions of the data, these snapshots can be generated concurrently with the execution of new blocks.

Tendermint will fetch these snapshots via ABCI when a new node does a state sync.

Note that only IAVL storage managed by the Cosmos SDK can be snapshotted. If an app stores additional data in an external data store, there is currently no mechanism to include these in the state sync snapshot, so apps cannot use automatic state sync via the SDK.

However, as described in the ABCI documentation, one is free to implement the state-sync protocol itself.

When a new node state syncs, Tendermint will take a snapshot from the peers in the network and serve it to the local (empty) application, which will import it into its IAVL storage.

Tendermint then verifies the app's application hash against the main blockchain using a light client validation and proceeds to execute blocks as usual.

Note that a state sync node will only restore the application state at the snapshot height and will not include historical data or historical blocks.

Enable state sync snapshot

To enable state sync snapshots, an application using the CosmosSDK BaseApp needs to set up a snapshot store (with a database and a filesystem directory) and configure the snapshot interval and the number of historical snapshots to keep. A minimal example is as follows:

snapshotDir := filepath.Join(
  cast.ToString(appOpts.Get(flags.FlagHome)), "data", "snapshots")
snapshotDB, err := sdk.NewLevelDB("metadata", snapshotDir)
if err != nil {
  panic(err)
}
snapshotStore, err := snapshots.NewStore(snapshotDB, snapshotDir)
if err != nil {
  panic(err)
}
app := baseapp.NewBaseApp(
  "app", logger, db, txDecoder,
  baseapp.SetSnapshotStore(snapshotStore),
  baseapp.SetSnapshotInterval(cast.ToUint64(appOpts.Get(
    server.FlagStateSyncSnapshotInterval))),
  baseapp.SetSnapshotKeepRecent(cast.ToUint32(appOpts.Get(
    server.FlagStateSyncSnapshotKeepRecent))),
)

When the application is started with the appropriate flags, (e.g. --state-sync.snapshot-interval 1000 --state-sync.snapshot-keep-recent 2) it should take a snapshot and output a log message:

Creating state snapshot module=main height=3000
Completed state snapshot module=main height=3000 format=1

Note that the snapshot interval must currently be a multiple of pruning-keep-every (default 100) to prevent height clipping when snapshots are taken. It's also usually a good idea to keep at least the 2 most recent snapshots, so that when a node tries to use it for state synchronization, previous snapshots are not deleted.

State synchronization node

📣 Tip: Looking for a snapshot or archive node to sync your nodes? Check out [this page].

Once several nodes in the network have taken state sync snapshots, new nodes can join the network using state sync. To do this, the node should first be configured as usual, and the following information must be obtained for light client authentication:

Two available RPC servers (at least)
Confidence height
Block ID hash with confidence height

A trusted hash must be obtained from a trusted source (such as a block explorer), but the RPC server does not need to be trusted. Tendermint will use the hash to get the trusted application hash from the blockchain to verify the restored application snapshot. The application hash and corresponding height are the only information that can be trusted when restoring a snapshot. Everything else can be faked by an opponent.

In this guide we use Ubuntu 20.04

Prepare the system

update system

sudo apt update -y

Upgrading the system

sudo apt upgrade -y

install dependencies

sudo apt-get install ca-certificates curl gnupg lsb-release make gcc git jq wget -y

Install Go

wget -q -O - https://raw.githubusercontent.com/canha/golang-tools-install-script/master/goinstall.sh | bash
source ~/.bashrc

set node name

moniker="NODE_NAME"

Use the following commands for mainnet settings

SNAP_RPC1="http://xxx1:26657"
SNAP_RPC="http://xxx:26657"
CHAIN_ID="daodst_7777-1"
PEER="96557e26aabf3b23e8ff5282d03196892a7776fc@xxx,dec587d55ff38827ebc6312cedda6085c59683b6@xxx"
wget -O $HOME/genesis.json https://raw.githubusercontent.com/daodst/mainnet/genesis.json

Install stcd

git clone https://github.com/daodst/blockchain.git && \
cd cmd && cd stcd
go build

configuration

node initialization

stcd init $moniker --chain-id $CHAIN_ID

📣 Tip: $install_path is used to indicate the path where you installed the stcd binary

Move the genesis file to the $install_path/.stcd/config folder

mv $HOME/genesis.json $install_path/.stcd/config/

reset node

stcd tendermint unsafe-reset-all --home .stcd

Change config file (set node name, add persistent peer, set indexer="null")

sed -i -e "s%^moniker *=.*%moniker = \"$moniker\"%; " $install_path/.stcd/config/config.toml
sed -i -e "s%^indexer *=.*%indexer = \"null\"%; " $install_path/.stcd/config/config.toml
sed -i -e "s%^persistent_peers *=.*%persistent_peers = \"$PEER\"%; " $install_path/.stcd/config/config.toml

Set variables starting from snapshot

LATEST_HEIGHT=$(curl -s $SNAP_RPC/block | jq -r .result.block.header.height); \
BLOCK_HEIGHT=$((LATEST_HEIGHT - 2000)); \
TRUST_HASH=$(curl -s "$SNAP_RPC/block?height=$BLOCK_HEIGHT" | jq -r .result.block_id.hash)

examine

echo $LATEST_HEIGHT $BLOCK_HEIGHT $TRUST_HASH

Example output (numbers will vary):

376080 374080 F0C78FD4AE4DB5E76A298206AE3C602FF30668C521D753BB7C435771AEA47189

If the output is normal, the next step

sed -i.bak -E "s|^(enable[[:space:]]+=[[:space:]]+).*$|\1true| ; \

s|^(rpc_servers[[:space:]]+=[[:space:]]+).*$|\1\"$SNAP_RPC,$SNAP_RPC1\"| ; \

s|^(trust_height[[:space:]]+=[[:space:]]+).*$|\1$BLOCK_HEIGHT| ; \

s|^(trust_hash[[:space:]]+=[[:space:]]+).*$|\1\"$TRUST_HASH\"| ; \

s|^(seeds[[:space:]]+=[[:space:]]+).*$|\1\"\"|" $install_path/.stcd/config/config.toml

Create stcd service

echo "[Unit]
Description=Daodst Chain Node
After=network.target
#
[Service]
User=$USER
Type=simple
ExecStart=$(which stcd) daemon
Restart=on-failure
LimitNOFILE=65535
#
[Install]
WantedBy=multi-user.target" > $HOME/stcd.service; sudo mv $HOME/stcd.service /etc/systemd/system/

sudo systemctl enable stcd.service && sudo systemctl daemon-reload

Run stcd

sytemctl start stcd

Check logs

journalctl -u stcd -f

When a node starts up, it will try to find a state sync snapshot in the network, and restore it:

Started node module=main nodeInfo="..."
Discovering snapshots for 20s
Discovered new snapshot height=3000 format=1 hash=0F14A473
Discovered new snapshot height=2000 format=1 hash=C6209AF7
Offering snapshot to ABCI app height=3000 format=1 hash=0F14A473
Snapshot accepted, restoring height=3000 format=1 hash=0F14A473
Fetching snapshot chunk height=3000 format=1 chunk=0 total=3
Fetching snapshot chunk height=3000 format=1 chunk=1 total=3
Fetching snapshot chunk height=3000 format=1 chunk=2 total=3
Applied snapshot chunk height=3000 format=1 chunk=0 total=3
Applied snapshot chunk height=3000 format=1 chunk=1 total=3
Applied snapshot chunk height=3000 format=1 chunk=2 total=3
Verified ABCI app height=3000 appHash=F7D66BC9
Snapshot restored height=3000 format=1 hash=0F14A473
Executed block height=3001 validTxs=16 invalidTxs=0
Committed state height=3001 txs=16 appHash=0FDBB0D5F
Executed block height=3002 validTxs=25 invalidTxs=0
Committed state height=3002 txs=25 appHash=40D12E4B3

Nodes are now in sync and join the network within seconds

Turn off state synchronization mode

After the node is fully synced, use this command to turn off state sync mode to avoid problems with future node restarts!

sed -i.bak -E "s|^(enable[[:space:]]+=[[:space:]]+).*$|\1false|" $install_path/.stcd/config/config.toml

⚠️ NOTE: The information contained in this document comes from Erik Grinaker, in particular his state synchronization guide Tendermint Core and the Cosmos SDK.