state synchronization
Learn about Tendermint Core state synchronization and support provided by the Cosmos SDK.
📣 Tip: Just concerned about how to sync nodes with the network? Skip to this section.
Tendermint core state synchronization
State synchronization allows new nodes to join the network by taking a snapshot of the most recent state of the network, rather than fetching and replaying all historical blocks. Since the application state is smaller than all blocks combined, and restoring state is faster than replaying blocks, this reduces the time to sync with the network from days to minutes.
This part of the document provides a brief overview of the Tendermint state synchronization protocol, and how nodes are synchronized. For more details, see the ABCI Application Guide and the ABCI Reference Documentation.
State synchronization snapshot
A guiding principle when designing Tendermint state synchronization is to provide as much flexibility as possible for applications. Therefore, Tendermint doesn't care what the snapshots contain, how they were taken or how they were restored. It is only concerned with discovering existing snapshots in the network, fetching them and passing them to the application via ABCI.
Tendermint uses light client validation to check the final application hash of the restored application against the chain application hash, but any further validation must be done by the application itself during recovery.
Snapshots consist of binary blocks of arbitrary format. Chunks cannot be larger than 16 MB, otherwise there is no limit. Snapshot Metadata, exchanged via ABCI and P2P, contains the following fields:
height
(uint64
): The height at which the snapshot was takenformat
(uint32
): Arbitrary application-specific format identifier (e.g. version)chunks
(uint32
): number of binary chunks in the snapshothash
(bytes
): Arbitrary snapshot hash for comparing snapshots across nodesmetadata
(bytes
): Arbitrary binary snapshot metadata for use by the application
The format
field allows an application to change its snapshot format in a backwards-compatible manner by providing snapshots in multiple formats and choosing which formats to accept during restore.
This is useful, for example, when changing the serialization or compression format: as a node may be able to provide a snapshot to a node running an older version, or use an old snapshot when starting with a newer version.
The hash
field contains an arbitrary snapshot hash. Snapshots with the same metadata
fields (including hash
) across nodes are considered to be the same, and chunks
will be fetched from any of those nodes.
The hash
is trustless and not verified by Tendermint itself, which prevents unintentional non-determinism in snapshot generation.
hash
can be verified by the application instead.
The metadata
field can contain any arbitrary metadata required by the application.
For example, applications may wish to include block checksums to discard corrupt chunks
, or Merkle proofs validates each block individually against the chain application hash.
Snapshot metadata
messages cannot exceed 4 MB in Protobuf encoded form.
Take and provide snapshots
To enable state synchronization, certain nodes in the network must take and provide snapshots. When a peer attempts a state sync, an existing Tendermint node will call the following ABCI method on the application to provide snapshot data to the peer:
ListSnapshots
: Returns a list of available snapshots, along with metadataLoadSnapshotChunk
: returns binary chunk data
Snapshots should generally be generated periodically, rather than on-demand: this improves state synchronization performance, since snapshot generation can be slow, and avoids a denial-of-service vector for attackers flooding nodes with such requests.
Older snapshots can usually be deleted, but it may be useful to keep at least the two most recent snapshots to avoid deleting previous snapshots when the node recovers.
It is entirely up to the application to decide how to take a snapshot, but it should strive to meet the following guarantees:
- async: Snapshot should not stop block processing, so it should happen asynchronously, eg. in a separate thread
- consistent: snapshots should be taken at isolated heights and should not be affected by concurrent writes, eg. Due to the block processing in the main thread
- Deterministic: For a given
height
andformat
, snapshotchunks
andmetadata
should be the same (at byte level) for all nodes to ensure good availability ofchunks
For example, this can be achieved as follows:
- Use a data store that supports transactions with snapshot isolation, such as RocksDB or BadgerDB.
- Start a read-only database transaction in the main thread after the block is committed.
- Pass the database transaction handle to the newly spawned thread.
- Iterate over all data items in a definite order (for example, sort by key)
- Serialize the data items (e.g. using Protobuf) and write them to a byte stream.
- Hash the byte stream and split it into fixed size chunks (say 10 MB)
- Store the chunks as separate files in the file system.
- Write snapshot metadata to database or file, including byte stream hash.
- Close the database transaction and exit the thread.
Applications may also need to take additional steps, such as compressing data, validating blocks, generating proofs for incremental verification, and deleting old snapshots.
restore snapshot
When Tendermint starts, it checks to see if the local node has any state (i.e. whether LastBlockHeight == 0
), and if not, it will start discovering snapshots over the P2P network.
These snapshots will be made available to native applications via the following ABCI calls:
OfferSnapshot(snapshot, apphash)
: Offer a discovered snapshot to the applicationApplySnapshotChunk(index, chunk, sender)
: apply snapshot chunk
The discovered snapshot is provided to the application, which can respond by accepting the snapshot, rejecting the snapshot, rejecting the format, rejecting the sender, aborting the state synchronization, etc.
Once the snapshot is accepted, Tendermint will fetch blocks from available peers and apply them in order to the application, which can choose to accept blocks, refetch, reject snapshot, reject sender, abort state sync, etc.
After all blocks are applied, Tendermint will call the Info
ABCI method on the application and check the application hash and The height corresponds to the trusted value in the chain.
It will then switch to fast sync for any remaining blocks (if enabled), before finally joining normal consensus operations.
How snapshots are actually restored is entirely up to the application, but usually the opposite of how they were generated.
Note, however, that Tendermint only verifies snapshots after all blocks have been restored, and will not reject any P2P peers on its own.
As long as the trusted hash and application code are correct, it is impossible for an adversary to cause a state-sync node to have an incorrect state when joining consensus, but applications can counteract state-sync denial of service (e.g., by implementing incremental verification, denying invalid node).
Note that state sync nodes will have a truncated block history from the height of the restored snapshot, there is currently no backfill of all block data. Networks should consider the wider implications and may wish to ensure that at least a few archiving nodes maintain full block history for auditability and backup purposes.
Cosmos SDK state synchronization
Cosmos SDK v0.40+ includes automatic support for state synchronization, so app developers just need to enable it to use. They will not need to implement the state sync protocol described in the above section on Tendermint themselves.
State synchronization snapshot
Tendermint Core handles most of the heavy lifting of discovering, exchanging, and validating state data for state synchronization, but applications must periodically take snapshots of their state, provide these snapshots to Tendermint via ABCI calls, and be able to resume these synchronizations when new nodes are created.
The Cosmos SDK stores application state in a data store called IAVL, and each module can set its own IAVL store. At a fixed height interval (configurable), the Cosmos SDK will export the content of each store at that height, Protobuf-encode and compress it , and save it to the snapshot store on the local file system. Since IAVL preserves historical versions of the data, these snapshots can be generated concurrently with the execution of new blocks.
Tendermint will fetch these snapshots via ABCI when a new node does a state sync.
Note that only IAVL storage managed by the Cosmos SDK can be snapshotted. If an app stores additional data in an external data store, there is currently no mechanism to include these in the state sync snapshot, so apps cannot use automatic state sync via the SDK.
However, as described in the ABCI documentation, one is free to implement the state-sync protocol itself.
When a new node state syncs, Tendermint will take a snapshot from the peers in the network and serve it to the local (empty) application, which will import it into its IAVL storage.
Tendermint then verifies the app's application hash against the main blockchain using a light client validation and proceeds to execute blocks as usual.
Note that a state sync node will only restore the application state at the snapshot height and will not include historical data or historical blocks.
Enable state sync snapshot
To enable state sync snapshots, an application using the CosmosSDK BaseApp
needs to set up a snapshot store (with a database and a filesystem directory) and configure the snapshot interval and the number of historical snapshots to keep. A minimal example is as follows:
snapshotDir := filepath.Join(
cast.ToString(appOpts.Get(flags.FlagHome)), "data", "snapshots")
snapshotDB, err := sdk.NewLevelDB("metadata", snapshotDir)
if err != nil {
panic(err)
}
snapshotStore, err := snapshots.NewStore(snapshotDB, snapshotDir)
if err != nil {
panic(err)
}
app := baseapp.NewBaseApp(
"app", logger, db, txDecoder,
baseapp.SetSnapshotStore(snapshotStore),
baseapp.SetSnapshotInterval(cast.ToUint64(appOpts.Get(
server.FlagStateSyncSnapshotInterval))),
baseapp.SetSnapshotKeepRecent(cast.ToUint32(appOpts.Get(
server.FlagStateSyncSnapshotKeepRecent))),
)
When the application is started with the appropriate flags, (e.g. --state-sync.snapshot-interval 1000 --state-sync.snapshot-keep-recent 2
) it should take a snapshot and output a log message:
Creating state snapshot module=main height=3000
Completed state snapshot module=main height=3000 format=1
Note that the snapshot interval must currently be a multiple of pruning-keep-every
(default 100) to prevent height clipping when snapshots are taken. It's also usually a good idea to keep at least the 2 most recent snapshots, so that when a node tries to use it for state synchronization, previous snapshots are not deleted.
State synchronization node
📣 Tip: Looking for a snapshot or archive node to sync your nodes? Check out [this page].
Once several nodes in the network have taken state sync snapshots, new nodes can join the network using state sync. To do this, the node should first be configured as usual, and the following information must be obtained for light client authentication:
- Two available RPC servers (at least)
- Confidence height
- Block ID hash with confidence height
A trusted hash must be obtained from a trusted source (such as a block explorer), but the RPC server does not need to be trusted. Tendermint will use the hash to get the trusted application hash from the blockchain to verify the restored application snapshot. The application hash and corresponding height are the only information that can be trusted when restoring a snapshot. Everything else can be faked by an opponent.
In this guide we use Ubuntu 20.04
Prepare the system
update system
sudo apt update -y
Upgrading the system
sudo apt upgrade -y
install dependencies
sudo apt-get install ca-certificates curl gnupg lsb-release make gcc git jq wget -y
Install Go
wget -q -O - https://raw.githubusercontent.com/canha/golang-tools-install-script/master/goinstall.sh | bash
source ~/.bashrc
set node name
moniker="NODE_NAME"
Use the following commands for mainnet settings
SNAP_RPC1="http://xxx1:26657"
SNAP_RPC="http://xxx:26657"
CHAIN_ID="daodst_7777-1"
PEER="96557e26aabf3b23e8ff5282d03196892a7776fc@xxx,dec587d55ff38827ebc6312cedda6085c59683b6@xxx"
wget -O $HOME/genesis.json https://raw.githubusercontent.com/daodst/mainnet/genesis.json
Install stcd
git clone https://github.com/daodst/blockchain.git && \
cd cmd && cd stcd
go build
configuration
node initialization
stcd init $moniker --chain-id $CHAIN_ID
📣 Tip: $install_path
is used to indicate the path where you installed the stcd
binary
Move the genesis file to the $install_path/.stcd/config folder
mv $HOME/genesis.json $install_path/.stcd/config/
reset node
stcd tendermint unsafe-reset-all --home .stcd
Change config file (set node name, add persistent peer, set indexer="null")
sed -i -e "s%^moniker *=.*%moniker = \"$moniker\"%; " $install_path/.stcd/config/config.toml
sed -i -e "s%^indexer *=.*%indexer = \"null\"%; " $install_path/.stcd/config/config.toml
sed -i -e "s%^persistent_peers *=.*%persistent_peers = \"$PEER\"%; " $install_path/.stcd/config/config.toml
Set variables starting from snapshot
LATEST_HEIGHT=$(curl -s $SNAP_RPC/block | jq -r .result.block.header.height); \
BLOCK_HEIGHT=$((LATEST_HEIGHT - 2000)); \
TRUST_HASH=$(curl -s "$SNAP_RPC/block?height=$BLOCK_HEIGHT" | jq -r .result.block_id.hash)
examine
echo $LATEST_HEIGHT $BLOCK_HEIGHT $TRUST_HASH
Example output (numbers will vary):
376080 374080 F0C78FD4AE4DB5E76A298206AE3C602FF30668C521D753BB7C435771AEA47189
If the output is normal, the next step
sed -i.bak -E "s|^(enable[[:space:]]+=[[:space:]]+).*$|\1true| ; \
s|^(rpc_servers[[:space:]]+=[[:space:]]+).*$|\1\"$SNAP_RPC,$SNAP_RPC1\"| ; \
s|^(trust_height[[:space:]]+=[[:space:]]+).*$|\1$BLOCK_HEIGHT| ; \
s|^(trust_hash[[:space:]]+=[[:space:]]+).*$|\1\"$TRUST_HASH\"| ; \
s|^(seeds[[:space:]]+=[[:space:]]+).*$|\1\"\"|" $install_path/.stcd/config/config.toml
Create stcd service
echo "[Unit]
Description=Daodst Chain Node
After=network.target
#
[Service]
User=$USER
Type=simple
ExecStart=$(which stcd) daemon
Restart=on-failure
LimitNOFILE=65535
#
[Install]
WantedBy=multi-user.target" > $HOME/stcd.service; sudo mv $HOME/stcd.service /etc/systemd/system/
sudo systemctl enable stcd.service && sudo systemctl daemon-reload
Run stcd
sytemctl start stcd
Check logs
journalctl -u stcd -f
When a node starts up, it will try to find a state sync snapshot in the network, and restore it:
Started node module=main nodeInfo="..."
Discovering snapshots for 20s
Discovered new snapshot height=3000 format=1 hash=0F14A473
Discovered new snapshot height=2000 format=1 hash=C6209AF7
Offering snapshot to ABCI app height=3000 format=1 hash=0F14A473
Snapshot accepted, restoring height=3000 format=1 hash=0F14A473
Fetching snapshot chunk height=3000 format=1 chunk=0 total=3
Fetching snapshot chunk height=3000 format=1 chunk=1 total=3
Fetching snapshot chunk height=3000 format=1 chunk=2 total=3
Applied snapshot chunk height=3000 format=1 chunk=0 total=3
Applied snapshot chunk height=3000 format=1 chunk=1 total=3
Applied snapshot chunk height=3000 format=1 chunk=2 total=3
Verified ABCI app height=3000 appHash=F7D66BC9
Snapshot restored height=3000 format=1 hash=0F14A473
Executed block height=3001 validTxs=16 invalidTxs=0
Committed state height=3001 txs=16 appHash=0FDBB0D5F
Executed block height=3002 validTxs=25 invalidTxs=0
Committed state height=3002 txs=25 appHash=40D12E4B3
Nodes are now in sync and join the network within seconds
Turn off state synchronization mode
After the node is fully synced, use this command to turn off state sync mode to avoid problems with future node restarts!
sed -i.bak -E "s|^(enable[[:space:]]+=[[:space:]]+).*$|\1false|" $install_path/.stcd/config/config.toml
⚠️ NOTE: The information contained in this document comes from Erik Grinaker, in particular his state synchronization guide Tendermint Core and the Cosmos SDK.