ADR 0002: Config Store Consensus HA
Status
Accepted
Date
2026-06-08
Context
Single-node SQLite persistence is not acceptable for carrier HA configuration claims. The SDK needed a production HA config persistence path with leader fencing, majority commit behavior, restart recovery, and authenticated transport. It also needed to make clear that standalone SQLite remains a development, lab, conformance, or explicitly accepted edge/single-replica profile.
Decision
High-availability configuration persistence is provided by
ConsensusConfigStore.
The consensus backend uses:
- Durable cluster membership and node identity checks.
- Leader election, current-term no-op gating, and majority write commitment.
- Linearizable read verification instead of follower-local reads.
- Authenticated mTLS/SPIFFE transport using shared identity/TLS substrates.
- Controlled TCP server lifecycle with bounded concurrency, read timeouts, and explicit shutdown.
- Snapshot persistence and HMAC verification.
- Non-voter catch-up and promotion guards for membership changes.
- Metrics and chaos/failover tests for partitions, restart, rejoin, and stale leader behavior.
Consequences
Config HA is a quorum system, not a property of SQLite. Any production claim must use the consensus backend or an equivalent adapter that satisfies the same contract.
The SDK accepts additional operational complexity so correctness is explicit: membership, certificates, node identity, quorum availability, and recovery state all become deployment responsibilities.
Evidence
crates/opc-persist/src/consensus.rscrates/opc-persist/tests/consensus_tests.rscrates/opc-persist/tests/tcp_consensus_tests.rsdocs/ha-design.mddocs/consensus-operator-runbook.md