High Availability
ArcadeDB supports a High Availability (HA) mode where multiple servers share the same databases via Raft-based replication. All servers of a cluster serve the same set of databases.
| Starting with v26.4.1, HA is powered by Apache Ratis, a production-grade implementation of the Raft consensus protocol. The old custom replication protocol has been removed. For the underlying concepts, see High Availability Concepts. |
Quick Start
To start an ArcadeDB server with HA enabled, at minimum you need to:
-
Set
arcadedb.ha.enabledtotrue. -
Define the list of peers in the cluster via
arcadedb.ha.serverList(if you are deploying on Kubernetes, see Kubernetes). The value is a comma-separated list of entries in the format[name@]host:raftPort:httpPort[:priority]. The optionalname@prefix is described in Named Peers. -
Optionally set the local server name via
arcadedb.server.name. Each node must have a unique name. If not specified, the default isArcadeDB_0.
Example starting a 3-node cluster on a single host (different ports):
$ bin/server.sh -Darcadedb.ha.enabled=true \
-Darcadedb.server.name=node1 \
-Darcadedb.ha.serverList=localhost:2434:2480,localhost:2435:2481,localhost:2436:2482
The Raft gRPC port is 2434 by default and is configurable via arcadedb.ha.raftPort.
The HTTP port in each entry is required so that replicas can forward non-idempotent requests to the leader over HTTP.
The cluster name is arcadedb by default; set arcadedb.ha.clusterName=<name> to run multiple independent clusters on the same network.
|
Named Peers (from v26.5.1)
By default, each node identifies its slot in arcadedb.ha.serverList from the numeric suffix of arcadedb.server.name (for example, ArcadeDB_0 maps to the first entry, ArcadeDB_1 to the second, and so on).
Display names shown in logs and Studio are then synthesized from the local node’s prefix.
This positional convention works well for Kubernetes StatefulSets but is awkward when nodes have human-readable names such as frankfurt, london, nyc.
You can add an optional name@ prefix to any entry in arcadedb.ha.serverList to give each peer a stable, human-readable name:
$ bin/server.sh -Darcadedb.ha.enabled=true \
-Darcadedb.server.name=frankfurt \
[email protected]:2434:2480,[email protected]:2434:2480,[email protected]:2434:2480
When peer names are configured, each node identifies its slot by matching arcadedb.server.name against the configured peer names first; if no match is found, it falls back to the legacy prefix_N / prefix-N suffix resolution.
This means:
-
Server names no longer require a numeric suffix when peer names are used.
-
Display names shown in logs and Studio reflect the configured peer name (e.g.
frankfurt (10.0.0.1:2480)instead ofArcadeDB_0 (10.0.0.1:2480)). -
Mixed clusters work: entries with
name@get their explicit name; entries without it fall back to the existing positional synthesis. -
Peer names must be unique within the cluster.
The Raft peer ID is still derived from the address (e.g. 10.0.0.1_2434), so changing or omitting peer names does not affect Raft identity stability.
Architecture
ArcadeDB uses a leader/replica model with Raft consensus. At any time one server holds leadership and accepts writes; the others are replicas that serve reads and stand by for failover.
Each server persists its own Raft log segments under <rootPath>/raft-storage/.
The log is used for recovery after restart and to replicate state to peers that fall behind.
Any read (query) can execute on any server in the cluster. All writes must go through the leader: a replica that receives a write transparently forwards it to the current leader via HTTP.
Internally, every committed write follows a replicate-first, commit-after three-phase commit:
-
Phase 1 (read lock): capture the WAL pages produced by the transaction.
-
Phase 2 (no lock): append the payload to the Raft log and wait for quorum acknowledgment.
-
Phase 3 (read lock): apply the pages to the local database and return to the client.
If Phase 2 times out or fails, Phase 3 never runs: no local writes, no divergence.
Multiple concurrent transactions are batched into fewer Raft round-trips via group commit (HA_RAFT_GROUP_COMMIT_BATCH_SIZE).
Write Quorum
A quorum is the number of peers that must acknowledge a write before it commits.
Configure it via arcadedb.ha.quorum:
-
majority(default) — a majority of peers must acknowledge (standard Raft). -
all— every configured peer must acknowledge.
If the configured quorum is not met within arcadedb.ha.quorumTimeout milliseconds, the transaction is rolled back and an error is returned to the client.
|
Starting with v26.4.1, the legacy quorum values |
Read Consistency (from v26.4.1)
When a read runs on a replica, ArcadeDB offers three consistency levels, configurable per-server via arcadedb.ha.readConsistency:
| Level | Behavior |
|---|---|
|
Read locally without waiting. Fastest, but may return data that was committed on the leader after the replica’s last apply. |
|
The replica waits until the Raft log index corresponding to the client’s most recent write has been applied locally before serving the read. |
|
The replica issues a Raft |
Clients communicate the consistency contract through three HTTP headers:
| Header | Direction | Purpose |
|---|---|---|
|
request |
Overrides the server default for this request. Accepts |
|
request |
Client-supplied bookmark: the Raft commit index the replica must have applied before serving the read. Used to implement read-your-writes across connections. |
|
response |
Current last-applied commit index, echoed on every response. Clients capture this value and send it back as |
Older clients that sent X-ArcadeDB-Commit-Index on requests are still accepted as a backward-compatible fallback.
|
Load Shedding and Backpressure (from v26.4.1)
The Raft group-commit queue is bounded (arcadedb.ha.groupCommitQueueSize, default 10000 pending transactions).
When a write arrives but the queue is full, the server waits up to arcadedb.ha.groupCommitOfferTimeout ms (default 100) for a slot, and if still full throws ReplicationQueueFullException — a NeedRetryException that clients automatically retry with backoff.
This protects the server from OOM under sustained write overload and lets clients back off gracefully.
Witness / Read-Scale Nodes (from v26.4.1)
Set arcadedb.ha.serverRole=replica to pin a node as a permanent replica.
The node’s Raft priority is set to 0 at startup so it is never elected leader, while still receiving all writes and serving reads.
Useful for read-scale deployments or for cross-DC witnesses that should not take over as leader.
Automatic Failover
If the leader becomes unreachable, replicas start a new Raft election. A replica with an up-to-date log is elected as the new leader and the cluster resumes serving writes. Pre-vote prevents partitioned nodes from triggering disruptive elections.
Common causes of leader unavailability include:
-
The ArcadeDB server process has been terminated.
-
The physical or virtual host has been shut down or rebooted.
-
Network issues prevent the leader from reaching a majority of peers.
When a replica rejoins after being offline, it catches up via the Raft log automatically; if it has fallen behind past the log purge boundary, the leader streams a full database snapshot over HTTP and the replica installs it atomically.
Offline Cluster Bootstrap (from v26.5.1)
By default, when an HA cluster forms for the first time and one or more peers already hold a non-empty database on local disk, the peers do not download the database over HTTP from a single source.
Instead, every peer reports a (fingerprint, lastTxId) tuple per database, the cluster elects the peer with the highest lastTxId as the source via leadership transfer, and any peer whose fingerprint matches the source bootstraps locally in seconds with zero bytes transferred.
This is the recommended path for scaling a single-instance ArcadeDB deployment up to a multi-node HA cluster after a large dataset has already been imported. Operator workflow:
-
Import the dataset into a single ArcadeDB instance with HA disabled.
-
Tar the database directory, or take a regular full backup.
-
Distribute the archive out-of-band to every peer’s filesystem (init container from S3, NFS read-only mount, baked image layer,
kubectl cp). -
Start every peer with HA enabled (
arcadedb.ha.enabled=true) and the usualarcadedb.ha.serverList. No special flag is required; the bootstrap path is on by default viaarcadedb.ha.bootstrapFromLocalDatabase=true. -
The peers form the Raft group with everyone already at the same state.
Time-to-cluster goes from "minutes-to-hours of HTTP snapshot transfer per replica" to "seconds, because everyone already has the bytes". For the Kubernetes recipe, see Pre-staging the database on every pod.
The bootstrap protocol handles the following cases automatically:
-
Identical pre-staged databases: every peer’s fingerprint matches the elected source, all peers bootstrap locally with zero bytes transferred.
-
Different-age peers: the peer with the highest
lastTxIdis elected as the source. Older peers reinstall the leader-shipped full snapshot; subsequent transactions are picked up by native Raft replication. -
Late newer joiner: a peer with a strictly newer
lastTxIdthan the cluster’s chosen baseline refuses to start and logs a SEVERE with the recovery procedure. This prevents a misconfigured rolling deploy from silently overwriting newer data with older data. -
Cluster restart: the bootstrap path is gated on every peer’s Raft log being empty. After the cluster has committed at least one Raft entry, restarts use the regular Raft log replay or leader-shipped snapshot path; the bootstrap path does not engage, and stale local data on a single pod cannot cause divergence.
The committed bootstrap baseline (fingerprint and lastTxId per database) is visible in the response to GET /api/v1/cluster and in the Studio cluster dashboard.
Cluster Management REST API (from v26.4.1)
Cluster membership and leadership can be changed at runtime through dedicated REST endpoints.
All endpoints require authentication as the root user.
| Method | Path | Description |
|---|---|---|
GET |
|
Return the current cluster status: current leader, peers and their roles, current term, commit and applied indices, and per-follower replication lag. |
POST |
|
Add a peer to the running cluster. Body: |
DELETE |
|
Remove a peer from the cluster. |
POST |
|
Transfer leadership. Body: |
POST |
|
Make the current leader step down. The cluster elects a new leader. |
POST |
|
Gracefully remove this server from the cluster. If the local server is the leader, leadership is transferred first. This endpoint is used by the Kubernetes |
POST |
|
Compare component file checksums for the given database across all peers. Useful to confirm that all nodes have converged. |
GET |
|
Leader-only: stream a ZIP of the database for follower catch-up. Requires the cluster token and is used internally by snapshot recovery. |
Example: add a new peer at runtime.
$ curl -u root:<password> \
-X POST http://leader:2480/api/v1/cluster/peer \
-H 'Content-Type: application/json' \
-d '{"peerId":"node4","address":"10.0.0.4:2434:2480"}'
Adding a peer with a human-readable name (from v26.5.1):
$ curl -u root:<password> \
-X POST http://leader:2480/api/v1/cluster/peer \
-H 'Content-Type: application/json' \
-d '{"peerId":"10.0.0.4_2434","address":"10.0.0.4:2434:2480","name":"frankfurt"}'
Studio Cluster Dashboard (from v26.4.1)
ArcadeDB Studio exposes a Cluster tab (visible when HA is enabled) that displays the current leader, peer roles, Raft term and commit index, per-follower replication lag, and provides buttons for leadership transfer, peer add/remove, and database verification. See Studio.
Security
- Cluster token
-
Inter-node HTTP forwarding (replica → leader proxy, snapshot downloads) is authenticated with a shared cluster token sent as the
X-ArcadeDB-Cluster-TokenHTTP header. Ifarcadedb.ha.clusterTokenis not set explicitly, the token is auto-generated on first startup and persisted underraft-storage/; all subsequent nodes must start with the same token (or withHA_CLUSTER_TOKENunset so they can read it from the same storage). Set it explicitly for production deployments to coordinate the token across nodes. - Peer allowlist
-
Inbound Raft gRPC connections are filtered against the DNS-resolved hosts in
arcadedb.ha.serverList; connections from other addresses are rejected. This closes the "any host that knows the port can inject log entries" vector. It is not a substitute for mTLS: deploy HA behind a private network, NetworkPolicy, or service mesh on untrusted networks.
Kubernetes
For Kubernetes deployments using the official Helm chart — including the StatefulSet + headless Service pattern, auto-join on scale-up, and the preStop auto-leave hook — see High Availability on Kubernetes in the Kubernetes guide.
Troubleshooting
Performance: insertion is slow
ArcadeDB uses an optimistic concurrency model: if two threads try to update the same page, the first wins, the second throws a ConcurrentModificationException and the client retries (configurable number of times).
In HA mode the retry window is wider because file locks are held during the Raft round-trip.
If you are inserting many records in parallel, allocate one bucket per thread to eliminate contention.
Example for the vertex type User:
ALTER TYPE User BucketSelectionStrategy `thread`
With enough buckets, parallel insertions avoid page contention entirely and do not hit the retry path.
HA Settings
The following settings control HA behavior. A complete list of all HA parameters is available in Server Settings.
| Setting | Description | Default Value |
|---|---|---|
|
Enables HA for this server |
false |
|
Cluster name. Useful when running multiple clusters in the same network |
arcadedb |
|
Comma-separated list of peers in the format |
(empty) |
|
Default Raft gRPC port used when an entry in |
2434 |
|
Write quorum: |
majority |
|
Timeout in ms waiting for the quorum acknowledgment |
10000 |
|
Default read consistency for follower reads: |
read_your_writes |
|
Minimum and maximum Raft election timeout in ms. Increase for WAN clusters |
2000 / 5000 |
|
Node role: |
any |
|
Number of Raft log entries after which the leader takes a snapshot |
100000 |
|
When |
true |
|
Maximum time in ms the bootstrap leader waits for every peer in |
120000 |
|
Bounded queue size and offer timeout (ms) for Raft group-commit backpressure |
10000 / 100 |
|
If true, the Raft storage directory is preserved across restarts, enabling rejoin without a full snapshot resync |
false |
|
If true, the JVM exits after exhausting step-down retries on a phase-2 replication failure. Default false keeps the server up and logs CRITICAL |
false |
|
Shared secret for inter-node authentication. If empty, auto-generated at first startup and persisted under |
(empty) |
|
Verbose HA logging: 0=off, 1=basic, 2=detailed, 3=trace |
0 |
|
The server is running inside Kubernetes (enables auto-join) |
false |
|
DNS suffix used to reach the other servers in Kubernetes, e.g. |
(empty) |