Schema
ArcadeDB supports schema-less, schema-full and hybrid operation. A type (document, vertex or edge) must be declared in a database before records can be inserted into it; everything else — properties, indexes, additional buckets, triggers, materialized views — is layered on top of that type. For declared properties, all writes are validated against the property’s type and constraints, while undeclared properties are accepted as long as no declared constraint is violated.
This section introduces the building blocks of an ArcadeDB schema: Types, Properties, Buckets, Indexes, Triggers, and Materialized Views. For the low-level on-disk layout (pages, record IDs, bucket files, page versioning), see Storage Internals.
Types
A type is the named "shape" of a record. ArcadeDB has three flavours, all backed by the same storage engine:
-
Document type — schemaless or schema-full documents (JSON-like records), the analog of a table.
-
Vertex type — documents that participate in the graph as nodes; carry edges-in / edges-out adjacency lists.
-
Edge type — documents that connect two vertices, bidirectional by default and traversable in both directions.
Types support single inheritance via the EXTENDS keyword and are queryable polymorphically (a query on a parent type sees all subtypes' records). See Inheritance for the details. The SQL command surface is CREATE TYPE, ALTER TYPE, and DROP TYPE.
Properties
A property is a declared field on a type. Declaring a property unlocks per-field validation (data type, NOT NULL, regex, min/max, default), polymorphic queries against the field, and — since v26.5.1 — the EXTERNAL storage option for heavy values (see External Property Storage below).
Properties are managed via CREATE PROPERTY, ALTER PROPERTY, and DROP PROPERTY.
External Property Storage
By default, every property of a record is serialized inline in the same page as the topology (vertex/edge identity and edges-out/edges-in lists). That keeps reads simple, but it means traversal-only queries pay for every heavy property in cache misses and I/O — even when the projection never touches them. A 4 KB embedding on every vertex pushes useful topology out of the page cache long before the working set is exhausted.
Properties marked EXTERNAL true are stored in a paired external bucket instead.
The primary record carries only a small pointer ([bucketId][position]); the value lives in a separate file that traversal-only queries never read.
Use it when:
-
the property is large compared to the rest of the record (vector embeddings, long text, embedded JSON, full-text payloads), and
-
most queries do not project the property.
Skip it for small, hot properties — the extra pointer indirection adds a page lookup on every read that does need the value.
Per-bucket pairing
Each primary bucket of a type that has at least one EXTERNAL property gets a paired bucket suffixed with _ext.
The mapping is persisted in the schema and visible from SELECT FROM schema:types under the externalBuckets field, e.g. { "Person_0": "Person_0_ext" }.
External buckets are internal: they are filtered out of schema:buckets by default and rejected by user-facing DML (INSERT INTO BUCKET: against an external bucket fails).
To inspect them explicitly:
SELECT name, purpose FROM schema:buckets WHERE purpose <> 'PRIMARY'
External buckets show purpose: 'EXTERNAL_PROPERTY'.
Compression
The payload stored in the paired bucket can be LZ4-compressed via the per-property COMPRESSION attribute:
-
none(default) — raw bytes; no compression overhead, no compression savings. -
lz4— always LZ4-compressed. -
auto— try LZ4 and keep the compressed bytes only when they save more than 10% of the raw size; otherwise fall back to raw. The decision is made per record, so a single property can mix compressed and uncompressed records (text gets compressed, vector embeddings typically fall back to raw).
The compression mode is recorded in the main record’s pointer byte, not inside the external blob, so reads dispatch to the right decoder without an extra header lookup.
Migrating existing records
Toggling EXTERNAL does not rewrite existing records eagerly.
Each record is converted on its next write (insert, update, or upsert), and orphaned external values are reclaimed in the same transaction.
To migrate a populated type at once, use REBUILD TYPE:
REBUILD TYPE Person
REBUILD TYPE Parent POLYMORPHIC
REBUILD TYPE Person WITH batchSize = 1000
Tiered storage
External buckets can be placed on a different volume from the primary buckets — for example, topology on fast NVMe and heavy payloads on cheaper bulk storage.
Set the database-scope configuration arcadedb.externalPropertyBucketPath to the target directory before creating the EXTERNAL property; new paired buckets are created there instead of in the database directory.
Existing external buckets are not relocated when this setting changes; move the files manually if you need to migrate after the fact.
Transactional semantics
-
The primary record and its external value are written in the same transaction and WAL group, so commit, rollback, and crash recovery are atomic across both.
-
Updating an EXTERNAL property updates the external record in place and does not rewrite the main record bytes.
-
Deleting the primary record cascades the delete to every linked external record in the same transaction.
-
If a property is toggled
EXTERNAL false, renamed, or dropped, the orphaned external value is reclaimed on the next write of that record (or byREBUILD TYPE).
Buckets
A bucket is the physical storage unit for records of a type. Every type maps to one or more buckets, and each record lives in exactly one bucket. Buckets are append-friendly LSM files on disk; partitioning a type across multiple buckets unlocks per-bucket page locality and lets concurrent writers avoid contending on the same file.
The mapping between records and buckets is controlled by the type’s Bucket Selection Strategy. Buckets are managed via CREATE BUCKET, DROP BUCKET, and ALTER TYPE … BUCKET (see ALTER TYPE). For the on-disk page layout and how records and RIDs are stored inside a bucket, see Storage Internals in the reference.
Bucket Selection Strategies
See Bucket Selection Strategy for an animated explanation of how each strategy dispatches new records.
When you create a type, it inherits the bucket selection strategy defined at the database level. By default this is round-robin. Change the database default with the ALTER DATABASE command, or override it per type with ALTER TYPE.
Supported strategies:
| Strategy | Description |
|---|---|
|
Selects the next bucket in a circular order, restarting once complete. |
|
Selects the next bucket by using the partition (mod) from the current thread id. This strategy gives the best results in terms of performance if you are using multiple threads and multiple buckets. |
|
Uses the primary key to assign a partition to the record. This allows to speedup the lookup into the index avoiding to search for a key in all the sub-indexes. Use this if you have multiple buckets and you want fast lookup but slower insertions. |
Picking the right strategy at type-creation time is one of the highest-leverage decisions you can make for query performance. See Schema design 101: choosing a bucket strategy for a 3-question decision tree, the anti-patterns to avoid, and the WITH repartition = true workflow for changing strategy on a populated type.
|
Indexes
An index is a secondary data structure that accelerates lookups, range scans, or similarity queries against a property (or composite of properties) on a type. ArcadeDB indexes are first-class schema objects: they live in their own files, are kept in sync transactionally, and survive restart and replication.
Several index implementations are available, chosen by the use case:
-
LSM_TREE(default) — ordered key index for equality and range queries. -
HASH_INDEX— key-only equality lookups; smaller and faster than LSM when range scans are not needed. -
LSM_VECTOR— HNSW-based dense vector similarity (cosine, dot-product, Euclidean). -
LSM_SPARSE_VECTOR— learned-sparse / BM25-style retrieval (v26.5.1+). -
FULL_TEXT— Lucene-backed text search.
See Indexes in the concepts section for the algorithms, trade-offs, and the case-insensitive (COLLATE CI) variant. The SQL command surface is CREATE INDEX, REBUILD INDEX, and DROP INDEX.
Triggers
A trigger is a Java callback invoked by the engine on record-level events (before/after create, read, update, delete) for a given type. Triggers run inside the originating transaction, so they can validate, enrich, or reject writes, and propagate changes to external systems (web-socket fan-out, audit logs, derived caches).
The SQL command surface is CREATE TRIGGER / DROP TRIGGER; see Triggers for the registered-class contract and event-type table.
Materialized Views
A materialized view is a query result persisted as its own type, refreshed either manually, on a schedule, or incrementally as the underlying data changes. Once defined, a view is queried like any other type — the optimizer can rewrite queries to hit the view instead of the base type when the projection and filter match.
See Materialized Views for the refresh modes (manual, incremental, periodic) and the design trade-offs. SQL: CREATE MATERIALIZED VIEW, ALTER MATERIALIZED VIEW, REFRESH MATERIALIZED VIEW, DROP MATERIALIZED VIEW.