Vector Embeddings
This guide covers practical decisions for working with vector embeddings in ArcadeDB: choosing dimensions, creating indexes, tuning parameters, and combining vector search with other query types.
Choosing an Embedding Model
Your embedding model determines the dimensions parameter for the index:
| Model | Dimensions | Notes |
|---|---|---|
OpenAI |
1536 |
General purpose, high quality |
OpenAI |
3072 |
Highest quality, largest memory footprint |
Sentence Transformers |
384 |
Fast, open source, good quality |
Sentence Transformers |
768 |
Better quality, slower |
Cohere |
1024 |
Good balance of quality and size |
CLIP (image + text) |
512 |
Multi-modal image/text |
| Start with 384 dimensions (MiniLM) for prototyping. Move to 768+ for production quality. Use quantization to manage memory at higher dimensions. |
Creating a Vector Index
Recommended index creation with INT8 quantization:
CREATE VERTEX TYPE Document
CREATE PROPERTY Document.content STRING
CREATE PROPERTY Document.embedding LIST OF FLOAT
CREATE INDEX ON Document (embedding) LSM_VECTOR METADATA {
dimensions: 384,
similarity: 'COSINE',
quantization: 'INT8'
}
INT8 quantization is recommended for all production workloads. It provides 2.5x faster search and 4x lower memory usage with negligible accuracy loss (see concepts/vector-search.adoc#quantization-performance). Only omit quantization for very small datasets (< 10K vectors) where maximum precision matters.
Production-ready index with additional tuning:
CREATE INDEX ON Document (embedding) LSM_VECTOR METADATA {
dimensions: 384,
similarity: 'COSINE',
quantization: 'INT8',
maxConnections: 16,
beamWidth: 100
}
Choosing a Similarity Function
| Function | Choose When | Avoid When |
|---|---|---|
COSINE |
Using text embedding models (most common). Vectors may have varying magnitudes. |
Vectors represent absolute quantities (distances, counts). |
DOT_PRODUCT |
Vectors are already L2-normalized. You need maximum query speed. |
Vectors are not normalized (results will be incorrect). |
EUCLIDEAN |
Working with spatial data, sensor readings, or continuous measurements. |
Comparing text embeddings of different lengths. |
Quantization Trade-offs
Use INT8 quantization for most use cases. It provides 4x memory savings with minimal accuracy loss and significantly faster ingestion and search:
-
< 10K vectors:
NONEis fine, butINT8works well too -
10K - 1M vectors: Use
INT8(4x memory savings, < 2% accuracy loss) — recommended -
> 1M vectors: Use
INT8for general use, orPRODUCTfor zero-disk-I/O graph construction on very large datasets -
Extreme compression: Use
BINARYfor first-pass filtering, then rerank with full vectors
-- INT8: recommended for most workloads
CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {
dimensions: 768,
similarity: 'COSINE',
quantization: 'INT8'
}
-- PRODUCT: for very large datasets, enables in-memory graph build
CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {
dimensions: 1024,
similarity: 'COSINE',
quantization: 'PRODUCT'
}
INT8 Pre-Quantized Ingest
| Available since ArcadeDB v26.5.1. See Vector Encoding for the underlying concept. |
When your embedding provider already emits signed-int8 vectors (Cohere int8 endpoints, OpenAI text-embedding-3-large reduced precision, Sentence Transformers with int8 quantization), use encoding: 'INT8' to keep the bytes byte-shaped end-to-end:
CREATE PROPERTY Doc.embedding BINARY;
CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {
"dimensions": 1024,
"similarity": "COSINE",
"encoding": "INT8"
};
What this saves:
-
HTTP payload: 4x smaller (1 byte/dim vs 4 bytes/dim) when sending vectors with the typed-marker convention — see HTTP wire convention.
-
Document bucket storage: 4x smaller, since the property column is
BINARYinstead ofARRAY_OF_FLOATS. -
No client-side
int8 → float32 → serverround trip; the precision the provider already discarded does not get padded back out for the wire.
What this does not change today:
-
HNSW graph and search internally still run on
float32(JVector 4.0.0-rc.8 contract). The engine dequantizes once on the read path. Native int8 HNSW is tracked upstream at datastax/jvector#665. -
encodingis independent ofquantization. Combiningencoding: 'INT8'withquantization: 'INT8'is rejected at index creation — pick one, not both.
When not to use INT8 encoding:
-
Your provider emits
float32(orfloat64) and you do not have an int8 quantizer client-side. The defaultFLOAT32encoding skips the dequantize hop on every search and keeps full precision in the documents. -
You want the index-internal compression benefit (search-time memory footprint). That is
quantization: 'INT8', notencoding: 'INT8'. They are orthogonal.
Tuning for Recall vs Speed
Adjust maxConnections and beamWidth based on your priorities:
| Profile | maxConnections | beamWidth | Trade-off |
|---|---|---|---|
Default |
16 |
100 |
Balanced for most workloads |
High recall |
32 |
200 |
Better accuracy, 2-3x slower builds, 50% more memory |
Fast indexing |
12 |
80 |
2x faster builds, 5-10% lower recall |
Memory constrained |
8 |
60 |
Minimal memory footprint |
For datasets over 100K vectors or with 1024+ dimensions, enable hierarchical mode:
CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {
dimensions: 1536,
similarity: 'COSINE',
quantization: 'INT8',
addHierarchy: true,
maxConnections: 32,
beamWidth: 200
}
Tuning efSearch
The efSearch parameter controls how many candidates the search explores at query time. By default, ArcadeDB uses an adaptive strategy that works well for most workloads. You only need to tune efSearch if you have specific recall or latency requirements.
| Profile | efSearch | Trade-off |
|---|---|---|
Adaptive (default) |
auto |
Two-pass: fast first pass ( |
High recall |
200-500 |
Consistent high accuracy, higher latency |
Low latency |
20-50 |
Fast responses, lower recall on hard queries |
You can override efSearch per-query without changing the index:
-- High recall for a critical search
SELECT expand(vectorNeighbors('Doc[embedding]', $queryVector, 10, 500))
-- Low latency for autocomplete/typeahead
SELECT expand(vectorNeighbors('Doc[embedding]', $queryVector, 5, 30))
Or set a default on the index:
CREATE INDEX ON Doc (embedding) LSM_VECTOR METADATA {
dimensions: 768,
similarity: 'COSINE',
quantization: 'INT8',
efSearch: 200
}
Multi-Modal Embeddings
Store multiple embeddings per record for different search modalities:
CREATE VERTEX TYPE Product
CREATE PROPERTY Product.imageEmbedding ARRAY_OF_FLOATS
CREATE PROPERTY Product.textEmbedding ARRAY_OF_FLOATS
CREATE INDEX ON Product (imageEmbedding) LSM_VECTOR METADATA {dimensions: 512, similarity: 'COSINE'}
CREATE INDEX ON Product (textEmbedding) LSM_VECTOR METADATA {dimensions: 768, similarity: 'COSINE'}
Query each index independently:
-- Search by image similarity
SELECT name, distance FROM (
SELECT expand(vectorNeighbors('Product[imageEmbedding]', $imageVector, 10))
)
-- Search by text similarity
SELECT name, distance FROM (
SELECT expand(vectorNeighbors('Product[textEmbedding]', $textVector, 10))
)
Hybrid Search: Dense + Sparse + Full-Text
The vector.fuse operator described below is available since ArcadeDB v26.5.1. Earlier versions required two queries plus client-side fusion via vector.rrfScore.
|
Combine dense vector similarity with sparse retrieval and/or keyword matching in a single server-side query. vector.fuse accepts any number of ranked sub-pipelines plus a fusion strategy (RRF, DBSF, LINEAR) and returns one ranked top-K:
-- Schema: dense + sparse properties + indexes (sparse needs LSM_SPARSE_VECTOR).
CREATE PROPERTY Document.dense ARRAY_OF_FLOATS;
CREATE PROPERTY Document.tokens ARRAY_OF_INTEGERS;
CREATE PROPERTY Document.weights ARRAY_OF_FLOATS;
CREATE INDEX ON Document (dense) LSM_VECTOR
METADATA { dimensions: 384, similarity: 'COSINE' }
CREATE INDEX ON Document (tokens, weights) LSM_SPARSE_VECTOR
METADATA { dimensions: 30000, modifier: 'IDF' }
-- Hybrid retrieval in one statement.
SELECT expand(`vector.fuse`(
`vector.neighbors`('Document[dense]', :denseVec, 50),
`vector.sparseNeighbors`('Document[tokens,weights]', :qIdx, :qVal, 50),
{ fusion: 'RRF', groupBy: 'source_file', groupSize: 1 }
)) LIMIT 10
-
vector.neighborsexposesdistance(lower = better);vector.fuseauto-flips it so dense and sparse sources compose without manual rescaling. -
groupBy+groupSizecollapse same-source duplicates server-side. Drop the option to return chunk-level results. -
Pre-fusion grouping is also possible by attaching
{ groupBy: 'source_file', groupSize: 1 }to each individual source.
To include full-text alongside dense/sparse, add a third source built from SEARCH_INDEX:
SELECT expand(`vector.fuse`(
`vector.neighbors`('Document[dense]', :denseVec, 100),
`vector.sparseNeighbors`('Document[tokens,weights]', :qIdx, :qVal, 100),
(SELECT @rid, $score FROM Document
WHERE SEARCH_INDEX('Document[content]', 'machine learning') = true),
{ fusion: 'RRF' }
)) LIMIT 10
Pick the strategy that matches your scoring shape:
-
RRF— rank-only, indifferent to score scales. Default, safest with mixed source types. -
DBSF— mean +/- 3sigma normalisation per source then weighted sum. Use when scores are roughly Gaussian on each side. -
LINEAR— per-source min-max normalisation then weighted sum. Use with already-tuned offline weights.
For the legacy two-query workaround (still supported via vector.rrfScore and vector.hybridScore on already-computed scores), see the SQL Vector Functions reference.
Batch Ingestion
For bulk loading vectors, batch your inserts within transactions:
BEGIN
CREATE VERTEX Document SET content = 'First document', embedding = [0.1, 0.2, ...]
CREATE VERTEX Document SET content = 'Second document', embedding = [0.3, 0.4, ...]
-- ... more inserts ...
COMMIT
For large bulk loads, increase mutationsBeforeRebuild to delay index rebuilds until after the load completes, then trigger a rebuild.
|
When vectors are inserted below the rebuild threshold, an inactivity timer ensures the graph is still rebuilt after a period of no new mutations (default: 15 seconds). This prevents buffered vectors from remaining in the brute-force delta buffer indefinitely during low-volume ingestion. Configure via inactivityRebuildTimeoutMs (per-index metadata or arcadedb.vectorIndex.inactivityRebuildTimeoutMs globally). Set to 0 to disable.
|
If you create the index before inserting data (e.g., during schema setup), set buildGraphNow: false to skip the initial (empty) graph build. The graph will be built lazily on the first search:
-- Schema setup phase: defer graph build since no data exists yet
CREATE INDEX ON Document (embedding) LSM_VECTOR METADATA {
dimensions: 384,
similarity: 'COSINE',
quantization: 'INT8',
buildGraphNow: false
}
-- Bulk load data...
-- Graph is built automatically on first vectorNeighbors() query
If you create the index after data is already loaded, leave buildGraphNow at its default (true) so the index is immediately ready to query.
Global Configuration
Set database-wide defaults for vector index parameters:
ALTER DATABASE `arcadedb.vectorIndex.locationCacheSize` 100000
ALTER DATABASE `arcadedb.vectorIndex.graphBuildCacheSize` 10000
ALTER DATABASE `arcadedb.vectorIndex.mutationsBeforeRebuild` 100
ALTER DATABASE `arcadedb.vectorIndex.inactivityRebuildTimeoutMs` 15000
ALTER DATABASE `arcadedb.vectorIndex.storeVectorsInGraph` false
Per-index metadata overrides these global settings.
Further Reading
-
Vector Search Concepts — Architecture and algorithm details
-
Vector Search Tutorial — Step-by-step hands-on guide
-
Java Vector API — Programmatic index management
-
SQL Vector Functions — Complete function reference