Graph RAG
Implement retrieval-augmented generation (RAG) that retrieves richer, more connected context for LLM queries — all within a single database. Graph traversal enables multi-hop entity bridging across knowledge graph relationships, vector similarity powers semantic chunk retrieval using vectorNeighbors() with LSM_VECTOR indexes, sparse vectors add SPLADE / BM25-style learned sparse retrieval for exact-term and OOV matches that dense embeddings miss, server-side hybrid fusion combines all of the above in one query, full-text search provides keyword-based chunk lookup, and Neo4j Bolt protocol compatibility on port 7687 supports LangChain4j integration.
Architecture Overview
Vertices |
|
Edges |
|
Document chunks carry both a dense embedding (embedding) and a sparse vector (tokens + weights) so the same record participates in both retrieval modalities. Chunks link to extracted entities through MENTIONS edges, and entities connect via RELATES_TO, enabling multi-hop discovery that bridges chunks from different documents through shared entity mentions.
Key Queries
Vector Search — Find semantically similar chunks by dense embedding:
SELECT content, source, distance FROM (
SELECT expand(vectorNeighbors('Chunk[embedding]', [0.9, 0.1, 0.8, 0.2], 5))
)
Hybrid Retrieval (Dense + Sparse), one query — Available since v26.5.1. The canonical modern-RAG retrieval shape. Dense + sparse hybrid with Reciprocal Rank Fusion, deduped to one chunk per source document via groupBy:
SELECT expand(`vector.fuse`(
`vector.neighbors`('Chunk[embedding]', :denseQuery, 50),
`vector.sparseNeighbors`('Chunk[tokens,weights]', :qIdx, :qVal, 50),
{ fusion: 'RRF', groupBy: 'source', groupSize: 1 }
)) LIMIT 10
Sparse retrieval fixes the case where dense embeddings collapse semantically distinct queries onto the same point: "freeze card after theft" and "unfreeze card after travel" share the same dense vector but separate cleanly under sparse term matching. groupBy: 'source', groupSize: 1 ensures each source document contributes its single best matching chunk to the LLM context window rather than overwhelming the prompt with near-duplicate chunks from one document.
Multi-Hop Entity Bridge — Discover related entities across documents:
MATCH (c:Chunk)-[:MENTIONS]->(e:Entity)-[:RELATES_TO*1..2]-(related:Entity)<-[:MENTIONS]-(other:Chunk)
WHERE c.source = 'quantum_computing.txt'
RETURN related.name, other.content, other.source
Composite Scoring — Combine vector distance with graph connectivity for ranked retrieval:
SELECT content, source,
(1.0 / (1.0 + distance)) * 0.7 + (entityCount / 5.0) * 0.3 AS compositeScore
FROM ChunkScores ORDER BY compositeScore DESC
Try It Yourself
git clone https://github.com/ArcadeData/arcadedb-usecases.git
cd arcadedb-usecases/graph-rag
docker compose up -d
./setup.sh
./queries/queries.sh
Full source: graph-rag on GitHub