Neo4j Importer
|
Migrating from Neo4j? The ArcadeDB Academy offers a free course on Neo4j-to-ArcadeDB migration with hands-on exercises and a certification at the end. |
ArcadeDB is able to import a database exported from Neo4j in JSONL format (one json per line).
To export a Neo4j database follow the instructions in Export in JSON. The resulting file contains one json per line.
Performance
The Neo4j importer uses the high-performance GraphBatch API internally:
-
Vertices are created with pre-allocated edge segments, eliminating lazy allocation during edge creation.
-
Edges are buffered in flat primitive arrays and flushed sorted by source vertex, converting random I/O into sequential I/O. Light edges (no edge record on disk) are used when an edge has no properties.
-
WAL is disabled during import for maximum throughput.
-
ID mapping uses a primitive
long[]-based hash map when Neo4j IDs are numeric (the common case with APOC exports), using only ~24 bytes per vertex. For non-numeric IDs, the importer automatically falls back to a standardHashMap. This makes it possible to import databases with hundreds of millions of vertices within a few gigabytes of heap.
Multi-label handling
Neo4j supports multiple labels per node, while in ArcadeDB a node (vertex) must have only one type.
The Neo4j importer will simulate multiple labels by creating new types with the following name: <label1>[~<labelN>]*.
Example:
{"type":"node","id":"1","labels":["User", "Administrator"],"properties":{"name":"Jim","age":42}}
This vertex will be created in ArcadeDB with type "Administrator~User" (the labels are always sorted alphabetically) that extends both "Administrator" and "User" types.
In this way you can use the polymorphism of ArcadeDB to retrieve all the nodes of type "User" and the record of User and all its subtypes will be returned.
Importing via SQL
To import a database use the Import Database command from API, Studio or Console. Below you can find an example of importing the Neo4j’s PanamaPapers database by using ArcadeDB Console.
> CREATE DATABASE PanamaPapers
{PanamaPapers}> IMPORT DATABASE file:///temp/panama-papers-neo4j.jsonl
ArcadeDB 26.5.1 - Neo4j Importer
Importing Neo4j database from file 'panama-papers-neo4j.jsonl' to 'databases/PanamaPapers'
- Creation of the schema: types, properties and indexes
- Creation of vertices started
- Creation of vertices completed: created 3 vertices, skipped 1 edges (0 vertices/sec elapsed=0 secs)
- ID mapping mode: numeric (primitive long[])
- Creation of edges started: creating edges between vertices
- Creation of edges completed: created 1 edges, (0 edges/sec elapsed=0 secs)
***************************************************************************************************
Import of Neo4j database completed in 0 secs with 0 errors and 0 warnings.
Importing via command line
The Neo4j importer can also be used directly from the command line:
java -cp lib/* com.arcadedb.integration.importer.Neo4jImporter -i <input-file> -d <database-path> [options]
Options:
| Option | Default | Description |
|---|---|---|
|
Path to the Neo4j JSONL export file (required) |
|
|
Path where the ArcadeDB database will be created (required) |
|
|
false |
Overwrite the database if it already exists |
|
10,000 |
Number of records per transaction batch |
|
DECIMAL |
Type for decimal values: FLOAT, DOUBLE, or DECIMAL |
|
10 |
Bits allocated for bucket IDs in the internal RID packing. The default supports up to 1,023 buckets, which is sufficient for most databases. Increase this value only if you have a very large number of types and buckets (e.g. |
Example:
java -cp lib/* com.arcadedb.integration.importer.Neo4jImporter \
-i /data/neo4j-export.jsonl -d /data/arcadedb/mydb -o -decimalType double
Memory considerations
For large imports (hundreds of millions of vertices), the main memory consumer is the ID mapping table that translates Neo4j node IDs to ArcadeDB record IDs. The table size depends on the ID format:
| ID format | Memory per vertex | Example: 100M vertices |
|---|---|---|
Numeric (e.g. "0", "12345") |
~24 bytes |
~2.2 GB |
String (e.g. "node-abc") |
~140 bytes |
~13 GB |
Neo4j APOC exports use numeric IDs by default, so most imports will use the compact primitive map. If the importer encounters a non-numeric ID, it automatically migrates to the string-based map and logs a message:
- Non-numeric Neo4j ID detected, switching to string-based ID mapping
For very large imports, allocate enough heap memory. For example, to import a database with 500M vertices using numeric IDs, you would need approximately 12 GB for the ID mapping table alone, plus memory for ArcadeDB’s internal buffers. A setting of -Xmx24G or more is recommended.
|
Differences with Neo4j
ArcadeDB is fully compatible with Neo4j at the wire-protocol and query-language level — your existing applications can connect to ArcadeDB by changing the connection URL alone — but the engine underneath is multi-model, embeddable, and Apache 2.0 licensed. This page summarises what carries over, what is new, and how to migrate.
What’s compatible
ArcadeDB ships several drop-in compatibility layers so existing Neo4j codebases keep working with minimal changes:
-
OpenCypher — ArcadeDB implements OpenCypher (97.8% TCK pass rate) on every supported data model. Most Cypher queries written against Neo4j run unmodified.
-
Bolt protocol — ArcadeDB exposes a Bolt server compatible with Bolt v3.0, v4.0, and v4.4. Official Neo4j drivers (Java, Python, JavaScript, .NET, Go, Ruby) connect by switching the URL and credentials; existing application code stays untouched.
-
APOC — A subset of Neo4j’s APOC procedures is available through ArcadeDB’s Extended Functions.
Where ArcadeDB goes further
Beyond Neo4j compatibility, ArcadeDB adds capabilities that typically require multiple databases on the Neo4j stack:
-
Multi-model — graph, document, key/value, search, time-series, vector, and geospatial data live in a single engine and a single transaction. Neo4j is graph-only.
-
Multi-language — beyond Cypher, ArcadeDB speaks SQL, Gremlin, GraphQL, MongoDB Query Language, and Redis commands.
-
Native vector search — built-in JVector (DiskANN + HNSW with SIMD acceleration). No external vector database required.
-
Built-in full-text search with fuzzy matching, integrated with the query languages.
-
High Availability and replication are part of the open-source distribution. Neo4j requires the (paid) Enterprise edition for HA.
-
Embeddable — ArcadeDB runs inside a JVM application with a few-megabyte footprint (as low as 16 MB heap). Neo4j Embedded is Enterprise-only.
-
Apache 2.0 licence — free for commercial use, no copyleft. Neo4j Community is GPL, which forces distributed applications to publish their source.
Performance comparison
The LDBC Graphalytics benchmark shows the following timings on identical workloads (lower is better):
| Algorithm | ArcadeDB | Neo4j |
|---|---|---|
PageRank |
0.48 s |
11.15 s |
Weakly Connected Components (WCC) |
0.30 s |
0.75 s |
Breadth-First Search (BFS) |
0.13 s |
1.91 s |
Local Clustering Coefficient (LCC) |
27.41 s |
45.78 s |
Single-Source Shortest Path (SSSP) |
3.53 s |
not available |
Community Detection (CDLP) |
3.67 s |
6.43 s |
ArcadeDB’s Graph OLAP engine delivers up to 400× faster analytics than Neo4j on the same hardware.
Connecting an existing Neo4j application
The quickest path is the Bolt server: keep using Neo4j’s official drivers and only change the connection URL and credentials.
// Existing Neo4j code works unchanged when pointed at ArcadeDB's Bolt port.
Driver driver = GraphDatabase.driver(
"bolt://localhost:7687",
AuthTokens.basic("root", "arcadedb_password"),
Config.builder().withoutEncryption().build());
try (Session session = driver.session(SessionConfig.forDatabase("mydb"))) {
Result rs = session.run(
"MATCH (p:Person)-[:KNOWS]->(f) WHERE p.name = $n RETURN f.name AS name",
Map.of("n", "Alice"));
rs.list().forEach(r -> System.out.println(r.get("name").asString()));
}
See Neo4j Bolt Protocol Plugin for server configuration and supported clients.
Importing a Neo4j database
ArcadeDB ships a Neo4j importer that reads APOC’s JSONL export format and rebuilds the graph on top of ArcadeDB’s storage.
-
Export the source database from Neo4j using APOC:
CALL apoc.export.json.all("neo4j-export.jsonl", {})APOC writes one JSON object per line — vertices first, then relationships:
{"type":"node","id":"0","labels":["User"],"properties":{"name":"Adam","age":42}} {"type":"node","id":"1","labels":["User"],"properties":{"name":"Jim","age":42}} {"type":"relationship","label":"KNOWS","properties":{"since":1993}, "start":{"id":"0","labels":["User"]},"end":{"id":"1","labels":["User"]}} -
Import into ArcadeDB through the console:
> CREATE DATABASE MyDatabase {MyDatabase}> IMPORT DATABASE file:///path/to/neo4j-export.jsonlor programmatically through the Java API:
Neo4jImporter importer = new Neo4jImporter( "-i", "/path/to/neo4j-export.jsonl", "-d", "./databases/MyDatabase", "-o"); // overwrite if exists importer.run();
The importer runs three passes: schema reconciliation, vertex creation, and edge creation.
Multi-label nodes
Neo4j allows a single node to carry several labels (for example [User, Administrator]). ArcadeDB models the same idea with type inheritance: the importer creates a synthetic type Administrator~User (labels sorted alphabetically and joined with ~) that extends both Administrator and User. Polymorphic queries work as expected — SELECT FROM User returns ordinary User vertices and every Administrator~User vertex.
Schema and constraints
Neo4j’s CREATE CONSTRAINT … REQUIRE … IS UNIQUE translates to a small block of ArcadeDB SQL DDL:
-- Neo4j
-- CREATE CONSTRAINT FOR (p:Person) REQUIRE p.id IS UNIQUE
CREATE VERTEX TYPE Person
CREATE EDGE TYPE KNOWS
CREATE PROPERTY Person.id LONG
CREATE INDEX ON Person (id) UNIQUE
Embedding ArcadeDB
The native Java API skips the Bolt round-trip entirely and gives you full multi-model access in-process. This mode is exclusive to Apache 2.0 — no Enterprise licence required.
Database database = new DatabaseFactory("./databases/mydb").open();
database.transaction(() -> {
MutableVertex alice = database.newVertex("Person");
alice.set("name", "Alice");
alice.set("born", 1985);
alice.save();
MutableVertex bob = database.newVertex("Person");
bob.set("name", "Bob");
bob.set("born", 1990);
bob.save();
alice.newEdge("KNOWS", bob, "since", 2015);
});
// Native graph traversal — no string queries, no parser overhead.
Vertex alice = database.lookupByKey("Person", "name", "Alice").next().asVertex();
for (Vertex friend : alice.getVertices(Vertex.DIRECTION.OUT, "KNOWS"))
System.out.println(friend.getString("name"));
For sustained inserts, use the asynchronous API to fan operations across all available threads:
database.async().onError(Throwable::printStackTrace);
for (int i = 0; i < 1_000_000; i++) {
MutableVertex v = database.newVertex("Person");
v.set("id", i);
v.set("name", "Person_" + i);
database.async().createRecord(v, null);
}
Further reading
-
Cypher in ArcadeDB — language overview and compatibility notes.
-
Cypher Compatibility — the exact subset of OpenCypher TCK ArcadeDB passes.
-
Bolt Protocol — wire-protocol setup for Neo4j drivers.
-
Neo4j Importer — full reference for the JSONL importer used above.
-
Graph OLAP Engine — the analytics engine behind the benchmark numbers.