Neo4j Importer

Migrating from Neo4j? The ArcadeDB Academy offers a free course on Neo4j-to-ArcadeDB migration with hands-on exercises and a certification at the end.

ArcadeDB is able to import a database exported from Neo4j in JSONL format (one json per line).

To export a Neo4j database follow the instructions in Export in JSON. The resulting file contains one json per line.

Performance

The Neo4j importer uses the high-performance GraphBatch API internally:

Vertices are created with pre-allocated edge segments, eliminating lazy allocation during edge creation.
Edges are buffered in flat primitive arrays and flushed sorted by source vertex, converting random I/O into sequential I/O. Light edges (no edge record on disk) are used when an edge has no properties.
WAL is disabled during import for maximum throughput.
ID mapping uses a primitive long[]-based hash map when Neo4j IDs are numeric (the common case with APOC exports), using only ~24 bytes per vertex. For non-numeric IDs, the importer automatically falls back to a standard HashMap. This makes it possible to import databases with hundreds of millions of vertices within a few gigabytes of heap.

Multi-label handling

Neo4j supports multiple labels per node, while in ArcadeDB a node (vertex) must have only one type. The Neo4j importer will simulate multiple labels by creating new types with the following name: <label1>[~<labelN>]*. Example:

{"type":"node","id":"1","labels":["User", "Administrator"],"properties":{"name":"Jim","age":42}}

This vertex will be created in ArcadeDB with type "Administrator~User" (the labels are always sorted alphabetically) that extends both "Administrator" and "User" types.

In this way you can use the polymorphism of ArcadeDB to retrieve all the nodes of type "User" and the record of User and all its subtypes will be returned.

Importing via SQL

To import a database use the Import Database command from API, Studio or Console. Below you can find an example of importing the Neo4j’s PanamaPapers database by using ArcadeDB Console.

> CREATE DATABASE PanamaPapers
{PanamaPapers}> IMPORT DATABASE file:///temp/panama-papers-neo4j.jsonl

ArcadeDB 26.5.1 - Neo4j Importer
Importing Neo4j database from file 'panama-papers-neo4j.jsonl' to 'databases/PanamaPapers'
- Creation of the schema: types, properties and indexes
- Creation of vertices started
- Creation of vertices completed: created 3 vertices, skipped 1 edges (0 vertices/sec elapsed=0 secs)
- ID mapping mode: numeric (primitive long[])
- Creation of edges started: creating edges between vertices
- Creation of edges completed: created 1 edges, (0 edges/sec elapsed=0 secs)
***************************************************************************************************
Import of Neo4j database completed in 0 secs with 0 errors and 0 warnings.

Importing via command line

The Neo4j importer can also be used directly from the command line:

java -cp lib/* com.arcadedb.integration.importer.Neo4jImporter -i <input-file> -d <database-path> [options]

Options:

Option Default Description

Option	Default	Description
`-i <file>`		Path to the Neo4j JSONL export file (required)
`-d <path>`		Path where the ArcadeDB database will be created (required)
`-o`	false	Overwrite the database if it already exists
`-b <size>`	10,000	Number of records per transaction batch
`-decimalType <type>`	DECIMAL	Type for decimal values: FLOAT, DOUBLE, or DECIMAL
`-bucketBits <n>`	10	Bits allocated for bucket IDs in the internal RID packing. The default supports up to 1,023 buckets, which is sufficient for most databases. Increase this value only if you have a very large number of types and buckets (e.g. `-bucketBits 16` supports up to 65,535 buckets)

-i <file>

Path to the Neo4j JSONL export file (required)

-d <path>

Path where the ArcadeDB database will be created (required)

-o

false

Overwrite the database if it already exists

-b <size>

10,000

Number of records per transaction batch

-decimalType <type>

DECIMAL

Type for decimal values: FLOAT, DOUBLE, or DECIMAL

-bucketBits <n>

Bits allocated for bucket IDs in the internal RID packing. The default supports up to 1,023 buckets, which is sufficient for most databases. Increase this value only if you have a very large number of types and buckets (e.g. -bucketBits 16 supports up to 65,535 buckets)

Example:

java -cp lib/* com.arcadedb.integration.importer.Neo4jImporter \
  -i /data/neo4j-export.jsonl -d /data/arcadedb/mydb -o -decimalType double

Memory considerations

For large imports (hundreds of millions of vertices), the main memory consumer is the ID mapping table that translates Neo4j node IDs to ArcadeDB record IDs. The table size depends on the ID format:

ID format	Memory per vertex	Example: 100M vertices
Numeric (e.g. "0", "12345")	~24 bytes	~2.2 GB
String (e.g. "node-abc")	~140 bytes	~13 GB

ID format

Memory per vertex

Example: 100M vertices

Numeric (e.g. "0", "12345")

~24 bytes

~2.2 GB

String (e.g. "node-abc")

~140 bytes

~13 GB

Neo4j APOC exports use numeric IDs by default, so most imports will use the compact primitive map. If the importer encounters a non-numeric ID, it automatically migrates to the string-based map and logs a message:

- Non-numeric Neo4j ID detected, switching to string-based ID mapping

For very large imports, allocate enough heap memory. For example, to import a database with 500M vertices using numeric IDs, you would need approximately 12 GB for the ID mapping table alone, plus memory for ArcadeDB’s internal buffers. A setting of -Xmx24G or more is recommended.

Differences with Neo4j

ArcadeDB is fully compatible with Neo4j at the wire-protocol and query-language level — your existing applications can connect to ArcadeDB by changing the connection URL alone — but the engine underneath is multi-model, embeddable, and Apache 2.0 licensed. This page summarises what carries over, what is new, and how to migrate.

What’s compatible

ArcadeDB ships several drop-in compatibility layers so existing Neo4j codebases keep working with minimal changes:

OpenCypher — ArcadeDB implements OpenCypher (97.8% TCK pass rate) on every supported data model. Most Cypher queries written against Neo4j run unmodified.
Bolt protocol — ArcadeDB exposes a Bolt server compatible with Bolt v3.0, v4.0, and v4.4. Official Neo4j drivers (Java, Python, JavaScript, .NET, Go, Ruby) connect by switching the URL and credentials; existing application code stays untouched.
APOC — A subset of Neo4j’s APOC procedures is available through ArcadeDB’s Extended Functions.

Where ArcadeDB goes further

Beyond Neo4j compatibility, ArcadeDB adds capabilities that typically require multiple databases on the Neo4j stack:

Multi-model — graph, document, key/value, search, time-series, vector, and geospatial data live in a single engine and a single transaction. Neo4j is graph-only.
Multi-language — beyond Cypher, ArcadeDB speaks SQL, Gremlin, GraphQL, MongoDB Query Language, and Redis commands.
Native vector search — built-in JVector (DiskANN + HNSW with SIMD acceleration). No external vector database required.
Built-in full-text search with fuzzy matching, integrated with the query languages.
High Availability and replication are part of the open-source distribution. Neo4j requires the (paid) Enterprise edition for HA.
Embeddable — ArcadeDB runs inside a JVM application with a few-megabyte footprint (as low as 16 MB heap). Neo4j Embedded is Enterprise-only.
Apache 2.0 licence — free for commercial use, no copyleft. Neo4j Community is GPL, which forces distributed applications to publish their source.

Performance comparison

The LDBC Graphalytics benchmark shows the following timings on identical workloads (lower is better):

Algorithm	ArcadeDB	Neo4j
PageRank	0.48 s	11.15 s
Weakly Connected Components (WCC)	0.30 s	0.75 s
Breadth-First Search (BFS)	0.13 s	1.91 s
Local Clustering Coefficient (LCC)	27.41 s	45.78 s
Single-Source Shortest Path (SSSP)	3.53 s	not available
Community Detection (CDLP)	3.67 s	6.43 s

Algorithm

ArcadeDB

Neo4j

PageRank

0.48 s

11.15 s

Weakly Connected Components (WCC)

0.30 s

0.75 s

Breadth-First Search (BFS)

0.13 s

1.91 s

Local Clustering Coefficient (LCC)

27.41 s

45.78 s

Single-Source Shortest Path (SSSP)

3.53 s

not available

Community Detection (CDLP)

3.67 s

6.43 s

ArcadeDB’s Graph OLAP engine delivers up to 400× faster analytics than Neo4j on the same hardware.

Connecting an existing Neo4j application

The quickest path is the Bolt server: keep using Neo4j’s official drivers and only change the connection URL and credentials.

// Existing Neo4j code works unchanged when pointed at ArcadeDB's Bolt port.
Driver driver = GraphDatabase.driver(
    "bolt://localhost:7687",
    AuthTokens.basic("root", "arcadedb_password"),
    Config.builder().withoutEncryption().build());

try (Session session = driver.session(SessionConfig.forDatabase("mydb"))) {
  Result rs = session.run(
      "MATCH (p:Person)-[:KNOWS]->(f) WHERE p.name = $n RETURN f.name AS name",
      Map.of("n", "Alice"));
  rs.list().forEach(r -> System.out.println(r.get("name").asString()));
}

See Neo4j Bolt Protocol Plugin for server configuration and supported clients.

Importing a Neo4j database

ArcadeDB ships a Neo4j importer that reads APOC’s JSONL export format and rebuilds the graph on top of ArcadeDB’s storage.

Export the source database from Neo4j using APOC:

CALL apoc.export.json.all("neo4j-export.jsonl", {})

APOC writes one JSON object per line — vertices first, then relationships:

{"type":"node","id":"0","labels":["User"],"properties":{"name":"Adam","age":42}}
{"type":"node","id":"1","labels":["User"],"properties":{"name":"Jim","age":42}}
{"type":"relationship","label":"KNOWS","properties":{"since":1993},
 "start":{"id":"0","labels":["User"]},"end":{"id":"1","labels":["User"]}}

Import into ArcadeDB through the console:

> CREATE DATABASE MyDatabase
{MyDatabase}> IMPORT DATABASE file:///path/to/neo4j-export.jsonl

or programmatically through the Java API:

Neo4jImporter importer = new Neo4jImporter(
    "-i", "/path/to/neo4j-export.jsonl",
    "-d", "./databases/MyDatabase",
    "-o");                              // overwrite if exists
importer.run();

The importer runs three passes: schema reconciliation, vertex creation, and edge creation.

Multi-label nodes

Neo4j allows a single node to carry several labels (for example [User, Administrator]). ArcadeDB models the same idea with type inheritance: the importer creates a synthetic type Administrator~User (labels sorted alphabetically and joined with ~) that extends both Administrator and User. Polymorphic queries work as expected — SELECT FROM User returns ordinary User vertices and every Administrator~User vertex.

Schema and constraints

Neo4j’s CREATE CONSTRAINT … REQUIRE … IS UNIQUE translates to a small block of ArcadeDB SQL DDL:

-- Neo4j
-- CREATE CONSTRAINT FOR (p:Person) REQUIRE p.id IS UNIQUE

CREATE VERTEX TYPE Person
CREATE EDGE TYPE KNOWS
CREATE PROPERTY Person.id LONG
CREATE INDEX ON Person (id) UNIQUE

Embedding ArcadeDB

The native Java API skips the Bolt round-trip entirely and gives you full multi-model access in-process. This mode is exclusive to Apache 2.0 — no Enterprise licence required.

Database database = new DatabaseFactory("./databases/mydb").open();

database.transaction(() -> {
  MutableVertex alice = database.newVertex("Person");
  alice.set("name", "Alice");
  alice.set("born", 1985);
  alice.save();

  MutableVertex bob = database.newVertex("Person");
  bob.set("name", "Bob");
  bob.set("born", 1990);
  bob.save();

  alice.newEdge("KNOWS", bob, "since", 2015);
});

// Native graph traversal — no string queries, no parser overhead.
Vertex alice = database.lookupByKey("Person", "name", "Alice").next().asVertex();
for (Vertex friend : alice.getVertices(Vertex.DIRECTION.OUT, "KNOWS"))
  System.out.println(friend.getString("name"));

For sustained inserts, use the asynchronous API to fan operations across all available threads:

database.async().onError(Throwable::printStackTrace);
for (int i = 0; i < 1_000_000; i++) {
  MutableVertex v = database.newVertex("Person");
  v.set("id", i);
  v.set("name", "Person_" + i);
  database.async().createRecord(v, null);
}

Neo4j Importer

Performance

Multi-label handling

Importing via SQL

Importing via command line

Memory considerations

Differences with Neo4j

What’s compatible

Where ArcadeDB goes further

Performance comparison

Connecting an existing Neo4j application

Importing a Neo4j database

Multi-label nodes

Schema and constraints

Embedding ArcadeDB

Further reading