Neo4j crashes on batch import

问题

I'm importing nodes that are all part of one Merge and relationship creation statement, but Neo4j is crashing with StackOverflowExceptions or "ERROR (-v for expanded information): Error unmarshaling return header; nested exception is: java.net.SocketException: Software caused connection abort: recv failed"

I admit my approach may be faulty, but I have some (A) nodes with ~8000 relationships to nodes of type (B) and (B) nodes have ~ 7000 relationships to other (A) nodes.

I basically have a big MERGE statement that creates the (A) & (B) nodes with a CREATE UNIQUE that does all the relationship creating at the end. I store all this Cypher in a file and import it through the Neo4jShell.

Example:

MERGE (foo:A { id:'blah'})
MERGE (bar:B {id:'blah2'})
MERGE (bar2:B1 {id:'blah3'})
MERGE (bar3:B3 {id:'blah3'})
MERGE (foo2:A1 {id:'blah4'})
... // thousands more of these
CREATE UNIQUE foo-[:x]->bar,  bar-[:y]->foo2, // hundreds more of these

Is there a better way to do this ? I was trying to avoid creating all the Merge statements, then matching each one to create the relationships in another query. I get really slow import performance on both ways. Splitting up each merge as a transaction is slow (2 hrs import for 60K, nodes/relationships). Current approach crashes neo4j

The current one big merge/create unique approach works for the first big insert, but fails after that when the next big insert uses 5000 nodes and 8000 relationships. Here is the result for the first big merge:

Nodes created: 756
Relationships created: 933
Properties set: 5633
Labels added: 756
15101 ms

I'm using a Windows 7 machine with 8GB RAM. In my neo4j.wrapper I use:

wrapper.java.initmemory=512
wrapper.java.maxmemory=2048

回答1:

There are 3 things that might help:

If you don't really need merge, you should use just a create instead. Create is more efficient because it doesn't have to check for existing relations
Make sure your indexes are correct
You now have everything in 1 big transaction. You state the alternative of having every statement in 1 transaction. Neither works for you. However, you could make transactions of, say, 100 statements each. This approach should be quicker than 1 statement per transaction, and still use less memory than putting everything in 1 big transaction

来源：https://stackoverflow.com/questions/27439424/neo4j-crashes-on-batch-import

标签

neo4j