Neo4j crashes on batch import

▼魔方 西西 提交于 2021-02-10 17:50:25

问题


I'm importing nodes that are all part of one Merge and relationship creation statement, but Neo4j is crashing with StackOverflowExceptions or "ERROR (-v for expanded information): Error unmarshaling return header; nested exception is: java.net.SocketException: Software caused connection abort: recv failed"

I admit my approach may be faulty, but I have some (A) nodes with ~8000 relationships to nodes of type (B) and (B) nodes have ~ 7000 relationships to other (A) nodes.

I basically have a big MERGE statement that creates the (A) & (B) nodes with a CREATE UNIQUE that does all the relationship creating at the end. I store all this Cypher in a file and import it through the Neo4jShell.

Example:

MERGE (foo:A { id:'blah'})
MERGE (bar:B {id:'blah2'})
MERGE (bar2:B1 {id:'blah3'})
MERGE (bar3:B3 {id:'blah3'})
MERGE (foo2:A1 {id:'blah4'})
... // thousands more of these
CREATE UNIQUE foo-[:x]->bar,  bar-[:y]->foo2, // hundreds more of these

Is there a better way to do this ? I was trying to avoid creating all the Merge statements, then matching each one to create the relationships in another query. I get really slow import performance on both ways. Splitting up each merge as a transaction is slow (2 hrs import for 60K, nodes/relationships). Current approach crashes neo4j

The current one big merge/create unique approach works for the first big insert, but fails after that when the next big insert uses 5000 nodes and 8000 relationships. Here is the result for the first big merge:

Nodes created: 756
Relationships created: 933
Properties set: 5633
Labels added: 756
15101 ms

I'm using a Windows 7 machine with 8GB RAM. In my neo4j.wrapper I use:

wrapper.java.initmemory=512
wrapper.java.maxmemory=2048

回答1:


There are 3 things that might help:

  1. If you don't really need merge, you should use just a create instead. Create is more efficient because it doesn't have to check for existing relations

  2. Make sure your indexes are correct

  3. You now have everything in 1 big transaction. You state the alternative of having every statement in 1 transaction. Neither works for you. However, you could make transactions of, say, 100 statements each. This approach should be quicker than 1 statement per transaction, and still use less memory than putting everything in 1 big transaction



来源:https://stackoverflow.com/questions/27439424/neo4j-crashes-on-batch-import

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!