Import of large dataset in Neo4j taking really long (>12 hours) with Neo4j import tool

前端未结

关注

 2  1828

I have a large dataset (about 1B nodes and a few billion relationships) that I am trying to import into Neo4j. I am using the Neo4j import tool. The nodes finished importing

相关标签:

2条回答

臣服心动

2021-01-20 09:44

I think I found the issue. I was using some of the tips here
http://neo4j.com/developer/guide-import-csv/#_super_fast_batch_importer_for_huge_datasets where it says I can re-use the same csv file with different headers -- once for nodes and once for relationships. I underestimated the 1-n (ness) of the data I was using, causing a lot of duplicates on the ID. That stage was basically almost all spent on trying to sort and then dedupe. Re-working my queries to extract the data split into nodes and rels files, fixed that problem. Thanks for looking into this!
So basically, ideally always having separate files for each type of node and rel will give fastest results (at least in my tests).

0 讨论(0)
发布评论:

提交评论
- 加载中...
感动是毒

2021-01-20 09:52

Have a look at the batch importer I wrote for a stress test:

https://github.com/graphaware/neo4j-stress-test

I used both neo4j index and in memory map between two commit. It is really fast and works for both version of neo4j.

Ignore the tests and get the batch importer.

0 讨论(0)
发布评论:

提交评论
- 加载中...