Import of large dataset in Neo4j taking really long (>12 hours) with Neo4j import tool

前端未结

关注

 2  430

臣服心动 2021-01-20 09:28

I have a large dataset (about 1B nodes and a few billion relationships) that I am trying to import into Neo4j. I am using the Neo4j import tool. The nodes finished importing

2条回答

醉话见心 (楼主)

2021-01-20 09:51

I think I found the issue. I was using some of the tips here
http://neo4j.com/developer/guide-import-csv/#_super_fast_batch_importer_for_huge_datasets where it says I can re-use the same csv file with different headers -- once for nodes and once for relationships. I underestimated the 1-n (ness) of the data I was using, causing a lot of duplicates on the ID. That stage was basically almost all spent on trying to sort and then dedupe. Re-working my queries to extract the data split into nodes and rels files, fixed that problem. Thanks for looking into this!
So basically, ideally always having separate files for each type of node and rel will give fastest results (at least in my tests).

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...