Import of large dataset in Neo4j taking really long (>12 hours) with Neo4j import tool

前端 未结 2 428
臣服心动
臣服心动 2021-01-20 09:28

I have a large dataset (about 1B nodes and a few billion relationships) that I am trying to import into Neo4j. I am using the Neo4j import tool. The nodes finished importing

2条回答
  •  醉话见心
    2021-01-20 09:51

    I think I found the issue. I was using some of the tips here
    http://neo4j.com/developer/guide-import-csv/#_super_fast_batch_importer_for_huge_datasets where it says I can re-use the same csv file with different headers -- once for nodes and once for relationships. I underestimated the 1-n (ness) of the data I was using, causing a lot of duplicates on the ID. That stage was basically almost all spent on trying to sort and then dedupe. Re-working my queries to extract the data split into nodes and rels files, fixed that problem. Thanks for looking into this!
    So basically, ideally always having separate files for each type of node and rel will give fastest results (at least in my tests).

提交回复
热议问题