Import of large dataset in Neo4j taking really long (>12 hours) with Neo4j import tool

前端 未结 2 427
臣服心动
臣服心动 2021-01-20 09:28

I have a large dataset (about 1B nodes and a few billion relationships) that I am trying to import into Neo4j. I am using the Neo4j import tool. The nodes finished importing

相关标签:
2条回答
  • 2021-01-20 09:48

    Have a look at the batch importer I wrote for a stress test:

    https://github.com/graphaware/neo4j-stress-test

    I used both neo4j index and in memory map between two commit. It is really fast and works for both version of neo4j.

    Ignore the tests and get the batch importer.

    0 讨论(0)
  • 2021-01-20 09:51

    I think I found the issue. I was using some of the tips here
    http://neo4j.com/developer/guide-import-csv/#_super_fast_batch_importer_for_huge_datasets where it says I can re-use the same csv file with different headers -- once for nodes and once for relationships. I underestimated the 1-n (ness) of the data I was using, causing a lot of duplicates on the ID. That stage was basically almost all spent on trying to sort and then dedupe. Re-working my queries to extract the data split into nodes and rels files, fixed that problem. Thanks for looking into this!
    So basically, ideally always having separate files for each type of node and rel will give fastest results (at least in my tests).

    0 讨论(0)
提交回复
热议问题