Neo4j: Best way to batch relate nodes using Cypher?

后端 未结 2 1701
不思量自难忘°
不思量自难忘° 2021-01-25 21:22

When I run a script that tries to batch merge all nodes a certain types, I am getting some weird performance results.

When merging 2 collections of nodes (~42k) and (~26

相关标签:
2条回答
  • 2021-01-25 21:48

    So far, the best I could come up with is the following (and it's a hack, specific to my environment):

    If / Else condition:

    If childrenNodes.count() < 200 -> assume they are type identifiers for the parent... i.e. ContactPrefixType

    Else assume it is a matrix for relating multiple item types together (i.e. ContactAddress)

    If childNodes < 200

    MATCH (parent:{parentLabel}), 
    (child:{childLabel} {{childLabelIdProperty}:parent.{parentRelationProperty}})
    CREATE child-[r:{relationshipLabel}]->parent
    

    This takes about 3-5 seconds to complete per relationship type

    Else

    MATCH (child:{childLabel}), 
    (parent:{parentLabel} {{parentPropertyField : child.{childLabelIdProperty}})
    WITH collect(parent) as parentCollection, child
    WITH parentCollection[{batchStart}..{batchEnd}] as coll, child
    FOREACH (parent in coll | 
    CREATE child-[r:{relationshipLabel}]-parent )
    

    I'm not sure this is the most efficient way of doing this, but after trying MANY different options, this seems to be the fastest.

    Stats:

    1. insert 225,018 nodes with 2,070,977 properties
    2. create 464,606 relationships

    Total: 331 seconds.

    Because this is a straight import and I'm not dealing with updates yet, I assume that all the relationships are correct and don't need to worry about invalid data... however, I will try to set properties to the relationship type so as to be able to perform cleanup functions later (i.e. store the parent and child Id's in the relationship type as properties for later reference)

    If anyone can improve on this, I would love it.

    0 讨论(0)
  • 2021-01-25 21:55

    Can you pass the ids in as parameters rather than fetch them from the graph? The query could look like

    MATCH (s:ContactPlayer {ContactPrefixTypeId:{cptid})
    MERGE (c:ContactPrefixType {ContactPrefixTypeId:{cptid})
    MERGE c-[:CONTACT_PLAYER]->s
    

    If you use the REST API Cypher resource, I think the entity should look something like

    {
        "query":...,
        "params": {
            "cptid":id1
        }
    }
    

    If you use the transactional endpoint, it should look something like this. You control transaction size by the number of statements in each call, and also by the number of calls before you commit. More here.

    {
        "statements":[
            "statement":...,
            "parameters": {
                "cptid":id1
            },
            "statement":...,
            "parameters": {
                "cptid":id2
            }
        ]
    }
    
    0 讨论(0)
提交回复
热议问题