Neo4j: how do I delete all duplicate relationships in the database through cypher?

前端 未结 2 795
被撕碎了的回忆
被撕碎了的回忆 2021-01-03 00:46

I have a huge database with a ton of nodes (10mil+). There is only one type of relationship in the whole database. However, there are a ton of nodes that have duplicated rel

相关标签:
2条回答
  • 2021-01-03 00:55

    This is a version of the accepted answer that has been fixed (by inserting the WITH rr clause) to work with more recent neo4j versions, and which should be faster (since it only creates the new TAIL list when needed):

    MATCH (a)-[r]->(b)
    WITH a, b, COLLECT(r) AS rr
    WHERE SIZE(rr) > 1
    WITH rr
    LIMIT 100000
    FOREACH (r IN TAIL(rr) | DELETE r);
    

    [UPDATE]

    If you only want to delete duplicate relationships with the same type, then do this:

    MATCH (a)-[r]->(b)
    WITH a, b, TYPE(r) AS t, COLLECT(r) AS rr
    WHERE SIZE(rr) > 1
    WITH rr
    LIMIT 100000
    FOREACH (r IN TAIL(rr) | DELETE r);
    
    0 讨论(0)
  • 2021-01-03 01:03

    What error do you get with the db global query in the linked SO question? Try substituting | for : in the FOREACH, that's the only breaking syntax difference that I can see. The 2.x way to say the same thing, except adapted to your having only one relationship type in the db, might be

    MATCH (a)-[r]->(b)
    WITH a, b, TAIL (COLLECT (r)) as rr
    FOREACH (r IN rr | DELETE r)
    

    I think the WITH pipe will carry the empty tails when there is no duplicate, and I don't know how expensive it is to loop through an empty collection–my sense is that the place to introduce the limit is with a filter after the WITH, something like

    MATCH (a)-[r]->(b)
    WITH a, b, TAIL (COLLECT (r)) as rr
    WHERE length(rr) > 0 LIMIT 100000
    FOREACH (r IN rr | DELETE r)
    

    Since this query doesn't touch properties at all (as opposed to yours, which returns properties for (a) and (b)) I don't think it should be very memory heavy for a medium graph like yours, but you will have to experiment with the limit.

    If memory is still a problem, then if there is any way for you to limit the nodes to work with (without touching properties), that's also a good idea. If your nodes are distinguishable by label, try running the query for one label at the time

    MATCH (a:A)-[r]->(b) //etc..
    MATCH (a:B)-[r]->(b) //etc..
    
    0 讨论(0)
提交回复
热议问题