I have a huge database with a ton of nodes (10mil+). There is only one type of relationship in the whole database. However, there are a ton of nodes that have duplicated rel
What error do you get with the db global query in the linked SO question? Try substituting |
for :
in the FOREACH
, that's the only breaking syntax difference that I can see. The 2.x way to say the same thing, except adapted to your having only one relationship type in the db, might be
MATCH (a)-[r]->(b)
WITH a, b, TAIL (COLLECT (r)) as rr
FOREACH (r IN rr | DELETE r)
I think the WITH
pipe will carry the empty tails when there is no duplicate, and I don't know how expensive it is to loop through an empty collection–my sense is that the place to introduce the limit is with a filter after the WITH
, something like
MATCH (a)-[r]->(b)
WITH a, b, TAIL (COLLECT (r)) as rr
WHERE length(rr) > 0 LIMIT 100000
FOREACH (r IN rr | DELETE r)
Since this query doesn't touch properties at all (as opposed to yours, which returns properties for (a) and (b)) I don't think it should be very memory heavy for a medium graph like yours, but you will have to experiment with the limit.
If memory is still a problem, then if there is any way for you to limit the nodes to work with (without touching properties), that's also a good idea. If your nodes are distinguishable by label, try running the query for one label at the time
MATCH (a:A)-[r]->(b) //etc..
MATCH (a:B)-[r]->(b) //etc..