I need to group the data from a neo4j database and then to filter out everything except the top n
records of every group.
Example:
I have two node
Try
MATCH (o:Order)-[r:ADDED]->(a:Article)
WITH o, r, a
ORDER BY o.oid, r.t
WITH o, COLLECT(a)[..2] AS topArticlesByOrder UNWIND topArticlesByOrder AS a
RETURN a.aid AS articleId, COUNT(*) AS count
Results look like
articleId count
8 6
2 2
4 5
7 2
3 3
6 5
0 7
on this sample graph created with
FOREACH(opar IN RANGE(1,15) |
MERGE (o:Order {oid:opar})
FOREACH(apar IN RANGE(1,5) |
MERGE (a:Article {aid:TOINT(RAND()*10)})
CREATE o-[:ADDED {t:timestamp() - TOINT(RAND()*1000)}]->a
)
)
Use LIMIT
combined with ORDER BY
to get the top N of anything. For example, the top 5 scores would be:
MATCH (node:MyScoreNode)
RETURN node
ORDER BY node.score DESC
LIMIT 5;
The ORDER BY
part ensures the highest scores show up first. The LIMIT
gives you only the first 5, which since they're sorted, are always the highest.
I tried to achieve your desired results and failed.
So, my guess - this one is impossible with pure cypher.
What is the problem? Cypher is considering everything as a paths. And actually is doing traverse.
Trying to group results and then execute filter on each group means that cypher should somehow branch it traversing at some points. But Cypher executed filter on all results, because they are considered as collection of different paths.
My suggestion - create several queries, that achieves desired functionality, and implement some client-side logic.