Tuning Neo4j for Performance

后端 未结 3 1917
感情败类
感情败类 2021-01-31 22:06

I have imported data using Michael Hunger\'s Batch Import, through which I created:-

4,612,893 nodes
14,495,063 properties
    node properties are indexed.
5,300         


        
3条回答
  •  一整个雨季
    2021-01-31 22:21

    Running this on my macbook air with little RAM and CPU with your dataset.

    You will get much faster than my results with more memory mapping, GCR cache and more heap for caches. Also make sure to use parameters in your queries.

    You are running into combinatorial explosion.

    Every step of the path adds "times rels" elements/rows to your matched subgraphs.

    See for instance here: you end up at 269268 matches but you only have 81674 distinct lu's

    The problem is that for each row the next match is expanded. So if you use distinct in between to limit the sizes again it will be some order of magnitutes less data. Same for the next level.

    START u=node(467242)
    MATCH u-[:LIKED|COMMENTED]->a
    WITH distinct a
    MATCH a<-[r2:LIKED|COMMENTED]-lu
    RETURN count(*),count(distinct a),count(distinct lu);
    
    +---------------------------------------------------+
    | count(*) | count(distinct a) | count(distinct lu) |
    +---------------------------------------------------+
    | 269268   | 1952              | 81674              |
    +---------------------------------------------------+
    1 row
    
    895 ms
    
    START u=node(467242)
    MATCH u-[:LIKED|COMMENTED]->a
    WITH distinct a
    MATCH a<-[:LIKED|COMMENTED]-lu
    WITH distinct lu
    MATCH lu-[:LIKED]-b
    RETURN count(*),count(distinct lu), count(distinct b)
    ;
    +---------------------------------------------------+
    | count(*) | count(distinct lu) | count(distinct b) |
    +---------------------------------------------------+
    | 2311694  | 62705              | 91294             |
    +---------------------------------------------------+
    

    Here you have 2.3M total matches and only 91k distinct elements. So almost 2 orders of magnitude.

    This is a huge aggregation which is rather a BI / statistics query that an OLTP query. Usually you can store the results e.g. on the user-node and only re-execute this in the background.

    THESE kind of queries are again global graph queries (statistics / BI ), in this case top 10 users.

    Usually you would run these in the background (e.g. once per day or hour) and connect the top 10 user nodes to a special node or index that then can be queried in a few ms.

    START a=node:nodes(kind="user") RETURN count(*);
    +----------+
    | count(*) |
    +----------+
    | 3889031  |
    +----------+
    1 row
    
    27329 ms
    

    After all you are running a match across the whole graph, i.e. 4M users that's a graph global, not a graph local query.

    START n=node:nodes(kind="top-user")
    MATCH n-[r?:TOP_USER]-()
    DELETE r
    WITH distinct n
    START a=node:nodes(kind="user")
    MATCH a-[:CREATED|LIKED|COMMENTED|FOLLOWS]-()
    WITH n, a,count(*) as cnt
    ORDER BY cnt DESC
    LIMIT 10
    CREATE a-[:TOP_USER {count:cnt} ]->n;
    
    +-------------------+
    | No data returned. |
    +-------------------+
    Relationships created: 10
    Properties set: 10
    Relationships deleted: 10
    
    70316 ms
    

    The querying would then be:

    START n=node:nodes(kind="top-user")
    MATCH n-[r:TOP_USER]-a
    RETURN a, r.count
    ORDER BY r.count DESC;
    
    +--------------------------------------------------------------------------------------------------------------------------------------------------------------+
    | a                                                                                                                                                  | r.count |
    +--------------------------------------------------------------------------------------------------------------------------------------------------------------+
    ….
    +--------------------------------------------------------------------------------------------------------------------------------------------------------------+
    10 rows
    
    4 ms
    

提交回复
热议问题