Neo4j Cypher return most consecutive “passes”

后端 未结 2 998
深忆病人
深忆病人 2020-12-22 07:02

I am trying to return from a graph database the students with the most consecutive passes to a series of exams.

Below is my current code but not sure where I can ta

相关标签:
2条回答
  • 2020-12-22 07:40

    You can do it with plain Cypher, but I don't think it's very practical - you essentially need to write a program with reduce.

    Basically, the "split" works as follows: initialize an empty accumulator list and calculate streaks by iterating through the list of passes/fails, check whether the current element is the same as the previous one. For example ['pass', 'pass'] keeps the streak, ['pass', 'fail'] breaks it. If it breaks (like at the start of the list), append a new element to the accumulator. If it keeps, append a new element to the last element of the accumulator, e.g. with a new 'fail', [['pass', 'pass'], ['fail']] becomes [['pass', 'pass'], ['fail', 'fail]].

    UNWIND
      [
        ['joe',  'pass'],
        ['matt', 'pass'],
        ['joe',  'fail'],
        ['matt', 'pass'],
        ['joe',  'pass'],
        ['matt', 'pass'],
        ['joe',  'pass'],
        ['matt', 'fail']
      ] AS row
    WITH row[0] AS s, row[1] AS passed
    WITH s, collect(passed) AS p
    WITH s, reduce(acc = [], i IN range(0, size(p) - 1) | 
        CASE p[i] = p[i-1]
          WHEN true THEN [j IN range(0, size(acc) - 1) |
              CASE j = size(acc) - 1
                WHEN true THEN acc[j] + [p[i]]
                ELSE acc[j]
              END
            ]
          ELSE acc + [[p[i]]]
        END
      ) AS streaks // (1)
    UNWIND streaks AS streak
    WITH s, streak
    WHERE streak[0] <> 'fail'
    RETURN s, max(size(streak)) AS consecutivePasses // (2)
    

    In step (1), this calculates streaks such as:

    ╒══════╤═════════════════════════════════╕
    │"s"   │"streaks"                        │
    ╞══════╪═════════════════════════════════╡
    │"matt"│[["pass","pass","pass"],["fail"]]│
    ├──────┼─────────────────────────────────┤
    │"joe" │[["fail"],["pass","pass"]]       │
    └──────┴─────────────────────────────────┘
    

    And in (2), it gives:

    ╒══════╤═══════════════════╕
    │"s"   │"consecutivePasses"│
    ╞══════╪═══════════════════╡
    │"matt"│3                  │
    ├──────┼───────────────────┤
    │"joe" │2                  │
    └──────┴───────────────────┘
    

    Of course, in this particular case it's not necessary to do the splitting: simply counting would be enough. But in 99% of practical situations, APOC is the way to go, so I did not bother optimising this solution.

    0 讨论(0)
  • 2020-12-22 07:42

    This is a tricky one, and as far as I know can't be done with just Cypher, but there is a procedure in APOC Procedures that can help.

    apoc.coll.split() takes a collection and a value to split around, and will yield records for each resulting sub-collection. Basically, we collect the ordered results per student, split around failures to get collections of consecutive passes, then get the max consecutive passes from the sizes of those collections:

    MATCH (s:Student)-[r:TAKEN]->(e:Exam)
    WITH s, r.score >= e.pass_mark as passed
    ORDER BY e.date
    WITH s, collect(passed) as resultsColl
    CALL apoc.coll.split(resultsColl, false) YIELD value
    WITH s, max(size(value)) as consecutivePasses
    RETURN s.name as student, consecutivePasses
    
    0 讨论(0)
提交回复
热议问题