Can graph db solve my graph problems?

纵饮孤独 提交于 2019-12-13 00:04:54

问题


I have transactions data (pastebin.com/ZswbyVHM) like this:

{"accountId":3,"recordedAt":"2013-12-01T00:00:00.000Z","region":"South","status":"H"}
{"accountId":3,"recordedAt":"2014-01-01T00:00:00.000Z","region":"South","status":"A"}
{"accountId":3,"recordedAt":"2014-02-01T00:00:00.000Z","region":"South","status":"B"}
{"accountId":3,"recordedAt":"2014-03-01T00:00:00.000Z","region":"South","status":"E"}
{"accountId":3,"recordedAt":"2014-04-01T00:00:00.000Z","region":"South","status":"C"}

when group transactions by accountId it can be viewed as:

{"accountId": 3, "region":"South", "transactions": [
    {"recordedAt":"2013-12-01T00:00:00.000Z", "status": "H"},
    {"recordedAt":"2014-01-01T00:00:00.000Z", "status": "A"},
    {"recordedAt":"2014-02-01T00:00:00.000Z", "status": "B"}]
}

From above view I would like to create a graph of status relation from transactions collection that can preserve the order of status, recordedAt, accountId with the following properties:

  • Nodes of the graph represent the status and implicitly the order of status.
  • Their edges have accounts, an array of accountId having the same status changing from one to another at the same order.

  • The path of the graph is intersection between accountId, status and level (the order of status).

Example: Status Relation Graph

  1. There are two statuses from root "A" and "H".

  2. At level 0,

    AccountId 1, 2, 4 have the same status "A", except AccountId 3.

    The path of AccountId 4 ends at this level.

  3. At level 1,

    AccountId 1, 2 have the same status "A",

    AccountId 3 have the same status "A" but from different path.

    The path of AccountId 1 ends at this level.

  4. At level 2,

    AccountId 2 have the status "B".

    AccountId 3 have the same status "B" but from different path.

  5. At level 3,

    AccountId 2 have the status "A" and AccountId 3 have the status "E"

So I designed 3 edge collections:

Design 1: Each edge is a relation between status vertex that uses level to keep the order of transaction.

{"_from": "status/root", "_to": "status/A", "level": 0, "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-05-01"}]}
{"_from": "status/A"   , "_to": "status/A", "level": 1, "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-06-01"}]}
{"_from": "status/A"   , "_to": "status/B", "level": 2, "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-07-01"}]}
{"_from": "status/B"   , "_to": "status/A", "level": 3, "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-08-01"}]}
{"_from": "status/A"   , "_to": "status/G", "level": 4, "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-09-01"}]}
{"_from": "status/G"   , "_to": "status/G", "level": 5, "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-10-01"}]}
{"_from": "status/G"   , "_to": "status/H", "level": 6, "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-11-01"}]}
{"_from": "status/H"   , "_to": "status/D", "level": 7, "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-12-01"}]}
{"_from": "status/D"   , "_to": "status/A", "level": 8, "accounts": [{"id": 2, "region": "South", "recordedAt": "2013-01-01"}]}
{"_from": "status/A"   , "_to": "status/A", "level": 9, "accounts": [{"id": 2, "region": "South", "recordedAt": "2013-02-01"}]}

Design 2: Each edge is a relation between level vertex that has an edge attribute status.

{"_from": "level/root", "_to": "level/0", "status": "A", "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-05-01"}]}
{"_from": "level/0"   , "_to": "level/1", "status": "A", "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-06-01"}]}
{"_from": "level/1"   , "_to": "level/2", "status": "B", "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-07-01"}]}
{"_from": "level/2"   , "_to": "level/3", "status": "A", "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-08-01"}]}
{"_from": "level/3"   , "_to": "level/4", "status": "G", "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-09-01"}]}
{"_from": "level/4"   , "_to": "level/5", "status": "G", "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-10-01"}]}
{"_from": "level/5"   , "_to": "level/6", "status": "H", "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-11-01"}]}
{"_from": "level/6"   , "_to": "level/7", "status": "D", "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-12-01"}]}
{"_from": "level/7"   , "_to": "level/8", "status": "A", "accounts": [{"id": 2, "region": "South", "recordedAt": "2013-01-01"}]}
{"_from": "level/8"   , "_to": "level/9", "status": "A", "accounts": [{"id": 2, "region": "South", "recordedAt": "2013-02-01"}]}

Design 3: By adding virtual vertex (region, year).

This design try to eliminate accounts to avoid filtering array of objects. It is easier to filter by edge attribute.

{"_from": "level/root" , "_to": "level/South"}
{"_from": "level/South", "_to": "level/2012"}
{"_from": "level/2016" , "_to": "level/0", "status": "A", "accountId": 1}
{"_from": "level/2012" , "_to": "level/0", "status": "A", "accountId": 2}
{"_from": "level/2017" , "_to": "level/0", "status": "A", "accountId": 4}
{"_from": "level/0"    , "_to": "level/1", "status": "A", "accountId": 1}
{"_from": "level/0"    , "_to": "level/1", "status": "A", "accountId": 2}
{"_from": "level/1"    , "_to": "level/2", "status": "B", "accountId": 2}
{"_from": "level/2"    , "_to": "level/3", "status": "A", "accountId": 2}
{"_from": "level/3"    , "_to": "level/4", "status": "G", "accountId": 2}
{"_from": "level/4"    , "_to": "level/5", "status": "G", "accountId": 2}
{"_from": "level/5"    , "_to": "level/6", "status": "H", "accountId": 2}
{"_from": "level/6"    , "_to": "level/7", "status": "D", "accountId": 2}
{"_from": "level/7"    , "_to": "level/8", "status": "A", "accountId": 2}
{"_from": "level/8"    , "_to": "level/9", "status": "A", "accountId": 2}
  • This graph uses to query:
    1. Recall the status pattern for specific accountId by filtering path that all edges have specified id.

Pseudo AQL:

FOR v, e, p IN 1..20 OUTBOUND status/root statusRelations
  FILTER p.edges[*].accountId ALL == id
RETURN p.edges.status
  1. Which statuses lead to interesting status (by region or year)?

Pseudo AQL:

FILTER e._to == "status/D" (AND e.region == "North" AND e.year == 2013)
RETURN SLICE(p.edges[*], -3) // RETURN 3 previous statuses that followed by "D"
  1. Which is the most status that occurred by year?

Pseudo AQL:

FILTER e.year == 2013
COLLECT status = e.status WITH COUNT INTO length
RETURN status

Which one of my design is the best fit for ArangoDB and can answer my queries?

来源:https://stackoverflow.com/questions/46214182/can-graph-db-solve-my-graph-problems

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!