问题
I have transactions data (pastebin.com/ZswbyVHM) like this:
{"accountId":3,"recordedAt":"2013-12-01T00:00:00.000Z","region":"South","status":"H"}
{"accountId":3,"recordedAt":"2014-01-01T00:00:00.000Z","region":"South","status":"A"}
{"accountId":3,"recordedAt":"2014-02-01T00:00:00.000Z","region":"South","status":"B"}
{"accountId":3,"recordedAt":"2014-03-01T00:00:00.000Z","region":"South","status":"E"}
{"accountId":3,"recordedAt":"2014-04-01T00:00:00.000Z","region":"South","status":"C"}
when group transactions by accountId
it can be viewed as:
{"accountId": 3, "region":"South", "transactions": [
{"recordedAt":"2013-12-01T00:00:00.000Z", "status": "H"},
{"recordedAt":"2014-01-01T00:00:00.000Z", "status": "A"},
{"recordedAt":"2014-02-01T00:00:00.000Z", "status": "B"}]
}
From above view I would like to create a graph of status relation from transactions collection
that can preserve the order of status, recordedAt
, accountId
with the following properties:
- Nodes of the graph represent the status and implicitly the order of status.
Their edges have
accounts
, an array of accountId having the same status changing from one to another at the same order.The path of the graph is intersection between
accountId
,status
andlevel
(the order of status).
Example: Status Relation Graph
There are two statuses from root "A" and "H".
At level 0,
AccountId 1, 2, 4 have the same status "A", except AccountId 3.
The path of AccountId 4 ends at this level.
At level 1,
AccountId 1, 2 have the same status "A",
AccountId 3 have the same status "A" but from different path.
The path of AccountId 1 ends at this level.
At level 2,
AccountId 2 have the status "B".
AccountId 3 have the same status "B" but from different path.
At level 3,
AccountId 2 have the status "A" and AccountId 3 have the status "E"
So I designed 3 edge collections:
Design 1: Each edge is a relation between status vertex that uses level
to keep the order of transaction.
{"_from": "status/root", "_to": "status/A", "level": 0, "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-05-01"}]}
{"_from": "status/A" , "_to": "status/A", "level": 1, "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-06-01"}]}
{"_from": "status/A" , "_to": "status/B", "level": 2, "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-07-01"}]}
{"_from": "status/B" , "_to": "status/A", "level": 3, "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-08-01"}]}
{"_from": "status/A" , "_to": "status/G", "level": 4, "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-09-01"}]}
{"_from": "status/G" , "_to": "status/G", "level": 5, "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-10-01"}]}
{"_from": "status/G" , "_to": "status/H", "level": 6, "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-11-01"}]}
{"_from": "status/H" , "_to": "status/D", "level": 7, "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-12-01"}]}
{"_from": "status/D" , "_to": "status/A", "level": 8, "accounts": [{"id": 2, "region": "South", "recordedAt": "2013-01-01"}]}
{"_from": "status/A" , "_to": "status/A", "level": 9, "accounts": [{"id": 2, "region": "South", "recordedAt": "2013-02-01"}]}
Design 2: Each edge is a relation between level vertex that has an edge attribute status
.
{"_from": "level/root", "_to": "level/0", "status": "A", "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-05-01"}]}
{"_from": "level/0" , "_to": "level/1", "status": "A", "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-06-01"}]}
{"_from": "level/1" , "_to": "level/2", "status": "B", "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-07-01"}]}
{"_from": "level/2" , "_to": "level/3", "status": "A", "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-08-01"}]}
{"_from": "level/3" , "_to": "level/4", "status": "G", "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-09-01"}]}
{"_from": "level/4" , "_to": "level/5", "status": "G", "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-10-01"}]}
{"_from": "level/5" , "_to": "level/6", "status": "H", "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-11-01"}]}
{"_from": "level/6" , "_to": "level/7", "status": "D", "accounts": [{"id": 2, "region": "South", "recordedAt": "2012-12-01"}]}
{"_from": "level/7" , "_to": "level/8", "status": "A", "accounts": [{"id": 2, "region": "South", "recordedAt": "2013-01-01"}]}
{"_from": "level/8" , "_to": "level/9", "status": "A", "accounts": [{"id": 2, "region": "South", "recordedAt": "2013-02-01"}]}
Design 3: By adding virtual vertex (region, year).
This design try to eliminate accounts
to avoid filtering array of objects. It is easier to filter by edge attribute.
{"_from": "level/root" , "_to": "level/South"}
{"_from": "level/South", "_to": "level/2012"}
{"_from": "level/2016" , "_to": "level/0", "status": "A", "accountId": 1}
{"_from": "level/2012" , "_to": "level/0", "status": "A", "accountId": 2}
{"_from": "level/2017" , "_to": "level/0", "status": "A", "accountId": 4}
{"_from": "level/0" , "_to": "level/1", "status": "A", "accountId": 1}
{"_from": "level/0" , "_to": "level/1", "status": "A", "accountId": 2}
{"_from": "level/1" , "_to": "level/2", "status": "B", "accountId": 2}
{"_from": "level/2" , "_to": "level/3", "status": "A", "accountId": 2}
{"_from": "level/3" , "_to": "level/4", "status": "G", "accountId": 2}
{"_from": "level/4" , "_to": "level/5", "status": "G", "accountId": 2}
{"_from": "level/5" , "_to": "level/6", "status": "H", "accountId": 2}
{"_from": "level/6" , "_to": "level/7", "status": "D", "accountId": 2}
{"_from": "level/7" , "_to": "level/8", "status": "A", "accountId": 2}
{"_from": "level/8" , "_to": "level/9", "status": "A", "accountId": 2}
- This graph uses to query:
- Recall the status pattern for specific
accountId
by filtering path that all edges have specified id.
- Recall the status pattern for specific
Pseudo AQL:
FOR v, e, p IN 1..20 OUTBOUND status/root statusRelations
FILTER p.edges[*].accountId ALL == id
RETURN p.edges.status
- Which statuses lead to interesting status (by region or year)?
Pseudo AQL:
FILTER e._to == "status/D" (AND e.region == "North" AND e.year == 2013)
RETURN SLICE(p.edges[*], -3) // RETURN 3 previous statuses that followed by "D"
- Which is the most status that occurred by year?
Pseudo AQL:
FILTER e.year == 2013
COLLECT status = e.status WITH COUNT INTO length
RETURN status
Which one of my design is the best fit for ArangoDB and can answer my queries?
来源:https://stackoverflow.com/questions/46214182/can-graph-db-solve-my-graph-problems