Neo4j how to model a time-versioned graph

后端 未结 2 879
南方客
南方客 2020-12-29 11:48

Part of my graph has the following schema:

Main part of the graph is the domain, that has some persons linked to it. Person has a unique constraint on the e

相关标签:
2条回答
  • 2020-12-29 11:54

    Resolving a GUID

    The first thing you need is to reliably resolve user ids so that they are consistent and globally unique. Now you said

    user id is specific only for production database and is not globally used for other sources

    From this, I can infer 2 things

    1. Users exist from multiple sources.
    2. For each source, users have a unique id.

    So that means that source + user.id will be a GUID. (You can hash the main connection url or name each source externally) I will assume you aren't merging users across multiple sources, because duplicating and merging data over any network creates an update order paradox that should be avoided as much as possible (If two sources list different new contact numbers, who is correct?).

    Querying current data

    The querying logic should be agnostic to any version tracking you may be doing. If your versioning causes problems with the logic, add a meta label like :Versioned with indexed property isLatest and tack on a Where n.isLatest to filter out the old "garbage" data from your results.

    So no that you don't need to worry about version, Queries 1 and 2 can be handled normally.

    1. For finding people who are admins, I would recommend just adding a the label :Admin to the person and removing it when it no longer applies (as needed). This comes with being indexed by the label "Admin". You can also just use an "isAdmin" property (which is probably how you are already storing it in the db, so more consistent.) So the final query would just be MATCH (p:Person:Admin) or MATCH (p:Person{isAdmin:true}).

    2. With the old version information filtered out, the query for who has a device would simply be MATCH (p:Person:Versioned{isCurrent:true})-[:HasDevice{isConnected:true}]->(d:Device:Versioned{isCurrent:true})

    This bit really just boils down to "What is your schema?"

    Data History

    This bit is where it really gets tricky. Depending on how you version the data, You can easily end up blowing up your data size and killing your DB performance. You REALLY need to ask yourself "Why am I versioning this?", "How often will this update/be read?", "Who will use it and What will they do with it?". If at any point you answer "I don't know/care", you either shouldn't do this, or backup your data in a database that natively handles this for you like SQLAlchemy-Continuum. (Related answer)

    If you must do this in Neo4j, than I would recommend using a delta chain. So if for example, you changed {a:1, b:2} to {a:1, b:null, c:3}, You would have (:Thing{a:1, b:null, c:3})-[_DELTA{timestamp:<value>}]->(:_ThingDelta{b: 2, c:null}). That way, to get a past value you just chain-apply the properties of the delta chain into a map. So MATCH (a:Thing) OPTIONAL MATCH (a)-[d:_DELTA*]->(d) WHERE d.timestamp >= <value> WITH reduce(v = {_id:ID(a)}, n IN nodes(p)| v += PROPERTIES(n)) AS OldVersion. This can get very tedious though and eat up your DB Space, so I would highly recommend using some existing db versioning thing at all costs if you can.

    0 讨论(0)
  • 2020-12-29 12:17

    This answer is based on Ian Robinson's post about time-based versioned graphs.

    I don't know if this answer covers ALL the requirements of the question, but I believe that can provide some insights.

    Also, I'm considering you are only interested in structural versioning (that is: you are not interested in queries about the changes of the domain user's name over the time). Finally, I'm using a partial representation of your graph model, but I believe that the concepts shown here can be applied in the whole graph.

    The initial graph state:

    Considering this Cypher to create an initial graph state:

    CREATE (admin:Admin)
    
    CREATE (person1:Person {person_id : 1})
    CREATE (person2:Person {person_id : 2})
    CREATE (person3:Person {person_id : 3})
    
    CREATE (domain1:Domain {domain_id : 1})
    
    CREATE (device1:Device {device_id : 1})
    
    CREATE (person1)-[:ADMIN {from : 0, to : 1000}]->(admin)
    
    CREATE (person1)-[:CONNECTED_DEVICE {from : 0, to : 1000}]->(device1)
    
    CREATE (domain1)-[:MEMBER]->(person1)
    CREATE (domain1)-[:MEMBER]->(person2)
    CREATE (domain1)-[:MEMBER]->(person3)
    

    Result:

    The above graph has 3 person nodes. These nodes are members of a domain node. The person node with person_id = 1 is connected to a device with device_id = 1. Also, person_id = 1 is the current administrator. The properties from and to inside the :ADMIN and :CONNECTED_DEVICE relationships are used to manage the history of the graph structure. from is representing a start point in time and to an end point in time. For simplification purpose I'm using 0 as the initial time of the graph and 1000 as the end-of-time constant. In a real world graph the current time in milliseconds can be used to represent time points. Also, Long.MAX_VALUE can be used instead as the EOT constant. A relationship with to = 1000 means there is no current upper bound to the period associated with it.

    Queries:

    With this graph, to get the current administrator I can do:

    MATCH (person:Person)-[:ADMIN {to:1000}]->(:Admin)
    RETURN person
    

    The result will be:

    ╒═══════════════╕
    │"person"       │
    ╞═══════════════╡
    │{"person_id":1}│
    └───────────────┘
    

    Given a device, to get the current connected user:

    MATCH (:Device {device_id : 1})<-[:CONNECTED_DEVICE {to : 1000}]-(person:Person)
    RETURN person
    

    Resulting:

    ╒═══════════════╕
    │"person"       │
    ╞═══════════════╡
    │{"person_id":1}│
    └───────────────┘
    

    To query the current administrator and the current person connected to a device the End-Of-Time constant is used.

    Query the device connect / disconnect events:

    MATCH (device:Device {device_id : 1})<-[r:CONNECTED_DEVICE]-(person:Person)
    RETURN person AS person, device AS device, r.from AS from, r.to AS to
    ORDER BY r.from
    

    Resulting:

    ╒═══════════════╤═══════════════╤══════╤════╕
    │"person"       │"device"       │"from"│"to"│
    ╞═══════════════╪═══════════════╪══════╪════╡
    │{"person_id":1}│{"device_id":1}│0     │1000│
    └───────────────┴───────────────┴──────┴────┘
    

    The above result shows that person_id = 1 is connected to device_id = 1 of the beginning until today.

    Changing the graph structure

    Consider that the current time point is 30. Now user_id = 1 is disconnecting from device_id = 1. user_id = 2 will connect to it. To represent this structural change, I will run the below query:

    // Get the current connected person
    MATCH (person1:Person)-[old:CONNECTED_DEVICE {to : 1000}]->(device:Device {device_id : 1})
    // get person_id = 2
    MATCH (person2:Person {person_id : 2}) 
     // set 30 as the end time of the connection between person_id = 1 and device_id = 1
    SET old.to = 30
    // set person_id = 2 as the current connected user to device_id = 1
    // (from time point 31 to now)
    CREATE (person2)-[:CONNECTED_DEVICE {from : 31, to: 1000}]->(device) 
    

    The resultant graph will be:

    After this structural change, the connection history of device_id = 1 will be:

    MATCH (device:Device {device_id : 1})<-[r:CONNECTED_DEVICE]-(person:Person)
    RETURN person AS person, device AS device, r.from AS from, r.to AS to
    ORDER BY r.from
    
    ╒═══════════════╤═══════════════╤══════╤════╕
    │"person"       │"device"       │"from"│"to"│
    ╞═══════════════╪═══════════════╪══════╪════╡
    │{"person_id":1}│{"device_id":1}│0     │30  │
    ├───────────────┼───────────────┼──────┼────┤
    │{"person_id":2}│{"device_id":1}│31    │1000│
    └───────────────┴───────────────┴──────┴────┘
    

    The above result shows that user_id = 1 was connected to device_id = 1 from 0 to 30 time. person_id = 2 is currently connected to device_id = 1.

    Now the current person connected to device_id = 1 is person_id = 2:

    MATCH (:Device {device_id : 1})<-[:CONNECTED_DEVICE {to : 1000}]-(person:Person)
    RETURN person
    
    ╒═══════════════╕
    │"person"       │
    ╞═══════════════╡
    │{"person_id":2}│
    └───────────────┘
    

    The same approach can be applied to manage the admin history.

    Obviously this approach has some downsides:

    • Need to manage a set of extra relationships
    • More expensive queries
    • More complex queries

    But if you really need a versioning schema I believe this approach is a good option or (at least) a good start point.

    0 讨论(0)
提交回复
热议问题