Database versioning

前端 未结 7 1721
抹茶落季
抹茶落季 2021-01-06 20:30

I have made few projects (CMS and EC system) that required to have some data versioned.

Usually I come with that kind of schema

+--------------+
+          


        
相关标签:
7条回答
  • 2021-01-06 20:38

    I prefer to have historical data in another table. I would make foobar_history or something similar and make a FK to foobar_id. This will stop you from having to use a subquery all together. This has the added advantage of not polluting your primary data table with the tons of historical data you probably don't want to see 99% of the time you're accessing it.

    You will likely want to make a trigger for updating this data though, as it would require you to copy the current data in to _history and then do the update.

    0 讨论(0)
  • 2021-01-06 20:41

    Common technique is to add a column version_status for current/expired. Also a note, if you keep new and old records in the same table, you should have a business (natural) key for your entity, something like name + pin, because the primary key will change (increment) with each row.

    TABLE foobar(foobar_id PK, business_key, version, version_status, .....)
    
    SELECT * 
    FROM foobar 
    WHERE business_key = 'myFoobar3' AND version_status = 'current'
    

    When deciding to keep the record history in the same table -- or move it to a separate one -- consider other tables which have the foobar_id as a foreign key. When issuing a new version, should existing foreign keys point to the new PK or to the old PK? If you want to keep history of relationships, you would probably want to keep everything in the same table. If only the new version is important, you may consider to move expired rows to another table -- though it is not necessary.

    0 讨论(0)
  • 2021-01-06 20:49

    The cleanest solution in my opinion would be to have a History table for each table that requires versioned. In other words, have a foobar table, and then a foobar_History table, with a trigger on foobar that will write existing data to the History table with a timestamp and user that changed the data. Older data is easily queryably, sorted by timestamp descending, and you know that the data in the main table is always the latest version.

    0 讨论(0)
  • 2021-01-06 20:50

    I used to work on a system with historical data, and we had a boolean to indicate which one was the latest version of the data. Of course you need to maintain the consitency of the flag at the applicative level. Then you can create indexes that use the flag and if you provide it in the where clause it's fast.

    Pro:

    • easy to understand
    • does not require major change to your (existing) database schema
    • no need to copy old data in another table, only flag is updated.

    Cons:

    • flag need to be maintained at applicative level

    Otherwise, you can rely on a separate history table, as suggested in several answers.

    Pro:

    • clean sepration of history from actual data
    • possible to have a db-level cascade delete between actual data and its history, in case the entity is removed

    Cons:

    • need 2 queries (or a union) if you want the complete history (that is, old data + current data)
    • the row that corresponds to the latest version of the data will be updated. I heard that update are slower than insert, depending on the "size" of the data that changed.

    What is best will depend from your use case. I had to deal with a document management system where we wanted to be able to version document. But we also had feature like reverting to old version. It was easier to use a boolean to speed up just the operation that required the last one. If you have real historical data (which never change) probably a dedicated history table is better.

    Does the concept of history fit in your domain model? If no, then you have a db schema that differs from your conceptual domain model. If at the domain level, the actual data and the old data need to be handled the same way, having two tables complicates the design. Just consider the case you need to return the complete history (old + new). The easiest solution would be to have one class for each table, but then you can't return a list as easily as if you have only one table. But if these are two distinct concepts, then it's fine to have history be first-class in your design.

    I would also recommend this article by M. Fowler also interesting when it comes to dealing with temporal data: Patterns for things that change with time

    0 讨论(0)
  • 2021-01-06 20:52

    It depends on how many of your tables require versioning, and if you've got a transactional ore reporting system.

    If just a few transactional tables - the way that you're doing it is fine as long as the performance issues aren't too significant. You can make the querying easier by adding a column for current_row and a trigger that updates the prior row to make it non-current.

    But if you've got a lot of tables or the extra rows are slowing down some of your queries then I'd do as others suggest and use history tables as well as history triggers. Note that you can generate that code to make it easier to develop & maintain.

    If you're in the reporting world then there's a lot other options I won't address here. You can find the options given in detail in data warehousing data modeling books.

    0 讨论(0)
  • 2021-01-06 20:55

    If you had used Oracle you could use analytic functions

    select * from ( SELECT a.* , row_number() over (partition by foobar_id order by version desc) rn FROM foobar a WHERE foobar_id = 2 ) where rn = 1

    0 讨论(0)
提交回复
热议问题