How to efficiently version records in an SQL database

后端 未结 4 1925
日久生厌
日久生厌 2021-02-10 14:41

In at least one application, I have the need to keep old versions of records in a relational database. When something should be updated, instead a new copy would be added and th

相关标签:
4条回答
  • 2021-02-10 15:20

    If you need old data being part of your business logic then:

    • Save latest version in master table.(insert and update, delete will just change the status column)
    • Take snapshot when an update happens in detail table(before any update an snapshot will be created).

    revision history

    • Another alternative will be Event Sourcing pattern.

    If old data is just a trace log of changes then:

    • An Entity–attribute–value approach may come in handy. An implementation sample can be found here.
    0 讨论(0)
  • 2021-02-10 15:20

    I am working with SQL within Oracle products (Database 11g). We have huge project and versioning is an essential part of its. Both approach you mentioned are useful.
    If your database support triggers and you can use PL/SQL, you can separate old data with a small dose of effort. You can create before update and before delete triggers, then store all older data inside special historical table (with date of change and type - delete or update)

    Assumption: All tables you want to versioning must have primary key.

    Pseudocode:

    CREATE TRIGGER TRIGGER_ON_VERSIONED_TABLE
    BEFORE UPDATE
      ON VERSIONED_TABLE
    BEGIN 
      INSERT INTO VERSIONED_TABLE_HISTORY_PART VALUES (:OLD.COLUMN_A, USER, TIMESTAMP);
    END
    

    If you want all historical data about one primary key, you can select data from "production" table and historical table select only key you want and sort by timestamp (for active record will be timestamp SYSTIMESTAMP). And if you want to see in which state is which record, you can select first row for which your date is higher than date in history (or production table).

    For before update trigger look here.

    If you have existing solution
    (so, your original DB model does not contain versioning parts)
    and you want to create versioned table, or you can not use PL/SQL, use your approach 2. Our project at work (on Oracle Database) use this approach also. Let say we have table with documents (in real life you you have a version identifier which will be primary key for this table, but this is only to show principles)

    CREATE TABLE DOC(
        DOC_NAME    VARCHAR(10)
      , DOC_NOTE    VARCHAR(10)
      , VALID_FROM  TIMESTAMP
      , VALID_TO    TIMESTAMP
      , CONSTRAINT DOC_PK PRIMARY KEY(DOCUMENT_NAME, VALID_FROM)
    );
    
    INSERT INTO doc VALUES ('A', 'FIRST VER', systimestamp, date'2999-12-31');
    INSERT INTO doc VALUES ('B', 'FIRST VER', systimestamp, date'2999-12-31');
    

    You don't need where like this:

    WHERE VALID_FROM <= :time AND VALID_TO > :time
    ORDER BY VALID_FROM LIMIT 1
    

    Because in versioned table, only one version of record is valid to any time. So, you need only this:

    SELECT * FROM DOC 
    WHERE SYSTIMESTAMP BETWEEN VALID_FROM AND VALID_TO;
    

    This always return you only one row and you can use instead of SYSTIMESTAMP any other date. But you can not update records directly, first, you must update end timestamp (but this is not problem for you, as I see). So if I update XK-04, I do it this way:

    UPDATE doc SET VALID_TO = systimestamp 
    WHERE DOC_NAME='A' AND SYSTIMESTAMP BETWEEN VALID_FROM AND VALID_TO;
    INSERT INTO doc VALUES ('A', 'SECOND VER', systimestamp, date'2999-12-31');
    

    And you can use the same select as above again.

    SELECT * FROM DOC WHERE :CUSTOM_DATE BETWEEN VALID_FROM AND VALID_TO;
    

    Best practice is create for versioned table also ACTIVE and HISTORICAL views. In base table you have all data and anytime you want actual record you must write BETWEEN VALID_FROM AND VALID_TO. Better way is create views:

    CREATE VIEW DOC_ACTIVE 
    AS SELECT * FROM DOC WHERE SYSTIMESTAMP BETWEEN VALID_FROM AND VALID_TO;
    

    Or, if you need for old data:

    CREATE VIEW DOC_INACTIVE 
    AS SELECT * FROM DOC WHERE NOT SYSTIMESTAMP BETWEEN VALID_FROM AND VALID_TO;
    

    Now, instead of your original SQL:

    SELECT a, b, c FROM t1
    

    you dont need use complicated structure, only change table to "active" view (like DOC_ACTIVE):

    SELECT a, b, c FROM t1_VIEW
    

    Please, look on this answer also: Versioning in SQL Tables - how to handle it?

    I do not know whether you see the difference between valid record and valid "object" inside. In our work project we don't have any valid overlapping ranges .. for example, said table with documents, primary key composite from document name and version number ... We have document A (and this document is valid for years 2010 - 2050) it has 2 versions.

    Document A, version 1 (2010-2020), record valid 2014-9999: VALID   (NEW)
    Document A, version 2 (2021-2050), record valid 2014-9999: VALID   (NEW)
    

    In version 1 is document valid from 2010 to 2020 (object version, not record version) The document in some state P. This record is valid from 2014-9999.

    In version 2 document is valid from 2021 to 2050 (object version, not record version) This record is valid again between 2014-9999. And document is in state Q.

    Let's say it is 2016. You find clerical error in both versions of document. You create to actual year (2016) new record version for both document versions. After all changes you have this document versions:

    Document A, version 1 (2010-2020), record valid 2014-2015: INVALID   (UPDATED)
    Document A, version 2 (2021-2050), record valid 2014-2015: INVALID   (UPDATED)
    Document A, version 1 (2010-2020), record valid 2016-9999: VALID NOW (NEW)
    Document A, version 2 (2021-2050), record valid 2016-9999: VALID NOW (NEW)
    

    After this, in year 2018, someone create new version of document, valid only for years 2021-2030. (The document is valid in the future, but his version is valid today) Now you must update VALID version 2 and create version 3. Actual state:

    Document A, version 1 (2010-2020), record valid 2014-2015: INVALID   (NO CHANGE)
    Document A, version 2 (2021-2050), record valid 2014-2015: INVALID   (NO CHANGE)
    Document A, version 1 (2010-2020), record valid 2016-9999: VALID NOW (NO CHANGE)
    Document A, version 2 (2021-2050), record valid 2016-2018: INVALID   (UPDATED)
    Document A, version 2 (2031-2050), record valid 2018-9999: VALID NOW (NEW)
    Document A, version 3 (2021-2030), record valid 2018-9999: VALID NOW (NEW)
    

    All this operations for us in our work project do PL/SQL code.
    In year 2018 if you select document for valid records you get 3 rows: A1 A2 A3.
    If you select versions valid in year 2015 you get only A1(INVALID) A2(INVALID).

    So, you have full history, even if the document has 3 valid version, valid in the same point (record validity). And object validity is separated. This is a really good approach and must cover all your requirements.

    You can easy use BETWEEN in VIEWS also for columns with NULL (indicated minimum or maximum values) like this:

    CREATE VIEW DOC_ACTIVE AS
    SELECT * FROM DOC 
     WHERE SYSTIMESTAMP BETWEEN NVL(VALID_FROM, SYSTIMESTAMP) 
                            AND NVL(VALID_TO, SYSTIMESTAMP);
    
    0 讨论(0)
  • 2021-02-10 15:27

    I have worked with tracking versions of records but never with overlapping ranges. However, I have experience selecting records under similar criteria. Here's a query that should do what you want.

    select  *
    from    t1
    where   VersionId = (select top 1 VersionId
                         from   t1 as MostRecentlyValid
                         where  MostRecentlyValid.ValidFrom <= @AsOfDate
                                and (MostRecentlyValid.ValidTo >= @AsOfDate
                                     or MostRecentlyValid.ValidTo is null)
                                and t1.Id = MostRecentlyValid.Id
                         order by MostRecentlyValid.ValidFrom desc)
    

    This assumes that ValidTo can also be null to indicate no end date. If ValidTo can't be null then you can remove the or condition. This also assumes a record is valid through the end of the day of the ValidTo date. If the record becomes old at the beginning of the day of the ValidTo date change >= to just >.

    This worked for the handful and test data I tried it on but I'm fairly certain it'll work for all cases.

    As for as efficiency, I'm not a SQL expert so I really don't know if this is the most efficient solution.

    To join to another table you could do something like this

    select  *
    from    (select *
             from  t1
             where VersionId = (select  top 1 VersionId
                    from  t1 as MostRecentlyValid
                    where MostRecentlyValid.ValidFrom <= '2014/2/11'
                          and (MostRecentlyValid.ValidTo >= '2014/2/1'
                               or MostRecentlyValid.ValidTo is null)
                          and t1.Id = MostRecentlyValid.Id
                          order by MostRecentlyValid.ValidFrom desc ) ) as SelectedRecords
             inner join t2
                on SelectedRecords.Id = t2.Id
    
    0 讨论(0)
  • 2021-02-10 15:35

    I Know that this is an old post, But I wanted to reply not only to provide solution but also to exchange my ideas with you and also to discuss the most efficient solution for this important issue of versioning.

    My idea is,

    Create a table that contains 5 main versioning fields

    • Serial (Incremental number) is the real identifier and used for joins
    • ID (Self-Foreign key) is equal to the (Serial) Field value when the record is created
    • ValidFrom (Data from which the record became active)
    • ValidTo (Data to which the record became inactive) => Will be null for a current version
    • IsCurrent (Flag indicating that record is active)

    When updating a record

    • Update the field to set (ValidTo) to be NOW datetime and set (IsCurrent) to false

    • Insert a new record by increment the (Serial) Field and keeping the very same Field (ID) of the updated record, (ValidFrom) will be NOW and (ValidTo) will be null and IsCurrent will have false.

    When Deleting record

    ValidTo will be set to NOW time IsCurrent set to false

    by this way you will not have problems with joins as joining tables with field ID will show you all record history.

    IF you have FKs to a parent table , You probably want to remove the value of the FK field.

    0 讨论(0)
提交回复
热议问题