How to efficiently version records in an SQL database

后端 未结 4 1915
日久生厌
日久生厌 2021-02-10 14:41

In at least one application, I have the need to keep old versions of records in a relational database. When something should be updated, instead a new copy would be added and th

4条回答
  •  北恋
    北恋 (楼主)
    2021-02-10 15:20

    I am working with SQL within Oracle products (Database 11g). We have huge project and versioning is an essential part of its. Both approach you mentioned are useful.
    If your database support triggers and you can use PL/SQL, you can separate old data with a small dose of effort. You can create before update and before delete triggers, then store all older data inside special historical table (with date of change and type - delete or update)

    Assumption: All tables you want to versioning must have primary key.

    Pseudocode:

    CREATE TRIGGER TRIGGER_ON_VERSIONED_TABLE
    BEFORE UPDATE
      ON VERSIONED_TABLE
    BEGIN 
      INSERT INTO VERSIONED_TABLE_HISTORY_PART VALUES (:OLD.COLUMN_A, USER, TIMESTAMP);
    END
    

    If you want all historical data about one primary key, you can select data from "production" table and historical table select only key you want and sort by timestamp (for active record will be timestamp SYSTIMESTAMP). And if you want to see in which state is which record, you can select first row for which your date is higher than date in history (or production table).

    For before update trigger look here.

    If you have existing solution
    (so, your original DB model does not contain versioning parts)
    and you want to create versioned table, or you can not use PL/SQL, use your approach 2. Our project at work (on Oracle Database) use this approach also. Let say we have table with documents (in real life you you have a version identifier which will be primary key for this table, but this is only to show principles)

    CREATE TABLE DOC(
        DOC_NAME    VARCHAR(10)
      , DOC_NOTE    VARCHAR(10)
      , VALID_FROM  TIMESTAMP
      , VALID_TO    TIMESTAMP
      , CONSTRAINT DOC_PK PRIMARY KEY(DOCUMENT_NAME, VALID_FROM)
    );
    
    INSERT INTO doc VALUES ('A', 'FIRST VER', systimestamp, date'2999-12-31');
    INSERT INTO doc VALUES ('B', 'FIRST VER', systimestamp, date'2999-12-31');
    

    You don't need where like this:

    WHERE VALID_FROM <= :time AND VALID_TO > :time
    ORDER BY VALID_FROM LIMIT 1
    

    Because in versioned table, only one version of record is valid to any time. So, you need only this:

    SELECT * FROM DOC 
    WHERE SYSTIMESTAMP BETWEEN VALID_FROM AND VALID_TO;
    

    This always return you only one row and you can use instead of SYSTIMESTAMP any other date. But you can not update records directly, first, you must update end timestamp (but this is not problem for you, as I see). So if I update XK-04, I do it this way:

    UPDATE doc SET VALID_TO = systimestamp 
    WHERE DOC_NAME='A' AND SYSTIMESTAMP BETWEEN VALID_FROM AND VALID_TO;
    INSERT INTO doc VALUES ('A', 'SECOND VER', systimestamp, date'2999-12-31');
    

    And you can use the same select as above again.

    SELECT * FROM DOC WHERE :CUSTOM_DATE BETWEEN VALID_FROM AND VALID_TO;
    

    Best practice is create for versioned table also ACTIVE and HISTORICAL views. In base table you have all data and anytime you want actual record you must write BETWEEN VALID_FROM AND VALID_TO. Better way is create views:

    CREATE VIEW DOC_ACTIVE 
    AS SELECT * FROM DOC WHERE SYSTIMESTAMP BETWEEN VALID_FROM AND VALID_TO;
    

    Or, if you need for old data:

    CREATE VIEW DOC_INACTIVE 
    AS SELECT * FROM DOC WHERE NOT SYSTIMESTAMP BETWEEN VALID_FROM AND VALID_TO;
    

    Now, instead of your original SQL:

    SELECT a, b, c FROM t1
    

    you dont need use complicated structure, only change table to "active" view (like DOC_ACTIVE):

    SELECT a, b, c FROM t1_VIEW
    

    Please, look on this answer also: Versioning in SQL Tables - how to handle it?

    I do not know whether you see the difference between valid record and valid "object" inside. In our work project we don't have any valid overlapping ranges .. for example, said table with documents, primary key composite from document name and version number ... We have document A (and this document is valid for years 2010 - 2050) it has 2 versions.

    Document A, version 1 (2010-2020), record valid 2014-9999: VALID   (NEW)
    Document A, version 2 (2021-2050), record valid 2014-9999: VALID   (NEW)
    

    In version 1 is document valid from 2010 to 2020 (object version, not record version) The document in some state P. This record is valid from 2014-9999.

    In version 2 document is valid from 2021 to 2050 (object version, not record version) This record is valid again between 2014-9999. And document is in state Q.

    Let's say it is 2016. You find clerical error in both versions of document. You create to actual year (2016) new record version for both document versions. After all changes you have this document versions:

    Document A, version 1 (2010-2020), record valid 2014-2015: INVALID   (UPDATED)
    Document A, version 2 (2021-2050), record valid 2014-2015: INVALID   (UPDATED)
    Document A, version 1 (2010-2020), record valid 2016-9999: VALID NOW (NEW)
    Document A, version 2 (2021-2050), record valid 2016-9999: VALID NOW (NEW)
    

    After this, in year 2018, someone create new version of document, valid only for years 2021-2030. (The document is valid in the future, but his version is valid today) Now you must update VALID version 2 and create version 3. Actual state:

    Document A, version 1 (2010-2020), record valid 2014-2015: INVALID   (NO CHANGE)
    Document A, version 2 (2021-2050), record valid 2014-2015: INVALID   (NO CHANGE)
    Document A, version 1 (2010-2020), record valid 2016-9999: VALID NOW (NO CHANGE)
    Document A, version 2 (2021-2050), record valid 2016-2018: INVALID   (UPDATED)
    Document A, version 2 (2031-2050), record valid 2018-9999: VALID NOW (NEW)
    Document A, version 3 (2021-2030), record valid 2018-9999: VALID NOW (NEW)
    

    All this operations for us in our work project do PL/SQL code.
    In year 2018 if you select document for valid records you get 3 rows: A1 A2 A3.
    If you select versions valid in year 2015 you get only A1(INVALID) A2(INVALID).

    So, you have full history, even if the document has 3 valid version, valid in the same point (record validity). And object validity is separated. This is a really good approach and must cover all your requirements.

    You can easy use BETWEEN in VIEWS also for columns with NULL (indicated minimum or maximum values) like this:

    CREATE VIEW DOC_ACTIVE AS
    SELECT * FROM DOC 
     WHERE SYSTIMESTAMP BETWEEN NVL(VALID_FROM, SYSTIMESTAMP) 
                            AND NVL(VALID_TO, SYSTIMESTAMP);
    

提交回复
热议问题