In at least one application, I have the need to keep old versions of records in a relational database. When something should be updated, instead a new copy would be added and th
If you need old data being part of your business logic then:
If old data is just a trace log of changes then:
I am working with SQL within Oracle products (Database 11g). We have huge project and versioning is an essential part of its. Both approach you mentioned are useful.
If your database support triggers and you can use PL/SQL, you can separate old data with a small dose of effort. You can create before update
and before delete
triggers, then store all older data inside special historical table (with date of change and type - delete or update)
Assumption: All tables you want to versioning must have primary key.
Pseudocode:
CREATE TRIGGER TRIGGER_ON_VERSIONED_TABLE
BEFORE UPDATE
ON VERSIONED_TABLE
BEGIN
INSERT INTO VERSIONED_TABLE_HISTORY_PART VALUES (:OLD.COLUMN_A, USER, TIMESTAMP);
END
If you want all historical data about one primary key, you can select data from "production" table and historical table select only key you want and sort by timestamp (for active record will be timestamp SYSTIMESTAMP). And if you want to see in which state is which record, you can select first row for which your date is higher than date in history (or production table).
For before update trigger look here.
If you have existing solution
(so, your original DB model does not contain versioning parts)
and you want to create versioned table, or you can not use PL/SQL, use your approach 2. Our project at work (on Oracle Database) use this approach also. Let say we have table with documents (in real life you you have a version identifier which will be primary key for this table, but this is only to show principles)
CREATE TABLE DOC(
DOC_NAME VARCHAR(10)
, DOC_NOTE VARCHAR(10)
, VALID_FROM TIMESTAMP
, VALID_TO TIMESTAMP
, CONSTRAINT DOC_PK PRIMARY KEY(DOCUMENT_NAME, VALID_FROM)
);
INSERT INTO doc VALUES ('A', 'FIRST VER', systimestamp, date'2999-12-31');
INSERT INTO doc VALUES ('B', 'FIRST VER', systimestamp, date'2999-12-31');
You don't need where like this:
WHERE VALID_FROM <= :time AND VALID_TO > :time
ORDER BY VALID_FROM LIMIT 1
Because in versioned table, only one version of record is valid to any time. So, you need only this:
SELECT * FROM DOC
WHERE SYSTIMESTAMP BETWEEN VALID_FROM AND VALID_TO;
This always return you only one row and you can use instead of SYSTIMESTAMP any other date. But you can not update records directly, first, you must update end timestamp (but this is not problem for you, as I see). So if I update XK-04, I do it this way:
UPDATE doc SET VALID_TO = systimestamp
WHERE DOC_NAME='A' AND SYSTIMESTAMP BETWEEN VALID_FROM AND VALID_TO;
INSERT INTO doc VALUES ('A', 'SECOND VER', systimestamp, date'2999-12-31');
And you can use the same select as above again.
SELECT * FROM DOC WHERE :CUSTOM_DATE BETWEEN VALID_FROM AND VALID_TO;
Best practice is create for versioned table also ACTIVE and HISTORICAL views.
In base table you have all data and anytime you want actual record you must write BETWEEN VALID_FROM AND VALID_TO
. Better way is create views:
CREATE VIEW DOC_ACTIVE
AS SELECT * FROM DOC WHERE SYSTIMESTAMP BETWEEN VALID_FROM AND VALID_TO;
Or, if you need for old data:
CREATE VIEW DOC_INACTIVE
AS SELECT * FROM DOC WHERE NOT SYSTIMESTAMP BETWEEN VALID_FROM AND VALID_TO;
Now, instead of your original SQL:
SELECT a, b, c FROM t1
you dont need use complicated structure, only change table to "active" view (like DOC_ACTIVE):
SELECT a, b, c FROM t1_VIEW
Please, look on this answer also: Versioning in SQL Tables - how to handle it?
I do not know whether you see the difference between valid record and valid "object" inside. In our work project we don't have any valid overlapping ranges .. for example, said table with documents, primary key composite from document name and version number ... We have document A (and this document is valid for years 2010 - 2050) it has 2 versions.
Document A, version 1 (2010-2020), record valid 2014-9999: VALID (NEW)
Document A, version 2 (2021-2050), record valid 2014-9999: VALID (NEW)
In version 1 is document valid from 2010 to 2020 (object version, not record version) The document in some state P. This record is valid from 2014-9999.
In version 2 document is valid from 2021 to 2050 (object version, not record version) This record is valid again between 2014-9999. And document is in state Q.
Let's say it is 2016. You find clerical error in both versions of document. You create to actual year (2016) new record version for both document versions. After all changes you have this document versions:
Document A, version 1 (2010-2020), record valid 2014-2015: INVALID (UPDATED)
Document A, version 2 (2021-2050), record valid 2014-2015: INVALID (UPDATED)
Document A, version 1 (2010-2020), record valid 2016-9999: VALID NOW (NEW)
Document A, version 2 (2021-2050), record valid 2016-9999: VALID NOW (NEW)
After this, in year 2018, someone create new version of document, valid only for years 2021-2030. (The document is valid in the future, but his version is valid today) Now you must update VALID version 2 and create version 3. Actual state:
Document A, version 1 (2010-2020), record valid 2014-2015: INVALID (NO CHANGE)
Document A, version 2 (2021-2050), record valid 2014-2015: INVALID (NO CHANGE)
Document A, version 1 (2010-2020), record valid 2016-9999: VALID NOW (NO CHANGE)
Document A, version 2 (2021-2050), record valid 2016-2018: INVALID (UPDATED)
Document A, version 2 (2031-2050), record valid 2018-9999: VALID NOW (NEW)
Document A, version 3 (2021-2030), record valid 2018-9999: VALID NOW (NEW)
All this operations for us in our work project do PL/SQL code.
In year 2018 if you select document for valid records you get 3 rows: A1 A2 A3.
If you select versions valid in year 2015 you get only A1(INVALID) A2(INVALID).
So, you have full history, even if the document has 3 valid version, valid in the same point (record validity). And object validity is separated. This is a really good approach and must cover all your requirements.
You can easy use BETWEEN in VIEWS also for columns with NULL (indicated minimum or maximum values) like this:
CREATE VIEW DOC_ACTIVE AS
SELECT * FROM DOC
WHERE SYSTIMESTAMP BETWEEN NVL(VALID_FROM, SYSTIMESTAMP)
AND NVL(VALID_TO, SYSTIMESTAMP);
I have worked with tracking versions of records but never with overlapping ranges. However, I have experience selecting records under similar criteria. Here's a query that should do what you want.
select *
from t1
where VersionId = (select top 1 VersionId
from t1 as MostRecentlyValid
where MostRecentlyValid.ValidFrom <= @AsOfDate
and (MostRecentlyValid.ValidTo >= @AsOfDate
or MostRecentlyValid.ValidTo is null)
and t1.Id = MostRecentlyValid.Id
order by MostRecentlyValid.ValidFrom desc)
This assumes that ValidTo can also be null to indicate no end date. If ValidTo can't be null then you can remove the or condition. This also assumes a record is valid through the end of the day of the ValidTo date. If the record becomes old at the beginning of the day of the ValidTo date change >= to just >.
This worked for the handful and test data I tried it on but I'm fairly certain it'll work for all cases.
As for as efficiency, I'm not a SQL expert so I really don't know if this is the most efficient solution.
To join to another table you could do something like this
select *
from (select *
from t1
where VersionId = (select top 1 VersionId
from t1 as MostRecentlyValid
where MostRecentlyValid.ValidFrom <= '2014/2/11'
and (MostRecentlyValid.ValidTo >= '2014/2/1'
or MostRecentlyValid.ValidTo is null)
and t1.Id = MostRecentlyValid.Id
order by MostRecentlyValid.ValidFrom desc ) ) as SelectedRecords
inner join t2
on SelectedRecords.Id = t2.Id
I Know that this is an old post, But I wanted to reply not only to provide solution but also to exchange my ideas with you and also to discuss the most efficient solution for this important issue of versioning.
My idea is,
Create a table that contains 5 main versioning fields
When updating a record
Update the field to set (ValidTo) to be NOW datetime and set (IsCurrent) to false
Insert a new record by increment the (Serial) Field and keeping the very same Field (ID) of the updated record, (ValidFrom) will be NOW and (ValidTo) will be null and IsCurrent will have false.
When Deleting record
ValidTo will be set to NOW time IsCurrent set to false
by this way you will not have problems with joins as joining tables with field ID will show you all record history.
IF you have FKs to a parent table , You probably want to remove the value of the FK field.