I have made few projects (CMS and EC system) that required to have some data versioned.
Usually I come with that kind of schema
+--------------+
+
I prefer to have historical data in another table. I would make foobar_history or something similar and make a FK to foobar_id. This will stop you from having to use a subquery all together. This has the added advantage of not polluting your primary data table with the tons of historical data you probably don't want to see 99% of the time you're accessing it.
You will likely want to make a trigger for updating this data though, as it would require you to copy the current data in to _history and then do the update.
Common technique is to add a column version_status
for current/expired. Also a note, if you keep new and old records in the same table, you should have a business (natural) key for your entity, something like name + pin
, because the primary key will change (increment) with each row.
TABLE foobar(foobar_id PK, business_key, version, version_status, .....)
SELECT *
FROM foobar
WHERE business_key = 'myFoobar3' AND version_status = 'current'
When deciding to keep the record history in the same table -- or move it to a separate one -- consider other tables which have the foobar_id
as a foreign key. When issuing a new version, should existing foreign keys point to the new PK or to the old PK? If you want to keep history of relationships, you would probably want to keep everything in the same table. If only the new version is important, you may consider to move expired rows to another table -- though it is not necessary.
The cleanest solution in my opinion would be to have a History table for each table that requires versioned. In other words, have a foobar table, and then a foobar_History table, with a trigger on foobar that will write existing data to the History table with a timestamp and user that changed the data. Older data is easily queryably, sorted by timestamp descending, and you know that the data in the main table is always the latest version.
I used to work on a system with historical data, and we had a boolean to indicate which one was the latest version of the data. Of course you need to maintain the consitency of the flag at the applicative level. Then you can create indexes that use the flag and if you provide it in the where clause it's fast.
Pro:
Cons:
Otherwise, you can rely on a separate history table, as suggested in several answers.
Pro:
Cons:
What is best will depend from your use case. I had to deal with a document management system where we wanted to be able to version document. But we also had feature like reverting to old version. It was easier to use a boolean to speed up just the operation that required the last one. If you have real historical data (which never change) probably a dedicated history table is better.
Does the concept of history fit in your domain model? If no, then you have a db schema that differs from your conceptual domain model. If at the domain level, the actual data and the old data need to be handled the same way, having two tables complicates the design. Just consider the case you need to return the complete history (old + new). The easiest solution would be to have one class for each table, but then you can't return a list as easily as if you have only one table. But if these are two distinct concepts, then it's fine to have history be first-class in your design.
I would also recommend this article by M. Fowler also interesting when it comes to dealing with temporal data: Patterns for things that change with time
It depends on how many of your tables require versioning, and if you've got a transactional ore reporting system.
If just a few transactional tables - the way that you're doing it is fine as long as the performance issues aren't too significant. You can make the querying easier by adding a column for current_row and a trigger that updates the prior row to make it non-current.
But if you've got a lot of tables or the extra rows are slowing down some of your queries then I'd do as others suggest and use history tables as well as history triggers. Note that you can generate that code to make it easier to develop & maintain.
If you're in the reporting world then there's a lot other options I won't address here. You can find the options given in detail in data warehousing data modeling books.
If you had used Oracle you could use analytic functions
select * from ( SELECT a.* , row_number() over (partition by foobar_id order by version desc) rn FROM foobar a WHERE foobar_id = 2 ) where rn = 1