As in many databases, i am designing a database that should keep record of previous versions of the rows changed in each table.
The standard solution to this problem is
The main limitation that comes to my mind is that a substantial portion of your table will be history data, which means indexing concerns and potentially introducing additional complexity into your CRUD queries.
Is there some particular reason you don't want to use what seems to be the usual solution to this situation?
I would use the IS_LAST=1
partition, and the IS_LAST=0
partition system. Because it is partitioned it will be fast (partition pruning) and you will never have to query a union of the normal table and the history table.
I would use IS_LAST='Y'/'N' and not 1/0. 1 and 0 are meaningless.
There is a special trick that can help guarrantee that there is max one row with IS_LAST='Y'
per entity: You can create a unique function based index with a function that returns null when IS_LAST='N'
and return the id when IS_LAST='Y'
. It is explained here: http://www.akadia.com/services/ora_function_based_index_1.html
It all depends on what you have:
First question should be: what would you do with that data? If you don't have clear business requirement, don't do it.
I did something similar and after 3 years of running there is about 20% of "valid data" and rest is "previous versions". And it is 10 million + 40 million records. In last three years we had 2 (two) requests to investigate history of changes and both times requests were silly - we record time stamp of record change and we were asked to check if persons worked overtime (after 5pm).
Now, we are stuck with oversized database that contains 80% of data that nobody needs.
EDIT:
Since you asked for possible solutions, I'll describe what we did. It's a bit different than solution you are considering.
We record all changes in single archive table with following columns:
Thing works this way:
Pro:
Cons:
LIKE
operator on string.So, again, check the requirements on archive. It is not trivial task, but gains and use can be minimal.
As with others, I use an ORM (Propel) with a Base Object containing custom save & delete methods. These methods override the standard save & delete that come with the ORM. They check to see which columns have changed, and create 1 row in the change table for each changed column.
Schema for change
table:
change_pk, user_fk, user_name, session_id, ip_address, method, table_name, row_fk, field_name, field_value, most_recent, date_time
Example: 1, 4232, 'Gnarls Barkley', 'f2ff3f8822ff23', '234.432.324.694', 'UPDATE', 'User', 4232, 'first_name', 'Gnarles', 'Y', '2009-08-20 10:10:10';
How will you define primary keys? There will be many rows with the same primary key due to keaping the history rows in the same table.
Also you don't seem to have a way to know the order of your history rows when a single "real" row gets changed more the once.
(One project I worked on, we generated all the history tables and triggers using codesmith, this worked very well.)