First question should be: what would you do with that data? If you don't have clear business requirement, don't do it.
I did something similar and after 3 years of running there is about 20% of "valid data" and rest is "previous versions". And it is 10 million + 40 million records. In last three years we had 2 (two) requests to investigate history of changes and both times requests were silly - we record time stamp of record change and we were asked to check if persons worked overtime (after 5pm).
Now, we are stuck with oversized database that contains 80% of data that nobody needs.
EDIT:
Since you asked for possible solutions, I'll describe what we did. It's a bit different than solution you are considering.
- All tables have surrogate primary key.
- All primary keys are generated from single sequence. This works fine because Oracle can generate and cache numbers, so no performance problems here. We use ORM and we wanted each object in memory (and corresponding record in database) to have unique identifier
- We use ORM and mapping information between database table and class is in form of attributes.
We record all changes in single archive table with following columns:
- id (surrogate primary key)
- time stamp
- original table
- id of original record
- user id
- transaction type (insert, update, delete)
- record data as varchar2 field
- this is actual data in form of fieldname/value pairs.
Thing works this way:
- ORM has insert/update and delete comands.
- we created one base class for all our business objects that overrides insert/update and delete commands
- insert/update/delete commands create string in form of fieldname/value pairs using reflection. Code looks for mapping information and reads field name, associated value and field type. Then we create something similar to JSON (we added some modifications). When string representing current state of object is created, it is inserted into archive table.
- when new or updated object is saved to database table, it is saved to his target table and at the same time we insert one record with current value into archive table.
- when object is deleted, we delete it from his target table and at the same time we insert one record in archive table that have transaction type = "DELETE"
Pro:
- we don't have archive tables for each table in database. We also don't need to worry about updating archive table when schema changes.
- complete archive is separated from "current data", so archive does not impose any performance hit on database. We put it onto separate tablespace on separate disk and it works fine.
- we created 2 forms for viewing archive:
- general viewer that can list archive table according to filter on archive table. Filter data user can enter on form (time span, user, ...). We show each record in form fieldname/value and each change is color coded. Users can see all versions for each record and they can see who and when made changes.
- invoice viewer - this one was complex, but we created form that shows invoice very similar to original invoice entry form, but with some additional buttons that can show different generations. It took considerable effort to create this form. Form was used few times and then forgotten because it was not needed in current workflow.
- code for creating archive records is located in single C# class. There is no need for triggers on every table in database.
- performance is very good. At peak times, system is used by around 700-800 users. This is ASP.Net application. Both ASP.Net and Oracle are running on one dual XEON with 8Gb RAM.
Cons:
- single table archive format is harder to read than solution where there is one archive table for each of the data tables.
- search on non-id field in archive table is hard - we can use only
LIKE
operator on string.
So, again, check the requirements on archive. It is not trivial task, but gains and use can be minimal.