history rows management in database

后端 未结 10 1802
灰色年华
灰色年华 2021-02-03 11:48

As in many databases, i am designing a database that should keep record of previous versions of the rows changed in each table.

The standard solution to this problem is

相关标签:
10条回答
  • 2021-02-03 12:26

    The main limitation that comes to my mind is that a substantial portion of your table will be history data, which means indexing concerns and potentially introducing additional complexity into your CRUD queries.

    Is there some particular reason you don't want to use what seems to be the usual solution to this situation?

    0 讨论(0)
  • 2021-02-03 12:26

    I would use the IS_LAST=1 partition, and the IS_LAST=0 partition system. Because it is partitioned it will be fast (partition pruning) and you will never have to query a union of the normal table and the history table.

    I would use IS_LAST='Y'/'N' and not 1/0. 1 and 0 are meaningless.

    There is a special trick that can help guarrantee that there is max one row with IS_LAST='Y' per entity: You can create a unique function based index with a function that returns null when IS_LAST='N' and return the id when IS_LAST='Y'. It is explained here: http://www.akadia.com/services/ora_function_based_index_1.html

    0 讨论(0)
  • 2021-02-03 12:27

    It all depends on what you have:

    • Are you running Standard or Enterprise Edition? Partitioning is only included as an option on top of Enterprise Edition. More info on that here.
    • You might consider going with Workspace Manager to do it if you are looking for an easy solution where you don't have to maintain your own code. However, there are some limitations I have found (e.g. Oracle Text index maintenance appears to be difficult, if not impossible, although I only have looked at it on 10gR2).
    • Otherwise, I would go with either zvolkov's solution (live table with a trigger writing to history table) or Mark Brady's solution (change log). I have used both patterns and each has its pros and cons.
    • @zendar: Flashback query only works for as far back as you have undo. It isn't a long-term solution, only a solution to look back at most a few hours (depending on how much undo retention you specified).
    0 讨论(0)
  • 2021-02-03 12:29

    First question should be: what would you do with that data? If you don't have clear business requirement, don't do it.

    I did something similar and after 3 years of running there is about 20% of "valid data" and rest is "previous versions". And it is 10 million + 40 million records. In last three years we had 2 (two) requests to investigate history of changes and both times requests were silly - we record time stamp of record change and we were asked to check if persons worked overtime (after 5pm).

    Now, we are stuck with oversized database that contains 80% of data that nobody needs.

    EDIT:

    Since you asked for possible solutions, I'll describe what we did. It's a bit different than solution you are considering.

    1. All tables have surrogate primary key.
    2. All primary keys are generated from single sequence. This works fine because Oracle can generate and cache numbers, so no performance problems here. We use ORM and we wanted each object in memory (and corresponding record in database) to have unique identifier
    3. We use ORM and mapping information between database table and class is in form of attributes.

    We record all changes in single archive table with following columns:

    • id (surrogate primary key)
    • time stamp
    • original table
    • id of original record
    • user id
    • transaction type (insert, update, delete)
    • record data as varchar2 field
      • this is actual data in form of fieldname/value pairs.

    Thing works this way:

    • ORM has insert/update and delete comands.
    • we created one base class for all our business objects that overrides insert/update and delete commands
      • insert/update/delete commands create string in form of fieldname/value pairs using reflection. Code looks for mapping information and reads field name, associated value and field type. Then we create something similar to JSON (we added some modifications). When string representing current state of object is created, it is inserted into archive table.
    • when new or updated object is saved to database table, it is saved to his target table and at the same time we insert one record with current value into archive table.
    • when object is deleted, we delete it from his target table and at the same time we insert one record in archive table that have transaction type = "DELETE"

    Pro:

    • we don't have archive tables for each table in database. We also don't need to worry about updating archive table when schema changes.
    • complete archive is separated from "current data", so archive does not impose any performance hit on database. We put it onto separate tablespace on separate disk and it works fine.
    • we created 2 forms for viewing archive:
      • general viewer that can list archive table according to filter on archive table. Filter data user can enter on form (time span, user, ...). We show each record in form fieldname/value and each change is color coded. Users can see all versions for each record and they can see who and when made changes.
      • invoice viewer - this one was complex, but we created form that shows invoice very similar to original invoice entry form, but with some additional buttons that can show different generations. It took considerable effort to create this form. Form was used few times and then forgotten because it was not needed in current workflow.
    • code for creating archive records is located in single C# class. There is no need for triggers on every table in database.
    • performance is very good. At peak times, system is used by around 700-800 users. This is ASP.Net application. Both ASP.Net and Oracle are running on one dual XEON with 8Gb RAM.

    Cons:

    • single table archive format is harder to read than solution where there is one archive table for each of the data tables.
    • search on non-id field in archive table is hard - we can use only LIKE operator on string.

    So, again, check the requirements on archive. It is not trivial task, but gains and use can be minimal.

    0 讨论(0)
  • 2021-02-03 12:36

    As with others, I use an ORM (Propel) with a Base Object containing custom save & delete methods. These methods override the standard save & delete that come with the ORM. They check to see which columns have changed, and create 1 row in the change table for each changed column.

    Schema for change table: change_pk, user_fk, user_name, session_id, ip_address, method, table_name, row_fk, field_name, field_value, most_recent, date_time

    Example: 1, 4232, 'Gnarls Barkley', 'f2ff3f8822ff23', '234.432.324.694', 'UPDATE', 'User', 4232, 'first_name', 'Gnarles', 'Y', '2009-08-20 10:10:10';

    0 讨论(0)
  • 2021-02-03 12:37

    How will you define primary keys? There will be many rows with the same primary key due to keaping the history rows in the same table.

    Also you don't seem to have a way to know the order of your history rows when a single "real" row gets changed more the once.

    (One project I worked on, we generated all the history tables and triggers using codesmith, this worked very well.)

    0 讨论(0)
提交回复
热议问题