“Diffing” objects from a relational database

后端 未结 12 1224
遇见更好的自我
遇见更好的自我 2021-02-04 15:05

Our win32 application assembles objects from the data in a number of tables in a MySQL relational database. Of such an object, multiple revisions are stored in the database.

相关标签:
12条回答
  • 2021-02-04 15:15

    My application dbscript compares hierarchical data (database schemas) in a stored procedure, which of course has to compare each field/property of every object with its counterpart. I guess you won't get around that step (unless you have a generic object description model)

    As for the UI part of your question, have a look at screenshots to view and select differences.

    0 讨论(0)
  • 2021-02-04 15:18

    In your situation in PostgreSQL I used a difference tables with the schema:

    history_columns (
        column_id smallint primary key,
        column_name text not null,
        table_name text not null,
        unique (table_name, column_name)
    );
    create temporary sequence column_id_seq;
    insert into history_columns
    select nextval('column_id_seq'), column_name, table_name
        from information_schema.columns
        where
            table_name in ('table1','table2','table3')
            and table_schema=current_schema() and table_catalog=current_database();
    
    create table history (
        column_id smallint not null references history_columns,
        id int not null,
        change_time timestamp with time zone not null
            constraint change_time_full_second -- only one change allowed per second
                check (date_trunc('second',change_time)=change_time),
        primary key (column_id,id,change_time),
        value text
    );
    

    And on the tables I used a trigger like this:

    create or replace function save_history() returns trigger as
    $$
        if (tg_op = 'DELETE') then
            insert into historia values (
                find_column_id('id',tg_relname), OLD.id,
                date_trunc('second',current_timestamp),
                OLD.id );
            [for each column_name] {
                if (char_length(OLD.column_name)>0) then
                    insert into history values (
                        find_column_id(column_name,tg_relname), OLD.id,
                        OLD.change_time, OLD.column_name
                    )
            }
        elsif (tg_op = 'UPDATE') then
            [for each column_name] {
                if (OLD.column_name is distinct from NEW.column_name) then
                    insert into history values (
                        find_column_id(column_name,tg_relname), OLD.id,
                        OLD.change_time, OLD.column_name
                    );
                end if;
            }
        end if;
    $$ language plpgsql volatile;
    
    create trigger save_history_table1
        before update or delete on table1
        for each row execute procedure save_history();
    
    0 讨论(0)
  • 2021-02-04 15:18

    Have you looked at Open Source DiffKit?

    www.diffkit.org

    I think it does what you want.

    0 讨论(0)
  • 2021-02-04 15:22

    Assume that a class has 5 known properties - date, time, subject, outline, location. When I look at my schedule, I'm most interested in the most recent (ie current/accurate) version of these properties. It would also be useful for me to know what, if anything, has changed. (As a side note, if the date, time or location changed, I'd also expect to get an email/sms advising me in case I don't check for an updated schedule :-))

    I would suggest that the 'diff' is performed at the time the schedule is amended. So, when version 2 of the class is created, record which values have changed, and store this in two 'changelog' fields on the version 2 object (there must already be one parent table that sits atop all your tables - use that one!). One changelog field is 'human readable text' eg 'Date changed from Mon 1 May to Tues 2 May, Time changed from 10:00am to 10:30am'. The second changelog field is a delimted list of changed fields eg 'date,time' To do this, before saving you would loop over the values submitted by the user, compare to current database values, and concatenate 2 strings, one human readable, one a list of field names. Then, update the data and set your concatenated strings as the 'changelog' values.

    When displaying the schedule load the current version by default. Loop through the fields in the changelog field list, and annotate the display to show that the value has changed (a * or a highlight, etc). Then, in a separate panel display the human readable change log.

    If a schedule is amended more than once, you would probably want to combine the changelogs between version 1 & 2, and 2 & 3. Say in version 3 only the course outline changed - if that was the only changelog you had when displaying the schedule, the change to date and time wouldn't be displayed.

    Note that this denormalised approach won't be great for analysis - eg working out which specific location always has classes changed out of it - but you could extend it using an E-A-V model to store the change log.

    0 讨论(0)
  • 2021-02-04 15:22

    Given you want to create a UI for this and need to indicate where the differences are, it seems to me you can either go custom or create a generic object comparer - the latter being dependent on the language you are using.

    For the custom method, you need to create a class that takes to two instances of the classes to be comparied. It then returns differences;

     public class Person
     {
         public string name;
     }
    
     public class PersonComparer
     {
         public PersonComparer(Person old, Person new)
         {
            ....         
         }
    
         public bool NameIsDifferent() { return old.Name != new.Name; }
         public string NameDifferentText() { return NameIsDifferent() ? "Name changed from " + old.Name + " to " + new.Name : ""; }
     }
    

    This way you can use the NameComparer object to create your GUI.

    The gereric approach would be much the same, just that you generalize the calls, and use object insepection (getObjectProperty call below) to find differences;

     public class ObjectComparer()
     {
        public ObjectComparer(object old, object new)
        {
            ...
        }
    
        public bool PropertyIsDifferent(string propertyName) { return getObjectProperty(old, propertyName) != getObjectProperty(new, propertyName) };
    
         public string PropertyDifferentText(string propertyName) { return PropertyIsDifferent(propertyName) ? propertyName + " " + changed from " + getObjectProperty(old, propertyName) + " to " + getObjectProperty(new, propertyName): ""; }
     }
    }
    

    I would go for the second, as it makes things really easy to change GUI on needs. The GUI I would try 'yellowing' the differences to make them easy to see - but that depends on how you want to show the differences.

    Getting the object to compare would be loading your object with the initial revision and latest revision.

    My 2 cents... Not as techy as the database compare stuff already here.

    0 讨论(0)
  • 2021-02-04 15:23

    Just an idea, but would it be worthwhile for you to convert the two object versions being compared to some text format and then comparing these text objects using an existing diff program - like diff for example? There are lots of nice diff programs out there that can offer nice visual representations, etc.

    So for example

    Text version of Object 1:

    first_name: Harry
    last_name: Lime
    address: Wien
    version: 0.1
    

    Text version of Object 2:

    first_name: Harry
    last_name: Lime
    address: Vienna
    version: 0.2
    

    The diff would be something like:

    3,4c3,4
    < address: Wien
    < version: 0.1
    ---
    > address: Vienna
    > version: 0.2
    
    0 讨论(0)
提交回复
热议问题