“Diffing” objects from a relational database

后端 未结 12 1221
遇见更好的自我
遇见更好的自我 2021-02-04 15:05

Our win32 application assembles objects from the data in a number of tables in a MySQL relational database. Of such an object, multiple revisions are stored in the database.

相关标签:
12条回答
  • 2021-02-04 15:08

    I'm riffing off of what Harry Lime suggested: Output your properties to text format, then hash the results. That way you can compare the hash values and easily flag the data that has been altered. This way you get the best of both worlds as you can visually see differences but programmatically identify differences. With the has you'll have a good source for an index should you want to store and retrieve the deltas.

    0 讨论(0)
  • 2021-02-04 15:09

    Doing a comparison at the database level would be good if what you cared about was changes to the database. That makes the most sense if you're trying to design a layer of generic functionality on top of the database itself.

    Doing a comparison at the object level would be good if you care about changes to the data. For example, if the data was the input to a program and you were interested in looking at changes in the input to verify that changes to the output were correct.

    Your use case doesn't appear to be either of these. You appear to care about the output and want differences from that perspective. If that's the case, I would do differences on the output report (or a pure-text version of it) instead of on the underlying data. You can do that with any off-the-shelf diff tool. To make things easier for your end-users you could parse the diff results and render them as HTML. There are lots of options here: side-by-side with color coding to indicate changes, one document with markup for changes (e.g. red strikethrough for deletions and green for additions), maybe just highlight areas that have changed and use balloons to show the previous/current values on demand.

    I've thought about doing database comparisons but never tried to implement it. As you noted, any such attempts are intimately intertwined with the schema.

    I have done object-level comparisons. The general algorithm was this:

    1. Do a set comparison on the lists of object IDs. This creates three result groupings: added objects, deleted objects, and objects that live in both sets.
    2. Report the deletions.
    3. Report the additions.
    4. For the things in both sets, do an attribute-by-attribute comparison.
    5. If any differences are found, report the object ID, the attributes that differ, and the respective values. If appropriate, highlight the portion of the attribute value that has changed.

    In my case, the comparison algorithms were hand-written to match the object attributes. This gave me control over which attributes were compared and how. A generic comparator might be possible for some cases but would depend on the situation and at least partially on the implementation language.

    0 讨论(0)
  • 2021-02-04 15:10

    This isn't really an answer to the question you asked rather an attempt to re-imagine the problem. Would you consider altering your database and object model to store the aggregate root and a series of deltas? That is, model and store RevisionSets that are collections of Revisions; a Revision is an entity property paired with a value. In a sense this is internalizing the revision structure into your architecture that the other posters are suggesting that you bolt-on to what you already have via "logs".

    It's trivial to display the aggregate from the deltas, and even easier to display the deltas as a change history. The fact that you are using a rich client with state and local memory makes this even more compelling. You could very easily display "all the changes since date xxxx" without revisiting the database.

    Credit for the basic idea goes to Greg Young and his work with financial data streams, but it is imminently applicable to your problem.

    0 讨论(0)
  • 2021-02-04 15:11

    I would think about some sort of common text representation of the objects and let the texts compare with an existing diffing tool like WinMerge.

    I see no need to invent diffing by myself since there are already plenty of nice tools I can use.

    0 讨论(0)
  • 2021-02-04 15:13

    I've looked into MysQL Diffing a number of times. Unfortunately, there aren't any really good solutions available.

    One tool I've tried was mysqldiff (www.mysqldiff.org). mysqldiff is a tool written in PHP which is capable of diffing mysql schemas. Unfortunately, it doesn't do a great job a lot of the time.

    MySQL Workbench, MySQLs own SQL IDE provides the option to generate an alter script and I would imagine it does this by performing some kind of diff operation internally.

    Aqua Data Studio is another tool that is capable of comparing schemas and outputing a diff of the two. While the ADS diff is quite nice, it does not provide a tool to create an alter script.

    If I were writing my own I guess I would write code capable of comparing structure of two tables. Such code could be tuned to be highly sensitive (Ig if column order differs from from version to the next, it's a difference) or more moderately sensitive (Eg Column order is not a major issue, datatypes and lengths are important, as are indices and constraints).

    Storage, I'm not to sure. I would look into how a version control system such as Mercurial stores its diff information for revisions and use that to elaborate a method appropriate for the DB.

    Finally, for visual output I recommend you take a look at the Aqua Data Stduio compare feature (You can use the Trial version to test this...). Its diff output is pretty good.

    0 讨论(0)
  • 2021-02-04 15:13

    Example with Oracle.

    • Export ordered objects to text with dbms_metadata
    • Export ordered tables data into CSV or query format
    • Make big text file
    • Diff
    0 讨论(0)
提交回复
热议问题