How to keep edit history of large string field in relational database

后端 未结 2 2004
囚心锁ツ
囚心锁ツ 2021-02-20 06:10

N.B. I think answers are likely to be design-focused and therefore basically implementation agnostic, but I\'m using Java+Hibernate with Postgres if there\'s some particular

2条回答
  •  抹茶落季
    2021-02-20 06:40

    A solution I'm working on right now, which is working well so far, implements the design I proposed in the question

    I'm thinking something along the lines of storing only deltas between the current and previous version when an edit is made, and then reconstructing the version history from these deltas programatically when it's requested, perhaps on the client so the data sent over the wire is minimised.

    I would most likely store the latest version as full-text, as I'd want to optimise for requesting this most frequently, then store a chain of deltas going backwards from the current version to reconstruct historical versions, as and when they are requested.

    I'll share the specifics of my implementation here

    For creating deltas and using the to reconstruct the full-text I am using the fantastic google-diff-match-patch library. You can read the implementation agnostic API documentation to better understand the code examples below, though it's pretty readable anyway.

    google-diff-match-patch has Java and JS implementations so I can use it to compute the deltas with Java on the server. I chose to convert each delta to a String both so it can be easily stored in the database, and easily consumed by the JS library on the client. More on this below.

    public String getBackwardsDelta(String editedBlogPost, String existingBlogPost) {
        diff_match_patch dmp = new diff_match_patch();
        LinkedList patches = 
            dmp.patch_make(editedBlogPost, existingBlogPost);
        return dmp.patch_toText(patches);
    }
    

    N.B. something it took me a while to figure out was how to pull down the the official build of google-diff-match-patch using maven. It's not in the maven central repo, but on their own repo on googlecode.com. Just to note, some people have forked it and put their forked versions in maven central, but if you really want the official version you can get by adding the repo and dependency in your pom.xml as follows

    
      google-diff-patch-match
      google-diff-patch-match
      https://google-diff-match-patch.googlecode.com/svn/trunk/maven/
    
    
    
      diff_match_patch
      diff_match_patch
      current
    
    

    For the front end, I pass the latest blog post full-text, along with a chain of deltas going backwards in time representing each edit, and then reconstruct the full text of each version in the browser in JS.

    To get the library, I'm using npm + browserify. The library is available on npm as diff-match-patch. Version 1.0.0 is the only version.

    getTextFromDelta: function(originalText, delta) {
      var DMP = require('diff-match-patch'); // get the constructor function
      var dmp = new DMP();
      var patches = dmp.patch_fromText(delta);
      return dmp.patch_apply(patches, originalText)[0];
    }
    

    And that's it, it works fantastically.

    In terms of storing the edits of the blog posts, I just use a table BLOG_POST_EDITS where I store the blog post id, a timestamp of when the edit was made (which I later use to order the edits correctly to make the chain when reconstructing the full-text versions on the client), and the backwards delta between the current live blog post in the BLOG_POST table, and the incoming edited version of the blog post.

    I chose to store a 'chain' of deltas because it suits my use case well, and is simpler on the server code end of things. It does mean in order to reconstruct version M of N, I have to send the client a chain of N-(M-1) deltas back from the live blog post full-text to version M. But in my use case I happen to want to send the whole chain each time, anyway, so this is fine.

    For slightly better over-the-wire efficiency for requesting specific versions, all deltas could be recalculated from the new edited blog post version back to each (restored) version each time an edit is made, but this would mean more work and complexity on the server.

提交回复
热议问题