What is an Algorithm to Diff the Two Strings in the Same Way that SO Does on the Version Page?

后端 未结 1 1895
名媛妹妹
名媛妹妹 2021-02-06 07:26

I\'m trying to diff two strings by phrase, similar to the way that StackOverflow diffs the two strings on the version edits page. What would be an algorithm to do this? Are ther

1条回答
  •  礼貌的吻别
    2021-02-06 08:09

    The algorithm you are looking for is Longest Common Subsequence it does most of the work for you.

    The outline is something along these lines.

    1. Split by word (input, output)
    2. Calculate LCS on input / output array.
    3. Walk through the array and join up areas intelligently.

    So for example say you have:

    "hello world this is a test"

    compared with:

    "mister hello world"

    The result from the LCS is

    • "mister" +
    • "hello" =
    • "world" =
    • "this" -
    • "is" -
    • "a" -
    • "test" -

    Now you sprinkle the special sauce when building up. You join the string together while staying mindful of the previous action. The naive algorithm is just join sections that are the same action.

    • "mister" +
    • "hello world" =
    • "this is a test" -

    Finally you transform it to html:

    mister hello world this is a test  
    

    Of course the devil is in the detail:

    • You need to consider how you handle tags
    • Do you compare markdown or html
    • Are there any edge cases where the UI stops making sense.
    • Do you need special handling for punctuations.

    0 讨论(0)
提交回复
热议问题