Python difflib: highlighting differences inline?

前端 未结 3 1140
南笙
南笙 2020-12-04 12:50

When comparing similar lines, I want to highlight the differences on the same line:

a) lorem ipsum dolor sit amet
b) lorem foo ipsum dolor amet

lorem 

        
相关标签:
3条回答
  • 2020-12-04 13:03

    Here's an inline differ inspired by @tzot's answer above (also Python 3 compatible):

    def inline_diff(a, b):
        import difflib
        matcher = difflib.SequenceMatcher(None, a, b)
        def process_tag(tag, i1, i2, j1, j2):
            if tag == 'replace':
                return '{' + matcher.a[i1:i2] + ' -> ' + matcher.b[j1:j2] + '}'
            if tag == 'delete':
                return '{- ' + matcher.a[i1:i2] + '}'
            if tag == 'equal':
                return matcher.a[i1:i2]
            if tag == 'insert':
                return '{+ ' + matcher.b[j1:j2] + '}'
            assert False, "Unknown tag %r"%tag
        return ''.join(process_tag(*t) for t in matcher.get_opcodes())
    

    It's not perfect, for example, it would be nice to expand 'replace' opcodes to recognize the full word replaced instead of just the few different letters, but it's a good place to start.

    Sample output:

    >>> a='Lorem ipsum dolor sit amet consectetur adipiscing'
    >>> b='Lorem bananas ipsum cabbage sit amet adipiscing'
    >>> print(inline_diff(a, b))
    Lorem{+  bananas} ipsum {dolor -> cabbage} sit amet{-  consectetur} adipiscing
    
    0 讨论(0)
  • 2020-12-04 13:04

    difflib.SequenceMatcher will operate on single lines. You can use the "opcodes" to determine how to change the first line to make it the second line.

    0 讨论(0)
  • 2020-12-04 13:16

    For your simple example:

    import difflib
    def show_diff(seqm):
        """Unify operations between two compared strings
    seqm is a difflib.SequenceMatcher instance whose a & b are strings"""
        output= []
        for opcode, a0, a1, b0, b1 in seqm.get_opcodes():
            if opcode == 'equal':
                output.append(seqm.a[a0:a1])
            elif opcode == 'insert':
                output.append("<ins>" + seqm.b[b0:b1] + "</ins>")
            elif opcode == 'delete':
                output.append("<del>" + seqm.a[a0:a1] + "</del>")
            elif opcode == 'replace':
                raise NotImplementedError, "what to do with 'replace' opcode?"
            else:
                raise RuntimeError, "unexpected opcode"
        return ''.join(output)
    
    >>> sm= difflib.SequenceMatcher(None, "lorem ipsum dolor sit amet", "lorem foo ipsum dolor amet")
    >>> show_diff(sm)
    'lorem<ins> foo</ins> ipsum dolor <del>sit </del>amet'
    

    This works with strings. You should decide what to do with "replace" opcodes.

    0 讨论(0)
提交回复
热议问题