Python - difference between two strings

前端 未结 5 1574
余生分开走
余生分开走 2020-11-27 13:41

I\'d like to store a lot of words in a list. Many of these words are very similar. For example I have word afrykanerskojęzyczny and many of words like afr

相关标签:
5条回答
  • 2020-11-27 14:16

    The answer to my comment above on the Original Question makes me think this is all he wants:

    loopnum = 0
    word = 'afrykanerskojęzyczny'
    wordlist = ['afrykanerskojęzycznym','afrykanerskojęzyczni','nieafrykanerskojęzyczni']
    for i in wordlist:
        wordlist[loopnum] = word
        loopnum += 1
    

    This will do the following:

    For every value in wordlist, set that value of the wordlist to the origional code.

    All you have to do is put this piece of code where you need to change wordlist, making sure you store the words you need to change in wordlist, and that the original word is correct.

    Hope this helps!

    0 讨论(0)
  • 2020-11-27 14:28

    I like the ndiff answer, but if you want to spit it all into a list of only the changes, you could do something like:

    import difflib
    
    case_a = 'afrykbnerskojęzyczny'
    case_b = 'afrykanerskojęzycznym'
    
    output_list = [li for li in difflib.ndiff(case_a, case_b) if li[0] != ' ']
    
    0 讨论(0)
  • 2020-11-27 14:28

    You can look into the regex module (the fuzzy section). I don't know if you can get the actual differences, but at least you can specify allowed number of different types of changes like insert, delete, and substitutions:

    import regex
    sequence = 'afrykanerskojezyczny'
    queries = [ 'afrykanerskojezycznym', 'afrykanerskojezyczni', 
                'nieafrykanerskojezyczni' ]
    for q in queries:
        m = regex.search(r'(%s){e<=2}'%q, sequence)
        print 'match' if m else 'nomatch'
    
    0 讨论(0)
  • 2020-11-27 14:31

    You can use ndiff in the difflib module to do this. It has all the information necessary to convert one string into another string.

    A simple example:

    import difflib
    
    cases=[('afrykanerskojęzyczny', 'afrykanerskojęzycznym'),
           ('afrykanerskojęzyczni', 'nieafrykanerskojęzyczni'),
           ('afrykanerskojęzycznym', 'afrykanerskojęzyczny'),
           ('nieafrykanerskojęzyczni', 'afrykanerskojęzyczni'),
           ('nieafrynerskojęzyczni', 'afrykanerskojzyczni'),
           ('abcdefg','xac')] 
    
    for a,b in cases:     
        print('{} => {}'.format(a,b))  
        for i,s in enumerate(difflib.ndiff(a, b)):
            if s[0]==' ': continue
            elif s[0]=='-':
                print(u'Delete "{}" from position {}'.format(s[-1],i))
            elif s[0]=='+':
                print(u'Add "{}" to position {}'.format(s[-1],i))    
        print()      
    

    prints:

    afrykanerskojęzyczny => afrykanerskojęzycznym
    Add "m" to position 20
    
    afrykanerskojęzyczni => nieafrykanerskojęzyczni
    Add "n" to position 0
    Add "i" to position 1
    Add "e" to position 2
    
    afrykanerskojęzycznym => afrykanerskojęzyczny
    Delete "m" from position 20
    
    nieafrykanerskojęzyczni => afrykanerskojęzyczni
    Delete "n" from position 0
    Delete "i" from position 1
    Delete "e" from position 2
    
    nieafrynerskojęzyczni => afrykanerskojzyczni
    Delete "n" from position 0
    Delete "i" from position 1
    Delete "e" from position 2
    Add "k" to position 7
    Add "a" to position 8
    Delete "ę" from position 16
    
    abcdefg => xac
    Add "x" to position 0
    Delete "b" from position 2
    Delete "d" from position 4
    Delete "e" from position 5
    Delete "f" from position 6
    Delete "g" from position 7
    
    0 讨论(0)
  • 2020-11-27 14:33

    What you are asking for is a specialized form of compression. xdelta3 was designed for this particular kind of compression, and there's a python binding for it, but you could probably get away with using zlib directly. You'd want to use zlib.compressobj and zlib.decompressobj with the zdict parameter set to your "base word", e.g. afrykanerskojęzyczny.

    Caveats are zdict is only supported in python 3.3 and higher, and it's easiest to code if you have the same "base word" for all your diffs, which may or may not be what you want.

    0 讨论(0)
提交回复
热议问题