Elegant way to compare sequences

前端 未结 8 1923
死守一世寂寞
死守一世寂寞 2021-02-05 20:10

Does python provide an elegant way to check for \"equality\" of sequences of different types? The following work, but they seem rather ugly and verbose for python code:

相关标签:
8条回答
  • 2021-02-05 20:33

    You can determine the equality of any two iterables (strings, tuples, lists, even custom sequences) without creating and storing duplicate lists by using the following:

    all(x == y for x, y in itertools.izip_longest(a, b))
    

    Note that if the two iterables are not the same length, the shorter one will be padded with Nones. In other words, it will consider [1, 2, None] to be equal to (1, 2).

    Edit: As Kamil points out in the comments, izip_longest is only available in Python 2.6. However, the docs for the function also provide an alternate implementation which should work all the way back to 2.3.

    Edit 2: After testing on a few different machines, it looks like this is only faster than list(a) == list(b) in certain circumstances, which I can't isolate. Most of the time, it takes about seven times as long. However, I also found tuple(a) == tuple(b) to be consistently at least twice as fast as the list version.

    0 讨论(0)
  • 2021-02-05 20:36

    Apart from the extra memory used by creating temporary lists/tuples, those answers will lose out to short circuiting generator solutions for large sequences when the inequality occurs early in the sequences

    from itertools import starmap, izip
    from operator import eq
    all(starmap(eq, izip(x, y)))
    

    or more concisely

    from itertools import imap
    from operator import eq
    all(imap(eq, x, y))
    

    some benchmarks from ipython

    x=range(1000)
    y=range(1000); y[10]=0
    
    timeit tuple(x) == tuple(y)
    100000 loops, best of 3: 16.9 us per loop
    
    timeit all(imap(eq, x, y))
    100000 loops, best of 3: 2.86 us per loop
    
    0 讨论(0)
  • 2021-02-05 20:36

    It looks like tuple(a) == tuple(b) is the best overall choice. Or perhaps tuple comparison with a preceding len check if they'll often be different lengths. This does create extra lists, but hopefully not an issue except for really huge lists. Here is my comparison of the various alternatives suggested:

    import timeit
    
    tests = (
    '''
    a=b=[5]*100
    ''',
    
    '''
    a=[5]*100
    b=[5]*3
    ''',
    
    '''
    a=b=(5,)*100
    ''',
    
    '''
    a=b="This on is a string" * 5
    ''',
    
    '''
    import array
    a=b=array.array('B', "This on is a string" * 5)
    '''
    )
    
    common = '''import itertools
    def comp1(a, b):
        if len(a) != len(b):
            return False
        for i, v in enumerate(a):
            if v != b[i]:
                return False
        return True'''
    
    for i, setup in enumerate(tests):
        t1 = timeit.Timer("comp1(a, b)", setup + common)
        t2 = timeit.Timer("all(x == y for x, y in itertools.izip_longest(a, b))", setup + common)
        t3 = timeit.Timer("all([x == y for x, y in itertools.izip_longest(a, b)])", setup + common)
        t4 = timeit.Timer("list(a) == list(b)", setup + common)
        t5 = timeit.Timer("tuple(a) == tuple(b)", setup + common)
    
        print '==test %d==' % i
        print '   comp1: %g' % t1.timeit()
        print ' all gen: %g' % t2.timeit()
        print 'all list: %g' % t3.timeit()
        print '    list: %g' % t4.timeit()
        print '   tuple: %g\n' % t5.timeit()
    

    Here are the results:

    ==test 0==
       comp1: 27.8089
     all gen: 31.1406
    all list: 29.4887
        list: 3.58438
       tuple: 3.25859
    
    ==test 1==
       comp1: 0.833313
     all gen: 3.8026
    all list: 33.5288
        list: 1.90453
       tuple: 1.74985
    
    ==test 2==
       comp1: 30.606
     all gen: 31.4755
    all list: 29.5637
        list: 3.56635
       tuple: 1.60032
    
    ==test 3==
       comp1: 33.3725
     all gen: 35.3699
    all list: 34.2619
        list: 10.2443
       tuple: 10.1124
    
    ==test 4==
       comp1: 31.7014
     all gen: 32.0051
    all list: 31.0664
        list: 8.35031
       tuple: 8.16301
    

    Edit: Added a few more tests. This was run on an AMD 939 3800+ with 2GB of ram. Linux 32bit, Python 2.6.2

    0 讨论(0)
  • 2021-02-05 20:36

    Since you put the word "equality" in quotes, I assume that you would like to know how the lists are the same and how the are different. Check out difflib which has a SequenceMatcher class:

        sm = difflib.SequenceMatcher(None, a, b)
        for opcode in sm.get_opcodes():
            print "    (%s %d:%d %d:%d)" % opcode
    

    You will get back a sequences of descriptions of the differences. It's fairly simple to turn that into diff-like output.

    0 讨论(0)
  • 2021-02-05 20:43

    Convert both sequences to lists, and use builtin list comparison. It should be sufficient, unless your sequences are really large.

    list(a) == list(b)
    

    Edit:

    Testing done by schickb shows that using tuples is slightly faster:

    tuple(a) == tuple(b)
    
    0 讨论(0)
  • 2021-02-05 20:55

    I think it's a good idea to special case when both sequences are type list. Comparing two lists is faster (and more memory efficient) than converting both to tuples.

    In the case that either a or b are not lists, they are both converted to tuple. There is no overhead if one or both are already tuples, as tuple() just returns a reference to the original object in that case.

    def comp(a, b):
        if len(a) != len(b):
            return False
        if type(a) == type(b) == list:
            return a == b
        a = tuple(a)
        b = tuple(b)
        return a == b
    
    0 讨论(0)
提交回复
热议问题