Elegant way to compare sequences

前端未结

关注

 8  1923

Does python provide an elegant way to check for \"equality\" of sequences of different types? The following work, but they seem rather ugly and verbose for python code:

相关标签:

8条回答

再見小時候

2021-02-05 20:33
You can determine the equality of any two iterables (strings, tuples, lists, even custom sequences) without creating and storing duplicate lists by using the following:
```
all(x == y for x, y in itertools.izip_longest(a, b))
```
Note that if the two iterables are not the same length, the shorter one will be padded with Nones. In other words, it will consider [1, 2, None] to be equal to (1, 2).

Edit: As Kamil points out in the comments, izip_longest is only available in Python 2.6. However, the docs for the function also provide an alternate implementation which should work all the way back to 2.3.

Edit 2: After testing on a few different machines, it looks like this is only faster than list(a) == list(b) in certain circumstances, which I can't isolate. Most of the time, it takes about seven times as long. However, I also found tuple(a) == tuple(b) to be consistently at least twice as fast as the list version.
0 讨论(0)
发布评论:

提交评论
- 加载中...

礼貌的吻别

2021-02-05 20:36

Apart from the extra memory used by creating temporary lists/tuples, those answers will lose out to short circuiting generator solutions for large sequences when the inequality occurs early in the sequences

from itertools import starmap, izip
from operator import eq
all(starmap(eq, izip(x, y)))

or more concisely

from itertools import imap
from operator import eq
all(imap(eq, x, y))

some benchmarks from ipython

x=range(1000)
y=range(1000); y[10]=0

timeit tuple(x) == tuple(y)
100000 loops, best of 3: 16.9 us per loop

timeit all(imap(eq, x, y))
100000 loops, best of 3: 2.86 us per loop

0 讨论(0)

夕颜

2021-02-05 20:36

It looks like tuple(a) == tuple(b) is the best overall choice. Or perhaps tuple comparison with a preceding len check if they'll often be different lengths. This does create extra lists, but hopefully not an issue except for really huge lists. Here is my comparison of the various alternatives suggested:

import timeit

tests = (
'''
a=b=[5]*100
''',

'''
a=[5]*100
b=[5]*3
''',

'''
a=b=(5,)*100
''',

'''
a=b="This on is a string" * 5
''',

'''
import array
a=b=array.array('B', "This on is a string" * 5)
'''
)

common = '''import itertools
def comp1(a, b):
    if len(a) != len(b):
        return False
    for i, v in enumerate(a):
        if v != b[i]:
            return False
    return True'''

for i, setup in enumerate(tests):
    t1 = timeit.Timer("comp1(a, b)", setup + common)
    t2 = timeit.Timer("all(x == y for x, y in itertools.izip_longest(a, b))", setup + common)
    t3 = timeit.Timer("all([x == y for x, y in itertools.izip_longest(a, b)])", setup + common)
    t4 = timeit.Timer("list(a) == list(b)", setup + common)
    t5 = timeit.Timer("tuple(a) == tuple(b)", setup + common)

    print '==test %d==' % i
    print '   comp1: %g' % t1.timeit()
    print ' all gen: %g' % t2.timeit()
    print 'all list: %g' % t3.timeit()
    print '    list: %g' % t4.timeit()
    print '   tuple: %g\n' % t5.timeit()

Here are the results:

==test 0==
   comp1: 27.8089
 all gen: 31.1406
all list: 29.4887
    list: 3.58438
   tuple: 3.25859

==test 1==
   comp1: 0.833313
 all gen: 3.8026
all list: 33.5288
    list: 1.90453
   tuple: 1.74985

==test 2==
   comp1: 30.606
 all gen: 31.4755
all list: 29.5637
    list: 3.56635
   tuple: 1.60032

==test 3==
   comp1: 33.3725
 all gen: 35.3699
all list: 34.2619
    list: 10.2443
   tuple: 10.1124

==test 4==
   comp1: 31.7014
 all gen: 32.0051
all list: 31.0664
    list: 8.35031
   tuple: 8.16301

Edit: Added a few more tests. This was run on an AMD 939 3800+ with 2GB of ram. Linux 32bit, Python 2.6.2

0 讨论(0)

甜味超标

2021-02-05 20:36
Since you put the word "equality" in quotes, I assume that you would like to know how the lists are the same and how the are different. Check out difflib which has a SequenceMatcher class:
```
    sm = difflib.SequenceMatcher(None, a, b)
    for opcode in sm.get_opcodes():
        print "    (%s %d:%d %d:%d)" % opcode
```
You will get back a sequences of descriptions of the differences. It's fairly simple to turn that into diff-like output.
0 讨论(0)
发布评论:

提交评论
- 加载中...
陌清茗

2021-02-05 20:43
Convert both sequences to lists, and use builtin list comparison. It should be sufficient, unless your sequences are really large.
```
list(a) == list(b)
```
Edit:

Testing done by schickb shows that using tuples is slightly faster:
```
tuple(a) == tuple(b)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
一整个雨季

2021-02-05 20:55
I think it's a good idea to special case when both sequences are type list. Comparing two lists is faster (and more memory efficient) than converting both to tuples.

In the case that either a or b are not lists, they are both converted to tuple. There is no overhead if one or both are already tuples, as tuple() just returns a reference to the original object in that case.
```
def comp(a, b):
    if len(a) != len(b):
        return False
    if type(a) == type(b) == list:
        return a == b
    a = tuple(a)
    b = tuple(b)
    return a == b
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页