Does python provide an elegant way to check for \"equality\" of sequences of different types? The following work, but they seem rather ugly and verbose for python code:
You can determine the equality of any two iterables (strings, tuples, lists, even custom sequences) without creating and storing duplicate lists by using the following:
all(x == y for x, y in itertools.izip_longest(a, b))
Note that if the two iterables are not the same length, the shorter one will be padded with None
s. In other words, it will consider [1, 2, None]
to be equal to (1, 2)
.
Edit: As Kamil points out in the comments, izip_longest
is only available in Python 2.6. However, the docs for the function also provide an alternate implementation which should work all the way back to 2.3.
Edit 2: After testing on a few different machines, it looks like this is only faster than list(a) == list(b)
in certain circumstances, which I can't isolate. Most of the time, it takes about seven times as long. However, I also found tuple(a) == tuple(b)
to be consistently at least twice as fast as the list
version.
Apart from the extra memory used by creating temporary lists/tuples, those answers will lose out to short circuiting generator solutions for large sequences when the inequality occurs early in the sequences
from itertools import starmap, izip
from operator import eq
all(starmap(eq, izip(x, y)))
or more concisely
from itertools import imap
from operator import eq
all(imap(eq, x, y))
some benchmarks from ipython
x=range(1000)
y=range(1000); y[10]=0
timeit tuple(x) == tuple(y)
100000 loops, best of 3: 16.9 us per loop
timeit all(imap(eq, x, y))
100000 loops, best of 3: 2.86 us per loop
It looks like tuple(a) == tuple(b) is the best overall choice. Or perhaps tuple comparison with a preceding len check if they'll often be different lengths. This does create extra lists, but hopefully not an issue except for really huge lists. Here is my comparison of the various alternatives suggested:
import timeit
tests = (
'''
a=b=[5]*100
''',
'''
a=[5]*100
b=[5]*3
''',
'''
a=b=(5,)*100
''',
'''
a=b="This on is a string" * 5
''',
'''
import array
a=b=array.array('B', "This on is a string" * 5)
'''
)
common = '''import itertools
def comp1(a, b):
if len(a) != len(b):
return False
for i, v in enumerate(a):
if v != b[i]:
return False
return True'''
for i, setup in enumerate(tests):
t1 = timeit.Timer("comp1(a, b)", setup + common)
t2 = timeit.Timer("all(x == y for x, y in itertools.izip_longest(a, b))", setup + common)
t3 = timeit.Timer("all([x == y for x, y in itertools.izip_longest(a, b)])", setup + common)
t4 = timeit.Timer("list(a) == list(b)", setup + common)
t5 = timeit.Timer("tuple(a) == tuple(b)", setup + common)
print '==test %d==' % i
print ' comp1: %g' % t1.timeit()
print ' all gen: %g' % t2.timeit()
print 'all list: %g' % t3.timeit()
print ' list: %g' % t4.timeit()
print ' tuple: %g\n' % t5.timeit()
Here are the results:
==test 0==
comp1: 27.8089
all gen: 31.1406
all list: 29.4887
list: 3.58438
tuple: 3.25859
==test 1==
comp1: 0.833313
all gen: 3.8026
all list: 33.5288
list: 1.90453
tuple: 1.74985
==test 2==
comp1: 30.606
all gen: 31.4755
all list: 29.5637
list: 3.56635
tuple: 1.60032
==test 3==
comp1: 33.3725
all gen: 35.3699
all list: 34.2619
list: 10.2443
tuple: 10.1124
==test 4==
comp1: 31.7014
all gen: 32.0051
all list: 31.0664
list: 8.35031
tuple: 8.16301
Edit: Added a few more tests. This was run on an AMD 939 3800+ with 2GB of ram. Linux 32bit, Python 2.6.2
Since you put the word "equality" in quotes, I assume that you would like to know how the lists are the same and how the are different. Check out difflib which has a SequenceMatcher class:
sm = difflib.SequenceMatcher(None, a, b)
for opcode in sm.get_opcodes():
print " (%s %d:%d %d:%d)" % opcode
You will get back a sequences of descriptions of the differences. It's fairly simple to turn that into diff-like output.
Convert both sequences to lists, and use builtin list comparison. It should be sufficient, unless your sequences are really large.
list(a) == list(b)
Edit:
Testing done by schickb shows that using tuples is slightly faster:
tuple(a) == tuple(b)
I think it's a good idea to special case when both sequences are type list
. Comparing two lists is faster (and more memory efficient) than converting both to tuples.
In the case that either a
or b
are not lists, they are both converted to tuple
. There is no overhead if one or both are already tuples, as tuple()
just returns a reference to the original object in that case.
def comp(a, b):
if len(a) != len(b):
return False
if type(a) == type(b) == list:
return a == b
a = tuple(a)
b = tuple(b)
return a == b