I am working on a Python library that performs a lot of bitwise operations on long bit strings, and I want to find a bit string type that will maximize its speed. I have tri
What you are trying to test - are these vector operations at all? You are simply trying to compare speeds of 1 operation and there plain python is going to win 'cos it doesn't have to setup numpy arrays or bitarrays.
How about trying out following?
x = np.array([random.randrange(2**31)]*1000)
y = np.array([random.randrange(2**31)]*1000)
%timeit x & y # in ipython
%timeit [ a & b for (a,b) in zip(x,y)] # even though x and y are numpy arrays, we are iterating over them - and not doing any vector operations
Interestingly, if
xxx = [random.randrange(2**31)] * 1000
yyy = [random.randrange(2**31)] * 1000
and then
%timeit [a & b for (a,b) in zip(xxx,yyy)]
pure python lists, iterating over them is faster than iterating over numpy arrays.. a bit counter intuitive. Not sure why.
Similarly you can try for bitstrings and bitarrays
Is this what you are looking at?
As far as I can tell, the built-in Python 3 int
is the only one of the options you tested that computes the &
in chunks of more than one byte at a time. (I haven't fully figured out what everything in the NumPy source for this operation does, but it doesn't look like it has an optimization to compute this in chunks bigger than the dtype.)
bitarray
goes byte-by-byte,In contrast, the int
operation goes by either 15-bit or 30-bit digits, depending on the value of the compile-time parameter PYLONG_BITS_IN_DIGIT. I don't know which setting is the default.
You can speed up the NumPy attempt by using a packed representation and a larger dtype. It looks like on my machine, a 32-bit dtype works fastest, beating Python ints; I don't know what it's like on your setup. Testing with 10240-bit values in each format, I get
>>> timeit.timeit('a & b', 'import numpy; a = b = numpy.array([0]*160, dtype=num
py.uint64)')
1.3918750826524047
>>> timeit.timeit('a & b', 'import numpy; a = b = numpy.array([0]*160*8, dtype=n
umpy.uint8)')
1.9460716604953632
>>> timeit.timeit('a & b', 'import numpy; a = b = numpy.array([0]*160*2, dtype=n
umpy.uint32)')
1.1728465435917315
>>> timeit.timeit('a & b', 'a = b = 2**10240-1')
1.5999407862400403