Slow bitwise operations

泄露秘密 提交于 2019-11-30 20:13:00

As far as I can tell, the built-in Python 3 int is the only one of the options you tested that computes the & in chunks of more than one byte at a time. (I haven't fully figured out what everything in the NumPy source for this operation does, but it doesn't look like it has an optimization to compute this in chunks bigger than the dtype.)

  • bitarray goes byte-by-byte,
  • the bool and 1-bit-per-int NumPy attempts go bit by bit,
  • the packed NumPy attempt goes byte-by-byte, and
  • the bitstring source goes byte-by-byte, as well as doing some things that screw up its attempts to gain speed through Cython, making it by far the slowest.

In contrast, the int operation goes by either 15-bit or 30-bit digits, depending on the value of the compile-time parameter PYLONG_BITS_IN_DIGIT. I don't know which setting is the default.

You can speed up the NumPy attempt by using a packed representation and a larger dtype. It looks like on my machine, a 32-bit dtype works fastest, beating Python ints; I don't know what it's like on your setup. Testing with 10240-bit values in each format, I get

>>> timeit.timeit('a & b', 'import numpy; a = b = numpy.array([0]*160, dtype=num
py.uint64)')
1.3918750826524047
>>> timeit.timeit('a & b', 'import numpy; a = b = numpy.array([0]*160*8, dtype=n
umpy.uint8)')
1.9460716604953632
>>> timeit.timeit('a & b', 'import numpy; a = b = numpy.array([0]*160*2, dtype=n
umpy.uint32)')
1.1728465435917315
>>> timeit.timeit('a & b', 'a = b = 2**10240-1')
1.5999407862400403

What you are trying to test - are these vector operations at all? You are simply trying to compare speeds of 1 operation and there plain python is going to win 'cos it doesn't have to setup numpy arrays or bitarrays.

How about trying out following?

x = np.array([random.randrange(2**31)]*1000) 
y = np.array([random.randrange(2**31)]*1000) 

%timeit x & y # in ipython

%timeit [ a & b for (a,b) in zip(x,y)] # even though x and y are numpy arrays, we are iterating over them - and not doing any vector operations

Interestingly, if

xxx = [random.randrange(2**31)] * 1000
yyy = [random.randrange(2**31)] * 1000 

and then

%timeit [a & b for (a,b) in zip(xxx,yyy)]

pure python lists, iterating over them is faster than iterating over numpy arrays.. a bit counter intuitive. Not sure why.

Similarly you can try for bitstrings and bitarrays

Is this what you are looking at?

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!