Convert binary (0|1) numpy to integer or binary-string?

 ̄綄美尐妖づ 提交于 2019-12-22 03:48:36

问题


Is there a shortcut to Convert binary (0|1) numpy array to integer or binary-string ? F.e.

b = np.array([0,0,0,0,0,1,0,1])   
  => b is 5

np.packbits(b)

works but only for 8 bit values ..if the numpy is 9 or more elements it generates 2 or more 8bit values. Another option would be to return a string of 0|1 ...

What I currently do is :

    ba = bitarray()
    ba.pack(b.astype(np.bool).tostring())
    #convert from bitarray 0|1 to integer
    result = int( ba.to01(), 2 )

which is ugly !!!


回答1:


One way would be using dot-product with 2-powered range array -

b.dot(2**np.arange(b.size)[::-1])

Sample run -

In [95]: b = np.array([1,0,1,0,0,0,0,0,1,0,1])

In [96]: b.dot(2**np.arange(b.size)[::-1])
Out[96]: 1285

Alternatively, we could use bitwise left-shift operator to create the range array and thus get the desired output, like so -

b.dot(1 << np.arange(b.size)[::-1])

If timings are of interest -

In [148]: b = np.random.randint(0,2,(50))

In [149]: %timeit b.dot(2**np.arange(b.size)[::-1])
100000 loops, best of 3: 13.1 µs per loop

In [150]: %timeit b.dot(1 << np.arange(b.size)[::-1])
100000 loops, best of 3: 7.92 µs per loop

Reverse process

To retrieve back the binary array, use np.binary_repr alongwith np.fromstring -

In [96]: b = np.array([1,0,1,0,0,0,0,0,1,0,1])

In [97]: num = b.dot(2**np.arange(b.size)[::-1]) # integer

In [98]: np.fromstring(np.binary_repr(num), dtype='S1').astype(int)
Out[98]: array([1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1])



回答2:


Using numpy for conversion limits you to 64-bit signed binary results. If you really want to use numpy and the 64-bit limit works for you a faster implementation using numpy is:

import numpy as np
def bin2int(bits):
    return np.right_shift(np.packbits(bits, -1), bits.size).squeeze()

Since normally if you are using numpy you care about speed then the fastest implementation for > 64-bit results is:

import gmpy2
def bin2int(bits):
    return gmpy2.pack(list(bits[::-1]), 1)

If you don't want to grab a dependency on gmpy2 this is a little slower but has no dependencies and supports > 64-bit results:

def bin2int(bits):
    total = 0
    for shift, j in enumerate(bits[::-1]):
        if j:
            total += 1 << shift
    return total

The observant will note some similarities in the last version to other Answers to this question with the main difference being the use of the << operator instead of **, in my testing this led to a significant improvement in speed.




回答3:


I extended the good dot product solution of @Divikar to run ~180x faster on my host, by using vectorized matrix multiplication code. The original code that runs one-row-at-a-time took ~3 minutes to run 100K rows of 18 columns in my pandas dataframe. Well, next week I need to upgrade from 100K rows to 20M rows, so ~10 hours of running time was not going to be fast enough for me. The new code is vectorized, first of all. That's the real change in the python code. Secondly, matmult often runs in parallel without you seeing it, on many-core processors depending on your host configuration, especially when OpenBLAS or other BLAS is present for numpy to use on matrix algebra like this matmult. So it can use a lot of processors and cores, if you have it.

The new -- quite simple -- code runs 100K rows x 18 binary columns in ~1 sec ET on my host which is "mission accomplished" for me:

'''
Fast way is vectorized matmult. Pass in all rows and cols in one shot.
'''
def BitsToIntAFast(bits):
  m,n = bits.shape # number of columns is needed, not bits.size
  a = 2**np.arange(n)[::-1]  # -1 reverses array of powers of 2 of same length as bits
  return bits @ a  # this matmult is the key line of code

'''I use it like this:'''
bits = d.iloc[:,4:(4+18)] # read bits from my pandas dataframe
gs = BitsToIntAFast(bits)
print(gs[:5])
gs.shape
...
d['genre'] = np.array(gs)  # add the newly computed column to pandas

Hope this helps.




回答4:


def binary_converter(arr):
    total = 0
    for index, val in enumerate(reversed(arr)):
        total += (val * 2**index)
    print total


In [14]: b = np.array([1,0,1,0,0,0,0,0,1,0,1])
In [15]: binary_converter(b)
1285
In [9]: b = np.array([0,0,0,0,0,1,0,1])
In [10]: binary_converter(b)
5

or

b = np.array([1,0,1,0,0,0,0,0,1,0,1])
sum(val * 2**index for index, val in enumerate(reversed(b)))


来源:https://stackoverflow.com/questions/41069825/convert-binary-01-numpy-to-integer-or-binary-string

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!