Convert binary (0|1) numpy to integer or binary-string?

前端 未结 5 1290
渐次进展
渐次进展 2021-02-19 20:29

Is there a shortcut to Convert binary (0|1) numpy array to integer or binary-string ? F.e.

b = np.array([0,0,0,0,0,1,0,1])   
  => b is 5

np.packbits(b)


        
相关标签:
5条回答
  • 2021-02-19 20:43

    One way would be using dot-product with 2-powered range array -

    b.dot(2**np.arange(b.size)[::-1])
    

    Sample run -

    In [95]: b = np.array([1,0,1,0,0,0,0,0,1,0,1])
    
    In [96]: b.dot(2**np.arange(b.size)[::-1])
    Out[96]: 1285
    

    Alternatively, we could use bitwise left-shift operator to create the range array and thus get the desired output, like so -

    b.dot(1 << np.arange(b.size)[::-1])
    

    If timings are of interest -

    In [148]: b = np.random.randint(0,2,(50))
    
    In [149]: %timeit b.dot(2**np.arange(b.size)[::-1])
    100000 loops, best of 3: 13.1 µs per loop
    
    In [150]: %timeit b.dot(1 << np.arange(b.size)[::-1])
    100000 loops, best of 3: 7.92 µs per loop
    

    Reverse process

    To retrieve back the binary array, use np.binary_repr alongwith np.fromstring -

    In [96]: b = np.array([1,0,1,0,0,0,0,0,1,0,1])
    
    In [97]: num = b.dot(2**np.arange(b.size)[::-1]) # integer
    
    In [98]: np.fromstring(np.binary_repr(num), dtype='S1').astype(int)
    Out[98]: array([1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1])
    
    0 讨论(0)
  • 2021-02-19 20:50

    I extended the good dot product solution of @Divikar to run ~180x faster on my host, by using vectorized matrix multiplication code. The original code that runs one-row-at-a-time took ~3 minutes to run 100K rows of 18 columns in my pandas dataframe. Well, next week I need to upgrade from 100K rows to 20M rows, so ~10 hours of running time was not going to be fast enough for me. The new code is vectorized, first of all. That's the real change in the python code. Secondly, matmult often runs in parallel without you seeing it, on many-core processors depending on your host configuration, especially when OpenBLAS or other BLAS is present for numpy to use on matrix algebra like this matmult. So it can use a lot of processors and cores, if you have it.

    The new -- quite simple -- code runs 100K rows x 18 binary columns in ~1 sec ET on my host which is "mission accomplished" for me:

    '''
    Fast way is vectorized matmult. Pass in all rows and cols in one shot.
    '''
    def BitsToIntAFast(bits):
      m,n = bits.shape # number of columns is needed, not bits.size
      a = 2**np.arange(n)[::-1]  # -1 reverses array of powers of 2 of same length as bits
      return bits @ a  # this matmult is the key line of code
    
    '''I use it like this:'''
    bits = d.iloc[:,4:(4+18)] # read bits from my pandas dataframe
    gs = BitsToIntAFast(bits)
    print(gs[:5])
    gs.shape
    ...
    d['genre'] = np.array(gs)  # add the newly computed column to pandas
    

    Hope this helps.

    0 讨论(0)
  • 2021-02-19 20:52
    def binary_converter(arr):
        total = 0
        for index, val in enumerate(reversed(arr)):
            total += (val * 2**index)
        print total
    
    
    In [14]: b = np.array([1,0,1,0,0,0,0,0,1,0,1])
    In [15]: binary_converter(b)
    1285
    In [9]: b = np.array([0,0,0,0,0,1,0,1])
    In [10]: binary_converter(b)
    5
    

    or

    b = np.array([1,0,1,0,0,0,0,0,1,0,1])
    sum(val * 2**index for index, val in enumerate(reversed(b)))
    
    0 讨论(0)
  • 2021-02-19 20:58

    Using numpy for conversion limits you to 64-bit signed binary results. If you really want to use numpy and the 64-bit limit works for you a faster implementation using numpy is:

    import numpy as np
    def bin2int(bits):
        return np.right_shift(np.packbits(bits, -1), bits.size).squeeze()
    

    Since normally if you are using numpy you care about speed then the fastest implementation for > 64-bit results is:

    import gmpy2
    def bin2int(bits):
        return gmpy2.pack(list(bits[::-1]), 1)
    

    If you don't want to grab a dependency on gmpy2 this is a little slower but has no dependencies and supports > 64-bit results:

    def bin2int(bits):
        total = 0
        for shift, j in enumerate(bits[::-1]):
            if j:
                total += 1 << shift
        return total
    

    The observant will note some similarities in the last version to other Answers to this question with the main difference being the use of the << operator instead of **, in my testing this led to a significant improvement in speed.

    0 讨论(0)
  • 2021-02-19 20:59

    My timeit results:

    b.dot(2**np.arange(b.size)[::-1])
    100000 loops, best of 3: 2.48 usec per loop
    
    b.dot(1 << np.arange(b.size)[::-1])
    100000 loops, best of 3: 2.24 usec per loop
    
    # Precompute powers-of-2 array with a = 1 << np.arange(b.size)[::-1]
    b.dot(a)
    100000 loops, best of 3: 0.553 usec per loop
    
    # using gmpy2 is slower
    gmpy2.pack(list(map(int,b[::-1])), 1)
    100000 loops, best of 3: 10.6 usec per loop
    

    So if you know the size ahead of time, it's significantly faster to precompute the powers-of-2 array. But if possible, you should do all computations simultaneously using matrix multiplication like in Geoffrey Anderson's answer.

    0 讨论(0)
提交回复
热议问题