Python: reading 12-bit binary files

不问归期 提交于 2019-11-28 11:40:45

I have a slightly different implementation from the one proposed by @max9111 that doesn't require a call to unpackbits.

It creates two uint12 values from three consecutive uint8 directly by cutting the middle byte in half and using numpy's binary operations. In the following, data_chunks is assumed to be a binary string containing the information for an arbitrary number number of 12-bit integers (hence its length must be a multiple of 3).

def read_uint12(data_chunk):
    data = np.frombuffer(data_chunk, dtype=np.uint8)
    fst_uint8, mid_uint8, lst_uint8 = np.reshape(data, (data.shape[0] // 3, 3)).astype(np.uint16).T
    fst_uint12 = (fst_uint8 << 4) + (mid_uint8 >> 4)
    snd_uint12 = ((mid_uint8 % 16) << 8) + lst_uint8
    return np.reshape(np.concatenate((fst_uint12[:, None], snd_uint12[:, None]), axis=1), 2 * fst_uint12.shape[0])

I benchmarked with the other implementation and this approach proved to be ~4x faster on a ~5 Mb input:
read_uint12_unpackbits 65.5 ms ± 1.11 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) read_uint12 14 ms ± 513 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Edit

With the help of @Cyril Gaudefroy answer, I created a compiled solution using Numba.

import numba as nb
import numpy as np
@nb.njit(nb.uint16[::1](nb.uint8[::1]),fastmath=True,parallel=True)
def nb_read_uint12(data_chunk):
  """data_chunk is a contigous 1D array of uint8 data)
  eg.data_chunk = np.frombuffer(data_chunk, dtype=np.uint8)"""

  #ensure that the data_chunk has the right length
  assert np.mod(data_chunk.shape[0],3)==0

  out=np.empty(data_chunk.shape[0]//3*2,dtype=np.uint16)

  for i in nb.prange(data_chunk.shape[0]//3):
    fst_uint8=np.uint16(data_chunk[i*3])
    mid_uint8=np.uint16(data_chunk[i*3+1])
    lst_uint8=np.uint16(data_chunk[i*3+2])

    out[i*2] =   (fst_uint8 << 4) + (mid_uint8 >> 4)
    out[i*2+1] = ((mid_uint8 % 16) << 8) + lst_uint8

  return out

Timings

num_Frames=10
data_chunk=np.random.randint(low=0,high=255,size=np.int(640*256*1.5*num_Frames),dtype=np.uint8)

Cyril Gaudefroy(numpy only): 11ms  ->225MB/s
Numba version:               1.1ms ->2.25GB/s

Previous version (not recommended)

If Numba is not an option see @Cyril Gaudefroys answer.

When i read the question, i thought there must be a simple answer, but I failed. Nevertheless i wrote a simple (but ugly) code that is about 300 times faster than your example and achieves about 25 MB/s on my Notebook (Core i5 3210M).

def read_uint12(filename_video,nb_frames,height,width):

    data=np.fromfile(filename_video, dtype=np.uint8)
    data=np.unpackbits(data)
    data=data.reshape((data.shape[0]/12,12))

    images=np.zeros(data_2.shape[0],dtype=np.uint16)
    for i in xrange(0,12):
        images+=2**i*data[:,11-i]

    images = np.reshape(images,(nb_frames,height,width))
    return images

Found @cyrilgaudefroy answer useful. However, initially, it did not work on my 12-bit packed binary image data. Found out that the packing is a bit different in this particular case. The "middle" byte contained the least significant nibbles. Bytes 1 and 3 of the triplet are the most significant 8 bits of the twelve. Hence modified @cyrilgaudefroy answer to:

def read_uint12(data_chunk):
    data = np.frombuffer(data_chunk, dtype=np.uint8)
    fst_uint8, mid_uint8, lst_uint8 = np.reshape(data, (data.shape[0] // 3, 3)).astype(np.uint16).T
    fst_uint12 = (fst_uint8 << 4) + (mid_uint8 >> 4)
    snd_uint12 = (lst_uint8 << 4) + (np.bitwise_and(15, mid_uint8))
    return np.reshape(np.concatenate((fst_uint12[:, None], snd_uint12[:, None]), axis=1), 2 * fst_uint12.shape[0])

Here's yet another variation. My data format is:

first uint12: most significant 4 bits from least significant 4 bits of second uint8 + least significant 8 bits from first uint8

second uint12: most significant 8 bits from third uint8 + least significant 4 bits from most significant 4 bits from second uint8

The corresponding code is:

def read_uint12(data_chunk):
    data = np.frombuffer(data_chunk, dtype=np.uint8)
    fst_uint8, mid_uint8, lst_uint8 = numpy.reshape(data, (data.shape[0] // 3, 3)).astype(numpy.uint16).T
    fst_uint12 = ((mid_uint8 & 0x0F) << 8) | fst_uint8
    snd_uint12 = (lst_uint8 << 4) | ((mid_uint8 & 0xF0) >> 4)
    return numpy.reshape(numpy.concatenate((fst_uint12[:, None], snd_uint12[:, None]), axis=1), 2 * fst_uint12.shape[0])
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!