Performant read uint12 binary from file in JavaScript

我与影子孤独终老i 提交于 2021-02-10 18:14:44


I need to read a binary blob from file into a JavaScript array. The blob is little endian, uint 12 bit, I.e.

| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
|         data1[7:0]            |
| data2[3:0]    | data1[11:8]   |
|           data2[11:4]         |

It seems like TypedArrays and bit shifting might be the best way (that's how I solved in Python), but I'm trying to make this very performant (10's of MB in sub-second time scale)

And I'm just testing app performance using the browser tools


WOW, took me some time to figure out this one... ended up having to use Webassembly, here's the code for others in case someone else wants to do it:


#include <stdint.h>

__attribute__((used)) void parse12bit(unsigned char *buffer, int num_bytes, uint16_t *data)
  int i = 0, j = 0;
    data[j] = buffer[i] + ((buffer[i + 1] & 0xF) << 8);
    data[j + 1] = ((buffer[i + 1] & 0xF0) >> 4) + (buffer[i + 2] << 4);
    i = i + 3;
    j = j + 2;
  } while (i < num_bytes);
} (Installed emcc on mac using brew install emscripten)

emcc ./parse12bit.c \
  --target=wasm32-unknown-unknown-wasm \
  --optimize=3 \
  -nostdlib \
  -Wl,--export-all \
  -Wl,--no-entry \
  -Wl,--allow-undefined \
  -o parse12bit.wasm


import fs from 'fs';
import path from 'path';

interface Parse12bit extends Function {
  // passing location of pointer (first element of array), respects C convention
  (buffer: number, length: number, data: number): void;

export const parseDataC = async (bufferArray: Uint8Array) => {
  // Load the wasm into a buffer.
  const wasmBuf = fs.readFileSync(
    path.join(__dirname, `/wasm/parse12bit.wasm`)

  // Make an instance.
  const res = await WebAssembly.instantiate(wasmBuf, {});

  // Get function.
  const parse12bit = res.instance.exports.parse12bit as Parse12bit;
  const memory = res.instance.exports.memory as WebAssembly.Memory;

  // calculate total size of shared memory
  const totalBytes =
    bufferArray.length * Uint8Array.BYTES_PER_ELEMENT +
    1 * Int32Array.BYTES_PER_ELEMENT +
    (bufferArray.length * Uint16Array.BYTES_PER_ELEMENT * 8) / 12;

  // grow memory if necessary, default created memory not large enough in original implementaiton
  // memory is grown in 64 KiB chunks
  if (memory.buffer.byteLength < totalBytes) {
    memory.grow(Math.ceil((totalBytes - memory.buffer.byteLength) / 65536));

  // Create the input arrays.
  let offset = 0;
  const buffer = new Uint8Array(memory.buffer, offset, bufferArray.length);

  offset += buffer.length * Uint8Array.BYTES_PER_ELEMENT;
  const numBytes = new Int32Array(memory.buffer, offset, 1);

  offset += Int32Array.BYTES_PER_ELEMENT;
  const data = new Uint16Array(
    (bufferArray.length * 8) / 12
  ); // data in 12 bit at end

  // Call the function.
  parse12bit(buffer.byteOffset, numBytes.byteOffset, data.byteOffset);

  // // Show the results.

Note, the final was from this very helpful blog post

