Is there a commonly accepted technique for efficiently converting JavaScript strings to ArrayBuffers and vice-versa? Specifically, I\'d like to be able to write the contents
The following is a working Typescript implementation:
bufferToString(buffer: ArrayBuffer): string {
return String.fromCharCode.apply(null, Array.from(new Uint16Array(buffer)));
}
stringToBuffer(value: string): ArrayBuffer {
let buffer = new ArrayBuffer(value.length * 2); // 2 bytes per char
let view = new Uint16Array(buffer);
for (let i = 0, length = value.length; i < length; i++) {
view[i] = value.charCodeAt(i);
}
return buffer;
}
I've used this for numerous operations while working with crypto.subtle.
In case you have binary data in a string (obtained from nodejs
+ readFile(..., 'binary')
, or cypress
+ cy.fixture(..., 'binary')
, etc), you can't use TextEncoder
. It supports only utf8
. Bytes with values >= 128
are each turned into 2 bytes.
ES2015:
a = Uint8Array.from(s, x => x.charCodeAt(0))
Uint8Array(33) [2, 134, 140, 186, 82, 70, 108, 182, 233, 40, 143, 247, 29, 76, 245, 206, 29, 87, 48, 160, 78, 225, 242, 56, 236, 201, 80, 80, 152, 118, 92, 144, 48
s = String.fromCharCode.apply(null, a)
"ºRFl¶é(÷LõÎW0 Náò8ìÉPPv\0"
Blob is much slower than String.fromCharCode(null,array);
but that fails if the array buffer gets too big. The best solution I have found is to use String.fromCharCode(null,array);
and split it up into operations that won't blow the stack, but are faster than a single char at a time.
The best solution for large array buffer is:
function arrayBufferToString(buffer){
var bufView = new Uint16Array(buffer);
var length = bufView.length;
var result = '';
var addition = Math.pow(2,16)-1;
for(var i = 0;i<length;i+=addition){
if(i + addition > length){
addition = length - i;
}
result += String.fromCharCode.apply(null, bufView.subarray(i,i+addition));
}
return result;
}
I found this to be about 20 times faster than using blob. It also works for large strings of over 100mb.
Based on the answer of gengkev, I created functions for both ways, because BlobBuilder can handle String and ArrayBuffer:
function string2ArrayBuffer(string, callback) {
var bb = new BlobBuilder();
bb.append(string);
var f = new FileReader();
f.onload = function(e) {
callback(e.target.result);
}
f.readAsArrayBuffer(bb.getBlob());
}
and
function arrayBuffer2String(buf, callback) {
var bb = new BlobBuilder();
bb.append(buf);
var f = new FileReader();
f.onload = function(e) {
callback(e.target.result)
}
f.readAsText(bb.getBlob());
}
A simple test:
string2ArrayBuffer("abc",
function (buf) {
var uInt8 = new Uint8Array(buf);
console.log(uInt8); // Returns `Uint8Array { 0=97, 1=98, 2=99}`
arrayBuffer2String(buf,
function (string) {
console.log(string); // returns "abc"
}
)
}
)
From emscripten:
function stringToUTF8Array(str, outU8Array, outIdx, maxBytesToWrite) {
if (!(maxBytesToWrite > 0)) return 0;
var startIdx = outIdx;
var endIdx = outIdx + maxBytesToWrite - 1;
for (var i = 0; i < str.length; ++i) {
var u = str.charCodeAt(i);
if (u >= 55296 && u <= 57343) {
var u1 = str.charCodeAt(++i);
u = 65536 + ((u & 1023) << 10) | u1 & 1023
}
if (u <= 127) {
if (outIdx >= endIdx) break;
outU8Array[outIdx++] = u
} else if (u <= 2047) {
if (outIdx + 1 >= endIdx) break;
outU8Array[outIdx++] = 192 | u >> 6;
outU8Array[outIdx++] = 128 | u & 63
} else if (u <= 65535) {
if (outIdx + 2 >= endIdx) break;
outU8Array[outIdx++] = 224 | u >> 12;
outU8Array[outIdx++] = 128 | u >> 6 & 63;
outU8Array[outIdx++] = 128 | u & 63
} else {
if (outIdx + 3 >= endIdx) break;
outU8Array[outIdx++] = 240 | u >> 18;
outU8Array[outIdx++] = 128 | u >> 12 & 63;
outU8Array[outIdx++] = 128 | u >> 6 & 63;
outU8Array[outIdx++] = 128 | u & 63
}
}
outU8Array[outIdx] = 0;
return outIdx - startIdx
}
Use like:
stringToUTF8Array('abs', new Uint8Array(3), 0, 4);
Unlike the solutions here, I needed to convert to/from UTF-8 data. For this purpose, I coded the following two functions, using the (un)escape/(en)decodeURIComponent trick. They're pretty wasteful of memory, allocating 9 times the length of the encoded utf8-string, though those should be recovered by gc. Just don't use them for 100mb text.
function utf8AbFromStr(str) {
var strUtf8 = unescape(encodeURIComponent(str));
var ab = new Uint8Array(strUtf8.length);
for (var i = 0; i < strUtf8.length; i++) {
ab[i] = strUtf8.charCodeAt(i);
}
return ab;
}
function strFromUtf8Ab(ab) {
return decodeURIComponent(escape(String.fromCharCode.apply(null, ab)));
}
Checking that it works:
strFromUtf8Ab(utf8AbFromStr('latinкирилицаαβγδεζηあいうえお'))
-> "latinкирилицаαβγδεζηあいうえお"