Is there a commonly accepted technique for efficiently converting JavaScript strings to ArrayBuffers and vice-versa? Specifically, I\'d like to be able to write the contents
I found I had problems with this approach, basically because I was trying to write the output to a file and it was non encoded properly. Since JS seems to use UCS-2 encoding (source, source), we need to stretch this solution a step further, here's my enhanced solution that works to me.
I had no difficulties with generic text, but when it was down to Arab or Korean, the output file didn't have all the chars but instead was showing error characters
File output:
","10k unit":"",Follow:"Õ©íüY‹","Follow %{screen_name}":"%{screen_name}U“’Õ©íü",Tweet:"ĤüÈ","Tweet %{hashtag}":"%{hashtag} ’ĤüÈY‹","Tweet to %{name}":"%{name}U“xĤüÈY‹"},ko:{"%{followers_count} followers":"%{followers_count}…X \Ì","100K+":"100Ì tÁ","10k unit":"Ì è",Follow:"\°","Follow %{screen_name}":"%{screen_name} Ø \°X0",K:"œ",M:"1Ì",Tweet:"¸","Tweet %{hashtag}":"%{hashtag}
Original:
","10k unit":"万",Follow:"フォローする","Follow %{screen_name}":"%{screen_name}さんをフォロー",Tweet:"ツイート","Tweet %{hashtag}":"%{hashtag} をツイートする","Tweet to %{name}":"%{name}さんへツイートする"},ko:{"%{followers_count} followers":"%{followers_count}명의 팔로워","100K+":"100만 이상","10k unit":"만 단위",Follow:"팔로우","Follow %{screen_name}":"%{screen_name} 님 팔로우하기",K:"천",M:"백만",Tweet:"트윗","Tweet %{hashtag}":"%{hashtag}
I took the information from dennis' solution and this post I found.
Here's my code:
function encode_utf8(s) {
return unescape(encodeURIComponent(s));
}
function decode_utf8(s) {
return decodeURIComponent(escape(s));
}
function ab2str(buf) {
var s = String.fromCharCode.apply(null, new Uint8Array(buf));
return decode_utf8(decode_utf8(s))
}
function str2ab(str) {
var s = encode_utf8(str)
var buf = new ArrayBuffer(s.length);
var bufView = new Uint8Array(buf);
for (var i=0, strLen=s.length; i<strLen; i++) {
bufView[i] = s.charCodeAt(i);
}
return bufView;
}
This allows me to save the content to a file without encoding problems.
How it works: It basically takes the single 8-byte chunks composing a UTF-8 character and saves them as single characters (therefore an UTF-8 character built in this way, could be composed by 1-4 of these characters). UTF-8 encodes characters in a format that variates from 1 to 4 bytes in length. What we do here is encoding the sting in an URI component and then take this component and translate it in the corresponding 8 byte character. In this way we don't lose the information given by UTF8 characters that are more than 1 byte long.
BlobBuilder has long been deprecated by the Blob object. Compare the code in Dennis' answer — where BlobBuilder is used — with the code below:
function arrayBufferGen(str, cb) {
var b = new Blob([str]);
var f = new FileReader();
f.onload = function(e) {
cb(e.target.result);
}
f.readAsArrayBuffer(b);
}
Note how much cleaner and less bloated this is compared to the deprecated method... Yeah, this is definitely something to consider here.
Well, here's a somewhat convoluted way of doing the same thing:
var string = "Blah blah blah", output;
var bb = new (window.BlobBuilder||window.WebKitBlobBuilder||window.MozBlobBuilder)();
bb.append(string);
var f = new FileReader();
f.onload = function(e) {
// do whatever
output = e.target.result;
}
f.readAsArrayBuffer(bb.getBlob());
Edit: BlobBuilder has long been deprecated in favor of the Blob constructor, which did not exist when I first wrote this post. Here's an updated version. (And yes, this has always been a very silly way to do the conversion, but it was just for fun!)
var string = "Blah blah blah", output;
var f = new FileReader();
f.onload = function(e) {
// do whatever
output = e.target.result;
};
f.readAsArrayBuffer(new Blob([string]));
You can use TextEncoder
and TextDecoder
from the Encoding standard, which is polyfilled by the stringencoding library, to convert string to and from ArrayBuffers:
var uint8array = new TextEncoder().encode(string);
var string = new TextDecoder(encoding).decode(uint8array);
stringToArrayBuffer(byteString) {
var byteArray = new Uint8Array(byteString.length);
for (var i = 0; i < byteString.length; i++) {
byteArray[i] = byteString.codePointAt(i);
}
return byteArray;
}
arrayBufferToString(buffer) {
var byteArray = new Uint8Array(buffer);
var byteString = '';
for (var i = 0; i < byteArray.byteLength; i++) {
byteString += String.fromCodePoint(byteArray[i]);
}
return byteString;
}
The "native" binary string that atob() returns is a 1-byte-per-character Array.
So we shouldn't store 2 byte into a character.
var arrayBufferToString = function(buffer) {
return String.fromCharCode.apply(null, new Uint8Array(buffer));
}
var stringToArrayBuffer = function(str) {
return (new Uint8Array([].map.call(str,function(x){return x.charCodeAt(0)}))).buffer;
}