Binary Data in JSON String. Something better than Base64

后端 未结 15 1266
一向
一向 2020-11-21 23:03

The JSON format natively doesn\'t support binary data. The binary data has to be escaped so that it can be placed into a string element (i.e. zero or more Unicode chars in d

相关标签:
15条回答
  • 2020-11-21 23:41

    Just to add the resource and complexity standpoint to the discussion. Since doing PUT/POST and PATCH for storing new resources and altering them, one should remember that the content transfer is an exact representation of the content that is stored and that is received by issuing a GET operation.

    A multi-part message is often used as a savior but for simplicity reason and for more complex tasks, I prefer the idea of giving the content as a whole. It is self-explaining and it is simple.

    And yes JSON is something crippling but in the end JSON itself is verbose. And the overhead of mapping to BASE64 is a way to small.

    Using Multi-Part messages correctly one has to either dismantle the object to send, use a property path as the parameter name for automatic combination or will need to create another protocol/format to just express the payload.

    Also liking the BSON approach, this is not that widely and easily supported as one would like it to be.

    Basically, we just miss something here but embedding binary data as base64 is well established and way to go unless you really have identified the need to do the real binary transfer (which is hardly often the case).

    0 讨论(0)
  • 2020-11-21 23:44

    I dig a little bit more (during implementation of base128), and expose that when we send characters which ascii codes are bigger than 128 then browser (chrome) in fact send TWO characters (bytes) instead one :(. The reason is that JSON by defaul use utf8 characters for which characters with ascii codes above 127 are coded by two bytes what was mention by chmike answer. I made test in this way: type in chrome url bar chrome://net-export/ , select "Include raw bytes", start capturing, send POST requests (using snippet at the bottom), stop capturing and save json file with raw requests data. Then we look inside that json file:

    • We can find our base64 request by finding string 4142434445464748494a4b4c4d4e this is hex coding of ABCDEFGHIJKLMN and we will see that "byte_count": 639 for it.
    • We can find our above127 request by finding string C2BCC2BDC380C381C382C383C384C385C386C387C388C389C38AC38B this are request-hex utf8 codes of characters ¼½ÀÁÂÃÄÅÆÇÈÉÊË (however the ascii hex codes of this characters are c1c2c3c4c5c6c7c8c9cacbcccdce). The "byte_count": 703 so it is 64bytes longer than base64 request because characters with ascii codes above 127 are code by 2 bytes in request :(

    So in fact we don't have profit with sending characters with codes >127 :( . For base64 strings we not observe such negative behaviour (probably for base85 too - I don check it) - however may be some solution for this problem will be sending data in binary part of POST multipart/form-data described in Ælex answer (however usually in this case we don't need to use any base coding at all...).

    The alternative approach may rely on mapping two bytes data portion into one valid utf8 character by code it using something like base65280 / base65k but probably it would be less effective than base64 due to utf8 specification ...

    function postBase64() {
      let formData = new FormData();
      let req = new XMLHttpRequest();
    
      formData.append("base64ch", "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/");
      req.open("POST", '/testBase64ch');
      req.send(formData);
    }
    
    
    function postAbove127() {
      let formData = new FormData();
      let req = new XMLHttpRequest();
    
      formData.append("above127", "¼½ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüý");
      req.open("POST", '/testAbove127');
      req.send(formData);
    }
    <button onclick=postBase64()>POST base64 chars</button>
    <button onclick=postAbove127()>POST chars with codes>127</button>

    0 讨论(0)
  • 2020-11-21 23:45

    BSON (Binary JSON) may work for you. http://en.wikipedia.org/wiki/BSON

    Edit: FYI the .NET library json.net supports reading and writing bson if you are looking for some C# server side love.

    0 讨论(0)
  • 2020-11-21 23:47

    Since you're looking for the ability to shoehorn binary data into a strictly text-based and very limited format, I think Base64's overhead is minimal compared to the convenience you're expecting to maintain with JSON. If processing power and throughput is a concern, then you'd probably need to reconsider your file formats.

    0 讨论(0)
  • 2020-11-21 23:49

    (Edit 7 years later: Google Gears is gone. Ignore this answer.)


    The Google Gears team ran into the lack-of-binary-data-types problem and has attempted to address it:

    Blob API

    JavaScript has a built-in data type for text strings, but nothing for binary data. The Blob object attempts to address this limitation.

    Maybe you can weave that in somehow.

    0 讨论(0)
  • 2020-11-21 23:49

    Refer: http://snia.org/sites/default/files/Multi-part%20MIME%20Extension%20v1.0g.pdf

    It describes a way to transfer binary data between a CDMI client and server using 'CDMI content type' operations without requiring base64 conversion of the binary data.

    If you can use 'Non-CDMI content type' operation, it is ideal to transfer 'data' to/from a object. Metadata can then later be added/retrieved to/from the object as a subsequent 'CDMI content type' operation.

    0 讨论(0)
提交回复
热议问题