Are there any compact binary representations of JSON out there? I know there is BSON, but even that webpage says \"in many cases is not much more efficient than JSON. In so
Try to use the js-inflate to make and unmake blobs.
https://github.com/augustl/js-inflate
This is perfect and I use a lot.
gzipping JSON data is going to get you good compression ratios with very little effort because of its universal support. Also, if you're in a browser environment, you may end up paying a greater byte cost in the size of the dependency from a new library than you would in actual payload savings.
If your data has additional constraints (such as lots of redundant field values), you may be able to optimize by looking at a different serialization protocol rather than sticking to JSON. Example: a column-based serialization such as Avro's upcoming columnar store may get you better ratios (for on-disk storage). If your payloads contain lots of constant values (such as columns that represent enums), a dictionary compression approach may be useful too.
Yes: Smile data format (see Wikipedia entry. It has public Java implementation, C version in the works at github (libsmile). It has benefit of being more compact than JSON (reliably), but being 100% compatible logical data model, so it is easy and possible to convert back and forth with textual JSON.
For performance, you can see jvm-serializers benchmark, where smile competes well with other binary formats (thrift, avro, protobuf); sizewise it is not the most compact (since it does retain field names), but does much better with data streams where names are repeated.
It is being used by projects like Elastic Search and Solr (optionally), Protostuff-rpc supports it, although it is not as widely as say Thrift or protobuf.
EDIT (Dec 2011) -- there are now also libsmile
bindings for PHP, Ruby and Python, so language support is improving. In addition there are measurements on data size; and although for single-record data alternatives (Avro, protobuf) are more compact, for data streams Smile is often more compact due to key and String value back reference option.
Have you tried BJSON ?
Another alternative that should be considered these days is CBOR (RFC 7049), which has an explicitly JSON-compatible model with a lot of flexibility. It is both stable and meets your open-standard qualification, and has obviously had a lot of thought put into it.
You could take a look at the Universal Binary JSON specification. It won't be as compact as Smile because it doesn't do name references, but it is 100% compatible with JSON (where as BSON and BJSON define data structures that don't exist in JSON so there is no standard conversion to/from).
It is also (intentionally) criminally simple to read and write with a standard format of:
[type, 1-byte char]([length, 4-byte int32])([data])
So simple data types begin with an ASCII marker code like 'I' for a 32-bit int, 'T' for true, 'Z' for null, 'S' for string and so on.
The format is by design engineered to be fast-to-read as all data structures are prefixed with their size so there is no scanning for null-terminated sequences.
For example, reading a string that might be demarcated like this (the []-chars are just for illustration purposes, they are not written in the format)
[S][512][this is a really long 512-byte UTF-8 string....]
You would see the 'S', switch on it to processing a string, see the 4-byte integer that follows it of "512" and know that you can just grab in one chunk the next 512 bytes and decode them back to a string.
Similarly numeric values are written out without a length value to be more compact because their type (byte, int32, int64, double) all define their length of bytes (1, 4, 8 and 8 respectively. There is also support for arbitrarily long numbers that is extremely portable, even on platforms that don't support them).
On average you should see a size reduction of roughly 30% with a well balanced JSON object (lots of mixed types). If you want to know exactly how certain structures compress or don't compress you can check the Size Requirements section to get an idea.
On the bright side, regardless of compression, the data will be written in a more optimized format and be faster to work with.
I checked the core Input/OutputStream implementations for reading/writing the format into GitHub today. I'll check in general reflection-based object mapping later this week.
You can just look at those two classes to see how to read and write the format, I think the core logic is something like 20 lines of code. The classes are longer because of abstractions to the methods and some structuring around checking the marker bytes to make sure the data file is a valid format; things like that.
If you have really specific questions like the endianness (Big) of the spec or numeric format for doubles (IEEE 754) all of that is covered in the spec doc or just ask me.
Hope that helps!