I have written code to serialize objects to JSON and BSON. According to my output, the BSON produced is greater in size than the JSON. Is this expected?
From my
From the BSON FAQ:
BSON is designed to be efficient in space, but in many cases is not much more efficient than JSON. In some cases BSON uses even more space than JSON. The reason for this is another of the BSON design goals: traversability. BSON adds some "extra" information to documents, like length prefixes, that make it easy and fast to traverse.
BSON is also designed to be fast to encode and decode. For example, integers are stored as 32 (or 64) bit integers, so they don't need to be parsed to and from text. This uses more space than JSON for small integers, but is much faster to parse.
For a string field, the overhead in JSON is 6 bytes -- 4 quotes, a colon and a comma. In BSON it's 7 -- entry type byte, null terminator to field name, 4 byte string length, null terminator to value.
For an integer field, the JSON length depends on the size of the number. "1" is just one byte. "1000000" is 7 bytes. In BSON both of these would be a 4 byte 32 bit integer. The situation with floating point numbers is similar.
BSON is not intended to be smaller. It is intended to be closer to the structures that computers work with natively, so that it can be worked with more efficiently -- that is one meaning of "light".
If you're not chasing extreme levels of performance (as the MongoDB developers who designed BSON are), then I would advise using JSON -- the human-readability is a great benefit to the developer. As long as you use a library like Jackson, migrating to BSON later should not be hard -- as you can see by how almost identical your own BSON and JSON classes are.
Bear in mind that if size is an issue, both JSON and BSON should compress well.
The property "foo":"bar"
consumes 11 bytes in UTF-8 encoded JSON. In BSON it consumes 13:
bytes description
============================================
1 entry type value \x02
3 "foo"
1 NUL \x00
4 int32 string length (4 -- includes the NUL)
3 "bar"
1 NUL \x00
There are many cases in which JSON will be more compact.