What is BSON and exactly how is it different from JSON?

前端 未结 7 879
-上瘾入骨i
-上瘾入骨i 2021-01-29 19:03

I am just starting out with MongoDB and one of the things that I have noticed is that it uses BSON to store data internally. However the documentation is not exactly clear on wh

7条回答
  •  再見小時候
    2021-01-29 19:08

    To stay strictly within the boundaries of the OP question:

    1. What is BSON?

    BSON is a specification for a rich set of scalar types (int32, int64, decimal, date, etc.) plus containers (object a.k.a. a map, and array) as they might appear in a byte stream. There is no "native" string form of BSON; is it a byte[] spec. To work with this byte stream, there are many native language implementations available that can turn the byte stream into actual types appropriate for the language. These are called codecs. For example, the Java implementation of a BSON codec into the Document class from MongoDB turns objects into something that implements java.util.Map. Dates are decoded into java.util.Date. Transmitting BSON looks like this in, for example, Java and python:

    Java:
    import org.bson.*;
    MyObject  -->  get() from MyObject, set() into org.bson.Document --> org.bson.standardCodec.encode(Document) to byte[]
    
    XMIT byte[]
    
    python:
    import bson
    byte[] --> bson.decode(byte[]) to dict --> get from dict --> do something
    

    There are no to- and from- string calls involved. There is no parser. There is nothing about whitespace and double quotes and escaped characters. Dates, BigDecimal, and arrays of Long captured on the Java side reappear in python as datetime.datetime, Decimal, and array of int.

    In comparison, JSON is a string. There is no codec for JSON. Transmitting JSON looks like this:

    MyObject --> convert to JSON (now you have a big string with quotes and braces and commas)
    
    XMIT string
    
    parse string to dict (or possibly a class via a framework) 
    

    Superficially this looks the same but the JSON specification for scalars has only strings and "number" (leaving out bools and nulls, etc.). There is no direct way to send a long or a BigDecimal from sender to receiver in JSON; they are both just "number". Furthermore, JSON has no type for plain byte array. All non-ASCII data must be base64 or otherwise encoded in a way to protect it and sent as a string. BSON has a byte array type. The producer sets it, the consumer gets it. There is no secondary processing of strings to turn it back into the desired type.

    1. How does MongoDB use BSON?

    To start, it is the wire protocol for content. It also is the on-disk format of data. Because varying length types (most notably string) carry length information in the BSON spec, this permits MongoDB to performantly traverse an object (hopping field to field). Finding the object in a collection is more than just BSON including use of indexes.

提交回复
热议问题