Why is binary serialization faster than xml serialization?

前端 未结 5 2303
别那么骄傲
别那么骄傲 2021-02-19 20:25

Why is binary serialization considered faster than xml serialization?

相关标签:
5条回答
  • 2021-02-19 21:05

    I had assumed binary serialization to be faster than xml (based on how verbose xml can be). However I have an opposite observation! I was investigating a performance issue in one of my application and find out that time to serialize is similar between xml and binary. However difference in time to deserialization is extremely huge. xml deserialization takes less than 10 seconds but binary deserialization takes over 10 minutes!

    So I guess in theory xml serialization/deseriliaztion is slower than binary but in your application, it depends!

    I can't share the actual data but here are the results (in milliseconds)

     Serialization   Deserialization    
     XML    Binary   XML    Binary 
     7,956  9,535    9,112  668,918 
     7,608  9,105    8,386  670,445 
     7,583  9,398    8,372  676,190 
     7,656  9,299    9,783  679,117 
     7,454  9,458    8,219  669,626 
    0 讨论(0)
  • 2021-02-19 21:12

    Consider serializing double for example:

    • binary serialization: writing 8 bytes from memory address to the stream

    • binary deserialization: reading same 8 bytes

    • xml serialization: writing tag, converting to text, writing closing tag - nearly thrice the I/O and 1000x more CPU utilization

    • xml deserialization: tag reading/validation, reading string parsing it to number, reading/validation of closing tag. little more overhead for I/O and some more for CPU

    0 讨论(0)
  • 2021-02-19 21:15

    Actually, like all things - it depends on the data, and the serializer.

    Commonly (although perhaps unwisely) people mean BinaryFormatter for "binary", but this has a number of foibles:

    • in adds lots of type metadata (which all takes space)
    • by default it includes field names (which can be verbose, especially for automatically implemented properties)

    Conversely, xml generally has overheads such as:

    • tags adding space and IO
    • the need to parse tags (which is remarkably expensive)
    • lots of text encoding/decoding

    Of course, xml is easily compressed, adding CPU but hugely reducing bandwidth.

    But that doesn't mean one is faster; I would refer you to some sample stats from here (with full source included), to which I've annotated the serializer base (binary, xml, text, etc). Look in particular at the first two results; it looks like XmlSerializer trumped BinaryFormatter on every value, while retaining the cross-platform advantages. Of course, protobuf then trumps XmlSerializer ;p

    These numbers tie in quite well to ServiceStack's benchmarks, here.

    BinaryFormatter *** binary
    Length: 1314
    Serialize: 6746
    Deserialize: 6268
    
    XmlSerializer *** xml
    Length: 1049
    Serialize: 3282
    Deserialize: 5132
    
    DataContractSerializer *** xml
    Length: 911
    Serialize: 1411
    Deserialize: 4380
    
    NetDataContractSerializer *** binary
    Length: 1139
    Serialize: 2014
    Deserialize: 5645
    
    JavaScriptSerializer *** text (json)
    Length: 528
    Serialize: 12050
    Deserialize: 30558
    
    (protobuf-net v2) *** binary
    Length: 112
    Serialize: 217
    Deserialize: 250
    
    0 讨论(0)
  • 2021-02-19 21:21

    Binary serialization is more efficient because write raw data directly and the XML needs format, and parse the data to generate a valid XML structure, additionally depending of what sort of data have your objects the XML may have a lot of redundant data.

    0 讨论(0)
  • 2021-02-19 21:26

    Well, first of all, XML is a bloated format. Every byte you send in binary form would be similar to at least 2 or 3 bytes in XML. For example, sending the number "44" in binary, you need just one byte. In XML you need an element tag, plus two bytes to put the numer: <N>44</N> which is a lot more data.
    One difference is the encoding/decoding time required to handle the message. Since binary data is so compact, it won't eat up much clock cycles. If the binary data is a fixed structure, you could probably load it directly into memory and access every element from it without the need to parse/unparse the data.
    XML is a text-based format which needs a few more steps to be processed. First, the format is bloated so it eats up more memory. Furthermore, all data is text and you might need them in binary form, thus the XML needs to be parsed. This parsing still needs time to process, no matter how fast your code is. ASN.1 is a "binary XML" format that provides a good alternative for XML, but which will need to be parsed just like XML. Plus, if most of the data you use is text, not numeric, then binary formats won't make a big difference.
    Another speed factor is the total size of your data. When you just load and save a binary file of 1 KB or an XML file of 3 KB then you probably won't notice any speed difference. This is because disks use blocks of a specific size to store data. Up to 4 KB fits easily within most disk blocks. Thus, for the disk it doesn't matter if it needs to read 1 KB or 3 KB since it reads the whole 4KB block. But when the binary file is 1 megabyte and the XML is 3 megabytes, the disk will need to read a lot more blocks to just read the XML. (Or to write it.) And then it even matters if your XML is 3 MB or just 2.99 MB or 3.01 MB.
    With transport over TCP/IP, most binary data will be UU-encoded. With UU-encoding, your binary data will grow with 1 byte for every 3 bytes in the data. XML data will not be encoded thus the size difference becomes smaller, thus the speed difference becomes less. Still, the binary data will still be faster since the encoding/decoding routines can be real fast.
    Basically, size matters. :-)

    But with XML you have an additional alternative. You can send and store the XML in a ZIP file format. Microsoft Office does this with it's newer versions. A Word document is created as an XML file, yet stored as part of a bigger ZIP file. This combines the best of both worlds, since Word documents are mostly text thus a binary format would not add much speed increase. Zipping the XML makes storage and sending the data a lot faster simply by making it binary. Even more interesting, a compressed XML file could end up being smaller than a non-compressed binary file, thus the zipped XML becomes the faster one. (But it's cheating since the XML is now binary...)

    0 讨论(0)
提交回复
热议问题