XML vs Binary performance for Serialization/Deserialization

◇◆丶佛笑我妖孽 提交于 2019-12-17 19:44:16

问题


I'm working on a compact framework application and need to boost performance. The app currently works offline by serializing objects to XML and storing them in a database. Using a profiling tool I could see this was quite a big overhead, slowing the app. I thought if I switched to a binary serialization the performance would increase, but because this is not supported in the compact framework I looked at protobuf-net. The serialization seems quicker, but deserialization much slower and the app is doing more deserializing than serializing.

Should binary serialization should be faster and if so what I can do to speed up the performance? Here's a snippet of how I'm using both XML and binary:

XML serialization:

public string Serialize(T obj)
{
  UTF8Encoding encoding = new UTF8Encoding();
  XmlSerializer serializer = new XmlSerializer(typeof(T));
  MemoryStream stream = new MemoryStream();
  XmlTextWriter writer = new XmlTextWriter(stream, Encoding.UTF8);
  serializer.Serialize(stream, obj);
  stream = (MemoryStream)writer.BaseStream;
  return encoding.GetString(stream.ToArray(), 0, Convert.ToInt32(stream.Length));
}
public T Deserialize(string xml)
{
  UTF8Encoding encoding = new UTF8Encoding();
  XmlSerializer serializer = new XmlSerializer(typeof(T));
  MemoryStream stream = new MemoryStream(encoding.GetBytes(xml));            
  return (T)serializer.Deserialize(stream);
}

Protobuf-net Binary serialization:

public byte[] Serialize(T obj)
{
  byte[] raw;
  using (MemoryStream memoryStream = new MemoryStream())
  {
    Serializer.Serialize(memoryStream, obj);
    raw = memoryStream.ToArray();
  }

  return raw;            
}

public T Deserialize(byte[] serializedType)
{
  T obj;
  using (MemoryStream memoryStream = new MemoryStream(serializedType))
  {
    obj = Serializer.Deserialize<T>(memoryStream);
  }
  return obj;
}

回答1:


I'm going to correct myself on this, Marc Gravall pointed out the first iteration has an overhead of bulding the model so I've done some tests taking the average of 1000 iterations of serialization and deserialization for both XML and binary. I tried my tests with the v2 of the Compact Framework DLL first, and then with the v3.5 DLL. Here's what I got, time is in ms:

.NET 2.0
================================ XML ====== Binary ===
Serialization 1st Iteration      3236       5508
Deserialization 1st Iteration    1501       318
Serialization Average            9.826      5.525
Deserialization Average          5.525      0.771

.NET 3.5
================================ XML ====== Binary ===
Serialization 1st Iteration      3307       5598
Deserialization 1st Iteration    1386       200
Serialization Average            10.923     5.605
Deserialization Average          5.605      0.279



回答2:


The main expense in your method is the actual generation of the XmlSerializer class. Creating the serialiser is a time consuming process which you should only do once for each object type. Try caching the serialisers and see if that improves performance at all.

Following this advice I saw a large performance improvement in my app which allowed me to continute to use XML serialisation.

Hope this helps.




回答3:


Interesting... thoughts:

  • what version of CF is this; 2.0? 3.5? In particular, CF 3.5 has Delegate.CreateDelegate that allows protobuf-net to access properties much faster than in can in CF 2.0
  • are you annotating fields or properties? Again, in CF the reflection optimisations are limited; you can get beter performance in CF 3.5 with properties, as with a field the only option I have available is FieldInfo.SetValue

There are a number of other things that simply don't exist in CF, so it has to make compromises in a few places. For overly complex models there is also a known issue with the generics limitations of CF. A fix is underway, but it is a big change, and is taking "a while".

For info, some metrics on regular (full) .NET comparing various formats (including XmlSerializer and protobuf-net) are here.




回答4:


Have you tried creating custom serialization classes for your classes? Instead of using XmlSerializer which is a general purpose serializer (it creates a bunch of classes at runtime). There's a tool for doing this (sgen). You run it during your build process and it generates a custom assembly that can be used in pace of XmlSerializer.

If you have Visual Studio, the option is available under the Build tab of your project's properties.




回答5:


Is the performance hit in serializing the objects, or writing them to the database? Since writing them is likely hitting some kind of slow storage, I'd imagine it to be a much bigger perf hit than the serialization step.

Keep in mind that the perf measurements posted by Marc Gravell are testing the performance over 1,000,000 iterations.

What kind of database are you storing them in? Are the objects serialized in memory or straight to storage? How are they being sent to the db? How big are the objects? When one is updated, do you send all of the objects to the database, or just the one that has changed? Are you caching anything in memory at all, or re-reading from storage each time?




回答6:


XML is often slow to process and takes up a lot of space. There have been a number of different attempts to tackle this, and the most popular today seems to be to just drop the lot in a gzip file, like with the Open Packaging Convention.

The W3C has shown the gzip approach to be less than optimal, and they and various other groups have been working on a better binary serialisation suitable for fast processing and compression, for transmission.



来源:https://stackoverflow.com/questions/1092020/xml-vs-binary-performance-for-serialization-deserialization

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!