High performance serialization: Java vs Google Protocol Buffers vs …?

前端 未结 7 1240
借酒劲吻你
借酒劲吻你 2020-12-04 08:05

For some caching I\'m thinking of doing for an upcoming project, I\'ve been thinking about Java serialization. Namely, should it be used?

Now I\'ve previously writt

相关标签:
7条回答
  • 2020-12-04 08:16

    Here is the off the wall suggestion of the day :-) (you just tweaked something in my head that I now want to try)...

    If you can go for the whole caching solution via this it might work: Project Darkstar. It is designed as very high performance game server, specifically so that reads are fast (so good for a cache). It has Java and C APIs so I believe (thought it has been a long time since I looked at it, and I wasn't thinking of this then) that you could save objects with Java and read them back in C and vice versa.

    If nothing else it'll give you something to read up on today :-)

    0 讨论(0)
  • 2020-12-04 08:19

    For wire-friendly serialisation, consider using the Externalizable interface. Used cleverly, you'll have intimate knowlege to decide how to optimally marshall and unmarshall specific fields. That said, you'll need to manage the versioning of each object correctly - easy to un-marshall, but re-marshalling a V2 object when your code supports V1 will either break, lose information, or worse corrupt data in a way your apps aren't able to correctly process. If you're looking for an optimal path, beware no library will solve your problem without some compromises. Generally libraries will fit most use-cases and will come with the added benefit that they'll adapt and enhance over time without your input, if you've opted for an active open source project. And they might add performance problems, introduce bugs, and even fix bugs that haven't affected you yet!

    0 讨论(0)
  • 2020-12-04 08:26

    What do you means by high performance? If you want milli-second serialization, I suggest you use the serialization approach which is simplest. If you want sub milli-second you are likely to need a binary format. If you want much below 10 micro-seconds you are likely to need a custom serialization.

    I haven't seen many benchmarks for serialization/deserialization but few support less that 200 micro-seconds for serialization/deserialization.

    Platform independent formats come at a cost (in effort on your part and latency) you may have to decide whether you want performance or platform independence. However, there is no reason you cannot have both as a configuration option which you switch between as required.

    0 讨论(0)
  • 2020-12-04 08:28

    One more data point: this project:

    http://code.google.com/p/thrift-protobuf-compare/

    gives some idea of expected performance for small objects, including Java serialization on PB.

    Results vary a lot depending on your platform, but there are some general trends.

    0 讨论(0)
  • 2020-12-04 08:32

    If you are confusing between PB & native java serialization on speed and efficiency, just go for PB.

    • PB was designed to achieve such factors. See http://code.google.com/apis/protocolbuffers/docs/overview.html
    • PB data is very small while java serialization tends to replicate a whole object, including its signature. Why I always get my class name, field name... serialized, even though I know it inside out at receiver?
    • Think about across language development. It's getting hard if one side uses Java, one side uses C++...

    Some developers suggest Thrift, but I would use Google PB because "I believe in google" :-).. Anyway, it's worth for a look: http://stuartsierra.com/2008/07/10/thrift-vs-protocol-buffers

    0 讨论(0)
  • 2020-12-04 08:33

    You might also have a look at FST, a drop-in replacement for built-in JDK serialization that should be faster and have smaller output.

    raw estimations on the frequent benchmarking i have done in recent years:

    100% = binary/struct based approaches (e.g. SBE, fst-structs)

    • inconvenient
    • postprocessing (build up "real" obejcts at receiver side) may eat up performance advantages and is never included in benchmarks

    ~10%-35% protobuf & derivates

    ~10%-30% fast serializers such as FST and KRYO

    • convenient, deserialized objects can be used most often directly without additional manual translation code.
    • can be pimped for performance (annotations, class registering)
    • preserve links in object graph (no object serialized twice)
    • can handle cyclic structures
    • generic solution, FST is fully compatible to JDK serialization

    ~2%-15% JDK serialization

    ~1%-15% fast JSon (e.g. Jackson)

    • cannot handle any object graph but only a small subset of java data structures
    • no ref restoring

    0.001-1% full graph JSon/XML (e.g. JSON.io)

    These numbers are meant to give a very rough order-of-magnitude impression. Note that performance depends A LOT on the data structures being serialized/benchmarked. So single simple class benchmarks are mostly useless (but popular: e.g. ignoring unicode, no collections, ..).

    see also

    http://java-is-the-new-c.blogspot.de/2014/12/a-persistent-keyvalue-server-in-40.html

    http://java-is-the-new-c.blogspot.de/2013/10/still-using-externalizable-to-get.html

    0 讨论(0)
提交回复
热议问题