I must serialize a huge tree of objects (7,000) into disk. Originally we kept this tree in a database with Kodo, but it would make thousands upon thousands of Queries to load t
You can use Colfer to generate the beans and Java's standard serialization performance will get a 10 - 1000x boost. Unless the size reaches over a GB chances are you'll be well below a second.
One optimization is customizing the class descriptors, so that you store the class descriptors in a different database and in the object stream you only refer to them by ID. This reduces the space needed by the serialized data. See for example how in one project the classes SerialUtil and ClassesTable do it.
Making classes Externalizable instead of Serializable can give some performance benefits. The downside is that it requires lots of manual work.
Then there are other serialization libraries, for example jserial, which can give better performance than Java's default serialization. Also, if the object graph does not include cycles, then it can be serialized a little bit faster, because the serializer does not need to keep track of objects it has seen (see "How does it work?" in jserial's FAQ).
Don't forget to use the 'transient' key word for instance variables that don't have to be serialized. This gives you a performance boost because you are no longer reading/writing unnecessary data.
This is how I would do it, form the top of my head
Serialization
Unserialization
Edit, you might need to use two-pass serialization and unserialization if you have circular references in there, it complicates things a bit - but not that much.
For performance, I'd suggest not using java.io serialisation at all. Instead get down on to the bytes yourself.
If you are going to java.io serialise the tree you might need to make sure your recursion doesn't get too deep, either by flattening (as say TreeSet
does) or arranging to serialise the deepest nodes first (so you have back references rather than nested readObject
calls).
I would be surprised if there wasn't a way in Kodo to read the entire tree in in one (or a few) goes.
I would recomend you to implement custom writeObject() and readObject() methods. In this way you will able eleminate writting chidren nodes for each node in a tree. When you use default serialization, each node will be serialized with all it's children.
For example, writeObject() of a Tree class should iterate through the all nodes of a tree and only write nodes data (without Nodes itself) with some markers, which identifies tree level.
You can look at LinkedList, to see how this methods implemented there. It uses the same approach in order to prevent writting prev and next entries for each single entry.