In short, I\'d like to learn/develop an elegant method to save a binary tree to disk (a general tree, not necessarily a BST). Here is the description of my problem:
I\'
I would do a Level-order traversal. That is to say you are basically doing a Breadth-first search algorithm.
You have:
Level-order traversal sequence: F, B, G, A, D, I, C, E, H
What you will store on disk: F, B, G, A, D, NullNode, I, NullNode, NullNode, C, E, H, NullNode
Loading it back from disk is even easier. Simply read from left to right the nodes you stored to disk. This will give you each level's left and right nodes. I.e. the tree will fill in from top to bottom left to right.
Step 1 reading in:
F
Step 2 reading in:
F
B
Step 3 reading in:
F
B G
Step 4 reading in:
F
B G
A
And so on ...
Note: Once you have a NULL node representation, you no longer need to list its children to disk. When loading back you will know to skip to the next node. So for very deep trees, this solution will still be efficient.
The most arbitrary simple way is just a basic format that can be used to represent any graph.
<parent>,<relation>,<child>
Ie:
"Is it Red", "yes", "does it have wings"
"Is it Red", "no" , "does it swim"
There isn't much redundancy here, and the formats mostly human readable, the only data duplication is that there must be a copy of a parent for every direct child it has.
The only thing you really have to watch is that you don't accidentally generate a cycle ;)
Unless that's what you want.
The problem here is rebuilding the tree afterwards. If I create the "does it have wings" object upon reading the first line, I have to somehow locate it when I later encounter the line reading "does it have wings","yes","Has it got a beak?"
This is why I traditionally just use graph structures in memory for such a thing with pointers going everywhere.
[0x1111111 "Is It Red" => [ 'yes' => 0xF752347 , 'no' => 0xFF6F664 ],
0xF752347 "does it have wings" => [ 'yes' => 0xFFFFFFF , 'no' => 0x2222222 ],
0xFF6F664 "does it swim" => [ 'yes' => "I Dont KNOW :( " , ... etc etc ]
Then the "child/parent" connectivity is merely metadata.