How would you get tree-structured data from a database with the best performance? For example, say you have a folder-hierarchy in a database. Where the folder-database-row h
This article is interesting as it shows some retrieval methods as well as a way to store the lineage as a derived column. The lineage provides a shortcut method to retrieve the hierarchy without too many joins.
It really depends on how you are going to access the tree.
One clever technique is to give every node a string id, where the parent's id is a predictable substring of the child. For example, the parent could be '01', and the children would be '0100', '0101', '0102', etc. This way you can select an entire subtree from the database at once with:
SELECT * FROM treedata WHERE id LIKE '0101%';
Because the criterion is an initial substring, an index on the ID column would speed the query.
Not going to work for all situations, but for example given a comment structure:
ID | ParentCommentID
You could also store TopCommentID
which represents the top most comment:
ID | ParentCommentID | TopCommentID
Where the TopCommentID
and ParentCommentID
are null
or 0
when it's the topmost comment. For child comments, ParentCommentID
points to the comment above it, and TopCommentID
points to the topmost parent.
Google for "Materialized Path" or "Genetic Trees"...
If you have many trees in the database, and you will only ever get the whole tree out, I would store a tree ID (or root node ID) and a parent node ID for each node in the database, get all the nodes for a particular tree ID, and process in memory.
However if you will be getting subtrees out, you can only get a subtree of a particular parent node ID, so you either need to store all parent nodes of each node to use the above method, or perform multiple SQL queries as you descend into the tree (hope there are no cycles in your tree!), although you can reuse the same Prepared Statement (assuming that nodes are of the same type and are all stored in a single table) to prevent re-compiling the SQL, so it might not be slower, indeed with database optimisations applied to the query it could be preferable. Might want to run some tests to find out.
If you are only storing one tree, your question becomes one of querying subtrees only, and the second answer applied.
Celko wrote about this (2000):
http://www.dbmsmag.com/9603d06.html
http://www.intelligententerprise.com/001020/celko1_1.jhtml;jsessionid=3DFR02341QLDEQSNDLRSKHSCJUNN2JVN?_requestid=32818
and other people asked:
Joining other tables in oracle tree queries
How to calculate the sum of values in a tree using SQL
How to store directory / hierarchy / tree structure in the database?
Performance of recursive stored procedures in MYSQL to get hierarchical data
What is the most efficient/elegant way to parse a flat table into a tree?
finally, you could look at the rails "acts_as_tree" (read-heavy) and "acts_as_nested_set" (write-heavy) plugins. I don't ahve a good link comparing them.