I\'m writing a data tree structure that is combined from a Tree and a TreeNode. Tree will contain the root and the top level actions on the data. I\'m using a UI library to
I'm suprised that nobody mentioned the materialized path solution, which is probably the fastest way of working with trees in standard SQL.
In this approach, every node in the tree has a column path, where the full path from the root to the node is stored. This involves very simple and fast queries.
Have a look at the example table node:
+---------+-------+
| node_id | path |
+---------+-------+
| 0 | |
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 1.4 |
| 5 | 2.5 |
| 6 | 2.6 |
| 7 | 2.6.7 |
| 8 | 2.6.8 |
| 9 | 2.6.9 |
+---------+-------+
In order to get the children of node x, you can write the following query:
SELECT * FROM node WHERE path LIKE CONCAT((SELECT path FROM node WHERE node_id = x), '.%')
Keep in mind, that the column path should be indexed, in order to perform fast with the LIKE clause.
The best way, I think indeed is to give each node an id and a parent_id, where the parent id is the id of the parent node. This has a couple of benefits
The easiest implementation is adjacency list structure:
id parent_id data
However, some databases, particularly MySQL
, have some issues in handling this model, because it requires an ability to run recursive queries which MySQL
lacks.
Another model is nested sets:
id lft rgt data
where lft
and rgt
are arbitrary values that define the hierarchy (any child's lft
, rgt
should be within any parent's lft
, rgt
)
This does not require recursive queries, but it slower and harder to maintain.
However, in MySQL
this can be improved using SPATIAL
abitilies.
See these articles in my blog:
for more detailed explanations.
I've bookmarked this slidshare about SQL-Antipatterns, which discusses several alternatives: http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back?src=embed
The recommendation from there is to use a Closure Table (it's explained in the slides).
Here is the summary (slide 77):
| Query Child | Query Subtree | Modify Tree | Ref. Integrity
Adjacency List | Easy | Hard | Easy | Yes
Path Enumeration | Easy | Easy | Hard | No
Nested Sets | Hard | Easy | Hard | No
Closure Table | Easy | Easy | Easy | Yes
Something like table "nodes" where each node row contains parent id (in addition to the ordinary node data). For root, the parent is NULL.
Of course, this makes finding children a bit more time consuming, but this way the actual database will be quite simple.
As this is the top answer when asking "sql trees" in a google search, I will try to update this from the perspective of today (december 2018).
Most answers imply that using an adjacency list is both simple and slow and therefore recommend other methods.
Since version 8 (published april 2018) MySQL supports recursive common table expressions (CTE). MySQL is a bit late to the show but this opens up a new option.
There is a tutorial here that explains the use of recursive queries to manage an adjacency list.
As the recursion now runs completely within the database engine, it is way much faster than in the past (when it had to run in the script engine).
The blog here gives some measurements (which are both biased and for postgres instead of MySQL) but nevertheless it shows that adjacency lists do not have to be slow.
So my conclusion today is: