How to represent a data tree in SQL?

后端 未结 8 811
星月不相逢
星月不相逢 2020-12-02 06:20

I\'m writing a data tree structure that is combined from a Tree and a TreeNode. Tree will contain the root and the top level actions on the data. I\'m using a UI library to

相关标签:
8条回答
  • 2020-12-02 06:34

    I'm suprised that nobody mentioned the materialized path solution, which is probably the fastest way of working with trees in standard SQL.

    In this approach, every node in the tree has a column path, where the full path from the root to the node is stored. This involves very simple and fast queries.

    Have a look at the example table node:

    +---------+-------+
    | node_id | path  |
    +---------+-------+
    | 0       |       |
    | 1       | 1     |
    | 2       | 2     |
    | 3       | 3     |
    | 4       | 1.4   |
    | 5       | 2.5   |
    | 6       | 2.6   |
    | 7       | 2.6.7 |
    | 8       | 2.6.8 |
    | 9       | 2.6.9 |
    +---------+-------+
    

    In order to get the children of node x, you can write the following query:

    SELECT * FROM node WHERE path LIKE CONCAT((SELECT path FROM node WHERE node_id = x), '.%')
    

    Keep in mind, that the column path should be indexed, in order to perform fast with the LIKE clause.

    0 讨论(0)
  • 2020-12-02 06:41

    The best way, I think indeed is to give each node an id and a parent_id, where the parent id is the id of the parent node. This has a couple of benefits

    1. When you want to update a node, you only have to rewrite the data of that node.
    2. When you want to query only a certain node, you can get exactly the information you want, thus having less overhead on the database connection
    3. A lot of programming languages have functionality to transform mysql data into XML or json, which will make it easier to open up your application using an api.
    0 讨论(0)
  • 2020-12-02 06:43

    The easiest implementation is adjacency list structure:

    id  parent_id  data
    

    However, some databases, particularly MySQL, have some issues in handling this model, because it requires an ability to run recursive queries which MySQL lacks.

    Another model is nested sets:

    id lft rgt data
    

    where lft and rgt are arbitrary values that define the hierarchy (any child's lft, rgt should be within any parent's lft, rgt)

    This does not require recursive queries, but it slower and harder to maintain.

    However, in MySQL this can be improved using SPATIAL abitilies.

    See these articles in my blog:

    • Adjacency list vs. nested sets: PostgreSQL
    • Adjacency list vs. nested sets: SQL Server
    • Adjacency list vs. nested sets: Oracle
    • Adjacency list vs. nested sets: MySQL

    for more detailed explanations.

    0 讨论(0)
  • 2020-12-02 06:52

    I've bookmarked this slidshare about SQL-Antipatterns, which discusses several alternatives: http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back?src=embed

    The recommendation from there is to use a Closure Table (it's explained in the slides).

    Here is the summary (slide 77):

                      | Query Child | Query Subtree | Modify Tree | Ref. Integrity
    Adjacency List    |    Easy     |     Hard      |    Easy     |      Yes
    Path Enumeration  |    Easy     |     Easy      |    Hard     |      No
    Nested Sets       |    Hard     |     Easy      |    Hard     |      No
    Closure Table     |    Easy     |     Easy      |    Easy     |      Yes
    
    0 讨论(0)
  • 2020-12-02 06:52

    Something like table "nodes" where each node row contains parent id (in addition to the ordinary node data). For root, the parent is NULL.

    Of course, this makes finding children a bit more time consuming, but this way the actual database will be quite simple.

    0 讨论(0)
  • 2020-12-02 06:53

    As this is the top answer when asking "sql trees" in a google search, I will try to update this from the perspective of today (december 2018).

    Most answers imply that using an adjacency list is both simple and slow and therefore recommend other methods.

    Since version 8 (published april 2018) MySQL supports recursive common table expressions (CTE). MySQL is a bit late to the show but this opens up a new option.

    There is a tutorial here that explains the use of recursive queries to manage an adjacency list.

    As the recursion now runs completely within the database engine, it is way much faster than in the past (when it had to run in the script engine).

    The blog here gives some measurements (which are both biased and for postgres instead of MySQL) but nevertheless it shows that adjacency lists do not have to be slow.

    So my conclusion today is:

    • The simple adjacency list may be fast enough if the database engine supports recursion.
    • Do a benchmark with your own data and your own engine.
    • Do not trust outdated recommendations to point out the "best" method.
    0 讨论(0)
提交回复
热议问题