Store multidimensional array in database: relational or multidimensional?

后端 未结 6 1759
栀梦
栀梦 2021-02-14 02:14

I have read numerous posts along the lines of multidimensional to single dimension, multidimensional database, and so on, but none of the answers helped. I did

相关标签:
6条回答
  • 2021-02-14 02:21

    I have two words for you... "RANGE KEYS"

    You may find this technique to be incredibly powerful and flexible. You'll be able to navigate your hierarchies with ease, and support variable depth aggregation without the need for recursion.

    In the demonstration below, we'll build the hierarchy via a recursive CTE. For larger hierarchies 150K+, I'm willing to share a much faster build in needed.

    Since your hierarchies are slow moving (like mine), I tend to store them in a normalized structure and rebuild as necessary.

    How about some actual code?

    Declare @YourTable table (ID varchar(25),Pt varchar(25))
    Insert into @YourTable values 
    ('A'   ,NULL),
    ('AA'  ,'A'),
    ('AAA' ,'AA'),
    ('AAC' ,'AA'),
    ('AB'  ,'A'),
    ('AE'  ,'A'),
    ('AEA' ,'AE'),
    ('AEE' ,'AE'),
    ('AEEB','AEE')
    
    
    Declare @Top  varchar(25) = null     --<<  Sets top of Hier Try 'AEE'
    Declare @Nest varchar(25) ='|-----'  --<<  Optional: Added for readability
    
    IF OBJECT_ID('TestHier') IS NOT NULL 
    Begin
        Drop Table TestHier
    End
    
    ;with cteHB as (
          Select Seq  = cast(1000+Row_Number() over (Order by ID) as varchar(500))
                ,ID
                ,Pt
                ,Lvl=1
                ,Title = ID
          From   @YourTable 
          Where  IsNull(@Top,'TOP') = case when @Top is null then isnull(Pt,'TOP') else ID end
          Union  All
          Select cast(concat(cteHB.Seq,'.',1000+Row_Number() over (Order by cteCD.ID)) as varchar(500))
                ,cteCD.ID
                ,cteCD.Pt
                ,cteHB.Lvl+1
                ,cteCD.ID
          From   @YourTable cteCD 
          Join   cteHB on cteCD.Pt = cteHB.ID)
         ,cteR1 as (Select Seq,ID,R1=Row_Number() over (Order By Seq) From cteHB)
         ,cteR2 as (Select A.Seq,A.ID,R2=Max(B.R1) From cteR1 A Join cteR1 B on (B.Seq like A.Seq+'%') Group By A.Seq,A.ID )
    Select B.R1  
          ,C.R2
          ,A.ID
          ,A.Pt
          ,A.Lvl
          ,Title = Replicate(@Nest,A.Lvl-1) + A.Title
     Into dbo.TestHier
     From cteHB A
     Join cteR1 B on A.ID=B.ID
     Join cteR2 C on A.ID=C.ID
     Order By B.R1
    

    Show The Entire Hier I added the Title and Nesting for readability

    Select * from TestHier Order By R1
    

    Just to state the obvious, the Range Keys are R1 and R2. You may also notice that R1 maintains the presentation sequence. Leaf nodes are where R1=R2 and Parents or rollups define the span of ownership.


    To Show All Descendants

    Declare @GetChildrenOf varchar(25) = 'AE'
    Select A.*
      From TestHier A
      Join TestHier B on B.ID=@GetChildrenOf and A.R1 Between B.R1 and B.R2
      Order By R1
    


    To Show Path

    Declare @GetParentsOf varchar(25) = 'AEEB'
    Select A.*
      From TestHier A
      Join TestHier B on B.ID=@GetParentsOf and B.R1 Between A.R1 and A.R2
      Order By R1
    

    Clearly these are rather simple illustrations. Over time, I have created a series of helper functions, both Scalar and Table Value Functions. I should also state that you should NEVER hard code range key in your work because they will change.

    In Summary

    If you have a point (or even a series of points), you'll have its range and therefore you'll immediately know where it resides and what rolls into it.

    0 讨论(0)
  • 2021-02-14 02:25

    You absolutely can do that (if I've read your question correctly).

    Depending on your RDBMS you might have to choose a different way.

    Your basic structure of having a parent is correct.

    SQL Server use recursive common table expression (CTE) to anchor the start and work down

    https://technet.microsoft.com/en-us/library/ms186243(v=sql.105).aspx

    Edit: For Linux use the same in PostgreSQL https://www.postgresql.org/docs/current/static/queries-with.html

    Oracle has a different approach, though I think you might be able to use the CTE as well.

    https://oracle-base.com/articles/misc/hierarchical-queries

    For 100k rows I don't imagine performance will be an issue, though I'd still index PK & FK because that's the right thing to do. If you're really concerned about speed then reading it into memory and building a hash table of linked lists might work.

    Pros & cons - it pretty much comes down to readability and suitability for your RDBMS.

    It's an already solved problem (again, assuming I've not missed anything) so you'll be fine.

    0 讨论(0)
  • 2021-02-14 02:25

    For your scenario, I would suggest you to use Nested Sets Approach in PostgreSQL. It is XML tags based querying using Relational database.

    Performance

    If you index on lft and rgt columns, then you don't require recursive queries to get the data. Even though, the data seems huge, the retrieval will be very fast.

    Sample

    /*1A:
    2  AA:
    3    AAA
    4    AAC
    5  AB
    6  AE:
    7   AEA
    8   AEE:
    9     AEEB
    10B:
    */
    
    CREATE TABLE tree(id int, CELL varchar(4), lft int, rgt int);
        
    INSERT INTO tree
        ("id", CELL, "lft", "rgt")
    VALUES
        (1, 'A', 1, 9),
        (2, 'AA', 2, 4),
        (3, 'AAA', 3, 3),
        (4, 'AAC', 4, 4),
        (5, 'AB', 5, 5),
        (6, 'AE', 6, 9),
        (7, 'AEA', 7, 7),
        (8, 'AEE', 8, 8),
        (9, 'AEEB', 9, 9)
    ;
    
    
    SELECT  hc.*
    FROM    tree hp
    JOIN    tree hc
    ON      hc.lft BETWEEN hp.lft AND hp.rgt
    WHERE   hp.id = 2
    

    Demo

    Querying using Nested Sets approach

    0 讨论(0)
  • 2021-02-14 02:38

    The goal is to retrieve an element with PHP by name and all its descendants.

    If that is all you need, you can use a LIKE search

    SELECT *
    FROM Table1
    WHERE CELL LIKE 'AEE%';
    

    With an index beginning with CELL this is a range check, which is fast.

    If your data doesn't look like that, you can create a path column which looks like a directory path and contains all nodes "on the way/path" from root to the element.

    | id | CELL | parent_id | path     |
    |====|======|===========|==========|
    |  1 | A    |      NULL | 1/       |
    |  2 | AA   |         1 | 1/2/     |
    |  3 | AAA  |         2 | 1/2/3/   |
    |  4 | AAC  |         2 | 1/2/4/   |
    |  5 | AB   |         1 | 1/5/     |
    |  6 | AE   |         1 | 1/6/     | 
    |  7 | AEA  |         6 | 1/6/7/   |
    |  8 | AEE  |         6 | 1/6/8/   |
    |  9 | AEEB |         8 | 1/6/8/9/ |
    

    To retrieve all descendants of 'AE' (including itself) your query would be

    SELECT *
    FROM tree t
    WHERE path LIKE '1/6/%';
    

    or (MySQL specific concatenation)

    SELECT t.*
    FROM tree t
    CROSS JOIN tree r -- root
    WHERE r.CELL = 'AE'
      AND t.path LIKE CONCAT(r.path, '%');
    

    Result:

    | id | CELL | parent_id |     path |
    |====|======|===========|==========|
    |  6 | AE   |         1 | 1/6/     |
    |  7 | AEA  |         6 | 1/6/7/   |
    |  8 | AEE  |         6 | 1/6/8/   |
    |  9 | AEEB |         8 | 1/6/8/9/ |
    

    Demo

    Performance

    I have created 100K rows of fake data on MariaDB with the sequence plugin using the following script:

    drop table if exists tree;
    CREATE TABLE tree (
      `id` int primary key,
      `CELL` varchar(50),
      `parent_id` int,
      `path` varchar(255),
      unique index (`CELL`),
      unique index (`path`)
    );
    
    DROP TRIGGER IF EXISTS `tree_after_insert`;
    DELIMITER //
    CREATE TRIGGER `tree_after_insert` BEFORE INSERT ON `tree` FOR EACH ROW BEGIN
        if new.id = 1 then
            set new.path := '1/';
        else    
            set new.path := concat((
                select path from tree where id = new.parent_id
            ), new.id, '/');
        end if;
    END//
    DELIMITER ;
    
    insert into tree
        select seq as id
            , conv(seq, 10, 36) as CELL
            , case 
                when seq = 1 then null
                else floor(rand(1) * (seq-1)) + 1 
            end as parent_id
            , null as path
        from seq_1_to_100000
    ;
    DROP TRIGGER IF EXISTS `tree_after_insert`;
    -- runtime ~ 4 sec.
    

    Tests

    Count all elements under the root:

    SELECT count(*)
    FROM tree t
    CROSS JOIN tree r -- root
    WHERE r.CELL = '1'
      AND t.path LIKE CONCAT(r.path, '%');
    -- result: 100000
    -- runtime: ~ 30 ms
    

    Get subtree elements under a specific node:

    SELECT t.*
    FROM tree t
    CROSS JOIN tree r -- root
    WHERE r.CELL = '3B0'
      AND t.path LIKE CONCAT(r.path, '%');
    -- runtime: ~ 30 ms
    

    Result:

    | id    | CELL | parent_id | path                                |
    |=======|======|===========|=====================================|
    |  4284 | 3B0  |       614 | 1/4/11/14/614/4284/                 |
    |  6560 | 528  |      4284 | 1/4/11/14/614/4284/6560/            |
    |  8054 | 67Q  |      6560 | 1/4/11/14/614/4284/6560/8054/       |
    | 14358 | B2U  |      6560 | 1/4/11/14/614/4284/6560/14358/      |
    | 51911 | 141Z |      4284 | 1/4/11/14/614/4284/51911/           |
    | 55695 | 16Z3 |      4284 | 1/4/11/14/614/4284/55695/           |
    | 80172 | 1PV0 |      8054 | 1/4/11/14/614/4284/6560/8054/80172/ |
    | 87101 | 1V7H |     51911 | 1/4/11/14/614/4284/51911/87101/     |
    

    PostgreSQL

    This also works for PostgreSQL. Only the string concatenation syntax has to be changed:

    SELECT t.*
    FROM tree t
    CROSS JOIN tree r -- root
    WHERE r.CELL = 'AE'
      AND t.path LIKE r.path || '%';
    

    Demo: sqlfiddle - rextester

    How does the search work

    If you look at the test example, you'll see that all paths in the result begin with '1/4/11/14/614/4284/'. That is the path of the subtree root with CELL='3B0'. If the path column is indexed, the engine will find them all efficiently, because the index is sorted by path. It's like you would want to find all the words that begin with 'pol' in a dictionary with 100K words. You wouldn't need to read the entire dictionary.

    0 讨论(0)
  • 2021-02-14 02:38

    This approach does not depend on the existence of a path or parent column. It is relational not recursive.

    Since the table is static create a materialized view containing just the leaves to make searching faster:

    create materialized view leave as
    select cell
    from (
        select cell,
            lag(cell,1,cell) over (order by cell desc) not like cell || '%' as leave
        from t
    ) s
    where leave;
    
    table leave;
     cell 
    ------
     CCCE
     CCCA
     CCBE
     CCBC
     BEDA
     BDDA
     BDCE
     BDCB
     BAA
     AEEB
     AEA
     AB
     AAC
     AAA
    

    A materialized view is computed once at creation not at each query like a plain view. Create an index to speed it up:

    create index cell_index on leave(cell);
    

    If eventually the source table is altered just refresh the view:

    refresh materialized view leave;
    

    The search function receives text and returns a text array:

    create or replace function get_descendants(c text)
    returns text[] as $$
        select array_agg(distinct l order by l)
        from (
            select left(cell, generate_series(length(c), length(cell))) as l
            from leave
            where cell like c || '%'
        ) s;
    $$ language sql immutable strict;
    

    Pass the desired match to the function:

    select get_descendants('A');
              get_descendants          
    -----------------------------------
     {A,AA,AAA,AAC,AB,AE,AEA,AEE,AEEB}
    
    select get_descendants('AEE');
     get_descendants 
    -----------------
     {AEE,AEEB}
    

    Test data:

    create table t (cell text);
    insert into t (cell) values
    ('A'),
    ('AA'),
    ('AAA'),
    ('AAC'),
    ('AB'),
    ('AE'),
    ('AEA'),
    ('AEE'),
    ('AEEB'),
    ('B'),
    ('BA'),
    ('BAA'),
    ('BD'),
    ('BDC'),
    ('BDCB'),
    ('BDCE'),
    ('BDD'),
    ('BDDA'),
    ('BE'),
    ('BED'),
    ('BEDA'),
    ('C'),
    ('CC'),
    ('CCB'),
    ('CCBC'),
    ('CCBE'),
    ('CCC'),
    ('CCCA'),
    ('CCCE'),
    ('CE');
    
    0 讨论(0)
  • 2021-02-14 02:40

    Performance

    As others have already mentioned, performance shouldn't be an issue as long as you use a suitable indexed primary key and ensure that relations use foreign keys. In general, an RDBMS is highly optimised to efficiently perform joins on indexed columns and referential integrity can also provide the advantage of preventing orphans. 100,000 may sound a lot of rows but this isn't going to stretch an RDBMS as long as the table structure and queries are well designed.

    Choice of RDBMS

    One factor in answering this question lies in choosing a database with the ability to perform a recursive query via a Common Table Expression (CTE), which can be very useful to keep the queries compact or essential if there are queries that do not limit the number of descendants being traversed.

    Since you've indicated that you are free to choose the RDBMS but it must run under Linux, I'm going to throw PostgreSQL out there as a suggestion since it has this feature and is freely available. (This choice is of course very subjective and there are advantages and disadvantages of each but a few other contenders I'd be tempted to rule out are MySQL since it doesn't currently support CTEs, MariaDB since it doesn't currently support *recursive* CTEs, SQL Server since it doesn't currently support Linux. Other possibilities such as Oracle may be dependent on budget / existing resources.)

    SQL

    Here's an example of the SQL you'd write to perform your first example of finding all the descendants of 'A':

    WITH RECURSIVE rcte AS (
       SELECT id, letters
       FROM cell 
       WHERE letters = 'A'
       UNION ALL
       SELECT c.id, c.letters
       FROM cell c
       INNER JOIN rcte r
       ON c.parent_cell_id = r.id
    )
    SELECT letters
    FROM rcte
    ORDER BY letters;
    

    Explanation

    The above SQL sets up a "Common Table Expression", i.e. a SELECT to run whenever its alias (in this case rcte) is referenced. The recursion happens because this is referenced within itself. The first part of the UNION picks the cell at the top of the hierarchy. Its descendants are all found by carrying on joining on children in the second part of the UNION until no further records are found.

    Demo

    The above query can be seen in action on the sample data here: http://rextester.com/HVY63888

    0 讨论(0)
提交回复
热议问题