Visiting a directed graph as if it were an undirected one, using a recursive query

后端 未结 1 723
谎友^
谎友^ 2020-12-14 22:42

I need your help about the visit of a directed graph stored in a database.

Consider the following directed graph

1->2 
2->1,3 
3->1
<         


        
相关标签:
1条回答
  • 2020-12-14 23:44

    Could work like this:

    WITH RECURSIVE graph AS (
        SELECT parent
              ,child
              ,',' || parent::text || ',' || child::text || ',' AS path
              ,0 AS depth
        FROM   ownership
        WHERE  parent = 1
    
        UNION ALL
        SELECT o.parent
              ,o.child
              ,g.path || o.child || ','
              ,g.depth + 1
        FROM   graph g
        JOIN   ownership o ON o.parent = g.child
        WHERE  g.path !~~ ('%,' || o.parent::text || ',' || o.child::text || ',%')
        )
    SELECT  *
    FROM    graph
    

    You mentioned performance, so I optimized in that direction.

    Major points:

    • Traverse the graph only in the defined direction.

    • No need for a column cycle, make it an exclusion condition instead. One less step to go. That is also the direct answer to:

    How can I do to stop cycles one step before the node that closes the cycle?

    • Use a string to record the path. Smaller and faster than an array of rows. Still contains all necessary information. Might change with very big bigint numbers, though.

    • Check for cycles with the LIKE operator (~~), should be much faster.

    • If you don't expect more that 2147483647 rows over the course of time, use plain integer columns instead of bigint. Smaller and faster.

    • Be sure to have an index on parent. Index on child is irrelevant for my query. (Other than in your original where you traverse edges in both directions.)

    • For huge graphs I would switch to a plpgsql procedure, where you can maintain the path as temp table with one row per step and a matching index. A bit of an overhead, that will pay off with huge graphs, though.


    Problems in your original query:

    WHERE (g.parent = o.child or g.child = o.parent) 
    

    There is only one endpoint of your traversal at any point in the process. As you wlak the directed graph in both directions, the endpoint can be either parent or child - but not both of them. You have to save the endpoint of every step, and then:

    WHERE g.child IN (o.parent, o.child) 
    

    The violation of the direction also makes your starting condition questionable:

    WHERE parent = 1
    

    Would have to be

    WHERE 1 IN (parent, child)
    

    And the two rows (1,2) and (2,1) are effectively duplicates this way ...


    Additional solution after comment

    • Ignore direction
    • Still walk any edge only once per path.
    • Use ARRAY for path
    • Save original direction in path, not actual direction.

    Note, that this way (2,1) and (1,2) are effective duplicates, but both can be used in the same path.

    I introduce the column leaf which saves the actual endpoint of every step.

    WITH RECURSIVE graph AS (
        SELECT CASE WHEN parent = 1 THEN child ELSE parent END AS leaf
              ,ARRAY[ROW(parent, child)] AS path
              ,0 AS depth
        FROM   ownership
        WHERE  1 in (child, parent)
    
        UNION ALL
        SELECT CASE WHEN o.parent = g.leaf THEN o.child ELSE o.parent END -- AS leaf
              ,path || ROW(o.parent, o.child) -- AS path
              ,depth + 1 -- AS depth
        FROM   graph g
        JOIN   ownership o ON g.leaf in (o.parent, o.child) 
        AND    ROW(o.parent, o.child) <> ALL(path)
        )
    SELECT *
    FROM   graph
    
    0 讨论(0)
提交回复
热议问题