问题
I'm using PostgreSQL's Ltree module for storing hierarchical data. I'm looking to retrieve the full hierarchy sorted by a particular column.
Consider the following table:
votes | path | ...
-------+-------+-----
1 | 1 | ...
2 | 1.1 | ...
4 | 1.2 | ...
1 | 1.2.1 | ...
3 | 2 | ...
1 | 2.1 | ...
2 | 2.1.1 | ...
4 | 2.1.2 | ...
... | ... | ...
In my current implementation, I'd query the database with SELECT * FROM comments ORDER BY path
, which would return the whole tree:
Node 1
-- Node 1.1
-- Node 1.2
---- Node 1.2.1
Node 2
-- Node 2.1
---- Node 2.1.1
---- Node 2.1.2
However, I want to sort by votes
(not by id
, which is what sorting by path
amounts to). Each depth level needs to be independently sorted, with the correct tree structure kept intact. Something that would return the following:
Node 2
-- Node 2.1
---- Node 2.1.2
---- Node 2.1.1
Node 1
-- Node 1.2
---- Node 1.2.1
-- Node 1.1
Postgres' WITH RECURSIVE
might be appropriate, but I'm not sure. Any ideas?
回答1:
You were on the right track with WITH RECURSIVE
.
Solution with recursive CTE
WITH RECURSIVE t AS (
SELECT t.votes
, t.path
, 1::int AS lvl
, to_char(t2.votes, 'FM0000000') AS sort
FROM tbl t
JOIN tbl t2 ON t2.path = subltree(t.path, 0, 1)
UNION ALL
SELECT t.votes
, t.path
, t.lvl + 1
, t.sort || to_char(t2.votes, 'FM0000000')
FROM t
JOIN tbl t2 ON t2.path = subltree(t.path, 0, t.lvl + 1)
WHERE nlevel(t.path) > t.lvl
)
SELECT votes, path, max(sort) AS sort
FROM t
GROUP BY 1, 2
ORDER BY max(sort), path;
Major points
The crucial part is to replace every level of the path with the value of
votes
. Thereby we assemble one column we canORDER BY
at the end. This is necessary, because the path has an unknown depth and we cannot order by an unknown number of expressions in static SQL.In order to get a stable sort, I convert
votes
to a string with leading zeroes using to_char(). I use seven digits in the demo, which works for vote values below 10.000.000. Adjust according to your maximum vote count.In the final
SELECT
I exclude all intermediary states to eliminate duplicates. Only the last step withmax(sort)
remains.This works in standard SQL with a recursive CTE, but is not very efficient for large trees. A plpgsql function that recursively updates the sort path in a temporary table without creating temporary dupes might perform better.
Only works with the ltree module installed. The functions subltree(...) and nlevel(.) as well as the ltree date type are not part of standard PostgreSQL.
My test setup, for your review convenience:
CREATE TEMP TABLE tbl(votes int, path ltree);
INSERT INTO tbl VALUES
(1, '1')
, (2, '1.1')
, (4, '1.2')
, (1, '1.2.1')
, (3, '2')
, (1, '2.1')
, (2, '2.1.1')
, (4, '2.1.2')
, (1, '2.1.3')
, (2, '3')
, (17, '3.3')
, (99, '3.2')
, (10, '3.1.1')
, (2345, '3.1.2')
, (1, '3.1.3');
PL/pgSQL table function doing the same
Should be faster with huge trees.
CREATE OR REPLACE FUNCTION f_sorted_ltree()
RETURNS TABLE(votes int, path ltree) AS
$func$
DECLARE
lvl integer := 0;
BEGIN
CREATE TEMP TABLE t ON COMMIT DROP AS
SELECT tbl.votes
, tbl.path
, ''::text AS sort
, nlevel(tbl.path) AS depth
FROM tbl;
-- CREATE INDEX t_path_idx ON t (path); -- beneficial for huge trees
-- CREATE INDEX t_path_idx ON t (depth);
LOOP
lvl := lvl + 1;
UPDATE t SET sort = t.sort || to_char(v.votes, 'FM0000000')
FROM (
SELECT t2.votes, t2.path
FROM t t2
WHERE t2.depth = lvl
) v
WHERE v.path = subltree(t.path, 0 ,lvl);
EXIT WHEN NOT FOUND;
END LOOP;
-- Return sorted rows
RETURN QUERY
SELECT t.votes, t.path
FROM t
ORDER BY t.sort;
END
$func$ LANGUAGE plpgsql VOLATILE;
Call:
SELECT * FROM f_sorted_ltree();
Read in the manual about setting temp_buffers.
I would be interested which performs faster with your real life data.
回答2:
create table comments (
id serial,
parent_id int,
msg text,
primary key (id)
);
insert into comments (id, parent_id, msg) values (1, null, 'msg 1');
insert into comments (id, parent_id, msg) values (2, null, 'msg 2');
insert into comments (id, parent_id, msg) values (3, 1, 'msg 1 / ans 1');
insert into comments (id, parent_id, msg) values (4, null, 'msg 3');
insert into comments (id, parent_id, msg) values (5, 2, 'msg 2 / ans 1');
insert into comments (id, parent_id, msg) values (6, 2, 'msg 2 / ans 2');
insert into comments (id, parent_id, msg) values (7, 2, 'msg 2 / ans 3');
desc
WITH RECURSIVE q AS
(
SELECT id, msg, 1 as level, ARRAY[id] as path
FROM comments c
WHERE parent_id is null
UNION ALL
SELECT sub.id, sub.msg, level + 1, path || sub.id
FROM q
JOIN comments sub
ON sub.parent_id = q.id
)
SELECT id, msg, level
FROM q
order by path || array_fill(100500, ARRAY[8 - level]) desc;
results in
4,"msg 3",1
2,"msg 2",1
7,"msg 2 / ans 3",2
6,"msg 2 / ans 2",2
5,"msg 2 / ans 1",2
1,"msg 1",1
3,"msg 1 / ans 1",2
asc
WITH RECURSIVE q AS
(
SELECT id, msg, 1 as level, ARRAY[id] as path
FROM comments c
WHERE parent_id is null
UNION ALL
SELECT sub.id, sub.msg, level + 1, path || sub.id
FROM q
JOIN comments sub
ON sub.parent_id = q.id
)
SELECT id, msg, level
FROM q
--order by path || array_fill(100500, ARRAY[8 - level]) desc;
order by path;
results in
1,"msg 1",1
3,"msg 1 / ans 1",2
2,"msg 2",1
5,"msg 2 / ans 1",2
6,"msg 2 / ans 2",2
7,"msg 2 / ans 3",2
4,"msg 3",1
来源:https://stackoverflow.com/questions/8790023/retrieving-full-hierarchy-sorted-by-a-column-under-postgresqls-ltree-module