问题
I have a table of identities
(i.e. aliases) for an arbitrary number of people. Each row has a previous name and a new name. In production, there are about 1 M rows. For example:
id, old, new
---
1, 'Albert', 'Bob'
2, 'Bob', 'Charles'
3, 'Mary', 'Nancy'
4, 'Charles', 'Albert'
5, 'Lydia', 'Nancy'
6, 'Zoe', 'Zoe'
What I want is to generate the list of users
and reference all of their respective identities. This is analogous to finding all of the nodes in each graph of connected identities, or finding the spanning forest:
User 1: Albert, Bob, Charles (identities: 1,2,4)
User 2: Mary, Nancy, Lydia (identities: 3,5)
User 3: Zoe (identities: 6)
I've been tinkering with PostgreSQL's WITH RECURSIVE
, but it yields both sets and subsets for each. For example:
1,2,4 <-- spanning tree: good
2 <-- subset: discard
3,5 <-- spanning tree: good
4 <-- subset: discard
5 <-- subset: discard
6 <-- spanning tree: good
What do I need to do to only produce the full sets of identities (i.e. the spanning tree) for each user?
SQLFiddle: http://sqlfiddle.com/#!15/9eaed/4 with my latest attempt. Here's the code:
WITH RECURSIVE search_graph AS (
SELECT id
, id AS min_id
, ARRAY[id] AS path
, ARRAY[old,new] AS emails
FROM identities
UNION
SELECT identities.id
, LEAST(identities.id, sg.min_id)
, (sg.path || identities.id)
, (sg.emails || identities.old || identities.new)
FROM search_graph sg
JOIN identities ON (identities.old = ANY(sg.emails) OR identities.new = ANY(sg.emails))
WHERE identities.id <> ALL(sg.path)
)
SELECT array_agg(DISTINCT(p)) from search_graph, unnest(path) p GROUP BY min_id;
And the results:
1,2,4
2
3,5
4
5
6
回答1:
I wrote an answer to a similar question a while ago: How to find all connected subgraphs of an undirected graph. In that question I used SQL Server. See that answer for detailed explanation of intermediate CTEs. I adapted that query to Postgres.
It may be written more efficiently using Postgres array feature instead of concatenating the path into a text
column.
WITH RECURSIVE
CTE_Idents
AS
(
SELECT old AS Ident
FROM identities
UNION
SELECT new AS Ident
FROM identities
)
,CTE_Pairs
AS
(
SELECT old AS Ident1, new AS Ident2
FROM identities
WHERE old <> new
UNION
SELECT new AS Ident1, old AS Ident2
FROM identities
WHERE old <> new
)
,CTE_Recursive
AS
(
SELECT
CTE_Idents.Ident AS AnchorIdent
, Ident1
, Ident2
, ',' || Ident1 || ',' || Ident2 || ',' AS IdentPath
, 1 AS Lvl
FROM
CTE_Pairs
INNER JOIN CTE_Idents ON CTE_Idents.Ident = CTE_Pairs.Ident1
UNION ALL
SELECT
CTE_Recursive.AnchorIdent
, CTE_Pairs.Ident1
, CTE_Pairs.Ident2
, CTE_Recursive.IdentPath || CTE_Pairs.Ident2 || ',' AS IdentPath
, CTE_Recursive.Lvl + 1 AS Lvl
FROM
CTE_Pairs
INNER JOIN CTE_Recursive ON CTE_Recursive.Ident2 = CTE_Pairs.Ident1
WHERE
CTE_Recursive.IdentPath NOT LIKE ('%,' || CTE_Pairs.Ident2 || ',%')
)
,CTE_RecursionResult
AS
(
SELECT AnchorIdent, Ident1, Ident2
FROM CTE_Recursive
)
,CTE_CleanResult
AS
(
SELECT AnchorIdent, Ident1 AS Ident
FROM CTE_RecursionResult
UNION
SELECT AnchorIdent, Ident2 AS Ident
FROM CTE_RecursionResult
)
,CTE_Groups
AS
(
SELECT
CTE_Idents.Ident
,array_agg(COALESCE(CTE_CleanResult.Ident, CTE_Idents.Ident)
ORDER BY COALESCE(CTE_CleanResult.Ident, CTE_Idents.Ident)) AS AllIdents
FROM
CTE_Idents
LEFT JOIN CTE_CleanResult ON CTE_CleanResult.AnchorIdent = CTE_Idents.Ident
GROUP BY CTE_Idents.Ident
)
SELECT AllIdents
FROM CTE_Groups
GROUP BY AllIdents
;
I added one row (7,X,Y)
to your sample data.
SQL Fiddle
Result
| allidents |
|--------------------|
| Lydia,Mary,Nancy |
| Albert,Bob,Charles |
| X,Y |
| Zoe |
来源:https://stackoverflow.com/questions/36685015/finding-the-spanning-forest-with-recursive-postgresql-9-5