Removing duplicates from multiple self left joins

主宰稳场 提交于 2019-12-13 12:14:28

问题


I am dynamically generating a query like below that creates different combinations of rules by left joining (any number of times) on itself and avoiding rules with some of the same attributes as part of the joins conditions e.g.

SELECT count(*) 
FROM rules AS t1 
LEFT JOIN rules AS t2
 ON t1.id != t2.id
 AND ...
LEFT JOIN rules AS t3
 ON t1.id != t2.id AND t1.id != t3.id AND t2.id != t3.id
 AND ...

I am currently removing duplicates by creating an array of ids from the joined rows then sorting and grouping by them:

SELECT sort(array[t1.id, t2.id, t3.id]) AS ids
...
GROUP BY ids

I would like to know if there is a better way of removing duplicate rows e.g.

t1.ID | t2.ID | t3.ID
---------------------
  A   |   B   |   C
  C   |   B   |   A

Should be

t1.ID | t2.ID | t3.ID
---------------------
  A   |   B   |   C

Or

t1.ID | t2.ID | t3.ID
---------------------
  C   |   B   |   A

But not both.

EDIT: I would like to go from a permutation of rows to a combination rows.


回答1:


I'd suggest rather than joining on !=, try joining on <=.

You will then have all combinations with t1.id > t2.id, t2.id > t3.id, and so on.

Rows will not be 'duplicates' because they are ordered sets, and any set containing equivalent members would necessarily result in the identical ordered set.




回答2:


I think you mean you want to go from a permutation of rows to a combination rows?

If so, the select distinct answers are wrong. Select distinct will select distinct permutations. I think you have a pretty good way of doing it. The only thing I can think of, would be to concatenate the rules into a string and the sort it in place. It looks like you are using Postgresql and there is no function that does it in the built-in string functions.

If the amount of symbols were small you might be able to insert them into the array pre-sorted by inserting 'A' in index 1, 'B' into index 2, etc. Which might might the sort quicker ...




回答3:


You need to get an order into your results in order to filter all duplicates out. This can be achieved by making sure that a<b<c. And once you have an order in your results, you can apply a distinct to the resultset.

` SELECT count(*) FROM rules AS t1

LEFT JOIN rules AS t2 ON t1.id != t2.id AND

LEFT JOIN rules AS t3 ON t1.id != t2.id AND t1.id != t3.id AND t2.id != t3.id ...

t1.id < t2.id and t2.id < t3.id ...

AND ...`




回答4:


Difficult to understand exactly what you're trying to achieve, but to avoid the A-B-C C-B-A duplication, try this:

SELECT count(*) 
FROM rules AS t1 
LEFT JOIN rules AS t2
 ON t1.id **<** t2.id
 AND ...
LEFT JOIN rules AS t3
 ON t1.id **<** t2.id AND t1.id **<** t3.id AND t2.id **<** t3.id
 AND ...

That way, the answers are always ordered



来源:https://stackoverflow.com/questions/4242715/removing-duplicates-from-multiple-self-left-joins

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!