Select duplicate and keep the oldest (not based on ID)

主宰稳场 提交于 2020-01-07 06:34:18

问题


Thanks for your help i'm stuck on this problem.

Let me explain it, i have this kind of table :

| domain |     creationdate    | value 1 | value 2 |
|--------|---------------------|---------|---------|
| abc    | 2013-05-28 15:35:01 | value 1 | value 2 |
| abc    | 2013-04-30 12:10:10 | value 1 | value 2 |
| aaa    | 2011-04-02 13:10:10 | value 1 | value 2 |
| bbb    | 2012-02-12 10:48:10 | value 1 | value 2 |
| bbb    | 2013-04-15 07:15:23 | value 1 | value 2 |

And i want to select (with subqueries) this :

| domain |     creationdate    | value 1 | value 2 |
|--------|---------------------|---------|---------|
| abc    | 2013-04-30 12:10:10 | value 1 | value 2 |
| aaa    | 2011-04-02 13:10:10 | value 1 | value 2 |
| bbb    | 2012-02-12 10:48:10 | value 1 | value 2 |

I tried to do a combinaison of subqueries with IN/NOT IN in WHERE clause and group by/having but i'm not able to obtain a proper result.

I also have another question to ask, if someone already faced this kind of problem i would be glad to hear how he managed to figure it out.

The records in the first table you see above are frequently (every ten mins) deleted/inserted. My aim is to make a copy (or maybe a view) of the result (without the duplicates entries) which will be used 24/7 by a postfix mail server. I heard that big views (with many subqueries) decreases performances which means a table would be a preferable option. The thing is if i have to make a new table every ten mins there will be a little down time and postfix will not be able to read the table.

Waiting for your advices, thanks already.

EDIT :

Based on @Ed Gibbs answer, there is a better sample :

Source table :

| domain     |     creationdate    | value 1 | value 2 |
|------------|---------------------|---------|---------|
| google.com | 2013-05-28 15:35:01 | john    | mary    |
| google.com | 2013-04-30 12:10:10 | patrick | edward  |
| yahoo.fr   | 2011-04-02 13:10:10 | britney | garry   |
| ebay.com   | 2012-02-12 10:48:10 | harry   | mickael |
| ebay.com   | 2013-04-15 07:15:23 | bill    | alice   |

With your query the result is the source table.

Desired result :

| domain     | value 1 | value 2 |
|------------|---------|---------|
| google.com | patrick | edward  |
| yahoo.fr   | britney | garry   |
| ebay.com   | harry   | mickael |

I want to keep the oldest domain (with the min creation date) with its own value1 and 2.


New question !

I made a view of the desired result based on your anwser.

The result look like this :

| domain     | value 1 | foreign_key |
|------------|---------|-------------|
| google.com | patrick | X           |
| yahoo.fr   | britney | Y           |
| ebay.com   | harry   | Z           |

I also have a table with this kind of entries :

| email              | value 1 | foreign_key |
|--------------------|---------|-------------|
| john@google.com    | patrick | X           |
| john@google.com    | britney | Y           |
| harry@google.com   | mary    | X           |
| mickael@google.com | jack    | X           |
| david@ebay.com     | walter  | Z           |
| alice@yahoo.com    | brian   | Y           |

Assume that (in this sample) emails %@google.com from Y foreign_key aren't good records (only %google.com from X foreign are the good ones and also because its domain is the one i choose with the creationdate selection) how could i manage to select only emails from domain/fk referenced in my new view ?

Desired result :

| email              | value 1 | foreign_key |
|--------------------|---------|-------------|
| john@google.com    | patrick | X           |
| harry@google.com   | mary    | X           |
| mickael@google.com | jack    | X           |
| david@ebay.com     | walter  | Z           |
| alice@yahoo.com    | brian   | Y           |

I tried with a CONCAT('%','@',domain) and a foreign_key=foreign_key join but it doesn't give me what i want.


回答1:


Based on your sample data and results, a GROUP BY will give you the results you're after:

SELECT
  domain,
  MIN(creationdate) AS creationdate,
  value1,
  value2
FROM mytable
GROUP BY domain, value1, value2

Addendum: @Arka provided updated sample data where the value 1 and value 2 columns have different values (in the original they were the same). That changes the query to this:

SELECT domain, creationdate, value1, value2
FROM mytable
WHERE (domain, creationdate) IN (
  SELECT domain, MIN(creationdate)
  FROM mytable
  GROUP BY domain)

The subquery gets a list of the earliest creationdate for each domain, and the outer query only selects rows where the domain and creationdate match the subquery values.



来源:https://stackoverflow.com/questions/16799799/select-duplicate-and-keep-the-oldest-not-based-on-id

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!