问题
The following query:
SELECT
year, id, rate
FROM h
WHERE year BETWEEN 2000 AND 2009
AND id IN (SELECT rid FROM table2)
GROUP BY id, year
ORDER BY id, rate DESC
yields:
year id rate
2006 p01 8
2003 p01 7.4
2008 p01 6.8
2001 p01 5.9
2007 p01 5.3
2009 p01 4.4
2002 p01 3.9
2004 p01 3.5
2005 p01 2.1
2000 p01 0.8
2001 p02 12.5
2004 p02 12.4
2002 p02 12.2
2003 p02 10.3
2000 p02 8.7
2006 p02 4.6
2007 p02 3.3
What I'd like is only the top 5 results for each id:
2006 p01 8
2003 p01 7.4
2008 p01 6.8
2001 p01 5.9
2007 p01 5.3
2001 p02 12.5
2004 p02 12.4
2002 p02 12.2
2003 p02 10.3
2000 p02 8.7
Is there a way to do this using some kind of LIMIT like modifier that works within the GROUP BY?
回答1:
You could use GROUP_CONCAT aggregated function to get all years into a single column, grouped by id
and ordered by rate
:
SELECT id, GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
FROM yourtable
GROUP BY id
Result:
-----------------------------------------------------------
| ID | GROUPED_YEAR |
-----------------------------------------------------------
| p01 | 2006,2003,2008,2001,2007,2009,2002,2004,2005,2000 |
| p02 | 2001,2004,2002,2003,2000,2006,2007 |
-----------------------------------------------------------
And then you could use FIND_IN_SET, that returns the position of the first argument inside the second one, eg.
SELECT FIND_IN_SET('2006', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
1
SELECT FIND_IN_SET('2009', '2006,2003,2008,2001,2007,2009,2002,2004,2005,2000');
6
Using a combination of GROUP_CONCAT
and FIND_IN_SET
, and filtering by the position returned by find_in_set, you could then use this query that returns only the first 5 years for every id:
SELECT
yourtable.*
FROM
yourtable INNER JOIN (
SELECT
id,
GROUP_CONCAT(year ORDER BY rate DESC) grouped_year
FROM
yourtable
GROUP BY id) group_max
ON yourtable.id = group_max.id
AND FIND_IN_SET(year, grouped_year) BETWEEN 1 AND 5
ORDER BY
yourtable.id, yourtable.year DESC;
Please see fiddle here.
Please note that if more than one row can have the same rate, you should consider using GROUP_CONCAT(DISTINCT rate ORDER BY rate) on the rate column instead of the year column.
The maximum length of the string returned by GROUP_CONCAT is limited, so this works well if you need to select a few records for every group.
回答2:
The original query used user variables and ORDER BY
on derived tables; the behavior of both quirks is not guaranteed. Revised answer as follows.
In MySQL 5.x you can use poor man's rank over partition to achieve desired result. Just outer join the table with itself and for each row, count the number of rows lesser than it. In the above case, lesser row is the one with higher rate:
SELECT t.id, t.rate, t.year, COUNT(l.rate) AS rank
FROM t
LEFT JOIN t AS l ON t.id = l.id AND t.rate < l.rate
GROUP BY t.id, t.rate, t.year
HAVING COUNT(l.rate) < 5
ORDER BY t.id, t.rate DESC, t.year
Demo and Result:
| id | rate | year | rank |
|-----|------|------|------|
| p01 | 8.0 | 2006 | 0 |
| p01 | 7.4 | 2003 | 1 |
| p01 | 6.8 | 2008 | 2 |
| p01 | 5.9 | 2001 | 3 |
| p01 | 5.3 | 2007 | 4 |
| p02 | 12.5 | 2001 | 0 |
| p02 | 12.4 | 2004 | 1 |
| p02 | 12.2 | 2002 | 2 |
| p02 | 10.3 | 2003 | 3 |
| p02 | 8.7 | 2000 | 4 |
Note that if the rates had ties, for example:
100, 90, 90, 80, 80, 80, 70, 60, 50, 40, ...
The above query will return 6 rows:
100, 90, 90, 80, 80, 80
Change to HAVING COUNT(DISTINCT l.rate) < 5
to get 8 rows:
100, 90, 90, 80, 80, 80, 70, 60
Or change to ON t.id = l.id AND (t.rate < l.rate OR (t.rate = l.rate AND t.pri_key > l.pri_key))
to get 5 rows:
100, 90, 90, 80, 80
In MySQL 8 or later just use the RANK, DENSE_RANK or ROW_NUMBER functions:
SELECT *
FROM (
SELECT *, RANK() OVER (PARTITION BY id ORDER BY rate DESC) AS rnk
FROM t
) AS x
WHERE rnk <= 5
回答3:
For me something like
SUBSTRING_INDEX(group_concat(col_name order by desired_col_order_name), ',', N)
works perfectly. No complicated query.
for example: get top 1 for each group
SELECT
*
FROM
yourtable
WHERE
id IN (SELECT
SUBSTRING_INDEX(GROUP_CONCAT(id
ORDER BY rate DESC),
',',
1) id
FROM
yourtable
GROUP BY year)
ORDER BY rate DESC;
回答4:
No, you can't LIMIT subqueries arbitrarily (you can do it to a limited extent in newer MySQLs, but not for 5 results per group).
This is a groupwise-maximum type query, which is not trivial to do in SQL. There are various ways to tackle that which can be more efficient for some cases, but for top-n in general you'll want to look at Bill's answer to a similar previous question.
As with most solutions to this problem, it can return more than five rows if there are multiple rows with the same rate
value, so you may still need a quantity of post-processing to check for that.
回答5:
This requires a series of subqueries to rank the values, limit them, then perform the sum while grouping
@Rnk:=0;
@N:=2;
select
c.id,
sum(c.val)
from (
select
b.id,
b.bal
from (
select
if(@last_id=id,@Rnk+1,1) as Rnk,
a.id,
a.val,
@last_id=id,
from (
select
id,
val
from list
order by id,val desc) as a) as b
where b.rnk < @N) as c
group by c.id;
回答6:
Try this:
SELECT h.year, h.id, h.rate
FROM (SELECT h.year, h.id, h.rate, IF(@lastid = (@lastid:=h.id), @index:=@index+1, @index:=0) indx
FROM (SELECT h.year, h.id, h.rate
FROM h
WHERE h.year BETWEEN 2000 AND 2009 AND id IN (SELECT rid FROM table2)
GROUP BY id, h.year
ORDER BY id, rate DESC
) h, (SELECT @lastid:='', @index:=0) AS a
) h
WHERE h.indx <= 5;
回答7:
SELECT year, id, rate
FROM (SELECT
year, id, rate, row_number() over (partition by id order by rate DESC)
FROM h
WHERE year BETWEEN 2000 AND 2009
AND id IN (SELECT rid FROM table2)
GROUP BY id, year
ORDER BY id, rate DESC) as subquery
WHERE row_number <= 5
The subquery is almost identical to your query. Only change is adding
row_number() over (partition by id order by rate DESC)
回答8:
Build the virtual columns(like RowID in Oracle)
Table:
CREATE TABLE `stack`
(`year` int(11) DEFAULT NULL,
`id` varchar(10) DEFAULT NULL,
`rate` float DEFAULT NULL)
ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
Data:
insert into stack values(2006,'p01',8);
insert into stack values(2001,'p01',5.9);
insert into stack values(2007,'p01',5.3);
insert into stack values(2009,'p01',4.4);
insert into stack values(2001,'p02',12.5);
insert into stack values(2004,'p02',12.4);
insert into stack values(2005,'p01',2.1);
insert into stack values(2000,'p01',0.8);
insert into stack values(2002,'p02',12.2);
insert into stack values(2002,'p01',3.9);
insert into stack values(2004,'p01',3.5);
insert into stack values(2003,'p02',10.3);
insert into stack values(2000,'p02',8.7);
insert into stack values(2006,'p02',4.6);
insert into stack values(2007,'p02',3.3);
insert into stack values(2003,'p01',7.4);
insert into stack values(2008,'p01',6.8);
SQL like this:
select t3.year,t3.id,t3.rate
from (select t1.*, (select count(*) from stack t2 where t1.rate<=t2.rate and t1.id=t2.id) as rownum from stack t1) t3
where rownum <=3 order by id,rate DESC;
If delete the where clause in t3, it shows like this:
GET "TOP N Record" --> add the rownum <=3
in where
clause (the where-clause of t3);
CHOOSE "the year" --> add the BETWEEN 2000 AND 2009
in where
clause (the where-clause of t3);
回答9:
Took some working, but I thougth my solution would be something to share as it is seems elegant as well as quite fast.
SELECT h.year, h.id, h.rate
FROM (
SELECT id,
SUBSTRING_INDEX(GROUP_CONCAT(CONCAT(id, '-', year) ORDER BY rate DESC), ',' , 5) AS l
FROM h
WHERE year BETWEEN 2000 AND 2009
GROUP BY id
ORDER BY id
) AS h_temp
LEFT JOIN h ON h.id = h_temp.id
AND SUBSTRING_INDEX(h_temp.l, CONCAT(h.id, '-', h.year), 1) != h_temp.l
Note that this example is specified for the purpose of the question and can be modified quite easily for other similar purposes.
回答10:
The following post: sql: selcting top N record per group describes the complicated way of achieving this without subqueries.
It improves on other solutions offered here by:
- Doing everything in a single query
- Being able to properly utilize indexes
- Avoiding subqueries, notoriously known to produce bad execution plans in MySQL
It is however not pretty. A good solution would be achievable were Window Functions (aka Analytic Functions) enabled in MySQL -- but they are not. The trick used in said post utilizes GROUP_CONCAT, which is sometimes described as "poor man's Window Functions for MySQL".
回答11:
for those like me that had queries time out. I made the below to use limits and anything else by a specific group.
DELIMITER $$
CREATE PROCEDURE count_limit200()
BEGIN
DECLARE a INT Default 0;
DECLARE stop_loop INT Default 0;
DECLARE domain_val VARCHAR(250);
DECLARE domain_list CURSOR FOR SELECT DISTINCT domain FROM db.one;
OPEN domain_list;
SELECT COUNT(DISTINCT(domain)) INTO stop_loop
FROM db.one;
-- BEGIN LOOP
loop_thru_domains: LOOP
FETCH domain_list INTO domain_val;
SET a=a+1;
INSERT INTO db.two(book,artist,title,title_count,last_updated)
SELECT * FROM
(
SELECT book,artist,title,COUNT(ObjectKey) AS titleCount, NOW()
FROM db.one
WHERE book = domain_val
GROUP BY artist,title
ORDER BY book,titleCount DESC
LIMIT 200
) a ON DUPLICATE KEY UPDATE title_count = titleCount, last_updated = NOW();
IF a = stop_loop THEN
LEAVE loop_thru_domain;
END IF;
END LOOP loop_thru_domain;
END $$
it loops through a list of domains and then inserts only a limit of 200 each
回答12:
Try this:
SET @num := 0, @type := '';
SELECT `year`, `id`, `rate`,
@num := if(@type = `id`, @num + 1, 1) AS `row_number`,
@type := `id` AS `dummy`
FROM (
SELECT *
FROM `h`
WHERE (
`year` BETWEEN '2000' AND '2009'
AND `id` IN (SELECT `rid` FROM `table2`) AS `temp_rid`
)
ORDER BY `id`
) AS `temph`
GROUP BY `year`, `id`, `rate`
HAVING `row_number`<='5'
ORDER BY `id`, `rate DESC;
回答13:
Please try below stored procedure. I have already verified. I am getting proper result but without using groupby
.
CREATE DEFINER=`ks_root`@`%` PROCEDURE `first_five_record_per_id`()
BEGIN
DECLARE query_string text;
DECLARE datasource1 varchar(24);
DECLARE done INT DEFAULT 0;
DECLARE tenants varchar(50);
DECLARE cur1 CURSOR FOR SELECT rid FROM demo1;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;
SET @query_string='';
OPEN cur1;
read_loop: LOOP
FETCH cur1 INTO tenants ;
IF done THEN
LEAVE read_loop;
END IF;
SET @datasource1 = tenants;
SET @query_string = concat(@query_string,'(select * from demo where `id` = ''',@datasource1,''' order by rate desc LIMIT 5) UNION ALL ');
END LOOP;
close cur1;
SET @query_string = TRIM(TRAILING 'UNION ALL' FROM TRIM(@query_string));
select @query_string;
PREPARE stmt FROM @query_string;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END
来源:https://stackoverflow.com/questions/64780798/mysql-efficient-max-n-values-per-group