问题
I have a table that has user_id, item_id and interaction_type as columns. interaction_type could be 0, 1,2,3,4 or 5. However, for some user_id and item_id pairs, we might have multiple interaction_types. For example, we might have:
user_id item_id interaction_type
2 3 1
2 3 0
2 3 5
4 1 0
5 4 4
5 4 2
What I want is to only keep the maximum interaction_type if there are multiples. So I want this:
user_id item_id interaction_type
2 3 5
4 1 0
5 4 4
Here is the query I wrote for this purpose:
select user_id, item_id, max(interaction_type) as max_type
from mytable
group by user_id, item_id;
But the result is weird. For example, in the original table I have 100000 rows with interaction_type=5 but in the result table I have only 2000. How is this possible as the max will pick 5 between every comparison that contains 5 and therefore I shouldn't have fewer 5 in the result table.
回答1:
Your query is fine. The reason you are getting 2000 rows is because you are getting one row for every unique pair of values user_id
, item_id
.
If you want to see the interaction types going into each row then use:
select user_id, item_id, max(interaction_type) as max_type,
group_concat(distinct interaction_type) as interaction_types,
count(*) as cnt
from mytable
group by user_id, item_id;
It occurs to me that you want all rows with the maximum interaction type. If so, calculate the maximum and then find all rows that match that value:
select t.*
from mytable t cross join
(select max(interaction_type) as maxit from mytable) x
on x.maxit = t.interaction_type;
No group by
is needed for this query.
来源:https://stackoverflow.com/questions/43438257/mysql-group-by-two-columns-and-pick-the-maximum-value-of-third-column