Select most recent row with GROUP BY in MySQL

后端未结

关注

 6  2010

I\'m trying to select each user with their most recent payment. The query I have now selects the users first payment. I.e. if a user has made two payments and the paym


                      
              相关标签:


      
      
        
          6条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  囚心锁ツ        
                
              
                            
                2020-12-09 03:39
              
            
            
                                                                       
I read the following solution on SO long ago, but I can't find the link to credit, but here goes:

SELECT users.*, payments.method, payments.id AS payment_id, payments2.id
FROM users
JOIN payments
    ON users.id = payments.user_id 
LEFT JOIN payments2
    ON payments.user_id = payments2.user_id
    AND payments.id < payments2.id
WHERE payments2.id IS NULL


To understand how this works, just drop the WHERE payments2.id IS NULL and you'll see what is happening, for instance it could produce the following output (I haven't build the schema to test this, so it's pseudo-output). Assume there are the following records in payments:

id | user_id | method
1  | 1       | VISA
2  | 1       | VISA
3  | 1       | VISA
4  | 1       | VISA


And the above SQL (without the WHERE payments2.id IS NULL clause) should produce:

users.id | payments.method | payments.id | payments2.id
1        | VISA            | 1           | 2
1        | VISA            | 1           | 3
1        | VISA            | 1           | 4
1        | VISA            | 2           | 3
1        | VISA            | 2           | 4
1        | VISA            | 3           | 4
1        | VISA            | 4           | NULL


As you can see the the last line produces the desired result, and since there's no payments2.id > 4, the LEFT JOIN results in a payments2.id = NULL.

I've found this solution to be much faster (from my early tests) than the accepted answer.

Using a different schema but a similar query, of 16095 records:

select as1.*, as2.id
from allocation_status as1
left join allocation_status as2 
    on as1.allocation_id = as2.allocation_id
    and as1.id < as2.id
where as2.id is null;

16095 rows affected, taking 4.1ms


Compared to the accepted answer of MAX / subquery:

SELECT as1.* 
FROM allocation_status as1
JOIN (
    SELECT max(id) as id
    FROM allocation_status
    group by allocation_id
) as_max on as1.id = as_max.id 

16095 rows affected, taking 14.8ms

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  别那么骄傲        
                
              
                            
                2020-12-09 03:40
              
            
            
                                                                       
I have come across this before.  Group by's are more intended for aggregate expressions or identical records. My research found it is best practice to do something like this:  

    SELECT  u.*, p.method, p.id AS payment_id
    FROM    (
        SELECT  DISTINCT users.id
        FROM    users
        ) ur
    JOIN    payments p
    ON      p.id =
        (
        SELECT  pt.id
        FROM    payments pt
        WHERE   pt.user_id = ur.id
        ORDER BY
                pt.id DESC
        LIMIT 1
        )

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  半阙折子戏        
                
              
                            
                2020-12-09 03:45
              
            
            
                                                                       
You want the groupwise maximum; in essence, group the payments table to identify the maximal records, then join the result back with itself to fetch the other columns:

SELECT users.*, payments.method, payments.id AS payment_id
FROM   payments NATURAL JOIN (
  SELECT   user_id, MAX(id) AS id 
  FROM     payments
  GROUP BY user_id
) t RIGHT JOIN users ON users.id = t.user_id


Note that MAX(id) may not be the "most recent payment", depending on your application and schema: it's usually better to determine "most recent" based off TIMESTAMP than based off synthetic identifiers such as an AUTO_INCREMENT primary key column.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  自闭症患者        
                
              
                            
                2020-12-09 03:46
              
            
            
                                                                       
My solution:

SELECT

u.codigo, 
u.nome,  
max(r.latitude),  
max(r.longitude),  
max(r.data_criacao) 

from TAB_REGISTRO_COORDENADAS  r

inner join TAB_USUARIO u

on u.codigo = r.cd_usuario

group by u.codigo

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  执笔经年        
                
              
                            
                2020-12-09 03:56
              
            
            
                                                                       
I've just been dealing with pretty much exactly the same problem and found these answers helpful. My testing seems to suggest you can make it slightly simpler than the accepted answer, viz.:

SELECT u.*, p.method, p.id AS payment_id 
FROM `users` u, `payments` p
WHERE u.id = p.user_id 
    AND p.id = (SELECT MAX(p2.id) FROM payments p2
                    WHERE p2.user_id = u.id);


I've not performance tested the differences but the db I'm working on has over 50,000 Users and over 60,000 payments and the query runs in 0.024 seconds.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  清酒与你        
                
              
                            
                2020-12-09 04:03
              
            
            
                                                                       
Taking this one step further, we can also use:

select payment_id, cust_id, amount, payment_method 
from my_table where payment_id in 
(
    select max(payment_id) from my_table group by cust_id
);


...but this query is also taking way too long in my context.  The inner select is smoking fast, but the outer takes a while, and with only 124 results from the inner.  Ideas?
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复