Two left joins gives me untrue data(double data?) with MySQL

后端未结

关注

 3  1351

This is my query:

SELECT `products`.*, SUM(orders.total_count) AS revenue,
    SUM(orders.quantity) AS qty, ROUND(AVG(product_reviews.stars)) as avg_stars 
FROM


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  没有蜡笔的小新        
                
              
                            
                2021-01-26 07:33
              
            
            
                                                                       
One approach to avoid that problem is to use correlated subquery in the SELECT list, rather than a left join.

SELECT p.*
     , SUM(o.total_count) AS revenue
     , SUM(o.quantity) AS qty
     , ( SELECT ROUND(AVG(r.stars))
           FROM `product_reviews` r
          WHERE r.product_id = p.id 
       ) AS avg_stars
  FROM `products` p
  LEFT
  JOIN `orders` o
    ON o.product_id = p.id
   AND o.status IN ('delivered','new')
 GROUP BY p.id
 ORDER BY p.id DESC
 LIMIT 10
 OFFSET 0


This isn't the only approach, and it's not necessarily the best approach, especially with large sets But given that the subquery will run a maximum of 10 times (given the LIMIT clause), performance should be reasonable (given an appropriate index on product_reviews(product_id,stars).

If you were returning all product ids, or a significant percentage of them, then using an inline view might give better performance (avoiding the nested loops execution of the correlated subquery in the select list)

SELECT p.*
     , SUM(o.total_count) AS revenue
     , SUM(o.quantity) AS qty
     , s.avg_stars
  FROM `products` p
  LEFT
  JOIN `orders` o
    ON o.product_id = p.id
   AND o.status IN ('delivered','new')
  LEFT
  JOIN ( SELECT ROUND(AVG(r.stars)) AS avg_stars
              , r.product_id
           FROM `product_reviews` r
          GROUP BY r.product_id 
       ) s
    ON s.product_id = p.id
 GROUP BY p.id
 ORDER BY p.id DESC
 LIMIT 10
 OFFSET 0




Just to be clear: the issue with the original query is that every order for a product is getting matched to every review for the product.

I apologize if my use of the term "semi-cartesian" was misleading or confusing. 

The idea that I meant to convey by that was that you had two distinct sets (the set of orders for a product, and the set of reviews for a product), and that your query was generating a "cross product" of those two distinct sets, basically "matching" every order to every review (for a particular product).

For example, given three rows in reviews for product_id 101, and two rows in orders for product_id 101, e.g.:

REVIEWS
pid  stars text
---  ----- --------------
101  4.5   woo hoo perfect
101  3     ehh
101  1     totally sucked


ORDERS
pid  date   qty 
---  -----  ---
101  1/13   100
101  1/22   7


Your original query is essentially forming a result set with six rows in it, each row from order being matched to all three rows from reviews:

id   date   qty   stars text
---  ----   ----  ----  ------------
101  1/13   100   4.5   woo hoo perfect
101  1/13   100   3     ehh
101  1/13   100   1     totally sucked
101  1/22   7     4.5   woo hoo perfect
101  1/22   7     3     ehh
101  1/22   7     1     totally sucked


Then, when the SUM aggregate on qty gets applied, the values returned are way bigger than you expect.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一个人的身影        
                
              
                            
                2021-01-26 07:36
              
            
            
                                                                       
The problem is that the product_reviews and orders table can have more that one row per product id.  One way you can fix this is to use a subquery:

SELECT `products`.*, 
  o.revenue,
  o.qty, 
  ROUND(avg_stars) as avg_stars 
FROM `products` 
LEFT JOIN
(
  select `product_id`, 
    sum(total_count) revenue,
    sum(quantity) qty
  from `orders`
  where `status` in ('delivered', 'new')
  group by `product_id`
) o
  ON `products`.`id` = o.`product_id`
LEFT JOIN
(
  select product_id, avg(stars) avg_stars
  from product_reviews
  group by product_id
) pr
    ON (products.id = pr.product_id)
ORDER BY products.ID DESC
LIMIT 10
OFFSET 0

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  耶瑟儿～        
                
              
                            
                2021-01-26 07:43
              
            
            
                                                                       
Its not easy to solve this without seeing your table schemas, 
I would suggest you look at your Aggregations and Group By statements first, then look at your column default values, how are you handling empty values, also look at DISTINCT in the Aggregation functions.

If all else fails and a "optimized" solution is not vital and your data volumes are low do a Sub Select only on the tables for which you require the values, within the Sub Select on 1 table you have a much narrower row scope and it will yield the correct result.

I would suggest that you supply your table schemas here. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复