Two left joins gives me untrue data(double data?) with MySQL

后端 未结 3 1351
梦谈多话
梦谈多话 2021-01-26 06:56

This is my query:

SELECT `products`.*, SUM(orders.total_count) AS revenue,
    SUM(orders.quantity) AS qty, ROUND(AVG(product_reviews.stars)) as avg_stars 
FROM          


        
相关标签:
3条回答
  • 2021-01-26 07:33

    One approach to avoid that problem is to use correlated subquery in the SELECT list, rather than a left join.

    SELECT p.*
         , SUM(o.total_count) AS revenue
         , SUM(o.quantity) AS qty
         , ( SELECT ROUND(AVG(r.stars))
               FROM `product_reviews` r
              WHERE r.product_id = p.id 
           ) AS avg_stars
      FROM `products` p
      LEFT
      JOIN `orders` o
        ON o.product_id = p.id
       AND o.status IN ('delivered','new')
     GROUP BY p.id
     ORDER BY p.id DESC
     LIMIT 10
     OFFSET 0
    

    This isn't the only approach, and it's not necessarily the best approach, especially with large sets But given that the subquery will run a maximum of 10 times (given the LIMIT clause), performance should be reasonable (given an appropriate index on product_reviews(product_id,stars).

    If you were returning all product ids, or a significant percentage of them, then using an inline view might give better performance (avoiding the nested loops execution of the correlated subquery in the select list)

    SELECT p.*
         , SUM(o.total_count) AS revenue
         , SUM(o.quantity) AS qty
         , s.avg_stars
      FROM `products` p
      LEFT
      JOIN `orders` o
        ON o.product_id = p.id
       AND o.status IN ('delivered','new')
      LEFT
      JOIN ( SELECT ROUND(AVG(r.stars)) AS avg_stars
                  , r.product_id
               FROM `product_reviews` r
              GROUP BY r.product_id 
           ) s
        ON s.product_id = p.id
     GROUP BY p.id
     ORDER BY p.id DESC
     LIMIT 10
     OFFSET 0
    

    Just to be clear: the issue with the original query is that every order for a product is getting matched to every review for the product.

    I apologize if my use of the term "semi-cartesian" was misleading or confusing.

    The idea that I meant to convey by that was that you had two distinct sets (the set of orders for a product, and the set of reviews for a product), and that your query was generating a "cross product" of those two distinct sets, basically "matching" every order to every review (for a particular product).

    For example, given three rows in reviews for product_id 101, and two rows in orders for product_id 101, e.g.:

    REVIEWS
    pid  stars text
    ---  ----- --------------
    101  4.5   woo hoo perfect
    101  3     ehh
    101  1     totally sucked
    
    
    ORDERS
    pid  date   qty 
    ---  -----  ---
    101  1/13   100
    101  1/22   7
    

    Your original query is essentially forming a result set with six rows in it, each row from order being matched to all three rows from reviews:

    id   date   qty   stars text
    ---  ----   ----  ----  ------------
    101  1/13   100   4.5   woo hoo perfect
    101  1/13   100   3     ehh
    101  1/13   100   1     totally sucked
    101  1/22   7     4.5   woo hoo perfect
    101  1/22   7     3     ehh
    101  1/22   7     1     totally sucked
    

    Then, when the SUM aggregate on qty gets applied, the values returned are way bigger than you expect.

    0 讨论(0)
  • 2021-01-26 07:36

    The problem is that the product_reviews and orders table can have more that one row per product id. One way you can fix this is to use a subquery:

    SELECT `products`.*, 
      o.revenue,
      o.qty, 
      ROUND(avg_stars) as avg_stars 
    FROM `products` 
    LEFT JOIN
    (
      select `product_id`, 
        sum(total_count) revenue,
        sum(quantity) qty
      from `orders`
      where `status` in ('delivered', 'new')
      group by `product_id`
    ) o
      ON `products`.`id` = o.`product_id`
    LEFT JOIN
    (
      select product_id, avg(stars) avg_stars
      from product_reviews
      group by product_id
    ) pr
        ON (products.id = pr.product_id)
    ORDER BY products.ID DESC
    LIMIT 10
    OFFSET 0
    
    0 讨论(0)
  • 2021-01-26 07:43

    Its not easy to solve this without seeing your table schemas, I would suggest you look at your Aggregations and Group By statements first, then look at your column default values, how are you handling empty values, also look at DISTINCT in the Aggregation functions.

    If all else fails and a "optimized" solution is not vital and your data volumes are low do a Sub Select only on the tables for which you require the values, within the Sub Select on 1 table you have a much narrower row scope and it will yield the correct result.

    I would suggest that you supply your table schemas here.

    0 讨论(0)
提交回复
热议问题