This is my query:
SELECT `products`.*, SUM(orders.total_count) AS revenue,
SUM(orders.quantity) AS qty, ROUND(AVG(product_reviews.stars)) as avg_stars
FROM
One approach to avoid that problem is to use correlated subquery in the SELECT list, rather than a left join.
SELECT p.*
, SUM(o.total_count) AS revenue
, SUM(o.quantity) AS qty
, ( SELECT ROUND(AVG(r.stars))
FROM `product_reviews` r
WHERE r.product_id = p.id
) AS avg_stars
FROM `products` p
LEFT
JOIN `orders` o
ON o.product_id = p.id
AND o.status IN ('delivered','new')
GROUP BY p.id
ORDER BY p.id DESC
LIMIT 10
OFFSET 0
This isn't the only approach, and it's not necessarily the best approach, especially with large sets But given that the subquery will run a maximum of 10 times (given the LIMIT clause), performance should be reasonable (given an appropriate index on product_reviews(product_id,stars)
.
If you were returning all product ids, or a significant percentage of them, then using an inline view might give better performance (avoiding the nested loops execution of the correlated subquery in the select list)
SELECT p.*
, SUM(o.total_count) AS revenue
, SUM(o.quantity) AS qty
, s.avg_stars
FROM `products` p
LEFT
JOIN `orders` o
ON o.product_id = p.id
AND o.status IN ('delivered','new')
LEFT
JOIN ( SELECT ROUND(AVG(r.stars)) AS avg_stars
, r.product_id
FROM `product_reviews` r
GROUP BY r.product_id
) s
ON s.product_id = p.id
GROUP BY p.id
ORDER BY p.id DESC
LIMIT 10
OFFSET 0
Just to be clear: the issue with the original query is that every order for a product is getting matched to every review for the product.
I apologize if my use of the term "semi-cartesian" was misleading or confusing.
The idea that I meant to convey by that was that you had two distinct sets (the set of orders for a product, and the set of reviews for a product), and that your query was generating a "cross product" of those two distinct sets, basically "matching" every order to every review (for a particular product).
For example, given three rows in reviews
for product_id 101, and two rows in orders
for product_id 101, e.g.:
REVIEWS
pid stars text
--- ----- --------------
101 4.5 woo hoo perfect
101 3 ehh
101 1 totally sucked
ORDERS
pid date qty
--- ----- ---
101 1/13 100
101 1/22 7
Your original query is essentially forming a result set with six rows in it, each row from order being matched to all three rows from reviews:
id date qty stars text
--- ---- ---- ---- ------------
101 1/13 100 4.5 woo hoo perfect
101 1/13 100 3 ehh
101 1/13 100 1 totally sucked
101 1/22 7 4.5 woo hoo perfect
101 1/22 7 3 ehh
101 1/22 7 1 totally sucked
Then, when the SUM aggregate on qty gets applied, the values returned are way bigger than you expect.
The problem is that the product_reviews
and orders table can have more that one row per product id. One way you can fix this is to use a subquery:
SELECT `products`.*,
o.revenue,
o.qty,
ROUND(avg_stars) as avg_stars
FROM `products`
LEFT JOIN
(
select `product_id`,
sum(total_count) revenue,
sum(quantity) qty
from `orders`
where `status` in ('delivered', 'new')
group by `product_id`
) o
ON `products`.`id` = o.`product_id`
LEFT JOIN
(
select product_id, avg(stars) avg_stars
from product_reviews
group by product_id
) pr
ON (products.id = pr.product_id)
ORDER BY products.ID DESC
LIMIT 10
OFFSET 0
Its not easy to solve this without seeing your table schemas, I would suggest you look at your Aggregations and Group By statements first, then look at your column default values, how are you handling empty values, also look at DISTINCT in the Aggregation functions.
If all else fails and a "optimized" solution is not vital and your data volumes are low do a Sub Select only on the tables for which you require the values, within the Sub Select on 1 table you have a much narrower row scope and it will yield the correct result.
I would suggest that you supply your table schemas here.