问题
I am aggregating sales for a set of products per day using Postgres and need to know not just when sales do happen, but also when they do not for further processing.
SELECT
sd.date,
COUNT(sd.sale_id) AS sales,
sd.product
FROM sales_data sd
-- sales per product, per day
GROUP BY sd.product, sd.date
ORDER BY sd.product, sd.date
This produces the following:
date | sales | product
------------+-------+-------------------
2017-08-17 | 10 | soap
2017-08-19 | 2 | soap
2017-08-20 | 5 | soap
2017-08-17 | 2 | shower gel
2017-08-21 | 1 | shower gel
As you can see - the date ranges per product are not continuous as sales_data
just didn't contain any info for these products on some days.
What I'm aiming to do is to add a sales = 0
row for each product that is not sold on any day in a range - for example here, between 2017-08-17
and 2017-08-21
to give something like the the following:
date | sales | product
------------+-------+-------------------
2017-08-17 | 10 | soap
2017-08-18 | 0 | soap
2017-08-19 | 2 | soap
2017-08-20 | 5 | soap
2017-08-21 | 0 | soap
2017-08-17 | 2 | shower gel
2017-08-18 | 0 | shower gel
2017-08-19 | 0 | shower gel
2017-08-20 | 0 | shower gel
2017-08-21 | 1 | shower gel
In a simpler case where there was only a single product, it seems like the solution would be to use generate_series()
i.e.:
- create a full range of dates using generate_series
LEFT JOIN
the already aggregated sales data onto the date seriesCOALESCE
anyNULL
counts to 0 in the missing rows
The problem I have is that this approach does not seem to work dates repeat in the aggregated data as I'm grouping over not just multiple dates, but multiple products also.
It feels like I should be able to do something cunning with window functions here to solve this e.g. joining onto the full date range over partitions defined by the product name - but I can't see a way of actually getting this to work.
回答1:
You could use:
WITH cte AS (
SELECT date, s.product
FROM ... -- some way to generate date series
CROSS JOIN (SELECT DISTINCT product FROM sales_data) s
)
SELECT
c.date,
c.product,
COUNT(sd.sale_id) AS sales
FROM cte c
LEFT JOIN sales_data sd
ON c.date = sd.date AND c.product= sd.product
GROUP BY c.date, c.product
ORDER BY c.date, c.product;
First create Cartesian product of dates and products, then LEFT JOIN
to actual data and do calculations.
Oracle has great feature for this scenarios called Partitioned Outer Joins:
SELECT times.time_id, product, quantity
FROM inventory PARTITION BY (product)
RIGHT OUTER JOIN times ON (times.time_id = inventory.time_id)
WHERE times.time_id BETWEEN TO_DATE('01/04/01', 'DD/MM/YY')
AND TO_DATE('06/04/01', 'DD/MM/YY')
ORDER BY 2,1;
回答2:
select
date,
count(sale_id) as sales,
product
from
sales_data
right join (
(
select d::date as date
from generate_series (
(select min(date) from sales_data),
(select max(date) from sales_data),
'1 day'
) gs (d)
) gs
cross join
(select distinct product from sales_data) p
) cj using (product, date)
group by product, date
order by product, date
来源:https://stackoverflow.com/questions/45738055/fill-in-missing-rows-when-aggregating-over-multiple-fields-in-postgres