问题
I have unevenly distributed data(wrt date) for a few years (2003-2008). I want to query data for a given set of start and end date, grouping the data by any of the supported intervals (day, week, month, quarter, year) in PostgreSQL 8.3 (http://www.postgresql.org/docs/8.3/static/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC).
The problem is that some of the queries give results continuous over the required period, as this one:
select to_char(date_trunc('month',date), 'YYYY-MM-DD'),count(distinct post_id)
from some_table where category_id=1 and entity_id = 77 and entity2_id = 115
and date <= '2008-12-06' and date >= '2007-12-01' group by
date_trunc('month',date) order by date_trunc('month',date);
to_char | count
------------+-------
2007-12-01 | 64
2008-01-01 | 31
2008-02-01 | 14
2008-03-01 | 21
2008-04-01 | 28
2008-05-01 | 44
2008-06-01 | 100
2008-07-01 | 72
2008-08-01 | 91
2008-09-01 | 92
2008-10-01 | 79
2008-11-01 | 65
(12 rows)
but some of them miss some intervals because there is no data present, as this one:
select to_char(date_trunc('month',date), 'YYYY-MM-DD'),count(distinct post_id)
from some_table where category_id=1 and entity_id = 75 and entity2_id = 115
and date <= '2008-12-06' and date >= '2007-12-01' group by
date_trunc('month',date) order by date_trunc('month',date);
to_char | count
------------+-------
2007-12-01 | 2
2008-01-01 | 2
2008-03-01 | 1
2008-04-01 | 2
2008-06-01 | 1
2008-08-01 | 3
2008-10-01 | 2
(7 rows)
where the required resultset is:
to_char | count
------------+-------
2007-12-01 | 2
2008-01-01 | 2
2008-02-01 | 0
2008-03-01 | 1
2008-04-01 | 2
2008-05-01 | 0
2008-06-01 | 1
2008-07-01 | 0
2008-08-01 | 3
2008-09-01 | 0
2008-10-01 | 2
2008-11-01 | 0
(12 rows)
A count of 0 for missing entries.
I have seen earlier discussions on Stack Overflow but they don't solve my problem it seems, since my grouping period is one of (day, week, month, quarter, year) and decided on runtime by the application. So an approach like left join with a calendar table or sequence table will not help I guess.
My current solution to this is to fill in these gaps in Python (in a Turbogears App) using the calendar module.
Is there a better way to do this.
回答1:
You can create the list of all first days of the last year (say) with
select distinct date_trunc('month', (current_date - offs)) as date
from generate_series(0,365,28) as offs;
date
------------------------
2007-12-01 00:00:00+01
2008-01-01 00:00:00+01
2008-02-01 00:00:00+01
2008-03-01 00:00:00+01
2008-04-01 00:00:00+02
2008-05-01 00:00:00+02
2008-06-01 00:00:00+02
2008-07-01 00:00:00+02
2008-08-01 00:00:00+02
2008-09-01 00:00:00+02
2008-10-01 00:00:00+02
2008-11-01 00:00:00+01
2008-12-01 00:00:00+01
Then you can join with that series.
回答2:
This question is old. But since fellow users picked it as master for a new duplicate I am adding a proper answer.
Proper solution
SELECT *
FROM (
SELECT day::date
FROM generate_series(timestamp '2007-12-01'
, timestamp '2008-12-01'
, interval '1 month') day
) d
LEFT JOIN (
SELECT date_trunc('month', date_col)::date AS day
, count(*) AS some_count
FROM tbl
WHERE date_col >= date '2007-12-01'
AND date_col <= date '2008-12-06'
-- AND ... more conditions
GROUP BY 1
) t USING (day)
ORDER BY day;
Use
LEFT JOIN
, of course.generate_series() can produce a table of timestamps on the fly, and very fast.
It's generally faster to aggregate before you join. I recently provided a test case on sqlfiddle.com in this related answer:
- PostgreSQL - order by an array
Cast the
timestamp
todate
(::date
) for a basic format. For more use to_char().GROUP BY 1
is syntax shorthand to reference the first output column. Could beGROUP BY day
as well, but that might conflict with an existing column of the same name. OrGROUP BY date_trunc('month', date_col)::date
but that's too long for my taste.Works with the available interval arguments for date_trunc().
count() never produces NULL (
0
for no rows), but theLEFT JOIN
does.
To return0
instead ofNULL
in the outerSELECT
, useCOALESCE(some_count, 0) AS some_count
. The manual.For a more generic solution or arbitrary time intervals consider this closely related answer:
- Best way to count records by arbitrary time intervals in Rails+Postgres
回答3:
You could create a temporary table at runtime and left join on that. That seems to make the most sense.
来源:https://stackoverflow.com/questions/346132/postgres-how-to-return-rows-with-0-count-for-missing-data