问题
I have a Postgres 9.1 database. I am trying to generate the number of records per week (for a given date range) and compare it to the previous year.
I have the following code used to generate the series:
select generate_series('2013-01-01', '2013-01-31', '7 day'::interval) as series
However, I am not sure how to join the counted records to the dates generated.
So, using the following records as an example:
Pt_ID exam_date
====== =========
1 2012-01-02
2 2012-01-02
3 2012-01-08
4 2012-01-08
1 2013-01-02
2 2013-01-02
3 2013-01-03
4 2013-01-04
1 2013-01-08
2 2013-01-10
3 2013-01-15
4 2013-01-24
I wanted to have the records return as:
series thisyr lastyr
=========== ===== =====
2013-01-01 4 2
2013-01-08 3 2
2013-01-15 1 0
2013-01-22 1 0
2013-01-29 0 0
Not sure how to reference the date range in the subsearch. Thanks for any assistance.
回答1:
Using across join
should work, I'm just going to paste the markdown output from SQL Fiddle below. It would seem that your sample output is incorrect for series 2013-01-08: the thisyr should be 2, not 3. This might not be the best way to do this though, my Postgresql knowledge leaves a lot to be desired.
SQL Fiddle
PostgreSQL 9.2.4 Schema Setup:
CREATE TABLE Table1
("Pt_ID" varchar(6), "exam_date" date);
INSERT INTO Table1
("Pt_ID", "exam_date")
VALUES
('1', '2012-01-02'),('2', '2012-01-02'),
('3', '2012-01-08'),('4', '2012-01-08'),
('1', '2013-01-02'),('2', '2013-01-02'),
('3', '2013-01-03'),('4', '2013-01-04'),
('1', '2013-01-08'),('2', '2013-01-10'),
('3', '2013-01-15'),('4', '2013-01-24');
Query 1:
select
series,
sum (
case
when exam_date
between series and series + '6 day'::interval
then 1
else 0
end
) as thisyr,
sum (
case
when exam_date + '1 year'::interval
between series and series + '6 day'::interval
then 1 else 0
end
) as lastyr
from table1
cross join generate_series('2013-01-01', '2013-01-31', '7 day'::interval) as series
group by series
order by series
Results:
| SERIES | THISYR | LASTYR |
|--------------------------------|--------|--------|
| January, 01 2013 00:00:00+0000 | 4 | 2 |
| January, 08 2013 00:00:00+0000 | 2 | 2 |
| January, 15 2013 00:00:00+0000 | 1 | 0 |
| January, 22 2013 00:00:00+0000 | 1 | 0 |
| January, 29 2013 00:00:00+0000 | 0 | 0 |
回答2:
The simple approach would be to solve this with a CROSS JOIN like demonstrated by @jpw. However, there are some hidden problems:
The performance of an unconditional
CROSS JOIN
deteriorates quickly with growing number of rows. The total number of rows is multiplied by the number of weeks you are testing for, before this huge derived table can be processed in the aggregation. Indexes can't help.Starting weeks with January 1st leads to inconsistencies. ISO weeks might be an alternative. See below.
All of the following queries make heavy use of an index on exam_date
. Be sure to have one.
Only join to relevant rows
Should be much faster:
SELECT d.day, d.thisyr
, count(t.exam_date) AS lastyr
FROM (
SELECT d.day::date, (d.day - '1 year'::interval)::date AS day0 -- for 2nd join
, count(t.exam_date) AS thisyr
FROM generate_series('2013-01-01'::date
, '2013-01-31'::date -- last week overlaps with Feb.
, '7 days'::interval) d(day) -- returns timestamp
LEFT JOIN tbl t ON t.exam_date >= d.day::date
AND t.exam_date < d.day::date + 7
GROUP BY d.day
) d
LEFT JOIN tbl t ON t.exam_date >= d.day0 -- repeat with last year
AND t.exam_date < d.day0 + 7
GROUP BY d.day, d.thisyr
ORDER BY d.day;
This is with weeks starting from Jan. 1st like in your original. As commented, this produces a couple of inconsistencies: Weeks start on a different day each year and since we cut off at the end of the year, the last week of the year consists of just 1 or 2 days (leap year).
The same with ISO weeks
Depending on requirements, consider ISO weeks instead, which start on Mondays and always span 7 days. But they cross the border between years. Per documentation on EXTRACT():
week
The number of the week of the year that the day is in. By definition (ISO 8601), weeks start on Mondays and the first week of a year contains January 4 of that year. In other words, the first Thursday of a year is in week 1 of that year.
In the ISO definition, it is possible for early-January dates to be part of the 52nd or 53rd week of the previous year, and for late-December dates to be part of the first week of the next year. For example,
2005-01-01
is part of the 53rd week of year 2004, and2006-01-01
is part of the 52nd week of year 2005, while2012-12-31
is part of the first week of 2013. It's recommended to use theisoyear
field together withweek
to get consistent results.
Above query rewritten with ISO weeks:
SELECT w AS isoweek
, day::text AS thisyr_monday, thisyr_ct
, day0::text AS lastyr_monday, count(t.exam_date) AS lastyr_ct
FROM (
SELECT w, day
, date_trunc('week', '2012-01-04'::date)::date + 7 * w AS day0
, count(t.exam_date) AS thisyr_ct
FROM (
SELECT w
, date_trunc('week', '2013-01-04'::date)::date + 7 * w AS day
FROM generate_series(0, 4) w
) d
LEFT JOIN tbl t ON t.exam_date >= d.day
AND t.exam_date < d.day + 7
GROUP BY d.w, d.day
) d
LEFT JOIN tbl t ON t.exam_date >= d.day0 -- repeat with last year
AND t.exam_date < d.day0 + 7
GROUP BY d.w, d.day, d.day0, d.thisyr_ct
ORDER BY d.w, d.day;
January 4th is always in the first ISO week of the year. So this expression gets the date of Monday of the first ISO week of the given year:
date_trunc('week', '2012-01-04'::date)::date
Simplify with EXTRACT()
Since ISO weeks coincide with the week numbers returned by EXTRACT()
, we can simplify the query. First, a short and simple form:
SELECT w AS isoweek
, COALESCE(thisyr_ct, 0) AS thisyr_ct
, COALESCE(lastyr_ct, 0) AS lastyr_ct
FROM generate_series(1, 5) w
LEFT JOIN (
SELECT EXTRACT(week FROM exam_date)::int AS w, count(*) AS thisyr_ct
FROM tbl
WHERE EXTRACT(isoyear FROM exam_date)::int = 2013
GROUP BY 1
) t13 USING (w)
LEFT JOIN (
SELECT EXTRACT(week FROM exam_date)::int AS w, count(*) AS lastyr_ct
FROM tbl
WHERE EXTRACT(isoyear FROM exam_date)::int = 2012
GROUP BY 1
) t12 USING (w);
Optimized query
The same with more details and optimized for performance
WITH params AS ( -- enter parameters here, once
SELECT date_trunc('week', '2012-01-04'::date)::date AS last_start
, date_trunc('week', '2013-01-04'::date)::date AS this_start
, date_trunc('week', '2014-01-04'::date)::date AS next_start
, 1 AS week_1
, 5 AS week_n -- show weeks 1 - 5
)
SELECT w.w AS isoweek
, p.this_start + 7 * (w - 1) AS thisyr_monday
, COALESCE(t13.ct, 0) AS thisyr_ct
, p.last_start + 7 * (w - 1) AS lastyr_monday
, COALESCE(t12.ct, 0) AS lastyr_ct
FROM params p
, generate_series(p.week_1, p.week_n) w(w)
LEFT JOIN (
SELECT EXTRACT(week FROM t.exam_date)::int AS w, count(*) AS ct
FROM tbl t, params p
WHERE t.exam_date >= p.this_start -- only relevant dates
AND t.exam_date < p.this_start + 7 * (p.week_n - p.week_1 + 1)::int
-- AND t.exam_date < p.next_start -- don't cross over into next year
GROUP BY 1
) t13 USING (w)
LEFT JOIN ( -- same for last year
SELECT EXTRACT(week FROM t.exam_date)::int AS w, count(*) AS ct
FROM tbl t, params p
WHERE t.exam_date >= p.last_start
AND t.exam_date < p.last_start + 7 * (p.week_n - p.week_1 + 1)::int
-- AND t.exam_date < p.this_start
GROUP BY 1
) t12 USING (w);
This should be very fast with index support and can easily be adapted to intervals of choice.
The implicit JOIN LATERAL
for generate_series()
in the last query requires Postgres 9.3.
SQL Fiddle.
来源:https://stackoverflow.com/questions/26834062/total-number-of-records-per-week