问题
I'm using PostgreSQL 9.6. I have a table like this:
mac sn loc time date vin1 vin2 vin3
1a34 4as11111111 aaaa 7:06:18 1/1/2018 447.42 472.32 682.59
1a34 4as11111111 aaaa 7:06:43 1/1/2018 455.97 476.25 682.59
1a34 4as11111111 aaaa 7:07:35 1/1/2018 470.88 484.2 682.5
I need to calculate the average of the vin1
, vin2
, vin3
within time intervals of 300 sec (5 min). For example, starting from the first time (7:06:18 - 7:11:18), for the dates in range. I can select the data I need with this query:
select * from table
where sn='4as11111111' and date between '2018-01-01' and '2018-01-02';
But I don't know how to group it by 300 sec time intervals and calculate average for vin1
, vin2
, vin3
columns for those 5 min interval, to get something like this:
mac sn loc time date vin1_av vin2_av vin3_av
1a34 4as11111111 aaaa 7:06:18 1/1/2018 450.0 480.32 600.59
1a34 4as11111111 aaaa 7:11:18 1/1/2018 460.0 490.25 782.59
1a34 4as11111111 aaaa 7:16:18 1/1/2018 470.88 500.2 600.5
Any help would be greatly appreciated.
回答1:
DB design
While you can work with separate date
and time
columns, there is really no advantage over a single timestamp column. I would adapt:
ALTER TABLE tbl ADD column ts timestamp;
UPDATE tbl SET ts = date + time; -- assuming actual date and time types
ALTER TABLE tbl DROP column date, DROP column time;
If date and time are not actual date
and time
data types, use to_timestamp()
. Related:
- Calculating Cumulative Sum in PostgreSQL
- How to convert "string" to "timestamp without time zone"
Query
Then the query is a bit simpler:
SELECT *
FROM (
SELECT sn, generate_series(min(ts), max(ts), interval '5 min') AS ts
FROM tbl
WHERE sn = '4as11111111'
AND ts >= '2018-01-01'
AND ts < '2018-01-02'
GROUP BY 1
) grid
CROSS JOIN LATERAL (
SELECT round(avg(vin1), 2) AS vin1_av
, round(avg(vin2), 2) AS vin2_av
, round(avg(vin3), 2) AS vin3_av
FROM tbl
WHERE sn = grid.sn
AND ts >= grid.ts
AND ts < grid.ts + interval '5 min'
) avg;
db<>fiddle here
Generate a grid of start times in the first subquery grid
, running from the first to the last qualifying row in the given time frame.
Join to rows that fall in each partition with a LATERAL
join and immediately aggregate averages in the subquery avg
. Due to the aggregates, it always returns a row even if no entries are found. Averages default to NULL
in this case.
The result includes all time slots between the first and last qualifying row in the given time frame. Various other result compositions would make sense, too. Like including all times slots in the given time frame or just time slots with actual values. All possible, I had to pick one interpretation.
Index
At least have this multicolumn index:
CRATE INDEX foo_idx ON tbl (sn, ts);
Or on (sn, ts, vin1, vin2, vin3)
to allow index-only scans - if some preconditions are met and especially if table rows are much wider than in the demo.
Closely related:
- Slow LEFT JOIN on CTE with time intervals
- Best way to count records by arbitrary time intervals in Rails+Postgres
Based on your original table
As requested and clarified in the comment, and later updated again in the question to include the columns mac
and loc
. I assume you want separate averages per (mac, loc)
.
date
and time
are still separate columns, vin* columns are type float
, and exclude time slots without rows:
The updated query also moves the set-returning function generate_series()
to the FROM
list, which is cleaner before Postgres 10:
SELECT t.mac, sn.sn, t.loc, ts.ts::time AS time, ts.ts::date AS date
, t.vin1_av, t.vin2_av, t.vin3_av
FROM (SELECT text '4as11111111') sn(sn) -- provide sn here once
CROSS JOIN LATERAL (
SELECT min(date+time) AS min_ts, max(date+time) AS max_ts
FROM tbl
WHERE sn = sn.sn
AND date+time >= '2018-01-01 0:0' -- provide time frame here
AND date+time < '2018-01-02 0:0'
) grid
CROSS JOIN LATERAL generate_series(min_ts, max_ts, interval '5 min') ts(ts)
CROSS JOIN LATERAL (
SELECT mac, loc
, round(avg(vin1)::numeric, 2) AS vin1_av -- cast to numeric for round()
, round(avg(vin2)::numeric, 2) AS vin2_av -- but rounding is optional
, round(avg(vin3)::numeric, 2) AS vin3_av
FROM tbl
WHERE sn = sn.sn
AND date+time >= ts.ts
AND date+time < ts.ts + interval '5 min'
GROUP BY mac, loc
HAVING count(*) > 0 -- exclude empty slots
) t;
Create a multicolumn expression index to support this:
CRATE INDEX bar_idx ON tbl (sn, (date+time));
db<>fiddle here
But I would much rather use timestamp
all along.
来源:https://stackoverflow.com/questions/50261637/how-to-get-average-values-for-time-intervals-in-postgres