问题
Assume I have a table called "Diary" like this:
| id | user_id | recorded_at | record |
|----|---------|--------------------------|--------|
| 20 | 50245 |2017-10-01 23:00:14.765366| 89 |
| 21 | 50245 |2017-12-05 10:00:33.135331| 97 |
| 22 | 50245 |2017-12-31 11:50:23.965134| 80 |
| 23 | 76766 |2015-10-06 11:00:14.902452| 70 |
| 24 | 76766 |2015-10-07 22:40:59.124553| 81 |
For each user I want to retrieve the latest row and all rows within one month prior to that.
In other words, for user_id 50245, I want the his/her data from "2017-12-01 11:50:23.965134" to "2017-12-31 11:50:23.965134"; for user_id 76766, I want his/her data from "2015-09-07 22:40:59.124553" to "2015-10-07 22:40:59.124553".
Hence the desired result looks like this:
| id | user_id | recorded_at | record |
|----|---------|--------------------------|--------|
| 21 | 50245 |2017-12-05 10:00:33.135331| 97 |
| 22 | 50245 |2017-12-31 11:50:23.965134| 80 |
| 23 | 76766 |2015-10-06 11:00:14.902452| 70 |
| 24 | 76766 |2015-10-07 22:40:59.124553| 81 |
Please note that the record of id 20 is not included because it is more than one month prior to user_id 50245's last record.
Is there any way I can write an SQL query to achieve this?
回答1:
For small tables, any (valid) query technique is good.
For big tables, details matter. Assuming:
There is also a
users
table withuser_id
as PK containing all relevant users (or possibly a few more). This is the typical setup.You have (or can create) an index on
diary (user_id, recorded_at DESC NULLS LAST)
.NULLS LAST
is optional ifrecorded_at
is definedNOT NULL
. But make sure the query matches the index.More than a few rows per user - the typical use case.
This should be among the fastest options:
SELECT d.*
FROM users u
CROSS JOIN LATERAL (
SELECT recorded_at
FROM diary
WHERE user_id = u.user_id
ORDER BY recorded_at DESC NULLS LAST
LIMIT 1
) d1
JOIN diary d ON d.user_id = u.user_id
AND d.recorded_at >= d1.recorded_at - interval '1 month'
ORDER BY d.user_id, d.recorded_at;
Produces your desired result exactly.
For only few rows per user, max()
or DISTINCT ON ()
in a subquery are typically faster.
Related (with detailed explanation):
- Optimize GROUP BY query to retrieve latest record per user
- Select first row in each GROUP BY group?
- What is the difference between LATERAL and a subquery in PostgreSQL?
About the FROM
clause:
- Start with the manual
- Why does this implicit join get planned differently than an explicit join?
- What does [FROM x, y] mean in Postgres?
回答2:
I would be inclined to use window functions:
select d.*
from (select d.*, max(d.recorded_at) over (partition by d.user_id) as max_recorded_at
from diary d
) d
where recorded_at >= max_recorded_at - interval '1 month';
回答3:
The straightforward way is to use a subquery to get the max recorded_at
for each user_id
and then join:
select d.*
from diary d
join ( select user_id, max(recorderd_at) mra
from diary
group by user_id ) m on d.user_id = m.user_id
where m.mra <= d.recorded_at + interval '1 month'
this has the drawback of accessing the table twice (may be different in different RDBMS - use explain
to see the execution plan).
A better alternative is to use window functions to do everything in one pass:
select id, user_id, recorderd_at
from ( select *, max(recorderd_at) over (partition by user_id) as mra
from diary ) x
where mra <= recorderd_at + interval '1 months'
Disclaimer I did not test the queries above, but you should get the idea anyway - see http://sqlfiddle.com/#!17/e90000/9 for a working example w/ similar schema
回答4:
Not tested but something like this should work.
I would use a subquery to get the last_record then filter out those at the date and the previous month like for example :
select d.* from diary d,
(select max(recorded_at) l from diary group by user_id) as last_record
where d.recorded_at = last_record.l
or
(
d.recorded_at >= date_trunc('month', last_record.l - interval '1' month)
and d.recorded_at < last_record.l
)
来源:https://stackoverflow.com/questions/48345520/select-data-within-one-month-prior-to-each-users-last-record