Let\'s say I have a Meeting entity. Each meeting has a single attendee and a meeting date. Within my meeting table I may have multiple meetings for each attendee, with dif
In SQL the solution is very simple - join the table with a subquery, which gives you the most recent meeting for each attendee:
select * from Meeting ALL
join ( select max(meetingDate) as newest, attendee
from Meeting group by attendee ) LATEST
on ALL.meetingDate = LATEST.newest AND ALL.attendee = LATEST.attendee
This works, and works fast!
The problem with JPA is that it (or most implementations) won't allow a subquery for a join. After spending several hours trying what will compile in the first place, and then, how slow it is, I decided that I hate JPA. Solutions like the ones above - like EXISTS (SELECT .. ) or IN ( SELECT .. ) - take ages to execute, orders of magnitude slower than they should.
Having a solution that works meant that I just needed to access that solution from JPA. There are two magic words in SQL that help you do just that:
CREATE VIEW
and the life becomes so much simpler... Just define such entity and use it. Caution: it's read-only.
Of course, any JPA purists will look down on you when you do that, so if anyone has a pure JPA solution, please let us both know!
Well in SQL that would be quite simple I think, so I assume that can be mapped to JPA:
SELECT m.AttendeeId, MAX(m.MeetingDate) from Meeting m GROUP BY m.AttendeeId
Edit: If you need the messageId itself as well you can do that with a simple subquery that returns the messageId for a message where the other two values are equal. Just make sure you handle the case where there are several messageIds for the same Attendee and Date (eg pick the first result since they should all be equally good - although I'd doubt that such data even makes sense for meetings)
As Bulba has said appropriate way is to join a subquery with group by.
The problem is that you can't join a subquery.
Here is a workaround.
Lets see what you get in the subquery with group by. You get a list of pairs (attendee_id, max(meeting_date))
.
This pair is like a new unique id for row with max date you want to join on.
Then note that each row in the table forms a pair (attendee_id, meeting_date)
.
So every row has an id as a pair (attendee_id, meeting_date)
.
Lets take a row if only it forms an id that belongs to list received in the subquery.
For simplicity lets represent this id-pair as a concatenation of attendee_id
and meeting_date
: concat(attendee_id, meeting_date)
.
Then the query in SQL(similarly for JPQL and JPA CriteriaBuilder) would be as follows:
SELECT * FROM meetings
WHERE concat(attendee_id, meeting_date) IN
(SELECT concat(attendee_id, max(meeting_date)) FROM meetings GROUP BY attendee_id)
Note that there is only one subquery per query, not one subquery for each row like in some answers.
We have a special offer for you!
Lets encode that id-pair to number.
It will be a sum of attendee_id
and meeting_date
but with modifications to ensure uniqueness of code. We can take number representation of date as Unix time.
We will fix the value of max date that our code can capture because final code has max value limit (e.g. bigint(int8)<263). Lets take for convenience max date as 2149-06-07 03:00:00. It equals 5662310400 in seconds and 65536 in days.
I will assume here that we need precision for date in days(so we ignore hours and below).
To construct unique code we can interpret it as a number in a numerical system with base of 65536. The last symbol(number from 0 to 216-1) in or code in such numerical system is number of days. Other symbols will capture attendee_id
. In such interpretation code looks like XXXX
, where each X is in range [0,216-1] (to be more accurate, first X is in range [0,215-1] because of 1 bit for sign), first three X represents attendee_id
and last X represents meeting_date
.
So the max value of attendee_id
our code can capture is 247-1.
The code can be computed as attendee_id
*65536 + "date in days".
In postgresql it will be:
attendee_id*65536 + date_part('epoch', meeting_date)/(60*60*24)
Where date_part returns date in seconds which we convert to days by dividing on constant.
And final query to get the latest meetings for all attendees:
SELECT * FROM meetings
WHERE attendee_id*65536 + date_part('epoch', meeting_date)/(60*60*24)
IN (SELECT attendee_id*65536 + date_part('epoch', max(meeting_date))/(60*60*24) from meetings GROUP BY attendee_id);
I have created a table with stucture as in the question and populated it with 100000 rows randomly selecting attendee_id
from [1, 10000] and random date from range [1970-01-01, 2017-09-16]. I have benchmarked (with EXPLAIN ANALYZE) queries with the following techniques:
Correlated subquery
SELECT * FROM meetings m1 WHERE m1.meeting_date=
(SELECT max(m2.meeting_date) FROM meetings m2 WHERE m2.attendee_id=m1.attendee_id);
Execution time: 873260.878 ms
Join subquery with group by
SELECT * FROM meetings m
JOIN (SELECT attendee_id, max(meeting_date) from meetings GROUP BY attendee_id) attendee_max_date
ON attendee_max_date.attendee_id = m.attendee_id;</code>
Execution time: 103.427 ms
Use pair (attendee_id, date)
as a key
Concat attendee_id
and meeting_date
as strings
SELECT * FROM meetings WHERE concat(attendee_id, meeting_date) IN
(SELECT concat(attendee_id, max(meeting_date)) from meetings GROUP BY attendee_id);
Execution time: 207.720 ms
Encode attendee_id
and meeting_date
to a single number(code)
SELECT * FROM meetings
WHERE attendee_id*65536 + date_part('epoch',meeting_date)/(60*60*24)
IN (SELECT attendee_id*65536 + date_part('epoch',max(meeting_date))/(60*60*24) from meetings GROUP BY attendee_id);
Execution time: 127.595 ms
Here is a git with table scheme, table data (as csv), code for populating table, and queries.
Try this
SELECT MAX(m.MeetingDate) FROM Meeting m
I think I've got it with this query.
select m from Meeting m
where m.meetingDate =
(select max(m1.meetingDate)
from Meeting m1
where m1.attendee = m.attendee )
and not exists
(select m2 from Meeting m2
where m2.attendee = m.attendee
and m2.meetingDate > m.meetingDate)