问题
I have a table I need to do some data conversion on. It is a simple tracking table as outlined below:
- SSN 9,0 KEY (ex. 123456789) NON-NULL
- DATE 8,0 KEY (ex. 20131202) NON-NULL
- TIME 6,0 KEY (ex. 133000) NON-NULL
- PRINT_NEW Z (ex.2013-12-02-11.23.47.965000) (CURRENT_TIMESTAMP used) NON-NULL
- PRINT_OLD Z (ex. 2013-12-02-11.23.47.965000) (CURRENT_TIMESTAMP used) NULLABLE
Previously I was inserting the current system time into the [TIME] field, however, what I should have been doing is inserting the [TIME] field from the changelog I was joined to in processing.
As a start to this conversion, I am trying to select the [SSN], [DATE], [TIME] from my tracking table, and the [TIME] from the changelog (the value that [TIME] in my tracking table should actually contain).
The issue I'm having however is that the change log can have multiple entries, even on 1 particular date. For instance, my below attempt returns the following:
SELECT DISTINCT a.SSN, a.DATE, a.TIME, b.TIME AS CORRECT_TIME
FROM trackTable a, changeLog b
WHERE (a.SSN = b.SSAN) AND (a.DATE = b.DATE)
Results:
SSN | DATE | TIME | CORRECT_TIME
123456789 | 20140117 | 94738 | 91541
123456789 | 20140117 | 94738 | 91542
678912345 | 20140123 | 124542 | 144557
678912345 | 20140123 | 124542 | 144558
678912345 | 20140123 | 124542 | 144559
678912345 | 20140123 | 124542 | 144600
My question is, how can I select only the MOST RECENT value for field [CORRECT_TIME]? I've been trying a few variations of joins and where clauses, but I'm still pretty new to SQL.
回答1:
Try a Common Table Expression. The 'with xxx as (...)' part will create a temporary table in memory so to speak. The table will contain the latest time via MAX(TIME) for each unique combination of SSN and DATE via GROUP BY SSN, DATE.
Once you have the latest time for each SSN/DATE, you can JOIN back to it in your main query.
with latest as (select ssn, date, max(time) as latest_time from changelog group by ssn, date)
select t.ssn, t.date, t.time, latest_time
from tracktable t join latest l on t.ssn = l.ssn and t.date = l.date
order by t.ssn, t.date, t.time;
回答2:
with tbl as (select ssan, date, max(time) as correct_time
from changelog group by ssan, date
)
select a.SSN, a.DATE, a.TIME, b.CORRECT_TIME
from tracktable a
join tbl b on (a.SSN = b.SSAN) AND (a.DATE = b.DATE)
回答3:
try something like this:
SELECT DISTINCT a.SSN, a.DATE, a.TIME, b.TIME AS CORRECT_TIME
FROM trackTable a, changeLog b
WHERE (a.SSN = b.SSAN) AND (a.DATE = b.DATE)
ORDER BY CORRECT_TIME
FETCH FIRST 1 ROW ONLY
回答4:
You can do this using window/analytic functions:
SELECT SSN, DATE, TIME, CORRECT_TIME
FROM (SELECT a.SSN, a.DATE, a.TIME, b.TIME AS CORRECT_TIME,
max(b.TIME) over (partition by a.SSN) as MAX_CORRECT_TIME
FROM trackTable a join
changeLog b
on a.SSN = b.SSAN AND a.DATE = b.DATE
) ab
WHERE CORRECT_TIME = MAX_CORRECT_TIME;
First, note that I changed the join
to use explicit join
syntax with an on
clause. This is much better than the implicit joins in the where
clause.
Second, this assumes that you want the latest time overall. If you want the latest on each date, then change the partition by
clause to a.SSN, a.DATE
来源:https://stackoverflow.com/questions/22045416/return-records-with-only-most-recent-time-value