问题
I am trying to use Oracle 11g (11.1 in dev, 11.2 in production) for numeric analysis, specifically linear interpolation on a table which has three columns of interest: a timestamp, a deviceid, and value.
The value columns holds data from the device (with id deviceid), taken at the time given in the timestamp. For example, this is bogus data, but it gives the idea:
time | deviceid | value
----------------|------------|-----------
01:00:00.000 | 001 | 1.000
01:00:01.000 | 001 | 1.030
01:00:02.000 | 001 | 1.063
01:00:00.050 | 002 | 553.10
01:00:01.355 | 002 | 552.30
01:00:02.155 | 002 | 552.43
The timestamps from device 001 do not match the timestamps of device 002, but I need to have the values from both device 001 and 002 in one row, with one timestamp, matching the timestamp for device 001. What I want to end up with is something like this:
time | device 001 | device 002
----------------|--------------|------------
01:00:00.000 | 1.000 | null
01:00:01.000 | 1.030 | 552.520
01:00:02.000 | 1.063 | 552.405
Where the value for device 002 was linearly interpolated based on the values for device 002 gathered at the two closest timestamps on either side of each timestamp for device 001. The null occurs because I don't have two timestamps for device 002 on either side of 01:00:00.000, and I don't want to extrapolate the value.
From what I understand I can use percentile_cont to do this, but I don't understand the examples I have seen online. For example, where would the percentile used by percentile_cont come from?
Thanks in advance for your help!
回答1:
I'm not sure how you'd use PERCENTILE_CONT
to do the interpolation you ask for, but with the help of a different analytic function you can achieve what you want.
Firstly, we'll create the following function, which converts INTERVAL DAY TO SECOND
values into seconds:
CREATE OR REPLACE FUNCTION intvl_to_seconds(
p_interval INTERVAL DAY TO SECOND
) RETURN NUMBER DETERMINISTIC
AS
BEGIN
RETURN EXTRACT(DAY FROM p_interval) * 24*60*60
+ EXTRACT(HOUR FROM p_interval) * 60*60
+ EXTRACT(MINUTE FROM p_interval) * 60
+ EXTRACT(SECOND FROM p_interval);
END;
/
With this function we can use a query such as the following:
SELECT d1.time,
d1.value AS value1,
q2.prev_value + intvl_to_seconds(d1.time - q2.prev_time) * (q2.next_value - q2.prev_value)/intvl_to_seconds(q2.next_time - q2.prev_time) AS value2
FROM devices d1
LEFT OUTER JOIN (SELECT d2.time AS prev_time,
d2.value AS prev_value,
LEAD(d2.time, 1) OVER (ORDER BY d2.time) AS next_time,
LEAD(d2.value, 1) OVER (ORDER BY d2.time) AS next_value
FROM devices d2
WHERE d2.deviceid = 2) q2
ON d1.time BETWEEN q2.prev_time AND q2.next_time
WHERE d1.deviceid = 1;
I took your data above, set the date component of the timestamps to today, and I got the following results when I ran the query above:
TO_CHAR(D1.TIME) VALUE1 VALUE2 ------------------------------------- ---------- ---------- 09-SEP-11 01.00.00.000000 1 09-SEP-11 01.00.01.000000 1.03 552.517625 09-SEP-11 01.00.02.000000 1.063 552.404813
(I added a TO_CHAR
around d1.time
to cut down on excessive spacing in SQL*Plus.)
If you're using DATE
s instead of TIMESTAMP
s, you don't need the function: you can just subtract the dates.
回答2:
I am using a modified version of @Luke Woodward's query:
SELECT d1.time,
d1.value AS value1,
q2.prev_value +
(EXTRACT( SECOND FROM (d1.time - q2.prev_time)) +
EXTRACT( MINUTE FROM (d1.time - q2.prev_time)) * 60 )
* (q2.next_value - q2.prev_value)/
(EXTRACT ( SECOND FROM (q2.next_time - q2.prev_time)) +
EXTRACT ( MINUTE FROM (q2.next_time - q2.prev_time)) * 60) AS value2
FROM devices d1
LEFT OUTER JOIN (SELECT d2.time AS prev_time,
d2.value AS prev_value,
LEAD(d2.time, 1) OVER (ORDER BY d2.time) AS next_time,
LEAD(d2.value, 1) OVER (ORDER BY d2.time) AS next_value
FROM devices d2
WHERE d2.deviceid = 2
and time between '20100914 000000' and '20100915 000000'
) q2
ON d1.time BETWEEN q2.prev_time AND q2.next_time
WHERE d1.deviceid = 1;
but the interpolated values are always coming up as null, even though there is data for device 2 in the date range.
Note, I had to add a date range for the query in q2 which is perhaps why the normal join loses the outer data.
I don't get null values for the interpolated data if I use a normal join, but in using a normal join, I lose the data for the device 1 outside the endpoints for device 2 (the interpolated device in q2). Suggestions?
回答3:
The final solution with the date range:
SELECT
d1.time,
d1.value AS value1,
q2.prev_value +
(EXTRACT( SECOND FROM (d1.time - q2.prev_time)) +
EXTRACT( MINUTE FROM (d1.time - q2.prev_time)) * 60 )
* (q2.next_value - q2.prev_value)/
(EXTRACT ( SECOND FROM (q2.next_time - q2.prev_time)) +
EXTRACT ( MINUTE FROM (q2.next_time - q2.prev_time)) * 60
) AS value2
FROM devices d1
LEFT OUTER JOIN (
SELECT d2.time AS prev_time,
d2.value AS prev_value,
LEAD(d2.time, 1) OVER (ORDER BY d2.time) AS next_time,
LEAD(d2.value, 1) OVER (ORDER BY d2.time) AS next_value
FROM devices d2
WHERE d2.deviceid = 2
AND time BETWEEN '20100914 000000' AND '20100915 000000'
) q2
ON d1.time BETWEEN q2.prev_time AND q2.next_time
WHERE d1.deviceid = 1
AND time BETWEEN '20100914 000000' AND '20100915 000000';
来源:https://stackoverflow.com/questions/7366308/how-can-i-perform-linear-interpolation-using-oracle-sql