How to efficiently determine changes between rows using SQL

前端未结

关注

 2  1561

I have a very large MySQL table containing data read from a number of sensors. Essentially, there\'s a time stamp and a value column. I\'ll omit the sensor id, indexes other

相关标签:

2条回答

不要未来只要你来

2021-01-05 19:01

I suppose it's not an option for you to switch DB engine. In case it might be, then window functions would allow you to write things like this:

SELECT d.*
FROM (
    SELECT d.*, lag(d.value) OVER (ORDER BY d.time) as previous_value 
    FROM data d
  ) as d
WHERE d.value IS DISTINCT FROM d.previous_value;

If not, you could try to rewrite the query like so:

select data.*
from data
left join (
    select data.measure_id,
           data.time,
           max(prev_data) as prev_time
    from data
    left join data as prev_data
    on prev_data.time < data.time
    group by data.measure_id, data.time, data.value
    ) as prev_data_time
on prev_data_time.measure_id = data.measure_id
and prev_data_time.time = data.time
left join prev_data_value
on prev_data_value.measure_id = data.measure_id
and prev_data_value.time = prev_data_time.prev_time
where data.value <> prev_data_value.value or prev_data_value.value is null

0 讨论(0)

伪装坚强ぢ

2021-01-05 19:04
You might try this - I'm not going to guarantee that it will perform better, but it's my usual way of correlating a row with a "previous" row:
```
SELECT
    * --TODO, list columns
FROM
    data d
       left join
    data d_prev
       on
           d_prev.time < d.time --TODO - Other key columns?
       left join
    data d_inter
       on
           d_inter.time < d.time and
           d_prev.time < d_inter.time --TODO - Other key columns?
WHERE
    d_inter.time is null AND
    (d_prev.value is null OR d_prev.value <> d.value)
```
(I think this is right - could do with some sample data to validate it).

Basically, the idea is to join the table to itself, and for each row (in d), find candidate rows (in d_prev) for the "previous" row. Then do a further join, to try to find a row (in d_inter) that exists between the current row (in d) and the candidate row (in d_prev). If we cannot find such a row (d_inter.time is null), then that candidate was indeed the previous row.
0 讨论(0)
发布评论:

提交评论
- 加载中...