How to efficiently determine changes between rows using SQL

前端 未结 2 1561
鱼传尺愫
鱼传尺愫 2021-01-05 18:27

I have a very large MySQL table containing data read from a number of sensors. Essentially, there\'s a time stamp and a value column. I\'ll omit the sensor id, indexes other

相关标签:
2条回答
  • I suppose it's not an option for you to switch DB engine. In case it might be, then window functions would allow you to write things like this:

    SELECT d.*
    FROM (
        SELECT d.*, lag(d.value) OVER (ORDER BY d.time) as previous_value 
        FROM data d
      ) as d
    WHERE d.value IS DISTINCT FROM d.previous_value;
    

    If not, you could try to rewrite the query like so:

    select data.*
    from data
    left join (
        select data.measure_id,
               data.time,
               max(prev_data) as prev_time
        from data
        left join data as prev_data
        on prev_data.time < data.time
        group by data.measure_id, data.time, data.value
        ) as prev_data_time
    on prev_data_time.measure_id = data.measure_id
    and prev_data_time.time = data.time
    left join prev_data_value
    on prev_data_value.measure_id = data.measure_id
    and prev_data_value.time = prev_data_time.prev_time
    where data.value <> prev_data_value.value or prev_data_value.value is null
    
    0 讨论(0)
  • 2021-01-05 19:04

    You might try this - I'm not going to guarantee that it will perform better, but it's my usual way of correlating a row with a "previous" row:

    SELECT
        * --TODO, list columns
    FROM
        data d
           left join
        data d_prev
           on
               d_prev.time < d.time --TODO - Other key columns?
           left join
        data d_inter
           on
               d_inter.time < d.time and
               d_prev.time < d_inter.time --TODO - Other key columns?
    WHERE
        d_inter.time is null AND
        (d_prev.value is null OR d_prev.value <> d.value)
    

    (I think this is right - could do with some sample data to validate it).

    Basically, the idea is to join the table to itself, and for each row (in d), find candidate rows (in d_prev) for the "previous" row. Then do a further join, to try to find a row (in d_inter) that exists between the current row (in d) and the candidate row (in d_prev). If we cannot find such a row (d_inter.time is null), then that candidate was indeed the previous row.

    0 讨论(0)
提交回复
热议问题