Select only rows that has a column changed from the rows before it, given an unique ID

ぐ巨炮叔叔 提交于 2020-06-27 16:57:50

问题


I have a postgreSQL database where I want to record how a specific column changes for each id, over time. Table1:

personID | status | unixtime | column d | column e | column f
    1        2       213214      x            y        z
    1        2       213325      x            y        z
    1        2       213326      x            y        z
    1        2       213327      x            y        z
    1        2       213328      x            y        z
    1        3       214330      x            y        z
    1        3       214331      x            y        z
    1        3       214332      x            y        z
    1        2       324543      x            y        z

I want to track all the of status over time. So based on this I want a new table, table2 with the following data:

personID | status | unixtime | column d | column e | column f
    1        2       213214      x            y        z
    1        3       214323      x            y        z
    1        2       324543      x            y        z

x,y,z are variables that can and will vary between each row. The tables have thousands of others personID with changing ID's that I would like to capture as well. A single group by status,personid is not enough (as I see it) as I can store several rows of the same status and personID, just as there has been a status change.

I do this in Python, but it's pretty slow (and I guess its a lot of IO):

for person in personid:
    status = -1
    records = getPersonRecords(person) #sorted by unixtime in query
    newrecords = []
    for record in records:
        if record.status != status:
                 status = record.status
                 newrecords.append(record)
    appendtoDB(newrecords)

回答1:


This is a gaps-and-island problem. You want the start of each island, which you can identify by comparing the status on the current row to the status on the "previous" record.

Window functions come handy for this:

select t.*
from (
    select t.*, lag(status) over(partition by personID order by unixtime) lag_status
    from mytable t
) t
where lag_status is null or status <> lag_status


来源:https://stackoverflow.com/questions/62214728/select-only-rows-that-has-a-column-changed-from-the-rows-before-it-given-an-uni

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!