问题
I have a postgreSQL database where I want to record how a specific column changes for each id, over time. Table1:
personID | status | unixtime | column d | column e | column f
1 2 213214 x y z
1 2 213325 x y z
1 2 213326 x y z
1 2 213327 x y z
1 2 213328 x y z
1 3 214330 x y z
1 3 214331 x y z
1 3 214332 x y z
1 2 324543 x y z
I want to track all the of status over time. So based on this I want a new table, table2 with the following data:
personID | status | unixtime | column d | column e | column f
1 2 213214 x y z
1 3 214323 x y z
1 2 324543 x y z
x,y,z are variables that can and will vary between each row. The tables have thousands of others personID with changing ID's that I would like to capture as well. A single group by status,personid is not enough (as I see it) as I can store several rows of the same status and personID, just as there has been a status change.
I do this in Python, but it's pretty slow (and I guess its a lot of IO):
for person in personid:
status = -1
records = getPersonRecords(person) #sorted by unixtime in query
newrecords = []
for record in records:
if record.status != status:
status = record.status
newrecords.append(record)
appendtoDB(newrecords)
回答1:
This is a gaps-and-island problem. You want the start of each island, which you can identify by comparing the status on the current row to the status on the "previous" record.
Window functions come handy for this:
select t.*
from (
select t.*, lag(status) over(partition by personID order by unixtime) lag_status
from mytable t
) t
where lag_status is null or status <> lag_status
来源:https://stackoverflow.com/questions/62214728/select-only-rows-that-has-a-column-changed-from-the-rows-before-it-given-an-uni