How to check if an id comes into data on a particular date that it stays until an exit date

问题

I have a data set that looks something like below. Basically, I am interested in checking if a particular id is present at the beginning of the year(in this case jan,1,2003) that it is present everyday until the end of the year( dec 31 2003) then starting the checking process over again with the start of next year as people might change from year to year but should not change within a year. If on certain day, an id is not present I would like to know which day and which id.

I first started with a for loop and checked every two days but this is super inefficient since my data set spans roughly 50 years and will grow later on with new data.

dates <- rep(seq(as.Date("2003/01/01"), as.Date("2004/12/31"), "days"),each = 3)
id <- rep(1:3,times = length(unique(dates))) 
df <- data.frame( dates = dates,id = id)

Edit:The above chunk has all the dates in it but if I delete for example id = 1 on the second day, the code should tell me it is missing so the count shouldn't be the same. I added the piece to delete the id = 1 on the second day below.

df <- df[-4,]

The code below will make the same data set but delete id = 1 for jan 2, 2003 and jan 3, 2003. I am trying to get something that returns the id that is missing and the date.

dates <- rep(seq(as.Date("2003/01/01"), as.Date("2004/12/31"), "days"),each = 3)
id <- rep(1:3,times = length(unique(dates))) 
df <- data.frame( dates = dates,id = id)
df <- df[-4,]
df <- df[-6,]

回答1:

This code chunk will count number of times a person appears in each year. if the answer is 365 or 366 in leap years a person was there everyday of the year.

library(dplyr)
library(tidyr)

dates <- rep(seq(as.Date("2003/01/01"), as.Date("2004/12/31"), "days"),each = 3)
id <- rep(1:3,times = length(unique(dates))) 
df <- data.frame( dates = dates,id = id)

    dfx <- df %>% 
          mutate(yrs = lubridate::year(dates)) %>% 
          group_by(id, dates) %>% 
          filter(row_number()==1) %>% 
          group_by(id, yrs) %>% 
          tally



#remove values
dfa <- df[c(-4,-6),]

The in oder to find the date of missing value add an indicator column to the data set. then fill in the missing dates by id. After this the val column will have missing values. Filter the data to get the dates where it went missing.

dfx <- dfa %>% 
        mutate(val = 1) %>% 
       complete(nesting(id),
                dates = seq(min(dates),max(dates),by = "day")) %>% 
        filter(is.na(val))

来源：https://stackoverflow.com/questions/53674579/how-to-check-if-an-id-comes-into-data-on-a-particular-date-that-it-stays-until-a

标签

performance

data-cleaning