Calculate difference between two values in grouped sequences

前端 未结 1 726
挽巷
挽巷 2021-01-28 04:28

This is a follow-up question for this post: Loop through dataframe in R and measure time difference between two values

I already got excellent help with the following co

1条回答
  •  南方客
    南方客 (楼主)
    2021-01-28 05:22

    Try this.

    # df$Date <- as.POSIXct(strptime(df$Date,"%d.%m.%Y %H:%M"))
    df %>% 
      arrange(User, Date) %>% 
      group_by(User) %>%
      mutate(
        last.date = Date[which(StimuliA == 1L)[c(1,1:sum(StimuliA == 1L))][cumsum(StimuliA == 1L)+ 1]]
      ) %>%
      mutate(
        timesince = ifelse(Responses == 1L, Date - last.date, NA)
      )
    

    This works by first creating a column that records the data of last stimuli, and then using ifelse and lag to get the difference between the current date and the last stimuli date. You can filter to extract only the LAST response.

    There is a cleaner way to do the "last.date" operation with zoo.na.locf, but I didn't want to assume you were ok with another package dependency.

    EDIT To identify the sequence (if I correctly understand what you mean by "sequence"), continue the chain with

    %>% mutate(sequence = cumsum(StimuliA))
    

    to identify sequences defined as observations following a positive Stimuli. To filter out the last response of a sequence, continue the chain with

    %>% group_by(User, sequence) %>%
      filter(timesince == max(timesince, na.rm = TRUE))
    

    to group by sequence (and user) and then extract the maximum time difference associated with each sequence (which will correspond to the last positive response of a sequence).

    0 讨论(0)
提交回复
热议问题