Calculate differences between rows faster than a for loop?

前端 未结 3 767
情书的邮戳
情书的邮戳 2020-12-18 08:32

I have a data set that looks like this:

ID   |   DATE    | SCORE
-------------------------
123  |  1/15/10  |  10
123  |  1/1/10   |  15
124  |  3/5/10   |           


        
相关标签:
3条回答
  • 2020-12-18 09:10

    This should work if your the dates are in order within id.

    id<-c(123,123,124,124)
    date<-as.Date(c('2010-01-15','2010-01-01','2010-03-05','2010-01-05'))
    score<-c(10,15,20,30)
    data<-data.frame(id,date,score)
    
    data <- data[order(data$id,data$date),]
    data$dayssincelast<-do.call(c,by(data$date,data$id,function(x) c(NA,diff(x))))
    # Or, even more concisely
    data$dayssincelast<-unlist(by(data$date,data$id,function(x) c(NA,diff(x))))
    
    0 讨论(0)
  • 2020-12-18 09:24

    How does the following work for you?

     indx <- which(data$id == c(data$id[-1], NA))
     data$date[indx] - data$date[indx+1]
    



    This just shifts the id's by 1 and compares them to id to check for neighboring matches.
    Then for the dat subtraction, simply subtract the matches from the date of the subsequent row.

    0 讨论(0)
  • 2020-12-18 09:25

    In the case where you need a more complex formula, you can use aggregate:

    a <- aggregate(date ~ id, data=data, FUN=function(x) c(NA,diff(x)))
    data$dayssincelast <- c(t(a[-1]), recursive=TRUE) # Remove 'id' column
    

    The same sort order applies here as in @nograpes answer.

    0 讨论(0)
提交回复
热议问题