dplyr group by, carry forward value from previous group to next

后端 未结 6 2071
小蘑菇
小蘑菇 2020-12-31 16:50

Ok this is the over all view of what i\'m trying to achieve with dplyr:

Using dplyr I am making calculations to form new columns.

initial.         


        
6条回答
  •  借酒劲吻你
    2020-12-31 16:59

    You're using data.table in the question and have tagged the question data.table, so here is a data.table answer. When j evaluates, it's in a static scope where local variables retain their values from the previous group.

    Using dummy data to demonstrate :

    require(data.table)
    set.seed(1)
    DT = data.table( long = rep(c(0,1,0,1),each=3),
                     val = sample(5,12,replace=TRUE))
    DT
        long val
     1:    0   2
     2:    0   2
     3:    0   3
     4:    1   5
     5:    1   2
     6:    1   5
     7:    0   5
     8:    0   4
     9:    0   4
    10:    1   1
    11:    1   2
    12:    1   1
    
    DT[, v1:=sum(val), by=rleid(long)][]
        long val v1
     1:    0   2  7
     2:    0   2  7
     3:    0   3  7
     4:    1   5 12
     5:    1   2 12
     6:    1   5 12
     7:    0   5 13
     8:    0   4 13
     9:    0   4 13
    10:    1   1  4
    11:    1   2  4
    12:    1   1  4
    

    So far, simple enough.

    prev = NA  # initialize previous group value
    DT[, v2:={ans<-last(val)/prev; prev<-sum(val); ans}, by=rleid(long)][]
        long val v1         v2
     1:    0   2  7         NA
     2:    0   2  7         NA
     3:    0   3  7         NA
     4:    1   5 12 0.71428571
     5:    1   2 12 0.71428571
     6:    1   5 12 0.71428571
     7:    0   5 13 0.33333333
     8:    0   4 13 0.33333333
     9:    0   4 13 0.33333333
    10:    1   1  4 0.07692308
    11:    1   2  4 0.07692308
    12:    1   1  4 0.07692308
    
    > 3/NA
    [1] NA
    > 5/7
    [1] 0.7142857
    > 4/12
    [1] 0.3333333
    > 1/13
    [1] 0.07692308
    > prev
    [1] NA
    

    Notice that the prev value did not update because prev and ans are local variables inside j's scope that were being updated as each group ran. Just to illustrate, the global prev can be updated from within each group using R's <<- operator :

    DT[, v2:={ans<-last(val)/prev; prev<<-sum(val); ans}, by=rleid(long)]
    prev
    [1] 4
    

    But there's no need to use <<- in data.table as local variables are static (retain their values from previous group). Unless you need to use the final group's value after the query has finished.

提交回复
热议问题