Efficient functional programming (using mapply) in R for a “naturally” procedural problem

后端 未结 5 1827
独厮守ぢ
独厮守ぢ 2021-01-07 11:43

A common use case in R (at least for me) is identifying observations in a data frame that have some characteristic that depends on the values in some subset of other observa

相关标签:
5条回答
  • 2021-01-07 12:09

    This situation is tailor-made for using the plyr package.

    ddply(raw, .(WorkerId), function(df) df[-NROW(df),])
    

    It produces the output

    WorkerId Iteration
    1        1         1
    2        1         2
    3        1         3
    4        2         1
    5        2         2
    6        2         3
    7        3         1
    8        3         2
    9        3         3
    
    0 讨论(0)
  • 2021-01-07 12:09
    subset(raw, Iteration != ave(Iteration, WorkerId, FUN=max))
    
    0 讨论(0)
  • 2021-01-07 12:15

    The "most natural way" IMO is the split-lapply-rbind method. You start by split()-ting into a list of groups, then lapply() the processing rule (in this case removing the last row) and then rbind() them back together. It's all doable as a nested set of function calls. The inner two steps are illustrated here and the final one-liner is presented at the bottom:

    > lapply( split(raw, raw$WorkerId), function(x) x[-NROW(x),] )
    $`1`
      WorkerId Iteration
    1        1         1
    2        1         2
    3        1         3
    
    $`2`
      WorkerId Iteration
    5        2         1
    6        2         2
    7        2         3
    
    $`3`
       WorkerId Iteration
    9         3         1
    10        3         2
    11        3         3
    
    do.call(rbind,  lapply( split(raw, raw$WorkerId), function(x) x[-NROW(x),] ) ) 
    

    Hadley Wickham has developed a wide set of tools, the plyr package, that extend this strategy to a wider variety of tasks.

    0 讨论(0)
  • 2021-01-07 12:20
    remove <- with(raw, as.logical(ave(Iteration, WorkerId, FUN=function(x) c(rep(TRUE, length(x)-1), FALSE)))))
    
    0 讨论(0)
  • 2021-01-07 12:26

    For the specific problem posed !rev(duplicated(rev(raw$WorkerId))) or better, following Charles' advice, !duplicated(raw$WorkerId, fromLast=TRUE)

    0 讨论(0)
提交回复
热议问题