A common use case in R (at least for me) is identifying observations in a data frame that have some characteristic that depends on the values in some subset of other observa
This situation is tailor-made for using the plyr
package.
ddply(raw, .(WorkerId), function(df) df[-NROW(df),])
It produces the output
WorkerId Iteration
1 1 1
2 1 2
3 1 3
4 2 1
5 2 2
6 2 3
7 3 1
8 3 2
9 3 3
subset(raw, Iteration != ave(Iteration, WorkerId, FUN=max))
The "most natural way" IMO is the split-lapply-rbind method. You start by split()-ting into a list of groups, then lapply() the processing rule (in this case removing the last row) and then rbind() them back together. It's all doable as a nested set of function calls. The inner two steps are illustrated here and the final one-liner is presented at the bottom:
> lapply( split(raw, raw$WorkerId), function(x) x[-NROW(x),] )
$`1`
WorkerId Iteration
1 1 1
2 1 2
3 1 3
$`2`
WorkerId Iteration
5 2 1
6 2 2
7 2 3
$`3`
WorkerId Iteration
9 3 1
10 3 2
11 3 3
do.call(rbind, lapply( split(raw, raw$WorkerId), function(x) x[-NROW(x),] ) )
Hadley Wickham has developed a wide set of tools, the plyr
package, that extend this strategy to a wider variety of tasks.
remove <- with(raw, as.logical(ave(Iteration, WorkerId, FUN=function(x) c(rep(TRUE, length(x)-1), FALSE)))))
For the specific problem posed !rev(duplicated(rev(raw$WorkerId)))
or better, following Charles' advice, !duplicated(raw$WorkerId, fromLast=TRUE)