I am attempting to create new variables using a function and lapply
rather than working right in the data with loops. I used to use Stata and would have solved this
The idiomatic way to do this kind of thing in R would be to use a combination of split
and lapply
. You're halfway there with your use of lapply
; you just need to use split
as well.
lapply(split(data, data$v1), function(df) {
cutoff <- quantile(df$v2, c(0.8, 0.9))
top_pct <- ifelse(df$v2 > cutoff[2], 10, ifelse(df$v2 > cutoff[1], 20, NA))
na.omit(data.frame(id=df$custID, top_pct))
})
Finding quantiles is done with quantile
.