Recoding is a common practice for survey data, but the most obvious routes take more time than they should.
The fastest code that accomplishes the same task with the pr
Making factors is expensive; only doing it once is comparable with the commands using structure
, and in my opinion, preferable as you don't have to depend on how factors happen to be constructed.
rc <- factor(re.codes, levels=re.codes)
dat5 <- as.data.frame(lapply(dat, function(d) rc[d]))
EDIT 2: Interestingly, this seems to be a case where lapply
does speed things up. This for loop is substantially slower.
for(i in seq_along(dat)) {
dat[[i]] <- rc[dat[[i]]]
}
EDIT 1: You can also speed things up by being more precise with your types. Try any of the solutions (but especially your original one) creating your data as integers, as follows. For details, see a previous answer of mine here.
dat <- cbind(rep(1:5,50000),rep(5:1,50000),rep(c(1L,2L,4L,5L,3L),50000))
This is also a good idea as converting to integers from floating points, as is being done in all of the faster solutions here, can give unexpected behavior, see this question.