发表新帖

发表新帖

Challenge: recoding a data.frame() — make it faster

后端未结

关注

 6  1632

借酒劲吻你 2021-02-04 06:43

Recoding is a common practice for survey data, but the most obvious routes take more time than they should.

The fastest code that accomplishes the same task with the pr

6条回答

不知归路 (楼主)

2021-02-04 06:45
Making factors is expensive; only doing it once is comparable with the commands using structure, and in my opinion, preferable as you don't have to depend on how factors happen to be constructed.
```
rc <- factor(re.codes, levels=re.codes)
dat5 <- as.data.frame(lapply(dat, function(d) rc[d]))
```
EDIT 2: Interestingly, this seems to be a case where lapply does speed things up. This for loop is substantially slower.
```
for(i in seq_along(dat)) {
  dat[[i]] <- rc[dat[[i]]]
}
```
EDIT 1: You can also speed things up by being more precise with your types. Try any of the solutions (but especially your original one) creating your data as integers, as follows. For details, see a previous answer of mine here.
```
dat <- cbind(rep(1:5,50000),rep(5:1,50000),rep(c(1L,2L,4L,5L,3L),50000))
```
This is also a good idea as converting to integers from floating points, as is being done in all of the faster solutions here, can give unexpected behavior, see this question.
0 讨论(0)

查看其它6个回答
发布评论:

提交评论
- 加载中...

热议问题