data.table efficient recycling V2

这一生的挚爱 提交于 2019-12-11 15:09:43

问题


This is a follow-up to this question : data.table efficient recycling

The difference here is that the number of future years for each line is not necessarily the same ..

I frequently use recycling in data.table, for exemple when I need to make projections future years. I repeat my original data fro each future year.

This can lead to something like that :

library(data.table)
dt <- data.table(1:500000, 500000:1, rpois(500000, 240))
dt2 <- dt[, c(.SD, .(year = 1:V3)), by = 1:nrow(dt) ]

But I often have to deal with millions of lines, and far more columns than in this toy exemple. The time increases .. Try this :

library(data.table)
dt <- data.table(1:5000000, 5000000:1, rpois(5000000, 240))
dt2 <- dt[, c(.SD, .(year = 1:V3)), by = 1:nrow(dt) ]

My question is : is there a more efficient to achieve this purpose ?

Thanks for any help !


回答1:


This is a faster implementation, but still long due to the lapply loop in the data.table

dt2 <- data.table(
  rep(dt$V1, dt$V3),
  rep(dt$V2, dt$V3),
  rep(dt$V3, dt$V3),
  unlist(lapply(dt$V3, function(x){1:x}))
)

I hope this is of any help!



来源:https://stackoverflow.com/questions/59197298/data-table-efficient-recycling-v2

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!