Improving performance of split() function in R?

血红的双手。 提交于 2019-12-01 21:39:43

Try

 library(data.table)
 DT <- as.data.table(df)
 DT1 <- DT[, list(Y=list(Y)), by=X]
 DT1$Y
 #[[1]]
 #[1] 1 2

 #[[2]]
 #[1] 3

 #[[3]]
 #[1] 1 3

Or using dplyr

 library(dplyr)
 df1 <-  df %>% 
             group_by(X) %>%
              do(Y=c(.$Y))

 df1$Y
 #[[1]]
 #[1] 1 2

 #[[2]]
 #[1] 3

 #[[3]]
 #[1] 1 3

data

 df <- structure(list(X = c("A", "A", "B", "C", "C"), Y = c(1L, 2L, 
 3L, 1L, 3L)), .Names = c("X", "Y"), class = "data.frame", row.names = c(NA, 
 -5L))
Daniel Schultz

I found an elegant solution using similar code from dplyr and/or data.table. I looked for concatenate groups in R and I found this post:

Efficiently concate character content within one column, by group in R

And actually, it works quite nicely with

dt = data.table(content = sample(letters, 26e6, T), groups = LETTERS)
df = as.data.frame(dt)

system.time(dt[, paste(content, collapse = " "), by = groups])
#   user  system elapsed 
#   5.37    0.06    5.65 

system.time(df %>% group_by(groups) %>% summarise(paste(content, collapse = " ")))
#   user  system elapsed 
#   7.10    0.13    7.67 

Thanks for all your help

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!