Improving performance of split() function in R?

前端 未结 2 2004
礼貌的吻别
礼貌的吻别 2021-01-20 09:12

I have a data frame in a very simple form:

    X Y
    ---
    A 1
    A 2
    B 3
    C 1
    C 3

My end result should be a list like this

相关标签:
2条回答
  • 2021-01-20 09:38

    Try

     library(data.table)
     DT <- as.data.table(df)
     DT1 <- DT[, list(Y=list(Y)), by=X]
     DT1$Y
     #[[1]]
     #[1] 1 2
    
     #[[2]]
     #[1] 3
    
     #[[3]]
     #[1] 1 3
    

    Or using dplyr

     library(dplyr)
     df1 <-  df %>% 
                 group_by(X) %>%
                  do(Y=c(.$Y))
    
     df1$Y
     #[[1]]
     #[1] 1 2
    
     #[[2]]
     #[1] 3
    
     #[[3]]
     #[1] 1 3
    

    data

     df <- structure(list(X = c("A", "A", "B", "C", "C"), Y = c(1L, 2L, 
     3L, 1L, 3L)), .Names = c("X", "Y"), class = "data.frame", row.names = c(NA, 
     -5L))
    
    0 讨论(0)
  • 2021-01-20 09:39

    I found an elegant solution using similar code from dplyr and/or data.table. I looked for concatenate groups in R and I found this post:

    Efficiently concate character content within one column, by group in R

    And actually, it works quite nicely with

    dt = data.table(content = sample(letters, 26e6, T), groups = LETTERS)
    df = as.data.frame(dt)
    
    system.time(dt[, paste(content, collapse = " "), by = groups])
    #   user  system elapsed 
    #   5.37    0.06    5.65 
    
    system.time(df %>% group_by(groups) %>% summarise(paste(content, collapse = " ")))
    #   user  system elapsed 
    #   7.10    0.13    7.67 
    

    Thanks for all your help

    0 讨论(0)
提交回复
热议问题