Improving performance of split() function in R?

前端 未结 2 2003
礼貌的吻别
礼貌的吻别 2021-01-20 09:12

I have a data frame in a very simple form:

    X Y
    ---
    A 1
    A 2
    B 3
    C 1
    C 3

My end result should be a list like this

2条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-01-20 09:39

    I found an elegant solution using similar code from dplyr and/or data.table. I looked for concatenate groups in R and I found this post:

    Efficiently concate character content within one column, by group in R

    And actually, it works quite nicely with

    dt = data.table(content = sample(letters, 26e6, T), groups = LETTERS)
    df = as.data.frame(dt)
    
    system.time(dt[, paste(content, collapse = " "), by = groups])
    #   user  system elapsed 
    #   5.37    0.06    5.65 
    
    system.time(df %>% group_by(groups) %>% summarise(paste(content, collapse = " ")))
    #   user  system elapsed 
    #   7.10    0.13    7.67 
    

    Thanks for all your help

提交回复
热议问题