Split dataframe by levels of a factor and name dataframes by those levels

后端 未结 3 1271

I want to split an existing dataframe by the levels of one of the factor variables so that the names of the split dataframes would correspond to the levels of the factor.

相关标签:
3条回答
  • 2020-11-28 15:22
    sapply( levels( df$Z ), function( x ) list( subset( df, Z == x ) ) )
    

    This will return a list with elements named after the levels of df$Z, each one containing the subset of df.

    Ops, a better answer was provided, but has been deleted -- I will put the solution here:

    split(df, df$Z)
    
    0 讨论(0)
  • 2020-11-28 15:28

    You can do it with the plyr package

    require(plyr)
    dlply(df, .(Z))
    
    0 讨论(0)
  • 2020-11-28 15:34

    In base R, you should use the function split. And split has a default method and one for data.frame. However, I find that split.data.frame is very slow as the number of levels to split on becomes huge. That is,

    # inefficient in my opinion
    split(df, df$Z)
    

    The above solution will give you the names you ask for as well directly, but will choke on large levels.

    And if you're willing to trade using external packages for speed/efficiency, I'd suggest using data.table package:

    require(data.table)
    dt <- data.table(df)
    oo <- dt[, list(list(.SD)), by = Z]$V1
    names(oo) <- unique(dt$Z)
    
    0 讨论(0)
提交回复
热议问题