How to flatten R data frame that contains lists?

后端 未结 3 1965
梦谈多话
梦谈多话 2020-12-17 15:08

I want to find the best \"R way\" to flatten a dataframe that looks like this:

  CAT    COUNT     TREAT
   A     1,2,3     Treat-a, Treat-b
   B     4,5              


        
相关标签:
3条回答
  • 2020-12-17 15:47

    Here is a solution using base R, accepting vectors of any length inside your list and no need to specify which columns of the dataframe you want to collapse. Part of the solution was generated using this answer.

    df2 <- do.call(cbind,lapply(df,function(x){
      #check if it is a list, otherwise just return as is
      if(is.list(x)){
        return(data.frame(t(sapply(x,'[',seq(max(sapply(x,length)))))))
      } else{
      return(x)
      }
    }))
    

    As of R 3.2 there is lengths to replace sapply(x, length) as well,

    df3 <- do.call(cbind.data.frame, lapply(df, function(x) {
      # check if it is a list, otherwise just return as is
      if (is.list(x)) {
        data.frame(t(sapply(x,'[', seq(max(lengths(x))))))
      } else {
       x
     }
    }))
    

    data used:

    df <- structure(list(CAT = structure(1:2, .Label = c("A", "B"), class = "factor"), 
        COUNT = list(1:3, 4:5), TREAT = list(c("Treat-a", "Treat-b"
        ), c("Treat-c", "Treat-d", "Treat-e"))), .Names = c("CAT", 
    "COUNT", "TREAT"), row.names = c(NA, -2L), class = "data.frame")
    
    0 讨论(0)
  • 2020-12-17 15:49

    There's a deleted answer here that indicates that "splitstackshape" could be used for this. It can, but the deleted answer used the wrong function. Instead, it should use the listCol_w function. Unfortunately, in its present form, this function is not vectorized across columns, so you would need to nest the calls to listCol_w for each column that needs to be flattened.

    Here's the approach:

    library(splitstackshape)
    listCol_w(listCol_w(df, "COUNT", fill = NA), "TREAT", fill = NA)
    ##    CAT COUNT_fl_1 COUNT_fl_2 COUNT_fl_3 TREAT_fl_1 TREAT_fl_2 TREAT_fl_3
    ## 1:   A          1          2          3    Treat-a    Treat-b         NA
    ## 2:   B          4          5         NA    Treat-c    Treat-d    Treat-e
    

    Note that fill = NA has been specified because it defaults to fill = NA_character_, which would otherwise coerce all the values to character.


    Another alternative would be to use transpose from "data.table". Here's a possible implementation (looks scary, but using the function is easy). Benefits are that (1) you can specify the columns to flatten, (2) you can decide whether you want to drop the original column or not, and (3) it's fast.

    flatten <- function(indt, cols, drop = FALSE) {
      require(data.table)
      if (!is.data.table(indt)) indt <- as.data.table(indt)
      x <- unlist(indt[, lapply(.SD, function(x) max(lengths(x))), .SDcols = cols])
      nams <- paste(rep(cols, x), sequence(x), sep = "_")
      indt[, (nams) := unlist(lapply(.SD, transpose), recursive = FALSE), .SDcols = cols]
      if (isTRUE(drop)) {
        indt[, (nams) := unlist(lapply(.SD, transpose), recursive = FALSE), 
             .SDcols = cols][, (cols) := NULL]
      }
      indt[]
    }
    

    Usage would be...

    Keeping original columns:

    flatten(df, c("COUNT", "TREAT"))
    #    CAT COUNT                   TREAT COUNT_1 COUNT_2 COUNT_3 TREAT_1 TREAT_2 TREAT_3
    # 1:   A 1,2,3         Treat-a,Treat-b       1       2       3 Treat-a Treat-b      NA
    # 2:   B   4,5 Treat-c,Treat-d,Treat-e       4       5      NA Treat-c Treat-d Treat-e
    

    Dropping original columns:

    flatten(df, c("COUNT", "TREAT"), TRUE)
    #    CAT COUNT_1 COUNT_2 COUNT_3 TREAT_1 TREAT_2 TREAT_3
    # 1:   A       1       2       3 Treat-a Treat-b      NA
    # 2:   B       4       5      NA Treat-c Treat-d Treat-e
    

    See this gist for a comparison with the other solutions proposed.

    0 讨论(0)
  • 2020-12-17 15:57

    Here is another way in base r

    df<-data.frame(CAT=c("A","B"))
    df$COUNT <-list(1:3,4:5)
    df$TREAT <-list(paste("Treat-", letters[1:2],sep=""),paste("Treat-", letters[3:5],sep=""))
    

    Create a helper function to do the work

    f <- function(l) {
      if (!is.list(l)) return(l)
      do.call('rbind', lapply(l, function(x) `length<-`(x, max(lengths(l)))))
    }
    

    Always test your code

    f(df$TREAT)
    
    #           [,1]      [,2]      [,3]     
    # [1,] "Treat-a" "Treat-b" NA       
    # [2,] "Treat-c" "Treat-d" "Treat-e"
    

    Apply it

    df[] <- lapply(df, f)
    df
    
    #     CAT COUNT.1 COUNT.2 COUNT.3 TREAT.1 TREAT.2 TREAT.3
    #   1   A       1       2       3 Treat-a Treat-b    <NA>
    #   2   B       4       5      NA Treat-c Treat-d Treat-e
    
    0 讨论(0)
提交回复
热议问题