R separate comma separated cells into rows and Cartesian product

后端 未结 2 1135
青春惊慌失措
青春惊慌失措 2021-01-24 17:20

I have mydf data frame below. I want to split any cell that contains comma separated data and put it into rows. I am looking for a data frame similar to y

相关标签:
2条回答
  • 2021-01-24 17:59

    There are times when a for loop is totally fine to work with in R. This is one of those times. Try:

    library(splitstackshape)
    cols <- c("name", "new")
    for (i in cols) {
      mydf <- cSplit(mydf, i, ",", "long")
    }
    
    mydf
    ##     name AB new
    ##  1:   AB  A   1
    ##  2:   AB  A   2
    ##  3:   AB  A   3
    ##  4:   BW  A   1
    ##  5:   BW  A   2
    ##  6:   BW  A   3
    ##  7:    x  B   4
    ##  8:    x  B   5
    ##  9:    x  B   6
    ## 10:    x  B   7
    ## 11:    y  B   4
    ## 12:    y  B   5
    ## 13:    y  B   6
    ## 14:    y  B   7
    ## 15:    z  B   4
    ## 16:    z  B   5
    ## 17:    z  B   6
    ## 18:    z  B   7
    

    Here's a small test using slightly bigger data:

    # concat.test = sample data from "splitstackshape"
    test <- do.call(rbind, replicate(5000, concat.test, FALSE))
    
    fun1 <- function() {
      cols <- c("Likes", "Siblings")
      for (i in cols) {
        test <- cSplit(test, i, ",", "long")
      }
      test
    }
    
    fun2 <- function() {
      test %>%
        separate_rows("Likes") %>%
        separate_rows("Siblings")
    }
    
    system.time(fun1())
    #   user  system elapsed 
    #  3.205   0.056   3.261 
    system.time(fun2())
    #   user  system elapsed 
    # 11.598   0.066  11.662
    
    0 讨论(0)
  • 2021-01-24 18:13

    We can use the separate_rows function from the tidyr package.

    library(tidyr)
    
    mydf2 <- mydf %>%
      separate_rows("name") %>%
      separate_rows("new")
    mydf2
    
    #    AB name new
    # 1   A   AB   1
    # 2   A   AB   2
    # 3   A   AB   3
    # 4   A   BW   1
    # 5   A   BW   2
    # 6   A   BW   3
    # 7   B    x   4
    # 8   B    x   5
    # 9   B    x   6
    # 10  B    x   7
    # 11  B    y   4
    # 12  B    y   5
    # 13  B    y   6
    # 14  B    y   7
    # 15  B    z   4
    # 16  B    z   5
    # 17  B    z   6
    # 18  B    z   7 
    

    If you don't what to use separate_rows function more than once, we can further design a function to iteratively apply the separate_rows function.

    expand_fun <- function(df, vars){
      while (length(vars) > 0){
        df <- df %>% separate_rows(vars[1])
        vars <- vars[-1]
      }
      return(df)
    }
    

    The expand_fun takes two arguments. The first argument, df, is the original data frame. The second argument, vars, is a character string with the columns names we want to expand. Here is an example using the function.

    mydf3 <- expand_fun(mydf, vars = c("name", "new"))
    mydf3
    #    AB name new
    # 1   A   AB   1
    # 2   A   AB   2
    # 3   A   AB   3
    # 4   A   BW   1
    # 5   A   BW   2
    # 6   A   BW   3
    # 7   B    x   4
    # 8   B    x   5
    # 9   B    x   6
    # 10  B    x   7
    # 11  B    y   4
    # 12  B    y   5
    # 13  B    y   6
    # 14  B    y   7
    # 15  B    z   4
    # 16  B    z   5
    # 17  B    z   6
    # 18  B    z   7
    
    0 讨论(0)
提交回复
热议问题