tidyr use separate_rows over multiple columns

前端 未结 2 1626
隐瞒了意图╮
隐瞒了意图╮ 2021-01-04 10:44

I have a data.frame where some cells contain strings of comma separate values:

d <- data.frame(a=c(1:3), 
       b=c(\"name1, name2, name3\", \"name4\",          


        
相关标签:
2条回答
  • 2021-01-04 11:29

    You can use a pipe. Note that sep = ", " is automatically detected.

    d %>% separate_rows(b) %>% separate_rows(c)
    #   a     b      c
    # 1 1 name1  name7
    # 2 1 name2  name7
    # 3 1 name3  name7
    # 4 2 name4  name8
    # 5 2 name4  name9
    # 6 3 name5 name10
    # 7 3 name6 name10
    

    Note: Using tidyr version 0.6.0, where the %>% operator is included in the package.


    Update: Using @doscendodiscimus comment, we could use a for() loop and reassign d in each iteration. This way we can have as many columns as we like. We will use a character vector of column names, so we'll need to switch to the standard evaluation version, separate_rows_.

    cols <- c("b", "c")
    for(col in cols) {
        d <- separate_rows_(d, col)
    }
    

    which gives the updated d

      a     b      c
    1 1 name1  name7
    2 1 name2  name7
    3 1 name3  name7
    4 2 name4  name8
    5 2 name4  name9
    6 3 name5 name10
    7 3 name6 name10
    
    0 讨论(0)
  • 2021-01-04 11:42

    Here's an alternative approach using splitstackshape::cSplit and zoo::na.locf.

    library(splitstackshape)
    library(zoo)
    
    df <- cSplit(d, 1:ncol(d), "long", sep = ",")
    na.locf(df[rowSums(is.na(df)) != ncol(df),])
    #    a     b      c
    #1:  1 name1  name7
    #2:  1 name2  name7
    #3:  1 name3  name7
    #4:  2 name4  name8
    #5:  2 name4  name9
    #6:  3 name5 name10
    #7:  3 name6 name10
    
    0 讨论(0)
提交回复
热议问题