melt + strsplit, or opposite to aggregate

前端 未结 4 1815
忘了有多久
忘了有多久 2021-01-22 07:32

I have a little question that seems to be so easy in concept, but I cannot find the way to do it...

Say I have a data.frame df2 with a column listing car brands and anot

相关标签:
4条回答
  • 2021-01-22 08:03

    Here are two alternatives:

    Use data.table and unlist as follows:

    library(data.table)
    DT <- data.table(df2)
    DT[, list(model = unlist(strsplit(as.character(models), ","))), 
       by = brand]
    #     brand model
    #  1:     a    a1
    #  2:     a    a2
    #  3:     a    a3
    #  4:     b    b1
    #  5:     b    b2
    #  6:     c    c1
    #  7:     d    d1
    #  8:     d    d2
    #  9:     d    d3
    # 10:     d    d4
    

    Use concat.split.multiple from my "splitstackshape" package. One nice thing with this approach is being able to split multiple columns with one simple command.

    library(splitstackshape)
    out <- concat.split.multiple(df2, "models", ",", "long")
    out[complete.cases(out), ]
    #    brand time models
    # 1      a    1     a1
    # 2      b    1     b1
    # 3      c    1     c1
    # 4      d    1     d1
    # 5      a    2     a2
    # 6      b    2     b2
    # 8      d    2     d2
    # 9      a    3     a3
    # 12     d    3     d3
    # 16     d    4     d4
    
    0 讨论(0)
  • 2021-01-22 08:03

    Here is how I would do it using the plyr package

    library("plyr")
    ddply(df2, .(brand), function(DF) {
      data.frame(model = strsplit(DF$models, ",")[[1]])
    })
    

    As a point of comparison, this is how to use the same package to go from df1 to df2:

    ddply(df1, .(brand), 
          summarize, models=paste(sort(unique(model)), collapse=","))
    
    0 讨论(0)
  • 2021-01-22 08:09

    These days I would use tidytext::unnest_tokens for this task:

    library(tidytext)
    df2 %>% 
      unnest_tokens(model, models, token = "regex", pattern = ",")
    
    # A tibble: 10 x 2
        brand model
       <fctr> <chr>
     1      a    a1
     2      a    a2
     3      a    a3
     4      b    b1
     5      b    b2
     6      c    c1
     7      d    d1
     8      d    d2
     9      d    d3
    10      d    d4
    
    0 讨论(0)
  • 2021-01-22 08:18

    Playing around, I have found a way to do the trick, even though it may be quite dirty:

    df1 <- data.frame(model=as.character(melt(strsplit(df2$models,','))$value), brand=as.character(df2[match(melt(strsplit(df2$models,','))$L1, rownames(df2)),]$brand))
    

    It is not the best solution, since the data.frames actually have many more columns, and I would not want to go one by one... If someone knows a prettier way to solve this, I would appreciate it!

    0 讨论(0)
提交回复
热议问题