melt + strsplit, or opposite to aggregate

前端未结

关注

 4  1815

忘了有多久

I have a little question that seems to be so easy in concept, but I cannot find the way to do it...

Say I have a data.frame df2 with a column listing car brands and anot

相关标签:

4条回答

谎友^

2021-01-22 08:03

Here are two alternatives:

Use data.table and unlist as follows:

library(data.table)
DT <- data.table(df2)
DT[, list(model = unlist(strsplit(as.character(models), ","))), 
   by = brand]
#     brand model
#  1:     a    a1
#  2:     a    a2
#  3:     a    a3
#  4:     b    b1
#  5:     b    b2
#  6:     c    c1
#  7:     d    d1
#  8:     d    d2
#  9:     d    d3
# 10:     d    d4

Use concat.split.multiple from my "splitstackshape" package. One nice thing with this approach is being able to split multiple columns with one simple command.

library(splitstackshape)
out <- concat.split.multiple(df2, "models", ",", "long")
out[complete.cases(out), ]
#    brand time models
# 1      a    1     a1
# 2      b    1     b1
# 3      c    1     c1
# 4      d    1     d1
# 5      a    2     a2
# 6      b    2     b2
# 8      d    2     d2
# 9      a    3     a3
# 12     d    3     d3
# 16     d    4     d4

0 讨论(0)

傲寒

2021-01-22 08:03

Here is how I would do it using the plyr package

library("plyr")
ddply(df2, .(brand), function(DF) {
  data.frame(model = strsplit(DF$models, ",")[[1]])
})

As a point of comparison, this is how to use the same package to go from df1 to df2:

ddply(df1, .(brand), 
      summarize, models=paste(sort(unique(model)), collapse=","))

0 讨论(0)

一向

2021-01-22 08:09

These days I would use tidytext::unnest_tokens for this task:

library(tidytext)
df2 %>% 
  unnest_tokens(model, models, token = "regex", pattern = ",")

# A tibble: 10 x 2
    brand model
   <fctr> <chr>
 1      a    a1
 2      a    a2
 3      a    a3
 4      b    b1
 5      b    b2
 6      c    c1
 7      d    d1
 8      d    d2
 9      d    d3
10      d    d4

0 讨论(0)

独厮守ぢ

2021-01-22 08:18
Playing around, I have found a way to do the trick, even though it may be quite dirty:
```
df1 <- data.frame(model=as.character(melt(strsplit(df2$models,','))$value), brand=as.character(df2[match(melt(strsplit(df2$models,','))$L1, rownames(df2)),]$brand))
```
It is not the best solution, since the data.frames actually have many more columns, and I would not want to go one by one... If someone knows a prettier way to solve this, I would appreciate it!
0 讨论(0)
发布评论:

提交评论
- 加载中...