I have a little question that seems to be so easy in concept, but I cannot find the way to do it...
Say I have a data.frame df2 with a column listing car brands and anot
Here are two alternatives:
Use data.table
and unlist
as follows:
library(data.table)
DT <- data.table(df2)
DT[, list(model = unlist(strsplit(as.character(models), ","))),
by = brand]
# brand model
# 1: a a1
# 2: a a2
# 3: a a3
# 4: b b1
# 5: b b2
# 6: c c1
# 7: d d1
# 8: d d2
# 9: d d3
# 10: d d4
Use concat.split.multiple
from my "splitstackshape" package. One nice thing with this approach is being able to split multiple columns with one simple command.
library(splitstackshape)
out <- concat.split.multiple(df2, "models", ",", "long")
out[complete.cases(out), ]
# brand time models
# 1 a 1 a1
# 2 b 1 b1
# 3 c 1 c1
# 4 d 1 d1
# 5 a 2 a2
# 6 b 2 b2
# 8 d 2 d2
# 9 a 3 a3
# 12 d 3 d3
# 16 d 4 d4
Here is how I would do it using the plyr
package
library("plyr")
ddply(df2, .(brand), function(DF) {
data.frame(model = strsplit(DF$models, ",")[[1]])
})
As a point of comparison, this is how to use the same package to go from df1
to df2
:
ddply(df1, .(brand),
summarize, models=paste(sort(unique(model)), collapse=","))
These days I would use tidytext::unnest_tokens
for this task:
library(tidytext)
df2 %>%
unnest_tokens(model, models, token = "regex", pattern = ",")
# A tibble: 10 x 2
brand model
<fctr> <chr>
1 a a1
2 a a2
3 a a3
4 b b1
5 b b2
6 c c1
7 d d1
8 d d2
9 d d3
10 d d4
Playing around, I have found a way to do the trick, even though it may be quite dirty:
df1 <- data.frame(model=as.character(melt(strsplit(df2$models,','))$value), brand=as.character(df2[match(melt(strsplit(df2$models,','))$L1, rownames(df2)),]$brand))
It is not the best solution, since the data.frames actually have many more columns, and I would not want to go one by one... If someone knows a prettier way to solve this, I would appreciate it!