问题
I have a data.frame
where one of the variables is a vector (or a list), like this:
MyColumn <- c("A, B,C", "D,E", "F","G")
MyDF <- data.frame(group_id=1:4, val=11:14, cat=MyColumn)
# group_id val cat
# 1 1 11 A, B,C
# 2 2 12 D,E
# 3 3 13 F
# 4 4 14 G
I'd like to have a new data frame with as many rows as the vector
FlatColumn <- unlist(strsplit(MyColumn,split=","))
which looks like this:
MyNewDF <- data.frame(group_id=c(rep(1,3),rep(2,2),3,4), val=c(rep(11,3),rep(12,2),13,14), cat=FlatColumn)
# group_id val cat
# 1 1 11 A
# 2 1 11 B
# 3 1 11 C
# 4 2 12 D
# 5 2 12 E
# 6 3 13 F
# 7 4 14 G
In essence, for every factor which is an element of the list of MyColumn
(the letters A to G), I want to assign the corresponding values of the list. Every factor appears only once in MyColumn
.
Is there a neat way for this kind of reshaping/unlisting/merging? I've come up with a very cumbersome for
-loop over the rows of MyDF
and the length of the corresponding element of strsplit(MyColumn,split=",")
. I'm very sure that there has to be a more elegant way.
回答1:
How about
lst <- strsplit(MyColumn, split = ",")
k <- lengths(lst) ## expansion size
FlatColumn <- unlist(lst, use.names = FALSE)
MyNewDF <- data.frame(group_id = rep.int(MyDF$group_id, k),
val = rep.int(MyDF$val, k),
cat = FlatColumn)
# group_id val cat
#1 1 11 A
#2 1 11 B
#3 1 11 C
#4 2 12 D
#5 2 12 E
#6 3 13 F
#7 4 14 G
回答2:
You can use separate_rows
from tidyr
:
tidyr::separate_rows(MyDF, cat)
# group_id val cat
# 1 1 11 A
# 2 1 11 B
# 3 1 11 C
# 4 2 12 D
# 5 2 12 E
# 6 3 13 F
# 7 4 14 G
回答3:
We can use cSplit
from splitstackshape
library(splitstackshape)
cSplit(MyDF, "cat", ",", "long")
# group_id val cat
#1: 1 11 A
#2: 1 11 B
#3: 1 11 C
#4: 2 12 D
#5: 2 12 E
#6: 3 13 F
#7: 4 14 G
We can also use do with base R
with strsplit
to split the 'cat' column into a list
, replicate the sequence of rows of 'MyDF' with the lengths
of 'lst', and create the 'cat' column by unlist
ing the 'lst'.
lst <- strsplit(as.character(MyDF$cat), ",")
transform(MyDF[rep(1:nrow(MyDF), lengths(lst)),-3], cat = unlist(lst))
来源:https://stackoverflow.com/questions/39028819/strsplit-on-a-column-of-a-data-frame