问题
I currently have wide data which looks similar to this:
cid dyad f1 f2 op1 op2 ed1 ed2 junk
1 2 0 0 2 4 5 7 0.876
1 5 0 1 2 4 4 3 0.765
etc
And I wish into a long data frame which looks similar to this:
cid dyad f op ed junk id
1 2 0 2 5 0.876 1
1 2 0 4 7 0.876 2
1 5 0 2 4 0.765 1
1 5 1 4 3 0.765 2
I have tried using the gather() function as well as the reshape() function but cannot figure out how to create multiple columns instead of collapsing all of the columns into a long style
All help is appreciated
回答1:
You can use the base reshape()
function to (roughly) simultaneously melt over multiple sets of variables, by using the varying
parameter and setting direction
to "long"
.
For example here, you are supplying a list of three "sets" (vectors) of variable names to the varying
argument:
dat <- read.table(text="
cid dyad f1 f2 op1 op2 ed1 ed2 junk
1 2 0 0 2 4 5 7 0.876
1 5 0 1 2 4 4 3 0.765
", header=TRUE)
reshape(dat, direction="long",
varying=list(c("f1","f2"), c("op1","op2"), c("ed1","ed2")),
v.names=c("f","op","ed"))
You'll end up with this:
cid dyad junk time f op ed id
1.1 1 2 0.876 1 0 2 5 1
2.1 1 5 0.765 1 0 2 4 2
1.2 1 2 0.876 2 0 4 7 1
2.2 1 5 0.765 2 1 4 3 2
Notice that two variables get created, in addition to the three sets getting collapsed: an $id
variable -- which tracks the row number in the original table (dat
), and a $time
variable -- which corresponds to the order of the original variables that were collapsed. There are also now nested row numbers -- 1.1, 2.1, 1.2, 2.2
, which here are just the values of $id
and $time
at that row, respectively.
Without knowing exactly what you're trying to track, hard to say whether $id
or $time
is what you want to use as the row identifier, but they're both there.
Might also be useful to play with the parameters timevar
and idvar
(you can set timevar
to NULL
, for example).
reshape(dat, direction="long",
varying=list(c("f1","f2"), c("op1","op2"), c("ed1","ed2")),
v.names=c("f","op","ed"),
timevar="id1", idvar="id2")
回答2:
The tidyr package can solve this problem using the function gather, separate and spread:
df<-read.table(header=TRUE, text="cid dyad f1 f2 op1 op2 ed1 ed2 junk
1 2 0 0 2 4 5 7 0.876
1 5 0 1 2 4 4 3 0.765")
library(tidyr)
print(df %>%gather( name, value, -c(cid, dyad, junk)) %>%
separate( name, into=c("name", "id"), sep= -2 ) %>%
spread( key=c(name), value)
)
#step by step:
#collect the columns f, op, ed to the common cid, dyad and junk
df<-gather(df, name, value, -c(cid, dyad, junk))
#separate the number id from the names
df<-separate(df, name, into=c("name", "id"), sep= -2 )
#made wide again.
df<-spread(df, key=c(name), value)
来源:https://stackoverflow.com/questions/47928328/converting-data-from-wide-to-long-using-multiple-columns