Converting data from wide to long (using multiple columns) [duplicate]

问题

I currently have wide data which looks similar to this:

cid dyad f1 f2 op1 op2 ed1 ed2 junk 
1   2    0  0  2   4   5   7   0.876
1   5    0  1  2   4   4   3   0.765

etc

And I wish into a long data frame which looks similar to this:

cid dyad f op ed junk  id
1   2    0 2  5  0.876 1
1   2    0 4  7  0.876 2
1   5    0 2  4  0.765 1
1   5    1 4  3  0.765 2

I have tried using the gather() function as well as the reshape() function but cannot figure out how to create multiple columns instead of collapsing all of the columns into a long style

All help is appreciated

回答1:

You can use the base reshape() function to (roughly) simultaneously melt over multiple sets of variables, by using the varying parameter and setting direction to "long".

For example here, you are supplying a list of three "sets" (vectors) of variable names to the varying argument:

dat <- read.table(text="
cid dyad f1 f2 op1 op2 ed1 ed2 junk 
1   2    0  0  2   4   5   7   0.876
1   5    0  1  2   4   4   3   0.765
", header=TRUE)

reshape(dat, direction="long", 
        varying=list(c("f1","f2"), c("op1","op2"), c("ed1","ed2")), 
        v.names=c("f","op","ed"))

You'll end up with this:

    cid dyad  junk time f op ed id
1.1   1    2 0.876    1 0  2  5  1
2.1   1    5 0.765    1 0  2  4  2
1.2   1    2 0.876    2 0  4  7  1
2.2   1    5 0.765    2 1  4  3  2

Notice that two variables get created, in addition to the three sets getting collapsed: an $id variable -- which tracks the row number in the original table (dat), and a $time variable -- which corresponds to the order of the original variables that were collapsed. There are also now nested row numbers -- 1.1, 2.1, 1.2, 2.2, which here are just the values of $id and $time at that row, respectively.

Without knowing exactly what you're trying to track, hard to say whether $id or $time is what you want to use as the row identifier, but they're both there.

Might also be useful to play with the parameters timevar and idvar (you can set timevar to NULL, for example).

reshape(dat, direction="long", 
        varying=list(c("f1","f2"), c("op1","op2"), c("ed1","ed2")), 
        v.names=c("f","op","ed"), 
        timevar="id1", idvar="id2")

回答2:

The tidyr package can solve this problem using the function gather, separate and spread:

df<-read.table(header=TRUE, text="cid dyad f1 f2 op1 op2 ed1 ed2 junk 
1   2    0  0  2   4   5   7   0.876
               1   5    0  1  2   4   4   3   0.765")

library(tidyr)

print(df %>%gather( name, value, -c(cid, dyad, junk)) %>% 
  separate( name, into=c("name", "id"), sep= -2 ) %>%
  spread( key=c(name), value)
)


#step by step:
  #collect the columns f, op, ed to the common cid, dyad and junk
df<-gather(df, name, value, -c(cid, dyad, junk))
  #separate the number id from the names
df<-separate(df, name, into=c("name", "id"), sep= -2 )
  #made wide again.
df<-spread(df, key=c(name), value)

来源：https://stackoverflow.com/questions/47928328/converting-data-from-wide-to-long-using-multiple-columns

标签

transform

reshape