问题
I am struggling with transformation of a Panel Dataset from wide to long format. The Dataset looks like this:
ID | KP1_430a | KP1_430b | KP1_430c | KP2_430a | KP2_430b | KP2_430c | KP1_1500a | ...
1 ....
2 ....
KP1; KP2 up to KP7 describe the Waves. a,b up to f describe a specific Item. (E.g. left to right right placement of Party a)
I would like to have this data in long format. Like this:
ID | Party | Wave | 430 | 1500
1 1 1 .. ..
1 2 1 .. ..
. . .
1 1 2 .. ..
. . .
2 1 1 .. ..
I tried to use the reshape function. But I had problems reshaping it over time and over the parties simultaneously.
Here is a small data.frame example.
data <- data.frame(matrix(rnorm(10),2,10))
data[,1] <- 1:2
names(data) <- c("ID","KP1_430a" , "KP1_430b" , "KP1_430c" , "KP2_430a" , "KP2_430b ", "KP2_430c ", "KP1_1500a" ,"KP1_1500b", "KP1_1500c")
And this is how far I got.
data_long <- reshape(data,varying=list(names(data)[2:4],names(data)[5:7], names(data[8:10]),
v.names=c("KP1_430","KP2_430","KP1_1500"),
direction="long", timevar="Party")
The question remains: how I can get the time varying variables in long format as well? And is there a more elegant way to reshape this data? In the code above I would have to enter the names (names(data)[2:4]) for each wave and variable. With this small data.frame it is Ok, but the Dataset is a lot larger.
EDIT: How this transformation could be done by hand: I actually have done this, which leaves me with a page-long code file.
First, Bind KP1_430a and KP1_1500a with IDs, Time=1 and Party=1 column wise. Second create the same object for all parties [b-f], changing the party index respectively, and append it row-wise. Do step one and two for the rest of the waves [2-7], respectively changing party and time var, and append them row-wise.
回答1:
It is usually easier to proceed in two steps: first use melt
to put your data into a "tall" format (unless it is already the case) and then use dcast
to convert ti to a wider format.
library(reshape2)
library(stringr)
# Tall format
d <- melt(data, id.vars="ID")
# Process the column containing wave and party
d1 <- str_match_all(
as.character( d$variable ),
"KP([0-9])_([0-9]+)([a-z])"
)
d1 <- do.call( rbind, d1 )
d1 <- d1[,-1]
colnames(d1) <- c("wave", "number", "party")
d1 <- as.data.frame( d1)
d <- cbind( d, d1 )
# Convert to the desired format
d <- dcast( d, ID + wave + party ~ number )
回答2:
At the moment your Wave data is in your variable names and you need to extract it with some string processing. I had no trouble with melt
mdat <- melt(data, id.vars="ID")
mdat$wave=sub("KP", "", sub("_.+$", "", mdat$variable)) # remove the other stuff
mdat
Your description is too sketchy (so far) for me to figure out the rule for deriving a "Party" variable, so perhaps you can edit you question to show how that might be done by a human being .... and then we can show the computer how to to do it.
EDIT: If the last lower-case letter in the original column names is Party as Vincent thinks, then you could trim the trailing spaces in those names and extract:
mdat$var <- sub("\\s", "", (as.character(mdat$variable)))
mdat$party=substr( mdat$var, nchar(mdat$var), nchar(mdat$var))
#--------------
> mdat
ID variable value wave party var
1 1 KP1_430a 0.7220627 1 a KP1_430a
2 2 KP1_430a 0.9585243 1 a KP1_430a
3 1 KP1_430b -1.2954671 1 b KP1_430b
4 2 KP1_430b 0.3393617 1 b KP1_430b
5 1 KP1_430c -1.1477627 1 c KP1_430c
6 2 KP1_430c -1.0909179 1 c KP1_430c
<snipped output>
来源:https://stackoverflow.com/questions/9004399/reshape-panel-data-wide-format-to-long-format