问题
I have a data frame that looks like this:
Code ID X1 X2
1 1000 2 1.6 250.6
2 1000 3 0.15 340.9
3 1001 2 0.53 441.7
4 1001 3 1.8 499.0
5 1002 2 4.4 516.6
6 1003 3 4.9 616.6
What I would like to do is to create a new data frame with unique codes and each unique ID as a column (there are two unique IDs:2 and 3), with the corresponding X1 and X2 values, so the result should look like this:
Code ID2X1 ID2X2 ID3X1 ID3X2
1 1000 1.6 250.6 0.15 340.9
2 1001 0.53 441.7 1.8 499.0
5 1002 4.4 516.6 NA NA
6 1003 NA NA 4.9 616.6
I used the "unique" function to extract the unique codes so I have the first column, but couldn't think of an efficient way to extract the data. Please note that some of the codes don't have values for either ID2 or ID3.
回答1:
This is a basic reshape
problem, reshaping from "long" to "wide".
Try the following (assuming your data.frame
is called "mydf"):
reshape(mydf, idvar="Code", timevar="ID", direction = "wide")
# Code X1.2 X2.2 X1.3 X2.3
# 1 1000 1.60 250.6 0.15 340.9
# 3 1001 0.53 441.7 1.80 499.0
# 5 1002 4.40 516.6 NA NA
# 6 1003 NA NA 4.90 616.6
回答2:
Using dplyr
and tidyr
library(dplyr)
library(tidyr)
mydf%>%
gather(Var, Val, X1:X2) %>%
mutate(IDVar=paste0("ID", ID, Var)) %>%
select(-ID, -Var) %>%
spread(IDVar, Val)
# Code ID2X1 ID2X2 ID3X1 ID3X2
#1 1000 1.60 250.6 0.15 340.9
#2 1001 0.53 441.7 1.80 499.0
#3 1002 4.40 516.6 NA NA
#4 1003 NA NA 4.90 616.6
回答3:
Here's another option using the reshape2
package:
dat = read.table(text=" Code ID X1 X2
1 1000 2 1.6 250.6
2 1000 3 0.15 340.9
3 1001 2 0.53 441.7
4 1001 3 1.8 499.0
5 1002 2 4.4 516.6
6 1003 3 4.9 616.6",header=TRUE)
library(reshape2)
# Melt to long format.
dat.m = melt(dat, id.var=c("Code","ID"))
# Combine "ID" and "variable" into a single column
dat.m$IDvar = paste0("ID", dat.m$ID, dat.m$variable)
# Remove uneeded columns
dat.m = dat.m[ , c("Code","IDvar", "value")]
# Cast to wide format
dat.w = dcast(dat.m, Code ~ IDvar, value.var="value")
dat.w
Code ID2X1 ID2X2 ID3X1 ID3X2
1 1000 1.60 250.6 0.15 340.9
2 1001 0.53 441.7 1.80 499.0
3 1002 4.40 516.6 NA NA
4 1003 NA NA 4.90 616.6
来源:https://stackoverflow.com/questions/24616094/extracting-data-from-data-frame