Extracting data from data frame

问题

I have a data frame that looks like this:

    Code           ID          X1         X2
1  1000             2         1.6       250.6
2  1000             3         0.15      340.9
3  1001             2         0.53      441.7
4  1001             3         1.8       499.0
5  1002             2         4.4       516.6
6  1003             3         4.9       616.6

What I would like to do is to create a new data frame with unique codes and each unique ID as a column (there are two unique IDs:2 and 3), with the corresponding X1 and X2 values, so the result should look like this:

    Code           ID2X1       ID2X2      ID3X1       ID3X2
1  1000             1.6        250.6        0.15      340.9
2  1001            0.53        441.7         1.8      499.0
5  1002             4.4        516.6          NA         NA
6  1003             NA            NA         4.9      616.6

I used the "unique" function to extract the unique codes so I have the first column, but couldn't think of an efficient way to extract the data. Please note that some of the codes don't have values for either ID2 or ID3.

回答1:

This is a basic reshape problem, reshaping from "long" to "wide".

Try the following (assuming your data.frame is called "mydf"):

reshape(mydf, idvar="Code", timevar="ID", direction = "wide")
#   Code X1.2  X2.2 X1.3  X2.3
# 1 1000 1.60 250.6 0.15 340.9
# 3 1001 0.53 441.7 1.80 499.0
# 5 1002 4.40 516.6   NA    NA
# 6 1003   NA    NA 4.90 616.6

回答2:

Using dplyr and tidyr

 library(dplyr)
 library(tidyr)
  mydf%>% 
  gather(Var, Val, X1:X2) %>%
  mutate(IDVar=paste0("ID", ID, Var)) %>%
  select(-ID, -Var) %>% 
  spread(IDVar, Val)
  #  Code ID2X1 ID2X2 ID3X1 ID3X2
 #1 1000  1.60 250.6  0.15 340.9
 #2 1001  0.53 441.7  1.80 499.0
 #3 1002  4.40 516.6    NA    NA
 #4 1003    NA    NA  4.90 616.6

回答3:

Here's another option using the reshape2 package:

dat = read.table(text=" Code ID X1 X2 
           1 1000 2 1.6 250.6 
           2 1000 3 0.15 340.9 
           3 1001 2 0.53 441.7 
           4 1001 3 1.8 499.0 
           5 1002 2 4.4 516.6 
           6 1003 3 4.9 616.6",header=TRUE)

library(reshape2)

# Melt to long format. 
dat.m = melt(dat, id.var=c("Code","ID"))

# Combine "ID" and "variable" into a single column
dat.m$IDvar = paste0("ID", dat.m$ID, dat.m$variable)

# Remove uneeded columns
dat.m = dat.m[ , c("Code","IDvar", "value")]

# Cast to wide format
dat.w = dcast(dat.m, Code ~ IDvar, value.var="value")

dat.w

Code ID2X1 ID2X2 ID3X1 ID3X2
1 1000  1.60 250.6  0.15 340.9
2 1001  0.53 441.7  1.80 499.0
3 1002  4.40 516.6    NA    NA
4 1003    NA    NA  4.90 616.6

来源：https://stackoverflow.com/questions/24616094/extracting-data-from-data-frame

标签

dataframe

reshape