R: Having trouble with reshape() function in stats package

问题

When there are multiple variables in a data.frame that need to be melted, I'm confused about how to make that work. Here's an example:

Data <- data.frame(SampleID = rep(1:10, each = 3), 
               TimePoint = rep(LETTERS[1:3], 10))
Data$File.ESIpos <- paste("20141031 Subject", Data$SampleID, "Point",
                     Data$TimePoint, "ESIpos")

Data$Date.ESIpos <- "20141031"

Data$File.ESIneg <- paste("20141030 Subject", Data$SampleID, "Point", 
                     Data$TimePoint, "ESIneg")
Data$Date.ESIneg <- "20141030"

Data$File.APCIpos <- paste("20141029 Subject", Data$SampleID, "Point", 
                     Data$TimePoint, "APCIpos")
Data$Date.APCIpos <- "20141029"

I would like that to be melted by both Date and File so that the new data.frame has the columns "SampleID, "TimePoint", a new column "Mode" (where the choices are ESIpos, ESIneg, and APCIpos), "File", and "Date". Here's the closest I've gotten with the reshape() function.

Data.long <- reshape(Data, 
                     varying = c("File.ESIpos", "Date.ESIpos",
                                 "File.ESIneg", "Date.ESIneg", 
                                 "File.APCIpos", "Date.APCIpos"),
                     idvar = c("SampleID", "TimePoint"),
                     ids = c("ESIpos", "ESIneg", "APCIpos"),
                     v.names = c("Date", "File"),
                     sep = ".",
                     direction = "long")

The output is a data.frame with the columns "SampleID", "TimePoint", "time" (which is "1", "2", or "3" for "ESIpos", "ESIneg", or "APCIpos"), "Date" and "File".

The first problem is that I don't see how to define a new "Mode" column. I can change the column "time" to be named "Mode", of course, but isn't there some way to tell it that the levels should be "ESIpos", "ESIneg", and "APCIpos" rather than 1, 2, 3? I thought I was doing that with ids = c("ESIpos"..., but clearly I'm not. Plus, I get the same output regardless of whether I include the ids = c("ESIpos"... line.

A second smaller issue is that regardless of whether I say v.names = c("Date", "File") or v.names = c("File", "Date"), the columns are always swapped, i.e. I get file names in the Date column and vice versa.

回答1:

I think this is the reshape() command you're after

reshaped <- reshape(Data, direction = "long", varying = 3:8, 
                 times = c("ESIpos", "ESIneg", "ACPIpos"))
head(reshaped)
#          SampleID TimePoint   time                              File     Date id
# 1.ESIpos        1         A ESIpos 20141031 Subject 1 Point A ESIpos 20141031  1
# 2.ESIpos        1         B ESIpos 20141031 Subject 1 Point B ESIpos 20141031  2
# 3.ESIpos        1         C ESIpos 20141031 Subject 1 Point C ESIpos 20141031  3
# 4.ESIpos        2         A ESIpos 20141031 Subject 2 Point A ESIpos 20141031  4
# 5.ESIpos        2         B ESIpos 20141031 Subject 2 Point B ESIpos 20141031  5
# 6.ESIpos        2         C ESIpos 20141031 Subject 2 Point C ESIpos 20141031  6

回答2:

I always give up on reshape due to migraines, but I am always amazed when someone uses it and it works, so I'd like to see a solution using it. So that said, you could use reshape2::melt twice and combine the results:

library(reshape2)
vars <- c('SampleID','TimePoint','Mode')
m1 <- melt(Data, id.vars = c(vars[1:2], names(Data)[grep('File', names(Data))]),
           variable.name = 'Mode', value.name = 'Date')[c(vars, 'Date')]
m2 <- melt(Data, id.vars = c(vars[1:2], names(Data)[grep('Date', names(Data))]),
           variable.name = 'Mode', value.name = 'File')[c(vars, 'File')]

m1$Mode <- gsub('Date.', '', m1$Mode)
m2$Mode <- gsub('File.', '', m2$Mode)

identical(m1[1:3], m2[1:3])
# [1] TRUE

Data.long <- cbind(m1, m2['File'])

head(Data.long[with(Data.long, order(SampleID, TimePoint)), ])

#    SampleID TimePoint    Mode     Date                               File
# 1         1         A  ESIpos 20141031  20141031 Subject 1 Point A ESIpos
# 31        1         A  ESIneg 20141030  20141030 Subject 1 Point A ESIneg
# 61        1         A APCIpos 20141029 20141029 Subject 1 Point A APCIpos
# 2         1         B  ESIpos 20141031  20141031 Subject 1 Point B ESIpos
# 32        1         B  ESIneg 20141030  20141030 Subject 1 Point B ESIneg
# 62        1         B APCIpos 20141029 20141029 Subject 1 Point B APCIpos

Or do something similar with stats::reshape

回答3:

Here's how I'd tackle the problem with tidyr:

library(tidyr)

Data %>%
  # Gather all columns except SampleID and TimePoint 
  # (since they're already variables)
  gather(key, value, -SampleID, -TimePoint) %>% 
  # Separate the key into components type and mode
  separate(key, c("type", "mode"), "\\.") %>%
  # Spread the type back into the columns
  spread(type, value)
#>    SampleID TimePoint    mode     Date                                File
#> 1         1         A APCIpos 20141029  20141029 Subject 1 Point A APCIpos
#> 2         1         A  ESIneg 20141030   20141030 Subject 1 Point A ESIneg
#> 3         1         A  ESIpos 20141031   20141031 Subject 1 Point A ESIpos
#> 4         1         B APCIpos 20141029  20141029 Subject 1 Point B APCIpos
#> 5         1         B  ESIneg 20141030   20141030 Subject 1 Point B ESIneg
#> 6         1         B  ESIpos 20141031   20141031 Subject 1 Point B ESIpos
#> 7         1         C APCIpos 20141029  20141029 Subject 1 Point C APCIpos
#...

To figure out how to come up with these steps yourself, I'd recommend reading Tidy Data, which lays out a framework that should help you understand the problem better.

回答4:

melt.data.table in v1.9.5 can now melt into multiple columns. With that, we can do:

require(data.table) ## v1.9.5
ans = melt(setDT(Data), id=c("SampleID", "TimePoint"), 
      measure=list(c(3,5,7), c(4,6,8)), value.name=c("File", "Date"))
setattr(ans$variable, 'levels', 
        unique(gsub(".*[.]", "", names(Data)[-(1:2)])))
#   SampleID TimePoint variable                                File     Date
# 1:        1         A   ESIpos   20141031 Subject 1 Point A ESIpos 20141031
# 2:        1         B   ESIpos   20141031 Subject 1 Point B ESIpos 20141031
# 3:        1         C   ESIpos   20141031 Subject 1 Point C ESIpos 20141031
# 4:        2         A   ESIpos   20141031 Subject 2 Point A ESIpos 20141031
# 5:        2         B   ESIpos   20141031 Subject 2 Point B ESIpos 20141031
# 6:        2         C   ESIpos   20141031 Subject 2 Point C ESIpos 20141031
# ...

You can get the development version from here.

来源：https://stackoverflow.com/questions/26692582/r-having-trouble-with-reshape-function-in-stats-package

标签

reshape