问题
When there are multiple variables in a data.frame that need to be melted, I'm confused about how to make that work. Here's an example:
Data <- data.frame(SampleID = rep(1:10, each = 3),
TimePoint = rep(LETTERS[1:3], 10))
Data$File.ESIpos <- paste("20141031 Subject", Data$SampleID, "Point",
Data$TimePoint, "ESIpos")
Data$Date.ESIpos <- "20141031"
Data$File.ESIneg <- paste("20141030 Subject", Data$SampleID, "Point",
Data$TimePoint, "ESIneg")
Data$Date.ESIneg <- "20141030"
Data$File.APCIpos <- paste("20141029 Subject", Data$SampleID, "Point",
Data$TimePoint, "APCIpos")
Data$Date.APCIpos <- "20141029"
I would like that to be melted by both Date and File so that the new data.frame has the columns "SampleID, "TimePoint", a new column "Mode" (where the choices are ESIpos, ESIneg, and APCIpos), "File", and "Date". Here's the closest I've gotten with the reshape() function.
Data.long <- reshape(Data,
varying = c("File.ESIpos", "Date.ESIpos",
"File.ESIneg", "Date.ESIneg",
"File.APCIpos", "Date.APCIpos"),
idvar = c("SampleID", "TimePoint"),
ids = c("ESIpos", "ESIneg", "APCIpos"),
v.names = c("Date", "File"),
sep = ".",
direction = "long")
The output is a data.frame with the columns "SampleID", "TimePoint", "time" (which is "1", "2", or "3" for "ESIpos", "ESIneg", or "APCIpos"), "Date" and "File".
The first problem is that I don't see how to define a new "Mode" column. I can change the column "time" to be named "Mode", of course, but isn't there some way to tell it that the levels should be "ESIpos", "ESIneg", and "APCIpos" rather than 1, 2, 3? I thought I was doing that with ids = c("ESIpos"...
, but clearly I'm not. Plus, I get the same output regardless of whether I include the ids = c("ESIpos"...
line.
A second smaller issue is that regardless of whether I say v.names = c("Date", "File")
or v.names = c("File", "Date")
, the columns are always swapped, i.e. I get file names in the Date column and vice versa.
回答1:
I think this is the reshape()
command you're after
reshaped <- reshape(Data, direction = "long", varying = 3:8,
times = c("ESIpos", "ESIneg", "ACPIpos"))
head(reshaped)
# SampleID TimePoint time File Date id
# 1.ESIpos 1 A ESIpos 20141031 Subject 1 Point A ESIpos 20141031 1
# 2.ESIpos 1 B ESIpos 20141031 Subject 1 Point B ESIpos 20141031 2
# 3.ESIpos 1 C ESIpos 20141031 Subject 1 Point C ESIpos 20141031 3
# 4.ESIpos 2 A ESIpos 20141031 Subject 2 Point A ESIpos 20141031 4
# 5.ESIpos 2 B ESIpos 20141031 Subject 2 Point B ESIpos 20141031 5
# 6.ESIpos 2 C ESIpos 20141031 Subject 2 Point C ESIpos 20141031 6
回答2:
I always give up on reshape
due to migraines, but I am always amazed when someone uses it and it works, so I'd like to see a solution using it. So that said, you could use reshape2::melt
twice and combine the results:
library(reshape2)
vars <- c('SampleID','TimePoint','Mode')
m1 <- melt(Data, id.vars = c(vars[1:2], names(Data)[grep('File', names(Data))]),
variable.name = 'Mode', value.name = 'Date')[c(vars, 'Date')]
m2 <- melt(Data, id.vars = c(vars[1:2], names(Data)[grep('Date', names(Data))]),
variable.name = 'Mode', value.name = 'File')[c(vars, 'File')]
m1$Mode <- gsub('Date.', '', m1$Mode)
m2$Mode <- gsub('File.', '', m2$Mode)
identical(m1[1:3], m2[1:3])
# [1] TRUE
Data.long <- cbind(m1, m2['File'])
head(Data.long[with(Data.long, order(SampleID, TimePoint)), ])
# SampleID TimePoint Mode Date File
# 1 1 A ESIpos 20141031 20141031 Subject 1 Point A ESIpos
# 31 1 A ESIneg 20141030 20141030 Subject 1 Point A ESIneg
# 61 1 A APCIpos 20141029 20141029 Subject 1 Point A APCIpos
# 2 1 B ESIpos 20141031 20141031 Subject 1 Point B ESIpos
# 32 1 B ESIneg 20141030 20141030 Subject 1 Point B ESIneg
# 62 1 B APCIpos 20141029 20141029 Subject 1 Point B APCIpos
Or do something similar with stats::reshape
回答3:
Here's how I'd tackle the problem with tidyr:
library(tidyr)
Data %>%
# Gather all columns except SampleID and TimePoint
# (since they're already variables)
gather(key, value, -SampleID, -TimePoint) %>%
# Separate the key into components type and mode
separate(key, c("type", "mode"), "\\.") %>%
# Spread the type back into the columns
spread(type, value)
#> SampleID TimePoint mode Date File
#> 1 1 A APCIpos 20141029 20141029 Subject 1 Point A APCIpos
#> 2 1 A ESIneg 20141030 20141030 Subject 1 Point A ESIneg
#> 3 1 A ESIpos 20141031 20141031 Subject 1 Point A ESIpos
#> 4 1 B APCIpos 20141029 20141029 Subject 1 Point B APCIpos
#> 5 1 B ESIneg 20141030 20141030 Subject 1 Point B ESIneg
#> 6 1 B ESIpos 20141031 20141031 Subject 1 Point B ESIpos
#> 7 1 C APCIpos 20141029 20141029 Subject 1 Point C APCIpos
#...
To figure out how to come up with these steps yourself, I'd recommend reading Tidy Data, which lays out a framework that should help you understand the problem better.
回答4:
melt.data.table
in v1.9.5
can now melt into multiple columns. With that, we can do:
require(data.table) ## v1.9.5
ans = melt(setDT(Data), id=c("SampleID", "TimePoint"),
measure=list(c(3,5,7), c(4,6,8)), value.name=c("File", "Date"))
setattr(ans$variable, 'levels',
unique(gsub(".*[.]", "", names(Data)[-(1:2)])))
# SampleID TimePoint variable File Date
# 1: 1 A ESIpos 20141031 Subject 1 Point A ESIpos 20141031
# 2: 1 B ESIpos 20141031 Subject 1 Point B ESIpos 20141031
# 3: 1 C ESIpos 20141031 Subject 1 Point C ESIpos 20141031
# 4: 2 A ESIpos 20141031 Subject 2 Point A ESIpos 20141031
# 5: 2 B ESIpos 20141031 Subject 2 Point B ESIpos 20141031
# 6: 2 C ESIpos 20141031 Subject 2 Point C ESIpos 20141031
# ...
You can get the development version from here.
来源:https://stackoverflow.com/questions/26692582/r-having-trouble-with-reshape-function-in-stats-package