问题
I hope that I'm not recreating the wheel, and do not think that the following can be answered using reshape
.
I have messy longitudinal survey data, that I want to convert from wide to long format. By messy I mean:
- I have a mixture of variable types (numeric, factor, logical)
- Not all variables have been collected at every timepoint.
For example:
data <- read.table(header=T, text='
id inlove.1 inlove.2 income.2 income.3 mood.1 mood.3 random
1 TRUE FALSE 87717.76 82281.25 happy happy filler
2 TRUE TRUE 70795.53 54995.19 so-so happy filler
3 FALSE FALSE 48012.77 47650.47 sad so-so filler
')
I could not work out how to reshape the data using reshape
, and keep getting the error message 'times' is wrong length
. Which I assume is because not every variable is recorded on every occasion. Also I don't think melt
and cast
from reshape2
will work as it requires all measured variables to be of the same type.
I came up with the following solution which may help others. It selects variables by timepoint, renames them, and then uses rbind.fill
from plyr
to concatenate them together. But I wonder if I'm missing something with reshape
or if this can be done easier using tidyr
or another package?
reshapeLong2 <- function(data, varying = NULL, timevar = "time", idvar = "id", sep = ".", patterns = NULL) {
require(plyr)
substrRight <- function(x, n){
substr(x, nchar(x)-n+1, nchar(x))
}
if (is.null(varying))
varying <- names(data)[! names(data) %in% idvar]
# Create pattern if not specified, guesses by taking numbers given at end of variable names.
if (is.null(patterns)) {
times <- unique(na.omit(as.numeric(substrRight(varying, 1))))
times <- times[order = times]
patterns <- paste0(sep, times)
}
# Create list of datasets by study time
ls.df <- lapply(patterns, function(pattern) {
var.old <- grep(pattern, x = varying, value = TRUE)
var.new <- gsub(pattern, "", x = var.old)
df <- data[, c(idvar, var.old)]
names(df) <- c(idvar, var.new)
df[, timevar] <- match(pattern, patterns)
return(df)
})
# Concatenate datasets together
dfs <- rbind.fill(ls.df)
return(dfs)
}
> reshapeLong2(df.test)
id inlove mood time income
1 1 FALSE sad 1 NA
2 2 TRUE so-so 1 NA
3 3 TRUE sad 1 NA
4 1 TRUE <NA> 2 27766.13
5 2 FALSE <NA> 2 74395.30
6 3 TRUE <NA> 2 89004.95
7 1 NA sad 3 27270.07
8 2 NA so-so 3 36971.64
9 3 NA so-so 3 85986.96
Warning message:
In na.omit(as.numeric(substrRight(varying, 1))) :
NAs introduced by coercion
Note, warning message indicates that there are some variables that are dropped (in this case "random"). Warning not shown if all variables are listed as either idvar or varying.
回答1:
If you fill in varname.TIME
columns as NA
for all the missing times, you can then just reshape
like:
uniqnames <- c("inlove","income","mood")
allnames <- make.unique(rep(uniqnames,4))[-(seq_along(uniqnames))]
#[1] "inlove.1" "income.1" "mood.1" "inlove.2" "income.2" "mood.2" ...
data[setdiff(allnames, names(data)[-1])] <- NA
# id inlove.1 inlove.2 income.2 income.3 mood.1 mood.3 random income.1 mood.2 inlove.3
#1 1 TRUE FALSE 87717.76 82281.25 happy happy filler NA NA NA
#2 2 TRUE TRUE 70795.53 54995.19 so-so happy filler NA NA NA
#3 3 FALSE FALSE 48012.77 47650.47 sad so-so filler NA NA NA
reshape(data, idvar="id", direction="long", sep=".", varying=allnames)
# id random time inlove income mood
#1.1 1 filler 1 TRUE NA happy
#2.1 2 filler 1 TRUE NA so-so
#3.1 3 filler 1 FALSE NA sad
#1.2 1 filler 2 FALSE 87717.76 <NA>
#2.2 2 filler 2 TRUE 70795.53 <NA>
#3.2 3 filler 2 FALSE 48012.77 <NA>
#1.3 1 filler 3 NA 82281.25 happy
#2.3 2 filler 3 NA 54995.19 happy
#3.3 3 filler 3 NA 47650.47 so-so
来源:https://stackoverflow.com/questions/34713846/reshape-messy-longitudinal-survey-data-containing-multiple-different-variables