问题
My input file:
Treat1 Treat2 Batch gene1 gene2 High Low 1 92.73 4.00 Low Low 1 101.85 6.00 High High 1 136.00 4.00 Low High 1 104.00 3.00 High Low 2 308.32 10.00 Low Low 2 118.93 3.00 High High 2 144.47 3.00 Low High 2 189.66 4.00 High Low 3 95.12 2.00 Low Low 3 72.08 6.00 High High 3 108.65 2.00 Low High 3 75.00 3.00 High Low 4 111.39 5.00 Low Low 4 119.80 4.00 High High 4 466.55 11.00 Low High 4 125.00 3.00
There are tens of thousands of additional columns, each with a header and a list of numbers, same length as "gene1" column.
My code:
library(lme4)
library(lmerTest)
# Import the data.
mydata <- read.table("input_file", header=TRUE, sep="\t")
# Make batch into a factor
mydata$Batch <- as.factor(mydata$Batch)
# Check structure
str(mydata)
# Get file without the factors, so that names(df) gives gene names.
genefile <- mydata[c(4:2524)]
# Loop through all gene names and run the model once per gene and print to file.
for (i in names(genefile)){
lmer_results <- lmer(i ~ Treat1*Treat2 + (1|Batch), data=mydata)
lmer_summary <- summary(lmer_results)
write(lmer_summary,file="results_file",append=TRUE, sep="\t", quote=FALSE)
}
Structure:
'data.frame': 16 obs. of 2524 variables:
$ Treat1 : Factor w/ 2 levels "High","Low": 1 2 1 2 1 2 1 2 1 2 ...
$ Treat2 : Factor w/ 2 levels "High","Low": 2 2 1 1 2 2 1 1 2 2 ...
$ Batch : Factor w/ 4 levels "1","2","3","4": 1 1 1 1 2 2 2 2 3 3 ...
$ gene1 : num 92.7 101.8 136 104 308.3 ...
$ gene2 : num 4 6 4 3 10 3 3 4 2 6 ...
My error message:
Error in model.frame.default(data = mydata, drop.unused.levels = TRUE, formula = i ~ : variable lengths differ (found for 'Treat1') Calls: lmer ... -> eval -> eval -> -> model.frame.default Execution halted
I have tried to examine all objects involved and cannot see any differences in variable lengths and I have also made sure there are no missing data. Running it with na.exclude doesn't change anything.
Any idea of what is going on?
回答1:
@Roland's diagnosis (lmer
is looking for a variable called i, not a variable whose name is i
: obligatory Lewis Carroll reference) is correct, I think. The most immediate way to handle this would be with reformulate()
, something like:
for (i in names(genefile)){
form <- reformulate(c("Treat1*Treat2","(1|Batch)"),response=i)
lmer_results <- lmer(form, data=mydata)
lmer_summary <- summary(lmer_results)
write(lmer_summary,file="results_file",
append=TRUE, sep="\t", quote=FALSE)
}
On second thought, you should be able to speed up your computations significantly using the built-in refit()
method, which refits a model for a new response variable: suppose for simplicity that the first gene is called geneAAA
:
wfun <- function(x) write(summary(x),
file="results_file", append=TRUE, sep="\t",quote=FALSE)
mod0 <- lmer(geneAAA ~ Treat1*Treat2 + (1|Batch), data=mydata)
wfun(mod0)
for (i in names(genefile)[-1]) {
mod1 <- refit(mod0,mydata[[i]])
wfun(mod1)
}
(By the way, I'm not sure your write()
command does anything sensible ...)
来源:https://stackoverflow.com/questions/37461725/variable-lengths-differ-in-r-linear-modelling-with-lme4