问题
I'm using dlply() with a custom function that averages slopes of lm() fits on data that contain some NA values, and I get the error "Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases"
This error only happens when I call dlply with two key variables - separating by one variable works fine.
Annoyingly I can't reproduce the error with a simple dataset, so I've posted the problem dataset in my dropbox.
Here's the code, as minimized as possible while still producing an error:
masterData <- read.csv("http://dl.dropbox.com/u/48901983/SOquestionData.csv", na.strings="#N/A")
workingData <- data.frame(sample = masterData$sample,
substrate = masterData$substrate,
el1 = masterData$elapsedHr1,
F1 = masterData$r1 - masterData$rK)
#This function is trivial as written; in reality it takes the average of many slopes
meanSlope <- function(df) {
lm1 <- lm(df$F1 ~ df$el1, na.action=na.omit) #changing to na.exclude doesn't help
slope1 <- lm1$coefficients[2]
meanSlope <- mean(c(slope1))
}
lsGOOD <- dlply(workingData, .(sample), meanSlope) #works fine
lsBAD <- dlply(workingData, .(sample, substrate), meanSlope) #throws error
Thanks in advance for any insight.
回答1:
For several of your cross-classifications you have missing covariates:
with(masterData, table(sample, substrate, r1mis = is.na(r1) ) )
#
snipped the nonmissing reports
, , r1mis = TRUE
substrate
sample 1 2 3 4 5 6 7 8
3 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0
5 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 3 3
8 0 0 0 0 0 0 0 3
9 0 0 0 0 0 0 0 3
10 0 0 0 0 0 0 0 3
11 0 0 0 0 0 0 0 3
12 0 0 0 0 0 0 0 3
13 0 0 0 0 0 0 0 3
14 0 0 0 0 0 0 0 3
This would let you skip over the subsets with insufficient data in this particular data:
meanSlope <- function(df) { if ( sum(!is.na(df$el1)) < 2 ) { return(NA) } else {
lm1 <- lm(df$F1 ~ df$el1, na.action=na.omit) #changing to na.exclude doesn't help
slope1 <- lm1$coefficients[2]
meanSlope <- mean(c(slope1)) }
}
Although it depends on the missingness being in one particular covariate. A more robust solution would be to use try
to capture errors and convert to NA's.
?try
回答2:
As per my comment:
my.func <- function(df) {
data.frame(el1=all(is.na(df$el1)), F1=all(is.na(df$F1)))
}
ddply(workingData, .(sample, substrate), my.func)
Shows that you have many sub sets where both F1 and el1 are NA. (in fact every time one is all na, the other is as well!)
来源:https://stackoverflow.com/questions/9520134/lm-called-from-inside-dlply-throws-0-non-na-cases-error-r