lm called from inside dlply throws “0 (non-NA) cases” error [r]

匿名 (未验证) 提交于 2019-12-03 08:46:08

问题:

I'm using dlply() with a custom function that averages slopes of lm() fits on data that contain some NA values, and I get the error "Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 0 (non-NA) cases"

This error only happens when I call dlply with two key variables - separating by one variable works fine.

Annoyingly I can't reproduce the error with a simple dataset, so I've posted the problem dataset in my dropbox.

Here's the code, as minimized as possible while still producing an error:

masterData <- read.csv("http://dl.dropbox.com/u/48901983/SOquestionData.csv", na.strings="#N/A")  workingData <- data.frame(sample = masterData$sample,                       substrate = masterData$substrate,                       el1 = masterData$elapsedHr1,                       F1 = masterData$r1 - masterData$rK)  #This function is trivial as written; in reality it takes the average of many slopes meanSlope <- function(df) {      lm1 <- lm(df$F1 ~ df$el1, na.action=na.omit) #changing to na.exclude doesn't help      slope1 <- lm1$coefficients[2]      meanSlope <- mean(c(slope1))  }  lsGOOD <- dlply(workingData, .(sample), meanSlope) #works fine  lsBAD <- dlply(workingData, .(sample, substrate), meanSlope) #throws error 

Thanks in advance for any insight.

回答1:

For several of your cross-classifications you have missing covariates:

 with(masterData, table(sample, substrate, r1mis = is.na(r1) ) ) # snipped the nonmissing reports , , r1mis = TRUE        substrate sample 1 2 3 4 5 6 7 8     3  0 0 0 0 0 0 0 0     4  0 0 0 0 0 0 0 0     5  0 0 0 0 0 0 0 0     6  0 0 0 0 0 0 0 0     7  0 0 0 0 0 0 3 3     8  0 0 0 0 0 0 0 3     9  0 0 0 0 0 0 0 3     10 0 0 0 0 0 0 0 3     11 0 0 0 0 0 0 0 3     12 0 0 0 0 0 0 0 3     13 0 0 0 0 0 0 0 3     14 0 0 0 0 0 0 0 3 

This would let you skip over the subsets with insufficient data in this particular data:

meanSlope <- function(df) { if ( sum(!is.na(df$el1)) < 2 ) { return(NA) } else {      lm1 <- lm(df$F1 ~ df$el1, na.action=na.omit) #changing to na.exclude doesn't help      slope1 <- lm1$coefficients[2]      meanSlope <- mean(c(slope1)) } } 

Although it depends on the missingness being in one particular covariate. A more robust solution would be to use try to capture errors and convert to NA's.

?try 


回答2:

As per my comment:

my.func <- function(df) {   data.frame(el1=all(is.na(df$el1)), F1=all(is.na(df$F1))) }  ddply(workingData, .(sample, substrate), my.func) 

Shows that you have many sub sets where both F1 and el1 are NA. (in fact every time one is all na, the other is as well!)



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!