问题
I am running a zero-inflated negative binomial regression model using the function zeroinfl
from the pscl
package.
I need to exclude NA's from the model in order to be able to plot the residuals against the dependent variable later in the analysis.
Therefore, I want to set na.action="na.exclude"
. I can do this without any problem for a non-zero-inflated negative binomial regression model (using glm.nb
from the glm
package), eg.
fm_nbin <- glm.nb(DV ~ factor(IDV) + contr1
+contr2 + contr3, data=df,
subset=(df$var<500), na.action="na.exclude")
fm_nbin.res = resid(fm_nbin)
plot(fm_nbin.res~df$var)
works fine. However, when I do the same for a zero-inflated model, it does not work:
zinfl <- zeroinfl(DV ~ factor(IDV) + contr1
+contr2 + contr3 | factor(IDV) + contr1
+contr2 + contr3, data=df,
subset=(df$var<500), na.action="na.exclude")
zinfl.res = resid(zinfl)
plot(zinfl.res~df$var)
gives the error
Error in function (formula, data = NULL, subset = NULL, na.action = na.fail, :
variable lengths differ (found for 'df$var')
Is there any other command I should use to exclude NA's from my regression?
Edit: This is the nearest of an answer I could find. Can it in some way be applied to my problem?
Also, can naresid
in some way be applied?
回答1:
As one finds by following the trail of documentation from zeroinfl
to glm.fit
: "The ‘factory-fresh’ default is na.omit
." Notice that I have not put quotes around it since it is supposed to be a function rather but the function will accept it as a name so it doesn't matter if it is quoted. I will admit that I don't really know how na.omit
and na.exclude
really differ (something to do with residuals I read), but would definitely go with the default setting first, since it generally delivers what I want from regression functions. So try just leaving it out:
zinfl <- zeroinfl(DV ~ factor(IDV) + contr1
+contr2 + contr3 | factor(IDV) + contr1
+contr2 + contr3, data=df,
subset=(df$var<500) )
回答2:
Since both the option of using na.omit(df)
or na.action="na.exclude"
don't seem to work in a zeroinfl
regression model, I found another (indirect) way of achieving that NA
's are excluded in the regression.
First, since my original dataset contains far more variables than only the regressors and outcome variable, I created a new dataset including only the variables I use in the regression model; and also set a condition on the value of var
to include observations in the regression:
df1 <- subset(df, var<500, select=c("DV", "IDV", "contr1", "contr2", "contr3"))
df1 <- na.omit(df1)
I then run the same code as above using the new dataset df1
, which works perfectly:
zinfl <- zeroinfl(DV ~ factor(IDV) + contr1
+contr2 + contr3 | factor(IDV) + contr1
+contr2 + contr3, data=df1)
zinfl.res = resid(zinfl)
plot(zinfl.res~df1$DV)
来源:https://stackoverflow.com/questions/16376544/how-to-change-na-action-for-zero-inflated-regression-model