Predict.glm not predicting missing values in response

核能气质少年 提交于 2019-12-04 11:23:03

问题


For some reason, when I specify glms (and lm's too, it turns out), R is not predicting missing values of the data. Here is an example:

y = round(runif(50))
y = c(y,rep(NA,50))
x = rnorm(100)
m = glm(y~x, family=binomial(link="logit"))
p = predict(m,na.action=na.pass)
length(p)

y = round(runif(50))
y = c(y,rep(NA,50))
x = rnorm(100)
m = lm(y~x)
p = predict(m)
length(p)

The length of p should be 100, but its 50. The weird thing is that I have other predicts in the same script that do predict from missing data.

EDIT: It turns out that those other predicts were quite wrong -- I was doing imputed.value = rnorm(N,mean.from.predict,var.of.prediction.interval). This recycled the mean and sd vectors from the lm predict or glm predict functions when length(predict)<N, which was quite different from what I was seeking.

So my question is what about my example code is stopping glm and lm from predicting missing values?

Thanks!


回答1:


When glm fits the model, it uses only the cases where there are no missing values. You can still get predictions for the cases where your y values are missing, by constructing a data frame and passing that to predict.glm.

predict(m, newdata=data.frame(y, x))



回答2:


The issue is with your call to glm, which has a na.action argument which is set to na.omit

Therefore these values are omited (and when predict.glm is called, they are still omitted)

From ?glm

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The ‘factory-fresh’ default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.

from ?na.exclude (which is general NA action help page)

na.exclude differs from na.omit only in the class of the "na.action" attribute of the result, which is "exclude". This gives different behaviour in functions making use of naresid and napredict: when na.exclude is used the residuals and predictions are padded to the correct length by inserting NAs for cases omitted by na.exclude.




回答3:


I'm not sure where you got the idea that R's regression functions should be expected to automatically impute missing values. That's just not a correct reading of the glm help page. If you have predictions for things that you "think" are missing values in data you have not provided, my guess is that they are not actually missing but are perhaps levels with a label of "NA". That is not a missing value in R. Show us str(chr.imp) for the dataset you are working with. The "imp" part of that name makes me think you (or someone before you) has constructed some imputations.

If you wnat to impute data, then you first need to read up on hte issues involved and then pick a package to do it. To search for such packages try this:

install.packages("sos")
require(sos)
 findFn("impute")
#---------
found 834 matches;  retrieving 20 pages, 400 matches.
2 3 4 5 6 7 8 9 10 
11 12 13 14 15 16 17 18 19 20 

Downloaded 383 links in 118 packages.


来源:https://stackoverflow.com/questions/16265798/predict-glm-not-predicting-missing-values-in-response

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!