问题
I'm using a data frame with many NA
values. While I'm able to create a linear model, I am subsequently unable to line the fitted values of the model up with the original data due to the missing values and lack of indicator column.
Here's a reproducible example:
library(MASS)
dat <- Aids2
# Add NA's
dat[floor(runif(100, min = 1, max = nrow(dat))),3] <- NA
# Create a model
model <- lm(death ~ diag + age, data = dat)
# Different Values
length(fitted.values(model))
# 2745
nrow(dat)
# 2843
回答1:
There are actually three solutions here:
- pad
NA
to fitted values ourselves; - use
predict()
to compute fitted values; - drop incomplete cases ourselves and pass only complete cases to
lm()
.
Option 1
## row indicator with `NA`
id <- attr(na.omit(dat), "na.action")
fitted <- rep(NA, nrow(dat))
fitted[-id] <- model$fitted
nrow(dat)
# 2843
length(fitted)
# 2843
sum(!is.na(fitted))
# 2745
Option 2
## the default NA action for "predict.lm" is "na.pass"
pred <- predict(model, newdata = dat) ## has to use "newdata = dat" here!
nrow(dat)
# 2843
length(pred)
# 2843
sum(!is.na(pred))
# 2745
Option 3
Alternatively, you might simply pass a data frame without any NA
to lm()
:
complete.dat <- na.omit(dat)
fit <- lm(death ~ diag + age, data = complete.dat)
nrow(complete.dat)
# 2745
length(fit$fitted)
# 2745
sum(!is.na(fit$fitted))
# 2745
In summary,
- Option 1 does the "alignment" in a straightforward manner by padding
NA
, but I think people seldom take this approach; - Option 2 is really simple, but it is more computationally costly;
- Option 3 is my favourite as it keeps all things simple.
回答2:
I use a simple for loop. The fitted values have an attribute (name) of the original row they belonged to. Therefore:
for(i in 1:nrow(data)){
data$fitted.values[i]<-
fit$fitted.values[paste(i)]
}
"data" is your original data frame. Fit is the object from the model (i.e. fit <- lm(y~x, data = data))
回答3:
My answer is an extension to @ithomps solution:
for(i in 1:nrow(data)){
data$fitted.values.men[i]<- ifelse(data$sex == 1,
fit.males$fitted.values[paste(i)], "NA")
data$fitted.values.women[i]<- ifelse(data$sex == 0,
fit.females$fitted.values[paste(i)], "NA")
data$fitted.values.combined[i]<- fit.combo$fitted.values[paste(i)]
}
Because in my case I ran three models: 1 for males, 1 for females, and 1 for the combined. And to make things "more" convenient: males and females are randomly distributed in my data
. Also, I'll have missing data as input for lm()
, so I did fit <- lm(y~x, data = data, na.action = na.exclude)
to get NAs in my model-object (fit
).
Hope this helps others.
(I found it pretty hard to formulate my issue/question, glad I found this post!)
来源:https://stackoverflow.com/questions/38253295/aligning-data-frame-with-missing-values