I\'m using a data frame with many NA
values. While I\'m able to create a linear model, I am subsequently unable to line the fitted values of the model up with
There are actually three solutions here:
NA
to fitted values ourselves;predict()
to compute fitted values;lm()
.Option 1
## row indicator with `NA`
id <- attr(na.omit(dat), "na.action")
fitted <- rep(NA, nrow(dat))
fitted[-id] <- model$fitted
nrow(dat)
# 2843
length(fitted)
# 2843
sum(!is.na(fitted))
# 2745
Option 2
## the default NA action for "predict.lm" is "na.pass"
pred <- predict(model, newdata = dat) ## has to use "newdata = dat" here!
nrow(dat)
# 2843
length(pred)
# 2843
sum(!is.na(pred))
# 2745
Option 3
Alternatively, you might simply pass a data frame without any NA
to lm()
:
complete.dat <- na.omit(dat)
fit <- lm(death ~ diag + age, data = complete.dat)
nrow(complete.dat)
# 2745
length(fit$fitted)
# 2745
sum(!is.na(fit$fitted))
# 2745
In summary,
NA
, but I think people seldom take this approach;My answer is an extension to @ithomps solution:
for(i in 1:nrow(data)){
data$fitted.values.men[i]<- ifelse(data$sex == 1,
fit.males$fitted.values[paste(i)], "NA")
data$fitted.values.women[i]<- ifelse(data$sex == 0,
fit.females$fitted.values[paste(i)], "NA")
data$fitted.values.combined[i]<- fit.combo$fitted.values[paste(i)]
}
Because in my case I ran three models: 1 for males, 1 for females, and 1 for the combined. And to make things "more" convenient: males and females are randomly distributed in my data
. Also, I'll have missing data as input for lm()
, so I did fit <- lm(y~x, data = data, na.action = na.exclude)
to get NAs in my model-object (fit
).
Hope this helps others.
(I found it pretty hard to formulate my issue/question, glad I found this post!)
I use a simple for loop. The fitted values have an attribute (name) of the original row they belonged to. Therefore:
for(i in 1:nrow(data)){
data$fitted.values[i]<-
fit$fitted.values[paste(i)]
}
"data" is your original data frame. Fit is the object from the model (i.e. fit <- lm(y~x, data = data))
If you do not want to change the raw data. Try this way, it's really simple.
names(fitted.values(model))
are data's rownames of available observations, and we can use this feature to add new column:
dat[names(fitted.values(model)), "fitted.values"] <- fitted.values(model)
sum(!is.na(dat[, "fitted.values"]))
# [1] 2745