I want to run a stepwise regression in R to choose the best fit model, my code is attached here:
full.modelfixed <- glm(died_ed ~ age_1 + gender + race + insu
This should probably be in a stats forum (stats.stackexchange) but briefly there are a number of considerations.
The main one is that when comparing two models they need to be fitted on the same dataset (i.e you need to be able to nest the models within each other).
For examples
glm1 <- glm(Dependent~indep1+indep2+indep3, family = binomial, data = data)
glm2 <- glm(Dependent~indep2+indep2, family = binomial, data = data)
Now imagine that we are missing values of indep3 but not indep1 or indep2. When we run glm1 we are running it on a smaller dataset - the dataset for which we have the dependent variable and all three independent ones (i.e we exclude any rows where indep3 values are missing).
When we run glm2 the rows missing a value for indep3 are included because those rows do contain dependent, indep1 and indep2 which are the models in the variable.
We can no longer directly compare models as they are fitted on different datasets.
I think broadly you can either 1) Limit to data which is complete 2) If appropriate consider multiple imputation
Hope that helps.
You can use the MICE package to do imputation, then working with the dataset will not give you errors