Aligning Data frame with missing values

后端 未结 4 1401
盖世英雄少女心
盖世英雄少女心 2020-12-04 00:31

I\'m using a data frame with many NA values. While I\'m able to create a linear model, I am subsequently unable to line the fitted values of the model up with

相关标签:
4条回答
  • 2020-12-04 00:38

    There are actually three solutions here:

    1. pad NA to fitted values ourselves;
    2. use predict() to compute fitted values;
    3. drop incomplete cases ourselves and pass only complete cases to lm().

    Option 1

    ## row indicator with `NA`
    id <- attr(na.omit(dat), "na.action")
    fitted <- rep(NA, nrow(dat))
    fitted[-id] <- model$fitted
    nrow(dat)
    # 2843
    length(fitted)
    # 2843
    sum(!is.na(fitted))
    # 2745
    

    Option 2

    ## the default NA action for "predict.lm" is "na.pass"
    pred <- predict(model, newdata = dat)  ## has to use "newdata = dat" here!
    nrow(dat)
    # 2843
    length(pred)
    # 2843
    sum(!is.na(pred))
    # 2745
    

    Option 3

    Alternatively, you might simply pass a data frame without any NA to lm():

    complete.dat <- na.omit(dat)
    fit <- lm(death ~ diag + age, data = complete.dat)
    nrow(complete.dat)
    # 2745
    length(fit$fitted)
    # 2745
    sum(!is.na(fit$fitted))
    # 2745
    

    In summary,

    • Option 1 does the "alignment" in a straightforward manner by padding NA, but I think people seldom take this approach;
    • Option 2 is really simple, but it is more computationally costly;
    • Option 3 is my favourite as it keeps all things simple.
    0 讨论(0)
  • 2020-12-04 00:54

    My answer is an extension to @ithomps solution:

    for(i in 1:nrow(data)){
      data$fitted.values.men[i]<- ifelse(data$sex == 1, 
        fit.males$fitted.values[paste(i)], "NA")
      data$fitted.values.women[i]<- ifelse(data$sex == 0, 
        fit.females$fitted.values[paste(i)], "NA")
      data$fitted.values.combined[i]<- fit.combo$fitted.values[paste(i)]
    }
    

    Because in my case I ran three models: 1 for males, 1 for females, and 1 for the combined. And to make things "more" convenient: males and females are randomly distributed in my data. Also, I'll have missing data as input for lm(), so I did fit <- lm(y~x, data = data, na.action = na.exclude) to get NAs in my model-object (fit).

    Hope this helps others.

    (I found it pretty hard to formulate my issue/question, glad I found this post!)

    0 讨论(0)
  • 2020-12-04 00:56

    I use a simple for loop. The fitted values have an attribute (name) of the original row they belonged to. Therefore:

    for(i in 1:nrow(data)){
      data$fitted.values[i]<-
        fit$fitted.values[paste(i)]
    }
    

    "data" is your original data frame. Fit is the object from the model (i.e. fit <- lm(y~x, data = data))

    0 讨论(0)
  • 2020-12-04 00:56

    If you do not want to change the raw data. Try this way, it's really simple.

    names(fitted.values(model)) are data's rownames of available observations, and we can use this feature to add new column:

    dat[names(fitted.values(model)), "fitted.values"] <- fitted.values(model)
    sum(!is.na(dat[, "fitted.values"]))
    # [1] 2745
    
    0 讨论(0)
提交回复
热议问题