predict.lm() with an unknown factor level in test data

前端 未结 7 548
我在风中等你
我在风中等你 2020-11-28 06:22

I am fitting a model to factor data and predicting. If the newdata in predict.lm() contains a single factor level that is unknown to the model,

相关标签:
7条回答
  • 2020-11-28 07:28

    If you want to deal with the missing levels in your data after creating your lm model but before calling predict (given we don't know exactly what levels might be missing beforehand) here is function I've built to set all levels not in the model to NA - the prediction will also then give NA and you can then use an alternative method to predict these values.

    object will be your lm output from lm(...,data=trainData)

    data will be the data frame you want to create predictions for

    missingLevelsToNA<-function(object,data){
    
      #Obtain factor predictors in the model and their levels ------------------
    
      factors<-(gsub("[-^0-9]|as.factor|\\(|\\)", "",names(unlist(object$xlevels))))
      factorLevels<-unname(unlist(object$xlevels))
      modelFactors<-as.data.frame(cbind(factors,factorLevels))
    
    
      #Select column names in your data that are factor predictors in your model -----
    
      predictors<-names(data[names(data) %in% factors])
    
    
      #For each factor predictor in your data if the level is not in the model set the value to NA --------------
    
      for (i in 1:length(predictors)){
        found<-data[,predictors[i]] %in% modelFactors[modelFactors$factors==predictors[i],]$factorLevels
        if (any(!found)) data[!found,predictors[i]]<-NA
      }
    
      data
    
    }
    
    0 讨论(0)
提交回复
热议问题