“Factor has new levels” error for variable I'm not using

前端 未结 2 1319
隐瞒了意图╮
隐瞒了意图╮ 2020-12-14 01:03

Consider a simple dataset, split into a training and testing set:

dat <- data.frame(x=1:5, y=c(\"a\", \"b\", \"c\", \"d\", \"e\"), z=c(0, 0, 1, 0, 1))
tra         


        
相关标签:
2条回答
  • 2020-12-14 01:06

    You could try updating mod2$xlevels[["y"]] in the model object

    mod2 <- glm(z~.-y, data=train, family="binomial")
    mod2$xlevels[["y"]] <- union(mod2$xlevels[["y"]], levels(test$y))
    
    predict(mod2, newdata=test, type="response")
    #        5 
    #0.5546394 
    

    Another option would be to exclude (but not remove) "y" from the training data

    mod2 <- glm(z~., data=train[,!colnames(train) %in% c("y")], family="binomial")
    predict(mod2, newdata=test, type="response")
    #        5 
    #0.5546394 
    
    0 讨论(0)
  • 2020-12-14 01:15

    I was confused about this issue for a long time. However, there was a simple solution to this. One of the variable "traffic type" had 20 factors and for one factor ie 17 there was only one row. Hence this row could be present either in train data or test data. In my case it was present in test data hence the error came - factor "traffic type" has a new level of 17 because there is no row with level 17in train data. I deleted this row from data set and model runs perfectly fine

    0 讨论(0)
提交回复
热议问题