问题
I am trying to make sure that all my features of type factors are represented fully (in terms of all possible factor levels) both in my tree object and in my test set for prediction.
for (j in 1:length(predictors)){
if (is.factor(Test[,j])){
ct [[names(predictors)[j]]] <- union(ct$xlevels[[names(predictors)[j]]], levels(Test[,c(names(predictors)[j])]))
}
}
however, for object ct (ctree from package party) I can't seem to understand how to access the features' factor levels, as I am getting an error
Error in ct$xlevels : $ operator not defined for this S4 class
回答1:
I had this problem countless times and today I come up with a little hack that should make not needed to fix levels' discrepancy in factors.
Just make the model on the whole dataset (train + test) giving zero weight to test observations. This way the ctree model will not drop factor levels.
a <- ctree(Y ~ ., DF[train.IDs,]) %>% predict(newdata = DF) # Would trigger error if the data passed to predict would not match the train data levels
b <- ctree(Y ~ ., weights = as.numeric((1:nrow(DF) %in% train.IDs)), data = DF) %>% predict(newdata = DF) # passing the IDs as 0-1 in the weights instead of subsetting the data solves it
mean(a == b) # test that predictions are equals, should be 1
Tell me if it works as expected!
来源:https://stackoverflow.com/questions/33583391/r-update-ctree-package-party-features-factors-levels