问题
I am currently working with spark mllib.
I have created a text classifier using the Gradient Boosting algorithm with the class GradientBoostedTrees:
Gradient Boosted Trees
Currently I obtain the predictions to know the class of new elements but I would like to obtain the class probabilities (the value of the output before the hard decision).
In other mllib algorithms like logistic regression you can remove the threshold from the classifier to obtain the class probabilities but I can not find a way to do the same procedure with GradientBosstedTrees.
回答1:
As far as I know, it's not currently possible but it is possible with random forest.
回答2:
You can see this link...I have explained a procedure here Predicting probabilities of classes in case of Gradient Boosting Trees in Spark using the tree output
In order to implement the predicted probabilities and thresholds one need to write program using the trees from
print(model.toDebugString)
output. I tried to understand how the tree works to predict which is fairly simple outside Spark.
回答3:
It seems that in Spark MLLIB it is not possible to obtain the class probabilities.
You can only obtain the final classification decision.
That's a pity because that information would be very useful (If you classify a sample as positive with 99.99% of posibilities is not the same than 51%) and it is not difficult to obtain that information once the model has been trained.
An alternative is using a different software like xgboost: https://github.com/dmlc/xgboost
来源:https://stackoverflow.com/questions/34208496/is-it-possible-to-obtain-class-probabilities-using-gradientboostedtrees-with-spa