Is it possible to obtain class probabilities using GradientBoostedTrees with spark mllib?

十年热恋 提交于 2020-01-16 01:10:55

问题


I am currently working with spark mllib.

I have created a text classifier using the Gradient Boosting algorithm with the class GradientBoostedTrees:

Gradient Boosted Trees

Currently I obtain the predictions to know the class of new elements but I would like to obtain the class probabilities (the value of the output before the hard decision).

In other mllib algorithms like logistic regression you can remove the threshold from the classifier to obtain the class probabilities but I can not find a way to do the same procedure with GradientBosstedTrees.


回答1:


As far as I know, it's not currently possible but it is possible with random forest.




回答2:


You can see this link...I have explained a procedure here Predicting probabilities of classes in case of Gradient Boosting Trees in Spark using the tree output

In order to implement the predicted probabilities and thresholds one need to write program using the trees from

print(model.toDebugString)

output. I tried to understand how the tree works to predict which is fairly simple outside Spark.




回答3:


It seems that in Spark MLLIB it is not possible to obtain the class probabilities.

You can only obtain the final classification decision.

That's a pity because that information would be very useful (If you classify a sample as positive with 99.99% of posibilities is not the same than 51%) and it is not difficult to obtain that information once the model has been trained.

An alternative is using a different software like xgboost: https://github.com/dmlc/xgboost



来源:https://stackoverflow.com/questions/34208496/is-it-possible-to-obtain-class-probabilities-using-gradientboostedtrees-with-spa

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!