PySpark & MLLib: Class Probabilities of Random Forest Predictions

后端未结

关注

 4  1900

忘掉有多难 2021-02-03 12:19

I\'m trying to extract the class probabilities of a random forest object I have trained using PySpark. However, I do not see an example of it anywhere in the documentation, nor

4条回答

攒了一身酷 (楼主)

2021-02-03 13:00

Probably people would have moved on with this post, but i was hit by the same problem today when trying to compute the accuracy for the multi-class classifier against a training set. So I thought I share my experience if someone is trying with mllib ...

probability can be computed fairly easy as follows:-

# say you have a testset against which you want to run your classifier
   (trainingset, testset) =data.randomSplit([0.7, 0.3])
   # I converted the spark dataset containing the test data to pandas
     ptd=testData.toPandas()

   #Now get a count of number of labels matching the predictions

   correct = ((ptd.label-1) == (predictions)).sum() 
   # here we had to change the labels from 0-9 as opposed to 1-10 since
   #labels take the values from 0 .. numClasses-1

   m=ptd.shape[0]
   print((correct/m)*100)

0 讨论(0)

查看其它4个回答