PySpark & MLLib: Class Probabilities of Random Forest Predictions

后端 未结 4 1889
忘掉有多难
忘掉有多难 2021-02-03 12:19

I\'m trying to extract the class probabilities of a random forest object I have trained using PySpark. However, I do not see an example of it anywhere in the documentation, nor

4条回答
  •  攒了一身酷
    2021-02-03 13:00

    Probably people would have moved on with this post, but i was hit by the same problem today when trying to compute the accuracy for the multi-class classifier against a training set. So I thought I share my experience if someone is trying with mllib ...

    probability can be computed fairly easy as follows:-

    # say you have a testset against which you want to run your classifier
       (trainingset, testset) =data.randomSplit([0.7, 0.3])
       # I converted the spark dataset containing the test data to pandas
         ptd=testData.toPandas()
    
       #Now get a count of number of labels matching the predictions
    
       correct = ((ptd.label-1) == (predictions)).sum() 
       # here we had to change the labels from 0-9 as opposed to 1-10 since
       #labels take the values from 0 .. numClasses-1
    
       m=ptd.shape[0]
       print((correct/m)*100)
    

提交回复
热议问题