What does the score of the Spark MLLib SVM output mean?

后端 未结 2 1986
刺人心
刺人心 2021-01-15 09:33

I do not understand the output of the SVM classifier from the Spark MLLib algorithm. I want to convert the score to a probability, so that I get a probability for a data-poi

相关标签:
2条回答
  • 2021-01-15 10:01
    import org.apache.spark.mllib.classification.{SVMModel, SVMWithSGD}
    import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics
    import org.apache.spark.mllib.util.MLUtils
    
    // Load training data in LIBSVM format.
    val data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")
    
    // Split data into training (60%) and test (40%).
    val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L)
    val training = splits(0).cache()
    val test = splits(1)
    
    // Run training algorithm to build the model
    val numIterations = 100
    val model = SVMWithSGD.train(training, numIterations)
    
    // Clear the default threshold.
    model.clearThreshold()
    
    // Compute raw scores on the test set.
    val scoreAndLabels = test.map { point =>
      val score = model.predict(point.features)
      (score, point.label)
    }
    
    // Get evaluation metrics.
    val metrics = new BinaryClassificationMetrics(scoreAndLabels)
    val auROC = metrics.areaUnderROC()
    
    println("Area under ROC = " + auROC)
    
    // Save and load model
    model.save(sc, "myModelPath")
    val sameModel = SVMModel.load(sc, "myModelPath")
    

    If you are using SVM module in MLLib , they provide you the AUC which is area under ROC curve and it is equivalent to "Accuracy" . Hope it helps.

    0 讨论(0)
  • 2021-01-15 10:02

    The value is the margin -- distance to separating hyperplane. It is not a probability, and SVMs do not in general give you a probability. However as comments by @cfh note, you can try to learn probabilities based on this margin. But that's separate from the SVM.

    0 讨论(0)
提交回复
热议问题