How to fix the false positives rate of a linear SVM?

前端 未结 2 1520
盖世英雄少女心
盖世英雄少女心 2021-02-20 17:44

I am an SVM newbie and this is my use case: I have a lot of unbalanced data to be binary classified using a linear SVM. I need to fix the false positives rate at certain values

2条回答
  •  醉梦人生
    2021-02-20 18:01

    The predict method for LinearSVC in sklearn looks like this

    def predict(self, X):
        """Predict class labels for samples in X.
    
        Parameters
        ----------
        X : {array-like, sparse matrix}, shape = [n_samples, n_features]
            Samples.
    
        Returns
        -------
        C : array, shape = [n_samples]
            Predicted class label per sample.
        """
        scores = self.decision_function(X)
        if len(scores.shape) == 1:
            indices = (scores > 0).astype(np.int)
        else:
            indices = scores.argmax(axis=1)
        return self.classes_[indices]
    

    So in addition to what mbatchkarov suggested you can change the decisions made by the classifier (any classifier really) by changing the boundary at which the classifier says something is of one class or the other.

    from collections import Counter
    import numpy as np
    from sklearn.datasets import load_iris
    from sklearn.svm import LinearSVC
    
    data = load_iris()
    
    # remove a feature to make the problem harder
    # remove the third class for simplicity
    X = data.data[:100, 0:1] 
    y = data.target[:100] 
    # shuffle data
    indices = np.arange(y.shape[0])
    np.random.shuffle(indices)
    X = X[indices, :]
    y = y[indices]
    
    decision_boundary = 0
    print Counter((clf.decision_function(X[50:]) > decision_boundary).astype(np.int8))
    Counter({1: 27, 0: 23})
    
    decision_boundary = 0.5
    print Counter((clf.decision_function(X[50:]) > decision_boundary).astype(np.int8))
    Counter({0: 39, 1: 11})
    

    You can optimize the decision boundary to be anything depending on your needs.

提交回复
热议问题