How to create/customize your own scorer function in scikit-learn?

后端 未结 2 744
太阳男子
太阳男子 2021-01-30 08:47

I am using Support Vector Regression as an estimator in GridSearchCV. But I want to change the error function: instead of using the default (R-squared: coefficient of determinat

相关标签:
2条回答
  • 2021-01-30 09:30

    Jamie has a fleshed out example, but here's an example using make_scorer straight from scikit-learn documentation:

    import numpy as np
    def my_custom_loss_func(ground_truth, predictions):
        diff = np.abs(ground_truth - predictions).max()
        return np.log(1 + diff)
    
    # loss_func will negate the return value of my_custom_loss_func,
    #  which will be np.log(2), 0.693, given the values for ground_truth
    #  and predictions defined below.
    loss  = make_scorer(my_custom_loss_func, greater_is_better=False)
    score = make_scorer(my_custom_loss_func, greater_is_better=True)
    ground_truth = [[1, 1]]
    predictions  = [0, 1]
    from sklearn.dummy import DummyClassifier
    clf = DummyClassifier(strategy='most_frequent', random_state=0)
    clf = clf.fit(ground_truth, predictions)
    loss(clf,ground_truth, predictions) 
    
    score(clf,ground_truth, predictions)
    

    When defining a custom scorer via sklearn.metrics.make_scorer, the convention is that custom functions ending in _score return a value to maximize. And for scorers ending in _loss or _error, a value is returned to be minimized. You can use this functionality by setting the greater_is_better parameter inside make_scorer. That is, this parameter would be True for scorers where higher values are better, and False for scorers where lower values are better. GridSearchCV can then optimize in the appropriate direction.

    You can then convert your function as a scorer as follows:

    from sklearn.metrics.scorer import make_scorer
    
    def custom_loss_func(X_train_scaled, Y_train_scaled):
        error, M = 0, 0
        for i in range(0, len(Y_train_scaled)):
            z = (Y_train_scaled[i] - M)
            if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) > 0:
                error_i = (abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z))
            if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) < 0:
                error_i = -(abs((Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z)))
            if X_train_scaled[i] > M and Y_train_scaled[i] < M:
                error_i = -(abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(-z))
        error += error_i
        return error
    
    
    custom_scorer = make_scorer(custom_loss_func, greater_is_better=True)
    

    And then pass custom_scorer into GridSearchCV as you would any other scoring function: clf = GridSearchCV(scoring=custom_scorer).

    0 讨论(0)
  • 2021-01-30 09:31

    As you saw, this is done by using make_scorer (docs).

    from sklearn.grid_search import GridSearchCV
    from sklearn.metrics import make_scorer
    from sklearn.svm import SVR
    
    import numpy as np
    
    rng = np.random.RandomState(1)
    
    def my_custom_loss_func(X_train_scaled, Y_train_scaled):
        error, M = 0, 0
        for i in range(0, len(Y_train_scaled)):
            z = (Y_train_scaled[i] - M)
            if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) > 0:
                error_i = (abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z))
            if X_train_scaled[i] > M and Y_train_scaled[i] > M and (X_train_scaled[i] - Y_train_scaled[i]) < 0:
                error_i = -(abs((Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(z)))
            if X_train_scaled[i] > M and Y_train_scaled[i] < M:
                error_i = -(abs(Y_train_scaled[i] - X_train_scaled[i]))**(2*np.exp(-z))
        error += error_i
        return error
    
    # Generate sample data
    X = 5 * rng.rand(10000, 1)
    y = np.sin(X).ravel()
    
    # Add noise to targets
    y[::5] += 3 * (0.5 - rng.rand(X.shape[0]/5))
    
    train_size = 100
    
    my_scorer = make_scorer(my_custom_loss_func, greater_is_better=True)
    
    svr = GridSearchCV(SVR(kernel='rbf', gamma=0.1),
                       scoring=my_scorer,
                       cv=5,
                       param_grid={"C": [1e0, 1e1, 1e2, 1e3],
                                   "gamma": np.logspace(-2, 2, 5)})
    
    svr.fit(X[:train_size], y[:train_size])
    
    print svr.best_params_
    print svr.score(X[train_size:], y[train_size:])
    
    0 讨论(0)
提交回复
热议问题