How to graph grid scores from GridSearchCV?

前端 未结 10 1023
旧时难觅i
旧时难觅i 2021-01-30 03:19

I am looking for a way to graph grid_scores_ from GridSearchCV in sklearn. In this example I am trying to grid search for best gamma and C parameters for an SVR algorithm. My c

相关标签:
10条回答
  • 2021-01-30 04:13

    The order that the parameter grid is traversed is deterministic, such that it can be reshaped and plotted straightforwardly. Something like this:

    scores = [entry.mean_validation_score for entry in grid.grid_scores_]
    # the shape is according to the alphabetical order of the parameters in the grid
    scores = np.array(scores).reshape(len(C_range), len(gamma_range))
    for c_scores in scores:
        plt.plot(gamma_range, c_scores, '-')
    
    0 讨论(0)
  • 2021-01-30 04:13

    I used grid search on xgboost with different learning rates, max depths and number of estimators.

    gs_param_grid = {'max_depth': [3,4,5], 
                     'n_estimators' : [x for x in range(3000,5000,250)],
                     'learning_rate':[0.01,0.03,0.1]
                    }
    gbm = XGBRegressor()
    grid_gbm = GridSearchCV(estimator=gbm, 
                            param_grid=gs_param_grid, 
                            scoring='neg_mean_squared_error', 
                            cv=4, 
                            verbose=1
                           )
    grid_gbm.fit(X_train,y_train)
    

    To create the graph for error vs number of estimators with different learning rates, I used the following approach:

    y=[]
    cvres = grid_gbm.cv_results_
    best_md=grid_gbm.best_params_['max_depth']
    la=gs_param_grid['learning_rate']
    n_estimators=gs_param_grid['n_estimators']
    
    for mean_score, params in zip(cvres["mean_test_score"], cvres["params"]):
        if params["max_depth"]==best_md:
            y.append(np.sqrt(-mean_score))
    
    
    y=np.array(y).reshape(len(la),len(n_estimators))
    
    %matplotlib inline
    plt.figure(figsize=(8,8))
    for y_arr, label in zip(y, la):
        plt.plot(n_estimators, y_arr, label=label)
    
    plt.title('Error for different learning rates(keeping max_depth=%d(best_param))'%best_md)
    plt.legend()
    plt.xlabel('n_estimators')
    plt.ylabel('Error')
    plt.show()
    

    The plot can be viewed here: Result

    Note that the graph can similarly be created for error vs number of estimators with different max depth (or any other parameters as per the user's case).

    0 讨论(0)
  • 2021-01-30 04:15
    from sklearn.svm import SVC
    from sklearn.grid_search import GridSearchCV
    from sklearn import datasets
    import matplotlib.pyplot as plt
    import seaborn as sns
    import numpy as np
    
    digits = datasets.load_digits()
    X = digits.data
    y = digits.target
    
    clf_ = SVC(kernel='rbf')
    Cs = [1, 10, 100, 1000]
    Gammas = [1e-3, 1e-4]
    clf = GridSearchCV(clf_,
                dict(C=Cs,
                     gamma=Gammas),
                     cv=2,
                     pre_dispatch='1*n_jobs',
                     n_jobs=1)
    
    clf.fit(X, y)
    
    scores = [x[1] for x in clf.grid_scores_]
    scores = np.array(scores).reshape(len(Cs), len(Gammas))
    
    for ind, i in enumerate(Cs):
        plt.plot(Gammas, scores[ind], label='C: ' + str(i))
    plt.legend()
    plt.xlabel('Gamma')
    plt.ylabel('Mean score')
    plt.show()
    
    • Code is based on this.
    • Only puzzling part: will sklearn always respect the order of C & Gamma -> official example uses this "ordering"

    Output:

    0 讨论(0)
  • 2021-01-30 04:16

    For plotting the results when tuning several hyperparameters, what I did was fixed all parameters to their best value except for one and plotted the mean score for the other parameter for each of its values.

    def plot_search_results(grid):
        """
        Params: 
            grid: A trained GridSearchCV object.
        """
        ## Results from grid search
        results = grid.cv_results_
        means_test = results['mean_test_score']
        stds_test = results['std_test_score']
        means_train = results['mean_train_score']
        stds_train = results['std_train_score']
    
        ## Getting indexes of values per hyper-parameter
        masks=[]
        masks_names= list(grid.best_params_.keys())
        for p_k, p_v in grid.best_params_.items():
            masks.append(list(results['param_'+p_k].data==p_v))
    
        params=grid.param_grid
    
        ## Ploting results
        fig, ax = plt.subplots(1,len(params),sharex='none', sharey='all',figsize=(20,5))
        fig.suptitle('Score per parameter')
        fig.text(0.04, 0.5, 'MEAN SCORE', va='center', rotation='vertical')
        pram_preformace_in_best = {}
        for i, p in enumerate(masks_names):
            m = np.stack(masks[:i] + masks[i+1:])
            pram_preformace_in_best
            best_parms_mask = m.all(axis=0)
            best_index = np.where(best_parms_mask)[0]
            x = np.array(params[p])
            y_1 = np.array(means_test[best_index])
            e_1 = np.array(stds_test[best_index])
            y_2 = np.array(means_train[best_index])
            e_2 = np.array(stds_train[best_index])
            ax[i].errorbar(x, y_1, e_1, linestyle='--', marker='o', label='test')
            ax[i].errorbar(x, y_2, e_2, linestyle='-', marker='^',label='train' )
            ax[i].set_xlabel(p.upper())
    
        plt.legend()
        plt.show()
    

    Result

    0 讨论(0)
提交回复
热议问题