Alternate different models in Pipeline for GridSearchCV

后端 未结 2 1889
隐瞒了意图╮
隐瞒了意图╮ 2021-02-09 03:11

I want to build a Pipeline in sklearn and test different models using GridSearchCV.

Just an example (please do not pay attention on what particular models are chosen):

2条回答
  •  栀梦
    栀梦 (楼主)
    2021-02-09 03:18

    An alternative solution that does not require to prefix the estimators names in the parameter grid is the following:

    from sklearn.ensemble import RandomForestClassifier
    from sklearn.neighbors import KNeighborsClassifier
    from sklearn.linear_model import LogisticRegression
    
    # the models that you want to compare
    models = {
        'RandomForestClassifier': RandomForestClassifier(),
        'KNeighboursClassifier': KNeighborsClassifier(),
        'LogisticRegression': LogisticRegression()
    }
    
    # the optimisation parameters for each of the above models
    params = {
        'RandomForestClassifier':{ 
                "n_estimators"      : [100, 200, 500, 1000],
                "max_features"      : ["auto", "sqrt", "log2"],
                "bootstrap": [True],
                "criterion": ['gini', 'entropy'],
                "oob_score": [True, False]
                },
        'KNeighboursClassifier': {
            'n_neighbors': np.arange(3, 15),
            'weights': ['uniform', 'distance'],
            'algorithm': ['ball_tree', 'kd_tree', 'brute']
            },
        'LogisticRegression': {
            'solver': ['newton-cg', 'sag', 'lbfgs'],
            'multi_class': ['ovr', 'multinomial']
            }  
    }
    

    and you can define:

    from sklearn.model_selection import GridSearchCV
    
    def fit(train_features, train_actuals):
            """
            fits the list of models to the training data, thereby obtaining in each 
            case an evaluation score after GridSearchCV cross-validation
            """
            for name in models.keys():
                est = models[name]
                est_params = params[name]
                gscv = GridSearchCV(estimator=est, param_grid=est_params, cv=5)
                gscv.fit(train_features, train_actuals)
                print("best parameters are: {}".format(gscv.best_estimator_))
    

    basically running through the different models, each model referring to its own set of optimisation parameters through a dictionary. Of course do not forget to pass the models and the parameters dictionary to the fit function, in case you do not have them as global variables. Have a look at this GitHub project for a more complete overview.

提交回复
热议问题