GridSearch over MultiOutputRegressor?

前端 未结 3 1778
小鲜肉
小鲜肉 2021-02-01 06:31

Let\'s consider a multivariate regression problem (2 response variables: Latitude and Longitude). Currently, a few machine learning model implementations like Support Vector Reg

相关标签:
3条回答
  • 2021-02-01 06:52

    I just found a working solution. In the case of nested estimators, the parameters of the inner estimator can be accessed by estimator__.

    from sklearn.multioutput import MultiOutputRegressor
    from sklearn.svm import SVR
    from sklearn.model_selection import GridSearchCV
    from sklearn.pipeline import Pipeline
    
    pipe_svr = Pipeline([('scl', StandardScaler()),
            ('reg', MultiOutputRegressor(SVR()))])
    
    grid_param_svr = {
        'reg__estimator__C': [0.1,1,10]
    }
    
    gs_svr = (GridSearchCV(estimator=pipe_svr, 
                          param_grid=grid_param_svr, 
                          cv=2,
                          scoring = 'neg_mean_squared_error',
                          n_jobs = -1))
    
    gs_svr = gs_svr.fit(X_train,y_train)
    gs_svr.best_estimator_    
    
    Pipeline(steps=[('scl', StandardScaler(copy=True, with_mean=True, with_std=True)), 
    ('reg', MultiOutputRegressor(estimator=SVR(C=10, cache_size=200,
     coef0=0.0, degree=3, epsilon=0.1, gamma='auto', kernel='rbf', max_iter=-1,    
     shrinking=True, tol=0.001, verbose=False), n_jobs=1))])
    
    0 讨论(0)
  • 2021-02-01 06:55

    Thank you, Marco.

    Adding to your answer here is a short illustrative example of a Randomized Search applied to a Multi-Ouput GradientBoostingRegressor.

    from sklearn.datasets import load_linnerud
    from sklearn.ensemble import GradientBoostingRegressor
    from sklearn.multioutput import MultiOutputRegressor
    from sklearn.model_selection import RandomizedSearchCV
    
    x, y = load_linnerud(return_X_y=True)
    
    model = MultiOutputRegressor(GradientBoostingRegressor(loss='ls', learning_rate=0.1, n_estimators=100, subsample=1.0,
                                                           criterion='friedman_mse', min_samples_split=2,
                                                           min_samples_leaf=1,
                                                           min_weight_fraction_leaf=0.0, max_depth=3,
                                                           min_impurity_decrease=0.0,
                                                           min_impurity_split=None, init=None, random_state=None,
                                                           max_features=None,
                                                           alpha=0.9, verbose=0, max_leaf_nodes=None, warm_start=False,
                                                           validation_fraction=0.1, n_iter_no_change=None, tol=0.0001,
                                                           ccp_alpha=0.0))
    
    hyperparameters = dict(estimator__learning_rate=[0.05, 0.1, 0.2, 0.5, 0.9], estimator__loss=['ls', 'lad', 'huber'],
                         estimator__n_estimators=[20, 50, 100, 200, 300, 500, 700, 1000],
                         estimator__criterion=['friedman_mse', 'mse'], estimator__min_samples_split=[2, 4, 7, 10],
                         estimator__max_depth=[3, 5, 10, 15, 20, 30], estimator__min_samples_leaf=[1, 2, 3, 5, 8, 10],
                         estimator__min_impurity_decrease=[0, 0.2, 0.4, 0.6, 0.8],
                         estimator__max_leaf_nodes=[5, 10, 20, 30, 50, 100, 300])
    
    randomized_search = RandomizedSearchCV(model, hyperparameters, random_state=0, n_iter=5, scoring=None,
                                           n_jobs=2, refit=True, cv=5, verbose=True,
                                           pre_dispatch='2*n_jobs', error_score='raise', return_train_score=True)
    
    hyperparameters_tuning = randomized_search.fit(x, y)
    print('Best Parameters = {}'.format(hyperparameters_tuning.best_params_))
    
    tuned_model = hyperparameters_tuning.best_estimator_
    
    print(tuned_model.predict(x))
    
    0 讨论(0)
  • 2021-02-01 07:07

    For use without pipeline, put estimator__ before parameters:

    param_grid = {'estimator__min_samples_split':[10, 50],
                  'estimator__min_samples_leaf':[50, 150]}
    
    gb = GradientBoostingRegressor()
    gs = GridSearchCV(MultiOutputRegressor(gb), param_grid=param_grid)
    
    gs.fit(X,y)
    
    0 讨论(0)
提交回复
热议问题