return coefficients from Pipeline object in sklearn

I've fit a Pipeline object with RandomizedSearchCV

pipe_sgd = Pipeline([('scl', StandardScaler()),
                    ('clf', SGDClassifier(n_jobs=-1))])

param_dist_sgd = {'clf__loss': ['log'],
                 'clf__penalty': [None, 'l1', 'l2', 'elasticnet'],
                 'clf__alpha': np.linspace(0.15, 0.35),
                 'clf__n_iter': [3, 5, 7]}

sgd_randomized_pipe = RandomizedSearchCV(estimator = pipe_sgd, 
                                         param_distributions=param_dist_sgd, 
                                         cv=3, n_iter=30, n_jobs=-1)

sgd_randomized_pipe.fit(X_train, y_train)

I want to access the coef_ attribute of the best_estimator_ but I'm unable to do that. I've tried accessing coef_ with the code below.

sgd_randomized_pipe.best_estimator_.coef_

However I get the following AttributeError...

AttributeError: 'Pipeline' object has no attribute 'coef_'

The scikit-learn docs say that coef_ is an attribute of SGDClassifier, which is the class of my base_estimator_.

What am I doing wrong?

You can always use the names you assigned to them while making the pipeline by using the named_steps dict.

scaler = sgd_randomized_pipe.best_estimator_.named_steps['scl']
classifier = sgd_randomized_pipe.best_estimator_.named_steps['clf']

and then access all the attributes like coef_, intercept_ etc. which are available to corresponding fitted estimator.

This is the formal attribute exposed by the Pipeline as specified in the documentation:

named_steps : dict

Read-only attribute to access any step parameter by user given name. Keys are step names and values are steps parameters.

I've found one way to do this is by chained indexing with the steps attribute...

sgd_randomized_pipe.best_estimator_.steps[1][1].coef_

Is this best practice, or is there another way?

I think this should work:

sgd_randomized_pipe.named_steps['clf'].coef_

来源：https://stackoverflow.com/questions/43856280/return-coefficients-from-pipeline-object-in-sklearn

标签

python

scikit-learn

pipeline

cross-validation