sklearn - How to retrieve PCA components and explained variance from inside a Pipeline passed to GridSearchCV

问题

I am using GridSearchCV with a pipeline as follows:

grid = GridSearchCV(
    Pipeline([
        ('reduce_dim', PCA()),
        ('classify', RandomForestClassifier(n_jobs = -1))
        ]),
    param_grid=[
        {
            'reduce_dim__n_components': range(0.7,0.9,0.1),
            'classify__n_estimators': range(10,50,5),
            'classify__max_features': ['auto', 0.2],
            'classify__min_samples_leaf': [40,50,60],
            'classify__criterion': ['gini', 'entropy']
        }
    ],
    cv=5, scoring='f1')

grid.fit(X,y)

How do I now retrieve PCA details like components and explained_variance from the grid.best_estimator_ model?

Furthermore, I also want to save the best_estimator_ to a file using pickle and later load it. How do I retrieve the PCA details from this loaded estimator? I suspect it will be the same as above.

回答1:

grid.best_estimator_ is to access the pipeline with the best parameters.

Now use named_steps[]attribute to access the internal estimators of the pipeline.

So grid.best_estimator_.named_steps['reduce_dim'] will give you the pca object. Now you can simply use this to access the components_ and explained_variance_ attibutes for this pca object like this:

grid.best_estimator_.named_steps['reduce_dim'].components_ grid.best_estimator_.named_steps['reduce_dim'].explained_variance_

来源：https://stackoverflow.com/questions/46800147/sklearn-how-to-retrieve-pca-components-and-explained-variance-from-inside-a-pi

标签

python

scikit-learn

pipeline

grid-search

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!