Invalid parameter clf for estimator Pipeline in sklearn

半城伤御伤魂 提交于 2019-12-06 15:33:06

You are assuming the usage of make_pipeline in a wrong way. From the documentation:-

This is a shorthand for the Pipeline constructor; it does not require, and does not permit, naming the estimators. Instead, their names will be set to the lowercase of their types automatically.

So that means that when you supply a PCA object, its name will be set as 'pca' (lowercase) and when you supply a RandomForestClassifier object to it, it will be named as 'randomforestclassifier', not 'clf' as you are thinking.

So now the parameter grid you have made is invalid, because it contains clf__ and its not present in pipeline.

Solution 1 :

Replace this line:

pca_clf = make_pipeline(pca, clf) 

With

pca_clf = Pipeline([('pca', pca), ('clf', clf)])

Solution 2 :

If you dont want to change the pca_clf = make_pipeline(pca, clf) line, then replace all the occurences of clf inside your parameters to 'randomforestclassifier' like this:

parameters = {'randomforestclassifier__n_estimators': [4, 6, 9], 
              'randomforestclassifier__max_features': ['log2', 'sqrt','auto'],
              'randomforestclassifier__criterion': ['entropy', 'gini'], 
              'randomforestclassifier__max_depth': [2, 3, 5, 10], 
              'randomforestclassifier__min_samples_split': [2, 3, 5],
              'randomforestclassifier__min_samples_leaf': [1,5,8] }

Sidenote: No need to do this in your code:

clf = grid_RF.best_estimator_
clf.fit(X_train, y_train)

The best_estimator_ will already be fitted with the whole data with best found params, so you calling clf.fit() is redundant.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!