pipeline

Perform feature selection using pipeline and gridsearch

断了今生、忘了曾经 提交于 2020-12-12 11:46:15
问题 As part of a research project, I want to select the best combination of preprocessing techniques and textual features that optimize the results of a text classification task. For this, I am using Python 3.6. There are a number of methods to combine features and algorithms, but I want to take full advantage of sklearn's pipelines and test all the different (valid) possibilities using grid search for the ultimate feature combo. My first step was to build a pipeline that looks like the following

sklearn pipeline + keras sequential model - how to get history?

久未见 提交于 2020-12-12 10:38:07
问题 Keras models, when .fit is called, return a history object. Is it possible to retrieve it if I use this model as one step of a sklearn pipeline? btw, i'm using python 3.6 Thanks in advance! 回答1: The History callback records training metrics for each epoch. This includes the loss and the accuracy (for classification problems) as well as the loss and accuracy for the validation dataset, if one is set. The history object is returned from calls to the fit() function used to train the model.

Luigi: how to pass different arguments to leaf tasks?

核能气质少年 提交于 2020-12-10 08:02:31
问题 This is my second attempt at understanding how to pass arguments to dependencies in Luigi. The first one was here. The idea is: I have TaskC which depends on TaskB , which depends on TaskA , which depends on Task0 . I want this whole sequence to be exactly the same always, except I want to be able to control what file Task0 reads from, lets call it path . Luigi's philosophy is normally that each task should only know about the Tasks it depends on, and their parameters. The problem with this

Luigi: how to pass different arguments to leaf tasks?

荒凉一梦 提交于 2020-12-10 08:01:19
问题 This is my second attempt at understanding how to pass arguments to dependencies in Luigi. The first one was here. The idea is: I have TaskC which depends on TaskB , which depends on TaskA , which depends on Task0 . I want this whole sequence to be exactly the same always, except I want to be able to control what file Task0 reads from, lets call it path . Luigi's philosophy is normally that each task should only know about the Tasks it depends on, and their parameters. The problem with this

Sklearn pass fit() parameters to xgboost in pipeline

本小妞迷上赌 提交于 2020-12-02 07:29:48
问题 Similar to How to pass a parameter to only one part of a pipeline object in scikit learn? I want to pass parameters to only one part of a pipeline. Usually, it should work fine like: estimator = XGBClassifier() pipeline = Pipeline([ ('clf', estimator) ]) and executed like pipeline.fit(X_train, y_train, clf__early_stopping_rounds=20) but it fails with: /usr/local/lib/python3.5/site-packages/sklearn/pipeline.py in fit(self, X, y, **fit_params) 114 """ 115 Xt, yt, fit_params = self._pre

Sklearn pass fit() parameters to xgboost in pipeline

旧巷老猫 提交于 2020-12-02 07:29:24
问题 Similar to How to pass a parameter to only one part of a pipeline object in scikit learn? I want to pass parameters to only one part of a pipeline. Usually, it should work fine like: estimator = XGBClassifier() pipeline = Pipeline([ ('clf', estimator) ]) and executed like pipeline.fit(X_train, y_train, clf__early_stopping_rounds=20) but it fails with: /usr/local/lib/python3.5/site-packages/sklearn/pipeline.py in fit(self, X, y, **fit_params) 114 """ 115 Xt, yt, fit_params = self._pre

Adaboost in Pipeline with Gridsearch SKLEARN

倾然丶 夕夏残阳落幕 提交于 2020-11-29 21:10:47
问题 I would like to use the AdaBoostClassifier with LinearSVC as base estimator. I want to do a gridsearch on some of the parameters in LinearSVC. Also I have to scale my features. p_grid = {'base_estimator__C': np.logspace(-5, 3, 10)} n_splits = 5 inner_cv = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=5) SVC_Kernel=LinearSVC(multi_class ='crammer_singer',tol=10e-3,max_iter=10000,class_weight='balanced') ABC = AdaBoostClassifier(base_estimator=SVC_Kernel,n_estimators=600

Adaboost in Pipeline with Gridsearch SKLEARN

天涯浪子 提交于 2020-11-29 21:07:04
问题 I would like to use the AdaBoostClassifier with LinearSVC as base estimator. I want to do a gridsearch on some of the parameters in LinearSVC. Also I have to scale my features. p_grid = {'base_estimator__C': np.logspace(-5, 3, 10)} n_splits = 5 inner_cv = StratifiedKFold(n_splits=n_splits, shuffle=True, random_state=5) SVC_Kernel=LinearSVC(multi_class ='crammer_singer',tol=10e-3,max_iter=10000,class_weight='balanced') ABC = AdaBoostClassifier(base_estimator=SVC_Kernel,n_estimators=600