Python - LightGBM with GridSearchCV, is running forever

人走茶凉 提交于 2020-07-17 11:15:42

问题


Recently, I am doing multiple experiments to compare Python XgBoost and LightGBM. It seems that this LightGBM is a new algorithm that people say it works better than XGBoost in both speed and accuracy.

This is LightGBM GitHub. This is LightGBM python API documents, here you will find python functions you can call. It can be directly called from LightGBM model and also can be called by LightGBM scikit-learn.

This is the XGBoost Python API I use. As you can see, it has very similar data structure as LightGBM python API above.

Here are what I tried:

  1. If you use train() method in both XGBoost and LightGBM, yes lightGBM works faster and has higher accuracy. But this method, doesn't have cross validation.
  2. If you try cv() method in both algorithms, it is for cross validation. However, I didn't find a way to use it return a set of optimum parameters.
  3. if you try scikit-learn GridSearchCV() with LGBMClassifier and XGBClassifer. It works for XGBClassifer, but for LGBClassifier, it is running forever.

Here are my code examples when using GridSearchCV() with both classifiers:

XGBClassifier with GridSearchCV

param_set = {
 'n_estimators':[50, 100, 500, 1000]
}
gsearch = GridSearchCV(estimator = XGBClassifier( learning_rate =0.1, 
n_estimators=100, max_depth=5,
min_child_weight=1, gamma=0, subsample=0.8, colsample_bytree=0.8, 
nthread=7,
objective= 'binary:logistic', scale_pos_weight=1, seed=410), 
param_grid = param_set, scoring='roc_auc',n_jobs=7,iid=False, cv=10)

xgb_model2 = gsearch.fit(features_train, label_train)
xgb_model2.grid_scores_, xgb_model2.best_params_, xgb_model2.best_score_

This works very well for XGBoost, and only tool a few seconds.

LightGBM with GridSearchCV

param_set = {
 'n_estimators':[20, 50]
}

gsearch = GridSearchCV(estimator = LGBMClassifier( boosting_type='gbdt', num_leaves=30, max_depth=5, learning_rate=0.1, n_estimators=50, max_bin=225, 
 subsample_for_bin=0.8, objective=None, min_split_gain=0, 
 min_child_weight=5, 
 min_child_samples=10, subsample=1, subsample_freq=1, 
colsample_bytree=1, 
reg_alpha=1, reg_lambda=0, seed=410, nthread=7, silent=True), 
param_grid = param_set, scoring='roc_auc',n_jobs=7,iid=False, cv=10)

lgb_model2 = gsearch.fit(features_train, label_train)
lgb_model2.grid_scores_, lgb_model2.best_params_, lgb_model2.best_score_

However, by using this method for LightGBM, it has been running the whole morning today still nothing generated.

I am using the same dataset, a dataset contains 30000 records.

I have 2 questions:

  1. If we just use cv() method, is there anyway to tune optimum set of parameters?
  2. Do you know why GridSearchCV() does not work well with LightGBM? I'm wondering whether this only happens on me all it happened on others to?

回答1:


Try to use n_jobs = 1 and see if it works.

In general, if you use n_jobs = -1 or n_jobs > 1 then you should protect your script by using if __name__=='__main__': :

Simple Example:

import ...

if __name__=='__main__':

    data= pd.read_csv('Prior Decompo2.csv', header=None)
    X, y = data.iloc[0:, 0:26].values, data.iloc[0:,26].values
    param_grid = {'C' : [0.01, 0.1, 1, 10], 'kernel': ('rbf', 'linear')}
    classifier = SVC()
    grid_search = GridSearchCV(estimator=classifier, param_grid=param_grid, scoring='accuracy', n_jobs=-1, verbose=42)
    grid_search.fit(X,y)

Finally, can you try to run your code using n_jobs = -1 and including if __name__=='__main__': as I explained and see if it works?




回答2:


The original problem is due to lightgbm and GridSearchCV starting too many threads (i.e. more than available on the machine). If the product (or a sum? it depends on how GridSearchCV is implemented) of those is still within machine capabilities, then it will run. It there are too many threads they clash and lightgbm stops execution for some unclear to me, but known to developers reason.



来源:https://stackoverflow.com/questions/45045938/python-lightgbm-with-gridsearchcv-is-running-forever

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!