Bayesian Optimisation applied in CatBoost

巧了我就是萌 提交于 2019-12-23 22:25:05

问题


This is my attempt at applying BayesSearch in CatBoost:

from catboost import CatBoostClassifier
from skopt import BayesSearchCV
from sklearn.model_selection import StratifiedKFold


# Classifier
bayes_cv_tuner = BayesSearchCV(
estimator = CatBoostClassifier(
silent=True
),
search_spaces = {
'depth':(2,16),
'l2_leaf_reg':(1, 500),
'bagging_temperature':(1e-9, 1000, 'log-uniform'),
'border_count':(1,255),
'rsm':(0.01, 1.0, 'uniform'),
'random_strength':(1e-9, 10, 'log-uniform'),
'scale_pos_weight':(0.01, 1.0, 'uniform'),
},
scoring = 'roc_auc',
cv = StratifiedKFold(
n_splits=2,
shuffle=True,
random_state=72
),
n_jobs = 1,
n_iter = 100,
verbose = 1,
refit = True,
random_state = 72
)

Keep track of results:

def status_print(optim_result):
"""Status callback durring bayesian hyperparameter search"""

# Get all the models tested so far in DataFrame format
all_models = pd.DataFrame(bayes_cv_tuner.cv_results_)    

# Get current parameters and the best parameters    
best_params = pd.Series(bayes_cv_tuner.best_params_)
print('Model #{}\nBest ROC-AUC: {}\nBest params: {}\n'.format(
    len(all_models),
    np.round(bayes_cv_tuner.best_score_, 4),
    bayes_cv_tuner.best_params_
))

Fit BayesCV

resultCAT = bayes_cv_tuner.fit(X_train, y_train, callback=status_print)

Results

The first 3 iterations work fine, but then I get a nonstop string of:

Iteration with suspicious time 7.55 sec ignored in overall statistics.

Iteration with suspicious time 739 sec ignored in overall statistics.

(...)

Any ideas of where I went wrong/How can I improve this?

Salut,


回答1:


One of the iterations in the set of experiments skopt is arranging is actually taking too long to complete, based on the timings that CatBoost has up so far recorded.

If you explore when this happens by setting the verbosity of the classifier and you use a callback to explore what combination of parameters skopt is exploring, you may find that the culprit is most likely the depth parameters: Skopt will slow down when CatBoost is trying to test deeper trees.

You can try to debug too using this custom callback:

counter = 0
def onstep(res):
    global counter
    args = res.x
    x0 = res.x_iters
    y0 = res.func_vals
    print('Last eval: ', x0[-1], 
          ' - Score ', y0[-1])
    print('Current iter: ', counter, 
          ' - Score ', res.fun, 
          ' - Args: ', args)
    joblib.dump((x0, y0), 'checkpoint.pkl')
    counter = counter+1

You can call it by:

resultCAT = bayes_cv_tuner.fit(X_train, y_train, callback=[onstep, status_print])

Actually I've noticed the same problem as yours in my experiments, the complexity raises in a non-linear way as the depth increases and thus CatBoost takes longer time to complete its iterations. A simple solution is to try searching a simpler space:

'depth':(2, 8)

Usually depth 8 is enough, anyway, you can first run skopt with maximum depth equal to 8 and then re-iterate by increasing the maximum.



来源:https://stackoverflow.com/questions/52989242/bayesian-optimisation-applied-in-catboost

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!