Implementing a decision tree using h2o

試著忘記壹切 提交于 2019-12-12 15:34:59

问题


I am trying to train a decision tree model using h2o. I am aware that no specific library for decision trees exist in h2o. But, h2o has an implemtation of random forest H2ORandomForestEstimator . Can we implement a decision tree in h2o by tuning certain input arguments of random forests ? Because we can do that in scikit module (a popular python library for machine learning)

Ref link : Why is Random Forest with a single tree much better than a Decision Tree classifier?

In scikit the code looks something like this

RandomForestClassifier(n_estimators=1, max_features=None, bootstrap=False)

Do we have a equivalant of this code in h2o ?


回答1:


you can use H2O's random forest (H2ORandomForestEstimator), set ntrees=1 so that it only builds one tree, set mtries to the number of features (i.e. columns) you have in your dataset and sample_rate =1. Setting mtries to the number of features in your dataset means the algo will randomly sample from all of your features at each level in the decision tree.

here is more information about mtries:http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/mtries.html




回答2:


To add to Lauren's answer: based on PUBDEV-4324 - Expose Decision Tree as a stand-alone algo in H2O both DRF and GBM can do the job with GBM being marginally easier:

titanic_1tree = h2o.gbm(x = predictors, y = response, 
                        training_frame = titanicHex,
                        ntrees = 1, min_rows = 1, sample_rate = 1,            
                        col_sample_rate = 1,
                        max_depth = 5,
                        seed = 1)

which creates a decision tree maximum 5 splits deep (max_depth = 5) on titanic dataset (available here: https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv)

Starting with release 3.22.0.1 (Xia) it's possible to extract tree structures from H2O models:

titanicH2oTree = h2o.getModelTree(model = titanic_1tree, tree_number = 1)


来源:https://stackoverflow.com/questions/50740316/implementing-a-decision-tree-using-h2o

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!