Implementing a decision tree using h2o

问题

I am trying to train a decision tree model using h2o. I am aware that no specific library for decision trees exist in h2o. But, h2o has an implemtation of random forest H2ORandomForestEstimator . Can we implement a decision tree in h2o by tuning certain input arguments of random forests ? Because we can do that in scikit module (a popular python library for machine learning)

Ref link : Why is Random Forest with a single tree much better than a Decision Tree classifier?

In scikit the code looks something like this

RandomForestClassifier(n_estimators=1, max_features=None, bootstrap=False)

Do we have a equivalant of this code in h2o ?

回答1:

you can use H2O's random forest (H2ORandomForestEstimator), set ntrees=1 so that it only builds one tree, set mtries to the number of features (i.e. columns) you have in your dataset and sample_rate =1. Setting mtries to the number of features in your dataset means the algo will randomly sample from all of your features at each level in the decision tree.

here is more information about mtries:http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/mtries.html

回答2:

To add to Lauren's answer: based on PUBDEV-4324 - Expose Decision Tree as a stand-alone algo in H2O both DRF and GBM can do the job with GBM being marginally easier:

titanic_1tree = h2o.gbm(x = predictors, y = response, 
                        training_frame = titanicHex,
                        ntrees = 1, min_rows = 1, sample_rate = 1,            
                        col_sample_rate = 1,
                        max_depth = 5,
                        seed = 1)

which creates a decision tree maximum 5 splits deep (max_depth = 5) on titanic dataset (available here: https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv)

Starting with release 3.22.0.1 (Xia) it's possible to extract tree structures from H2O models:

titanicH2oTree = h2o.getModelTree(model = titanic_1tree, tree_number = 1)

来源：https://stackoverflow.com/questions/50740316/implementing-a-decision-tree-using-h2o

标签

python

machine-learning

scikit-learn

decision-tree

h2o