Decision Tree Sklearn -Depth Of tree and accuracy

问题

I am applying Decision Tree to a data set, using sklearn

In Sklearn there is a parameter to select the depth of the tree - dtree = DecisionTreeClassifier(max_depth=10).

My question is how the max_depth parameter helps on the model. how does high/low max_depth help in predicting the test data more accurately?

回答1:

max_depth is what the name suggests: The maximum depth that you allow the tree to grow to. The deeper you allow, the more complex your model will become.

For training error, it is easy to see what will happen. If you increase max_depth, training error will always go down (or at least not go up).

For testing error, it gets less obvious. If you set max_depth too high, then the decision tree might simply overfit the training data without capturing useful patterns as we would like; this will cause testing error to increase. But if you set it too low, that is not good as well; then you might be giving the decision tree too little flexibility to capture the patterns and interactions in the training data. This will also cause the testing error to increase.

There is a nice golden spot in between the extremes of too-high and too-low. Usually, the modeller would consider the max_depth as a hyper-parameter, and use some sort of grid/random search with cross-validation to find a good number for max_depth.

来源：https://stackoverflow.com/questions/49289187/decision-tree-sklearn-depth-of-tree-and-accuracy

标签

python

scikit-learn

decision-tree

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!