I have some questions about h2o distributed random forest model

问题

According to H2O docs in FAQ of the DRF section, this note is mentioned on the "How does the algorithm handle missing values during training?" FAQ:

Note: Unlike in GLM, in DRF numerical values are handled the same way as categorical values. Missing values are not imputed with the mean, as is done by default in GLM.

I use a DRF Algorithm to solve a regression problem, but when I saw this note, I felt strange. If I convert all numerical value to categorical value to solve regression problem, I think that it is nonsense.

Here is My question.

Do I need to convert all numerical values to categorical values to use DRF algorithm?

Do I not need to convert all numerical values to categorical values to use DRF algorithm?

Thank you to read my question.

回答1:

No, H2O does not require you to convert all numerical values to categorical values.

If you want to view how trained H2O DRF models treat the different input columns, follow the instructions below for how to view a MOJO.

http://docs.h2o.ai/h2o/latest-stable/h2o-genmodel/javadoc/overview-summary.html#viewing-a-mojo

Note in the picture below that numerical columns are treated with a "less than" value comparison, and categorical columns are treated by sending some of the levels to the left child and some to the right child.

来源：https://stackoverflow.com/questions/49895721/i-have-some-questions-about-h2o-distributed-random-forest-model

标签

python

machine-learning

random-forest

h2o

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!