问题
According to H2O docs in FAQ of the DRF section, this note is mentioned on the "How does the algorithm handle missing values during training?" FAQ:
Note: Unlike in GLM, in DRF numerical values are handled the same way as categorical values. Missing values are not imputed with the mean, as is done by default in GLM.
I use a DRF Algorithm to solve a regression problem, but when I saw this note, I felt strange. If I convert all numerical value to categorical value to solve regression problem, I think that it is nonsense.
Here is My question.
- Do I need to convert all numerical values to categorical values to use DRF algorithm?
or
- Do I not need to convert all numerical values to categorical values to use DRF algorithm?
Thank you to read my question.
回答1:
No, H2O does not require you to convert all numerical values to categorical values.
If you want to view how trained H2O DRF models treat the different input columns, follow the instructions below for how to view a MOJO.
- http://docs.h2o.ai/h2o/latest-stable/h2o-genmodel/javadoc/overview-summary.html#viewing-a-mojo
Note in the picture below that numerical columns are treated with a "less than" value comparison, and categorical columns are treated by sending some of the levels to the left child and some to the right child.
来源:https://stackoverflow.com/questions/49895721/i-have-some-questions-about-h2o-distributed-random-forest-model