Is there a supported way to get list of features used by a H2O model during its training?

扶醉桌前 提交于 2020-01-03 02:18:05

问题


This is my situation. I have over 400 features, many of which are probably useless and often zero. I would like to be able to:

  • train an model with a subset of those features
  • query that model for the features actually used to build that model
  • build a H2OFrame containing just those features (I get a sparse list of non-zero values for each row I want to predict.)
  • pass this newly constructed frame to H2OModel.predict() to get a prediction

I am pretty sure what found is unsupported but works for now (v 3.13.0.341). Is there a more robust/supported way of doing this?

model._model_json['output']['names']

The response variable appears to be the last item in this list.

In a similar vein, it would be nice to have a supported way of finding out which H2O version that the model was built under. I cannot find the version number in the json.


回答1:


If you want to know which feature columns the model used after you have built a model you can do the following in python:

my_training_frame = your_model.actual_params['training_frame']

which will return some frame id

and then you can do

col_used = h2o.get_frame(my_training_frame)
col_used

EDITED (after comment was posted)

to get the columns use: enter code herecol_used.columns

a quick way to check the version of a saved binary model is to try and load it into h2o, if it loads it is the same version of h2o, if it isn't you will get a warning.

you can also open the saved model file, the first line will list the version of H2O used to create it.

for a model saved as a mojo you can look at the model.ini file, it will list the version of H2O



来源:https://stackoverflow.com/questions/45153176/is-there-a-supported-way-to-get-list-of-features-used-by-a-h2o-model-during-its

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!