ROC on multiple test sets in h2o (python)

问题

I had a use-case that I thought was really simple but couldn't find a way to do it with h2o. I thought you might know.

I want to train my model once, and then evaluate its ROC on a few different test sets (e.g. a validation set and a test set, though in reality I have more than 2) without having to retrain the model. The way I know to do it now requires retraining the model each time:

train, valid, test = fr.split_frame([0.2, 0.25], seed=1234)
rf_v1 = H2ORandomForestEstimator( ... )
rf_v1.train(features, var_y, training_frame=train, validation_frame=valid)
roc = rf_v1.roc(valid=1)

rf_v1.train(features, var_y, training_frame=train, validation_frame=test) # training again with the same training set - can I avoid this?
roc2 = rf_v1.roc(valid=1)

I can also use model_performance(), which gives me some metrics on an arbitrary test set without retraining, but not the ROC. Is there a way to get the ROC out of the H2OModelMetrics object?

Thanks!

回答1:

You can use the h2o flow to inspect the model performance. Simply go to: http://localhost:54321/flow/index.html (if you changed the default port change it in the link); type "getModel "rf_v1"" in a cell and it will show you all the measurements of the model in multiple cells in the flow. It's quite handy. If you are using Python, you can find the performance in your IDE like this:

rf_perf1 = rf_v1.model_performance(test)

and then print the ROC like this:

print (rf_perf1.auc())

回答2:

Yes, indirectly. Get the TPRs and FPRs from the H2OModelMetrics object:

out = rf_v1.model_performance(test)
fprs = out.fprs
tprs = out.tprs
roc = zip(fprs, tprs)

(By the way, my H2ORandomForestEstimator object does not seem to have an roc() method at all, so I'm not 100% sure that this output is in the exact same format. I'm using h2o version 3.10.4.7.)

来源：https://stackoverflow.com/questions/42981259/roc-on-multiple-test-sets-in-h2o-python

标签

h2o