H2O GLM model: saved MOJO's prediction is very different when running on the same validation data

问题

I built a GLM model using H2O (ver 3.14) in R. Please note that the training data contains integers, and also many NA, which I use MeanImputation to handle them.

glm <- h2o.glm(         
    training_frame = train.truth,        
    x=getColNames(train.truth),
    y="isFemale",                 
    family = "binomial",
    missing_values_handling = "MeanImputation",
    seed = 1000000)

I then use a validation data set to look at the perf, and the Precision looks good to me:

h2o.performance(glm, newdata=valid.truth)%>% h2o.confusionMatrix()

Confusion Matrix (vertical: actual; across: predicted)  for max f1 @ threshold = 0.529384526696015:
           0     1    Error         Rate
0      41962   300 0.007099   =300/42262
1        863 13460 0.060253   =863/14323
Totals 42825 13760 0.020553  =1163/56585

I then saved the model as a MOJO:

h2o.download_mojo(glm, path="models/mojo", get_genmodel_jar=TRUE)

I exported the validation DF to a CSV file:

dt.valid <- data.table(as.data.frame(valid.truth))
write.table(dt.valid, row.names = F, na="", file="models/test.csv")

I tried to use the saved mojo to do the same prediction by running this on my Linux shell:

java -cp h2o-genmodel.jar hex.genmodel.tools.PredictCsv \
    --mojo GLM_model_R_1511161743608_15 \
    --decimal --mojo GLM_model_R_1511161743608_15.zip \
    --input ../test.csv --output output.csv

However, the result is terrible. All the records were predicted as 0, which is very different from what I got when I ran the model in R.

I have been stuck in this for a day but I couldn't figure out what went wrong. Anyone can shed some light on this?

来源：https://stackoverflow.com/questions/47390133/h2o-glm-model-saved-mojos-prediction-is-very-different-when-running-on-the-sam

标签

h2o

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!