alternative to `as.h2o()` for small data?

这一生的挚爱 提交于 2019-12-11 14:53:31

问题


I have the opposite issue to most people with as.h2o(), though the resulting problem is the same. I have to convert and feed a series of single row vectors just 19 columns wide to an h2o autoencoder. Each vector takes 0.29 seconds approx to convert using as.h2o(), which is causing a major bottleneck.

Can anyone suggest an alternative approach that might be faster?
(For various reasons I have no alternative to sending single row vectors one by one, so aggregating the data in matrices before calling as.h2o is not an option.)

Many thanks.


回答1:


If this is creating a bottleneck, you should use a MOJO (or POJO) model for row-wise scoring instead of a model loaded into memory in the H2O cluster. This is what the MOJO/POJOs model format is designed for -- fast scoring without the need to convert between R data.frame and H2OFrame and also does not require running an H2O cluster. You can skip R altogether here.

Alternatively, if your pipeline requires R, you can still use the MOJO/POJO model from R via the h2o.predict_json() function; it just requires you to convert your 1-row data.frame to a JSON string. That might alleviate the bottleneck somewhat, though the straight Java with MOJO/POJO model scoring method (above) will be the fastest.

Here's an example of what this looks like using a GBM MOJO file:

library(h2o)

model_path <- "~/GBM_model_python_1473313897851_6.zip"
json <- '{"V1":1, "V2":3.0, "V3":0}'
pred <- h2o.predict_json(model = model_path, json = json)

Here's how to construct the JSON string from a 1-row data.frame:

df <- data.frame(V1 = 1, V2 = 3.0, V3 = 0)
dfstr <- sapply(1:ncol(df), function(i) paste(paste0('\"', names(df)[i], '\"'), df[1,i], sep = ':'))
json <- paste0('{', paste0(dfstr, collapse = ','), '}')


来源:https://stackoverflow.com/questions/47759418/alternative-to-as-h2o-for-small-data

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!