alternative to `as.h2o()` for small data?

问题

I have the opposite issue to most people with as.h2o(), though the resulting problem is the same. I have to convert and feed a series of single row vectors just 19 columns wide to an h2o autoencoder. Each vector takes 0.29 seconds approx to convert using as.h2o(), which is causing a major bottleneck.

Can anyone suggest an alternative approach that might be faster?
(For various reasons I have no alternative to sending single row vectors one by one, so aggregating the data in matrices before calling as.h2o is not an option.)

Many thanks.

回答1:

If this is creating a bottleneck, you should use a MOJO (or POJO) model for row-wise scoring instead of a model loaded into memory in the H2O cluster. This is what the MOJO/POJOs model format is designed for -- fast scoring without the need to convert between R data.frame and H2OFrame and also does not require running an H2O cluster. You can skip R altogether here.

Alternatively, if your pipeline requires R, you can still use the MOJO/POJO model from R via the h2o.predict_json() function; it just requires you to convert your 1-row data.frame to a JSON string. That might alleviate the bottleneck somewhat, though the straight Java with MOJO/POJO model scoring method (above) will be the fastest.

Here's an example of what this looks like using a GBM MOJO file:

library(h2o)

model_path <- "~/GBM_model_python_1473313897851_6.zip"
json <- '{"V1":1, "V2":3.0, "V3":0}'
pred <- h2o.predict_json(model = model_path, json = json)

Here's how to construct the JSON string from a 1-row data.frame:

df <- data.frame(V1 = 1, V2 = 3.0, V3 = 0)
dfstr <- sapply(1:ncol(df), function(i) paste(paste0('\"', names(df)[i], '\"'), df[1,i], sep = ':'))
json <- paste0('{', paste0(dfstr, collapse = ','), '}')

来源：https://stackoverflow.com/questions/47759418/alternative-to-as-h2o-for-small-data

标签

h2o