h2o

What's the difference between h2o on multi-nodes and h2o on hadoop?

让人想犯罪 __ 提交于 2019-12-11 18:57:19
问题 In H2O site, it says H2O’s core code is written in Java. Inside H2O, a Distributed Key/Value store is used to access and reference data, models, objects, etc., across all nodes and machines. The algorithms are implemented on top of H2O’s distributed Map/Reduce framework and utilize the Java Fork/Join framework for multi-threading. Does this mean H2O will not work better than other libraries if it runs on single node cluster? But will work well on multiple nodes cluster. Is that right? Also

get predictor data types from H2OModel

ぃ、小莉子 提交于 2019-12-11 17:51:12
问题 I know that I can access the predictor names of an H2OModel via the @parameters slot, but can I access the predictor data types ? I'm trying to generate an input schema for my h2OModel , and right now I have to cross-reference the training_frame and get data types from there. Obviously, this would be a problem if my training_frame was no longer in memory. Here's my current approach: getInputSchema <- function(model){ require(jsonlite) require(h2o) training_frame <- h2o.getFrame(model

H2O fails on H2OContext.getOrCreate

烈酒焚心 提交于 2019-12-11 17:41:53
问题 I'm trying to write a sample program in Scala/Spark/H2O. The program compiles, but throws an exception in H2OContext.getOrCreate : object App1 extends App{ val conf = new SparkConf() conf.setAppName("AppTest") conf.setMaster("local[1]") conf.set("spark.executor.memory","1g"); val sc = new SparkContext(conf) val spark = SparkSession.builder .master("local") .appName("ApplicationController") .getOrCreate() import spark.implicits._ val h2oContext = H2OContext.getOrCreate(sess) // <--- error here

issue installing h2o on r

痞子三分冷 提交于 2019-12-11 17:19:26
问题 I used install package to install h2o. While I could do h2o.init(), the h2o.autoML function isn't found: could not find function "h2o.automl" After some searching I installed the 'nightly bleeding edge' version in tar.gz. but after install that even h2o.init() no longer works and shows this error: Error: package or namespace load failed for ‘h2o’ in get(method, envir = home): lazy-load database '/Library/Frameworks/R.framework/Versions/3.4/Resources/library/h2o/R/h2o.rdb' is corrupt In

Finding contribution by each feature into making particular prediction by h2o ensemble model

て烟熏妆下的殇ゞ 提交于 2019-12-11 16:41:54
问题 I am trying to explain the decision taken by h2o GBM model. based on idea:https://medium.com/applied-data-science/new-r-package-the-xgboost-explainer-51dd7d1aa211 I want to calculate the contribution by each feature into making a certain decision at test time. Is it possible to get each individual tree from the ensable along with the log-odds at every node? also be needing the path traverse for each tree by model while making the prediction. 回答1: H2O doesn't have an equivalent

H2O.ai h2o-genmodel.jar contains sl4j binding

最后都变了- 提交于 2019-12-11 16:31:57
问题 When using the h2o-genmodel.jar (either from maven central or that is output when generating a mojo) SLF4j gives the error SLF4J: Class path contains multiple SLF4J bindings SLF4J: Found binding in [jar:file:~/.ivy2/cache/org.slf4j/slf4j-log4j12/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. Using maven or SBT's transitive dependency exclusion doesn't work, so right now I'm using the jar

H2ORandomForestEstimator with min_samples_split?

人盡茶涼 提交于 2019-12-11 15:49:38
问题 What is the analogue of min_samples_split for H2ORandomForestEstimator and H2OGradientBoostingEstimator? (h2o min_rows == sklearn min_samples_leaf ) 回答1: It looks like the closest thing to min_samples_split is min_split_improvement : Minimum relative improvement in squared error reduction for a split to happen 来源: https://stackoverflow.com/questions/53642304/h2orandomforestestimator-with-min-samples-split

What are the column definitions for H2O's gains/lift table?

怎甘沉沦 提交于 2019-12-11 15:40:36
问题 H2O's documentation doesn't provide clear definitions for each column in the gains/lift table output. I'm not sure how the capture rate is being calculated, and there is a score column that is not mentioned in the documentation. Here's what the output looks like. The raw java file is here -- I tried finding the answer to my question in there but had difficulty making sense of it. Thanks. 回答1: The capture rate is the proportion of all the events that fall into the group/bin. E.g. if 90 out of

I have some questions about h2o distributed random forest model

一个人想着一个人 提交于 2019-12-11 15:26:32
问题 According to H2O docs in FAQ of the DRF section, this note is mentioned on the "How does the algorithm handle missing values during training?" FAQ: Note: Unlike in GLM, in DRF numerical values are handled the same way as categorical values. Missing values are not imputed with the mean, as is done by default in GLM. I use a DRF Algorithm to solve a regression problem, but when I saw this note, I felt strange. If I convert all numerical value to categorical value to solve regression problem, I

Can a docker image use hadoop?

▼魔方 西西 提交于 2019-12-11 15:09:37
问题 Can a docker image access hadoop resources? Eg. submit YARN jobs and access HDFS; something like MapR's Datasci. Refinery, but for Hortonworks HDP 3.1. (May assume that the image will be launched on a hadoop cluster node). Saw the hadoop docs for launching docker applications from hadoop nodes, but was interested in whether could go the "other way" (ie. being able to start a docker image with the conventional docker -ti ... command and have that application be able to run hadoop jars etc.