h2o | 易学教程

What's the difference between h2o on multi-nodes and h2o on hadoop?

阅读更多关于 What's the difference between h2o on multi-nodes and h2o on hadoop?

问题 In H2O site, it says H2O’s core code is written in Java. Inside H2O, a Distributed Key/Value store is used to access and reference data, models, objects, etc., across all nodes and machines. The algorithms are implemented on top of H2O’s distributed Map/Reduce framework and utilize the Java Fork/Join framework for multi-threading. Does this mean H2O will not work better than other libraries if it runs on single node cluster? But will work well on multiple nodes cluster. Is that right? Also

get predictor data types from H2OModel

阅读更多关于 get predictor data types from H2OModel

问题 I know that I can access the predictor names of an H2OModel via the @parameters slot, but can I access the predictor data types ? I'm trying to generate an input schema for my h2OModel , and right now I have to cross-reference the training_frame and get data types from there. Obviously, this would be a problem if my training_frame was no longer in memory. Here's my current approach: getInputSchema <- function(model){ require(jsonlite) require(h2o) training_frame <- h2o.getFrame(model

H2O fails on H2OContext.getOrCreate

阅读更多关于 H2O fails on H2OContext.getOrCreate

问题 I'm trying to write a sample program in Scala/Spark/H2O. The program compiles, but throws an exception in H2OContext.getOrCreate : object App1 extends App{ val conf = new SparkConf() conf.setAppName("AppTest") conf.setMaster("local[1]") conf.set("spark.executor.memory","1g"); val sc = new SparkContext(conf) val spark = SparkSession.builder .master("local") .appName("ApplicationController") .getOrCreate() import spark.implicits._ val h2oContext = H2OContext.getOrCreate(sess) // <--- error here

issue installing h2o on r

阅读更多关于 issue installing h2o on r

问题 I used install package to install h2o. While I could do h2o.init(), the h2o.autoML function isn't found: could not find function "h2o.automl" After some searching I installed the 'nightly bleeding edge' version in tar.gz. but after install that even h2o.init() no longer works and shows this error: Error: package or namespace load failed for ‘h2o’ in get(method, envir = home): lazy-load database '/Library/Frameworks/R.framework/Versions/3.4/Resources/library/h2o/R/h2o.rdb' is corrupt In

Finding contribution by each feature into making particular prediction by h2o ensemble model

阅读更多关于 Finding contribution by each feature into making particular prediction by h2o ensemble model

问题 I am trying to explain the decision taken by h2o GBM model. based on idea:https://medium.com/applied-data-science/new-r-package-the-xgboost-explainer-51dd7d1aa211 I want to calculate the contribution by each feature into making a certain decision at test time. Is it possible to get each individual tree from the ensable along with the log-odds at every node? also be needing the path traverse for each tree by model while making the prediction. 回答1: H2O doesn't have an equivalent

H2O.ai h2o-genmodel.jar contains sl4j binding

阅读更多关于 H2O.ai h2o-genmodel.jar contains sl4j binding

问题 When using the h2o-genmodel.jar (either from maven central or that is output when generating a mojo) SLF4j gives the error SLF4J: Class path contains multiple SLF4J bindings SLF4J: Found binding in [jar:file:~/.ivy2/cache/org.slf4j/slf4j-log4j12/jars/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. Using maven or SBT's transitive dependency exclusion doesn't work, so right now I'm using the jar

H2ORandomForestEstimator with min_samples_split?

阅读更多关于 H2ORandomForestEstimator with min_samples_split?

问题 What is the analogue of min_samples_split for H2ORandomForestEstimator and H2OGradientBoostingEstimator? (h2o min_rows == sklearn min_samples_leaf ) 回答1: It looks like the closest thing to min_samples_split is min_split_improvement : Minimum relative improvement in squared error reduction for a split to happen 来源： https://stackoverflow.com/questions/53642304/h2orandomforestestimator-with-min-samples-split

What are the column definitions for H2O's gains/lift table?

阅读更多关于 What are the column definitions for H2O's gains/lift table?

问题 H2O's documentation doesn't provide clear definitions for each column in the gains/lift table output. I'm not sure how the capture rate is being calculated, and there is a score column that is not mentioned in the documentation. Here's what the output looks like. The raw java file is here -- I tried finding the answer to my question in there but had difficulty making sense of it. Thanks. 回答1: The capture rate is the proportion of all the events that fall into the group/bin. E.g. if 90 out of

I have some questions about h2o distributed random forest model

阅读更多关于 I have some questions about h2o distributed random forest model

问题 According to H2O docs in FAQ of the DRF section, this note is mentioned on the "How does the algorithm handle missing values during training?" FAQ: Note: Unlike in GLM, in DRF numerical values are handled the same way as categorical values. Missing values are not imputed with the mean, as is done by default in GLM. I use a DRF Algorithm to solve a regression problem, but when I saw this note, I felt strange. If I convert all numerical value to categorical value to solve regression problem, I

Can a docker image use hadoop?

阅读更多关于 Can a docker image use hadoop?

问题 Can a docker image access hadoop resources? Eg. submit YARN jobs and access HDFS; something like MapR's Datasci. Refinery, but for Hortonworks HDP 3.1. (May assume that the image will be launched on a hadoop cluster node). Saw the hadoop docs for launching docker applications from hadoop nodes, but was interested in whether could go the "other way" (ie. being able to start a docker image with the conventional docker -ti ... command and have that application be able to run hadoop jars etc.