h2o

H2O (open source) for K-mean clustering

别来无恙 提交于 2019-12-24 21:08:11
问题 I am using H2O (H2O flow, in particular) to do K-means clustering. I selected "standardize" checkbox which makes sure "It standardize columns before computing distances". It trained fine and I investigated the results. It depicts "within_cluster_sum_of_squares" in the result for review. My question is "within_cluster_sum_of_squares" the distance BEFORE or AFTER standardization ? It looks displaying distance after standardization, but the distance I see is big and it seems before

YARN job appears to have access to less resources than Ambari YARN manager reports

≯℡__Kan透↙ 提交于 2019-12-24 21:03:41
问题 Getting confused when trying to run a YARN process and getting errors. Looking in ambari UI YARN section, seeing... (note it says 60GB available). Yet, when trying to run an YARN process, getting errors indicating that there are less resources available than is being reported in ambari, see... ➜ h2o-3.26.0.2-hdp3.1 hadoop jar h2odriver.jar -nodes 4 -mapperXmx 5g -output /home/ml1/hdfsOutputDir Determining driver host interface for mapper->driver callback... [Possible callback IP address: 192

H2O Sparkling Water Error while creating H2O cloud

我是研究僧i 提交于 2019-12-24 15:24:45
问题 I have setup H2O Sparkling water and now following the instructions at http://h2o-release.s3.amazonaws.com/sparkling-water/rel-1.3/6/index.html - where step 3 says import org.apache.spark.h2o._ val h2oContext = new H2OContext(sc).start() I get following error after entering the last line. The error is as follows - scala> val h2oContext = new H2OContext(sc).start() java.lang.NoClassDefFoundError: org/apache/spark/sql/execution/LogicalRDD at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:15) at

Unable to connect to a running H2o Server from Python [H2O.ai]

可紊 提交于 2019-12-24 09:33:34
问题 Error on connecting to H2o server running on EMR core node from master node. import h2o h2o.connect(url="http://IP:54321") Error Trace Connecting to H2O server at http://IP:54321... successful. Traceback (most recent call last): File "/home/hadoop/TataCliqEMR/app/__init__.py", line 3, in <module> h2o.connect(ip="IP", port=54321) File "/usr/local/lib/python3.4/site-packages/h2o/h2o.py", line 86, in connect h2oconn.cluster.show_status() File "/usr/local/lib/python3.4/site-packages/h2o/backend

Download jar file into inst/java directory for a R package

落爺英雄遲暮 提交于 2019-12-24 09:25:05
问题 Is it possible to download a JAR into the directory inst/java of a R package on the installation stage? I want to submit a package to CRAN, but the jar is too big and they are not going to accept it. I thought one possible solution would be to automatically download the jar and place it inside inst/java I think H2O do it in its build.gradle. Would it be possible? if so, do I need to use gradle? Update It seems in make-dist.sh they also download the jar. 回答1: This is the key file: https:/

MAPE metric at h2O

谁都会走 提交于 2019-12-24 09:18:16
问题 What is correct way to implement MAPE under h2o framework? I am interested to convert below function to h2o concept def mape(a, b): mask = a <> 0 return (np.fabs(a - b)/a)[mask].mean() 回答1: import h2o h2o.init() df = h2o.create_frame(rows=100, cols=2, missing_fraction=0, integer_fraction=1, integer_range=5) print(df) def mape(a, b): mask = a != 0 return (abs(a-b)/a)[mask].mean() mape(df[0],df[1]) 来源: https://stackoverflow.com/questions/43103022/mape-metric-at-h2o

H2O-R: Apply custom library function on each row of H2OFrame

☆樱花仙子☆ 提交于 2019-12-24 02:48:09
问题 After importing a relatively big table from MySQL into H2O on my machine, I tried to run a hashing algorithm (murmurhash from the R digest package) on one of its columns and save it back to H2O. As I found out, using as.data.frame on a H2OFrame object is not always advised: originally my H2OFrame is ~43k rows large, but the coerced DataFrame contains usually only ~30k rows for some reason (the same goes for using base::apply / base::sapply /etc on the H2OFrame). I found out there is an apply

How to convert my H2O prediction to a data.frame in a fast way

烂漫一生 提交于 2019-12-24 02:33:43
问题 I am using H2O, on a large dataset, 8 Million rows and 10 col. I trained my randomForest using h2o.randomForest. The model was trained fine and also prediction worked correctly. Now I would like to convert my predictions to a data.frame. I did this : A2=h2o.predict(m1,Tr15_h2o) pred2=as.data.frame(A2) but it is too slow, takes forever. Is there any faster way to do the conversion from H2o to data.frame or data.table? 回答1: Here is some code which demonstrates how to use the data.table package

Issue starting h2o from RStudio, running command 'curl 'http://localhost:54321'' had status 127 [duplicate]

泄露秘密 提交于 2019-12-24 02:21:05
问题 This question already has answers here : Error with H2O in R - can't connect to local host (2 answers) Closed 2 years ago . I'm trying to start H2O from within RStudio. I've got the latest versions of R, RStudio, the R H2O package, Java SE SDK and the RCurl package. But when trying to initialize H2O, I get the following output: H2O is not running yet, starting it now... Note: In case of errors look at the following log files: C:\Users\fel069\AppData\Local\Temp\RtmpQZJyS1/h2o_fel069_started

h2oensemble Error in value[[3L]](cond) : argument “training_frame” must be a valid H2O H2OFrame or id

若如初见. 提交于 2019-12-24 01:55:27
问题 While trying to run the example on H2OEnsemble found on http://learn.h2o.ai/content/tutorials/ensembles-stacking/index.html from within Rstudio, I encounter the following error: Error in value[3L] : argument "training_frame" must be a valid H2O H2OFrame or id after defining the ensemble fit <- h2o.ensemble(x = x, y = y, training_frame = train, family = family, learner = learner, metalearner = metalearner, cvControl = list(V = 5, shuffle = TRUE)) I installed the latest version of both h2o and