h2o

R H2O package import csv file with Chinese characters

大憨熊 提交于 2019-12-12 23:09:02
问题 I have a large dataset in csv format to build a prediction model. Because of its size, I planned to use h2o package in R to build the model. However, the data, in multiple columns of the data.frame , contains some Chinese Simplified characters and h2o is having difficulty receiving the data. I've tried two different approaches. The first approach involved directly reading from the file using the h2o.importFile() function to import the data. However, this approach ends up converting the

How should we interpret the results of the H2O predict function?

天大地大妈咪最大 提交于 2019-12-12 16:17:18
问题 I have trained and stored a random forest binary classification model. Now I'm trying to simulate processing new (out-of-sample) data with this model. My Python (Anaconda 3.6) code is: import h2o import pandas as pd import sys localH2O = h2o.init(ip = "localhost", port = 54321, max_mem_size = "8G", nthreads = -1) h2o.remove_all() model_path = "C:/sm/BottleRockets/rf_model/DRF_model_python_1501621766843_28117"; model = h2o.load_model(model_path) new_data = h2o.import_file(path="C:/sm

Implementing a decision tree using h2o

試著忘記壹切 提交于 2019-12-12 15:34:59
问题 I am trying to train a decision tree model using h2o. I am aware that no specific library for decision trees exist in h2o. But, h2o has an implemtation of random forest H2ORandomForestEstimator . Can we implement a decision tree in h2o by tuning certain input arguments of random forests ? Because we can do that in scikit module (a popular python library for machine learning) Ref link : Why is Random Forest with a single tree much better than a Decision Tree classifier? In scikit the code

How to use H2o on feature hashed matrix in R

橙三吉。 提交于 2019-12-12 15:00:21
问题 I am working on a moderate data set (train_data). There are more 124 variables and 50,00,000 observations. For categorical variables, I have used feature hashing on it through hashed.model.matrix function in R. ## feature hashing b <- 2 ^ 22 f <- ~ .-1 X_train <- hashed.model.matrix(f, train_data, hash.size=b) So, as a result , I have got a large dgCmatrix (a sparse matrix) as output (X_train). How can I use, H2o wrapper on this matrix and use different algorithms available in H2o ? Does H2o

Can I use directly H2O library functions from Java or the only option for H2O is R?

删除回忆录丶 提交于 2019-12-12 06:41:43
问题 I want to use machine learning algorithms in java. Mahout with hadoop is too slow and weka is not able to work because of large datasize. So is it possible to call H2O library from Java or any other better option available for java? 回答1: What you can do is implement your machine learning algorithms in R, and then call them via command line calls to the underlying system. I found this to be my best option when doing my thesis in Bioinformatics a few years ago. I remember trying to call the R

Efficient way to maintain a h2o data frame

半世苍凉 提交于 2019-12-12 04:14:12
问题 Lets say I have a function 'getData()' which returns data (see of it as a data stream). Now I need to form a h2o data frame with these data. I need to insert them as a new row only if it is not present in the data frame before. One obvious way is to do : There is a global h2o data frame Create a h2o data frame (of 1 row) from the arrived data. (I am using as.h2o()) Check if it is already present in the global data frame (using h2o.which() or any other function) If it is not present then add

H2oApi Java bindings columns endpoint not returning column metadata

我的未来我决定 提交于 2019-12-12 04:11:53
问题 We are developing a Java programm with H2O 3.10.4.7 and we need to retrieve metadata about all columns in a frame such as column names and datatypes. Related question (not resolved, different problem) here. Our expectation was that the water.bindings.H2oApi Client works just like the REST endpoints and we wanted to use the the H2oApi method frameColumns(FrameKeyV3 frameId) described in Javadoc: "Return all the columns from a Frame." but the result does not include any column-related info.

H2O not working on parallel

倖福魔咒の 提交于 2019-12-12 04:02:32
问题 I have create a DF and want to convert it to H2O Frame. To do that, I do: library(h2o) h2o.init(nthreads=-1) df<-data.table(matrix(0,ncol=46,nrow=30000)) df<-as.h2o(df) When I do htop on the comand line I see that only one processor of the 4 available are working. It is not possible to do in other way? Thanks! 回答1: There are two factors at work here. 1) The first is you are using as.h2o() , which is the not-very-efficient "push" method (where the client pushes data to the server) of ingesting

Confuse about the result of my check null value code

故事扮演 提交于 2019-12-12 02:47:10
问题 I tried this to check whether a row is null or not. package org.apache.spark.h2o.utils import water.fvec.{NewChunk, Frame, Chunk} import water._ class Miss extends MRTask { override def map(c: Chunk, nc: NewChunk): Unit = { for (row <- 0 until c.len()) { if(c.atd(row) == 0){ nc.addNum(0) } else nc.addNum(1) } } } And I can not understand the result of my code here A B C D E check min 0 mean 0 stddev 0 max 1 missing 0 0 5.1 3.5 1.4 0.2 Iris-setosa 1 1 4.9 3 1.4 0.2 Iris-setosa 1 2 4.7 3.2 1.3

How to add string to new chunk in H2Oframe

帅比萌擦擦* 提交于 2019-12-11 23:04:07
问题 By using newChunk.addNum(8) i can add a number to a row in my new chunk. How can I add a String to a row in in the new chunk? Thanks! 回答1: Your question was answered on the H2O google group: https://groups.google.com/forum/#!topic/h2ostream/nMHmBSMQRRM Thanks! Avni 来源: https://stackoverflow.com/questions/33424324/how-to-add-string-to-new-chunk-in-h2oframe