h2o | 易学教程

R H2O package import csv file with Chinese characters

阅读更多关于 R H2O package import csv file with Chinese characters

问题 I have a large dataset in csv format to build a prediction model. Because of its size, I planned to use h2o package in R to build the model. However, the data, in multiple columns of the data.frame , contains some Chinese Simplified characters and h2o is having difficulty receiving the data. I've tried two different approaches. The first approach involved directly reading from the file using the h2o.importFile() function to import the data. However, this approach ends up converting the

How should we interpret the results of the H2O predict function?

阅读更多关于 How should we interpret the results of the H2O predict function?

问题 I have trained and stored a random forest binary classification model. Now I'm trying to simulate processing new (out-of-sample) data with this model. My Python (Anaconda 3.6) code is: import h2o import pandas as pd import sys localH2O = h2o.init(ip = "localhost", port = 54321, max_mem_size = "8G", nthreads = -1) h2o.remove_all() model_path = "C:/sm/BottleRockets/rf_model/DRF_model_python_1501621766843_28117"; model = h2o.load_model(model_path) new_data = h2o.import_file(path="C:/sm

Implementing a decision tree using h2o

阅读更多关于 Implementing a decision tree using h2o

问题 I am trying to train a decision tree model using h2o. I am aware that no specific library for decision trees exist in h2o. But, h2o has an implemtation of random forest H2ORandomForestEstimator . Can we implement a decision tree in h2o by tuning certain input arguments of random forests ? Because we can do that in scikit module (a popular python library for machine learning) Ref link : Why is Random Forest with a single tree much better than a Decision Tree classifier? In scikit the code

How to use H2o on feature hashed matrix in R

阅读更多关于 How to use H2o on feature hashed matrix in R

问题 I am working on a moderate data set (train_data). There are more 124 variables and 50,00,000 observations. For categorical variables, I have used feature hashing on it through hashed.model.matrix function in R. ## feature hashing b <- 2 ^ 22 f <- ~ .-1 X_train <- hashed.model.matrix(f, train_data, hash.size=b) So, as a result , I have got a large dgCmatrix (a sparse matrix) as output (X_train). How can I use, H2o wrapper on this matrix and use different algorithms available in H2o ? Does H2o

Can I use directly H2O library functions from Java or the only option for H2O is R?

阅读更多关于 Can I use directly H2O library functions from Java or the only option for H2O is R?

问题 I want to use machine learning algorithms in java. Mahout with hadoop is too slow and weka is not able to work because of large datasize. So is it possible to call H2O library from Java or any other better option available for java? 回答1: What you can do is implement your machine learning algorithms in R, and then call them via command line calls to the underlying system. I found this to be my best option when doing my thesis in Bioinformatics a few years ago. I remember trying to call the R

Efficient way to maintain a h2o data frame

阅读更多关于 Efficient way to maintain a h2o data frame

问题 Lets say I have a function 'getData()' which returns data (see of it as a data stream). Now I need to form a h2o data frame with these data. I need to insert them as a new row only if it is not present in the data frame before. One obvious way is to do : There is a global h2o data frame Create a h2o data frame (of 1 row) from the arrived data. (I am using as.h2o()) Check if it is already present in the global data frame (using h2o.which() or any other function) If it is not present then add

H2oApi Java bindings columns endpoint not returning column metadata

阅读更多关于 H2oApi Java bindings columns endpoint not returning column metadata

问题 We are developing a Java programm with H2O 3.10.4.7 and we need to retrieve metadata about all columns in a frame such as column names and datatypes. Related question (not resolved, different problem) here. Our expectation was that the water.bindings.H2oApi Client works just like the REST endpoints and we wanted to use the the H2oApi method frameColumns(FrameKeyV3 frameId) described in Javadoc: "Return all the columns from a Frame." but the result does not include any column-related info.

H2O not working on parallel

阅读更多关于 H2O not working on parallel

问题 I have create a DF and want to convert it to H2O Frame. To do that, I do: library(h2o) h2o.init(nthreads=-1) df<-data.table(matrix(0,ncol=46,nrow=30000)) df<-as.h2o(df) When I do htop on the comand line I see that only one processor of the 4 available are working. It is not possible to do in other way? Thanks! 回答1: There are two factors at work here. 1) The first is you are using as.h2o() , which is the not-very-efficient "push" method (where the client pushes data to the server) of ingesting

Confuse about the result of my check null value code

阅读更多关于 Confuse about the result of my check null value code

问题 I tried this to check whether a row is null or not. package org.apache.spark.h2o.utils import water.fvec.{NewChunk, Frame, Chunk} import water._ class Miss extends MRTask { override def map(c: Chunk, nc: NewChunk): Unit = { for (row <- 0 until c.len()) { if(c.atd(row) == 0){ nc.addNum(0) } else nc.addNum(1) } } } And I can not understand the result of my code here A B C D E check min 0 mean 0 stddev 0 max 1 missing 0 0 5.1 3.5 1.4 0.2 Iris-setosa 1 1 4.9 3 1.4 0.2 Iris-setosa 1 2 4.7 3.2 1.3

How to add string to new chunk in H2Oframe

阅读更多关于 How to add string to new chunk in H2Oframe

问题 By using newChunk.addNum(8) i can add a number to a row in my new chunk. How can I add a String to a row in in the new chunk? Thanks! 回答1: Your question was answered on the H2O google group: https://groups.google.com/forum/#!topic/h2ostream/nMHmBSMQRRM Thanks! Avni 来源： https://stackoverflow.com/questions/33424324/how-to-add-string-to-new-chunk-in-h2oframe