How to use H2o on feature hashed matrix in R

橙三吉。 提交于 2019-12-12 15:00:21

问题


I am working on a moderate data set (train_data). There are more 124 variables and 50,00,000 observations. For categorical variables, I have used feature hashing on it through hashed.model.matrix function in R.  

## feature hashing
b <- 2 ^ 22
f <- ~ .-1
X_train <- hashed.model.matrix(f, train_data, hash.size=b)

So, as a result , I have got a large dgCmatrix (a sparse matrix) as output (X_train). How can I use, H2o wrapper  on  this matrix and use different algorithms available in H2o ? Does H2o wrapper take sparse matrix (dgCmatrix). Any link / example of such usage will be helpful. Thanks in anticipation.

Looking forward to import X_train in H2o environment to do dollowing type of steps

# initialize connection to H2O server
  h2o.init(nthreads = -1)
 train.hex <- h2o.uploadFile('./X_train', destination_frame='train')

# list of features for training
feature.names <- names(train.hex)

# train random forest model, use ntrees = 500 
drf <- h2o.randomForest(x=feature.names, y='outcome', training_frame,train.hex, ntrees =500)

回答1:


you could save your sparse matrix to svmlight sparse format, then use

train.hex <- h2o.uploadFile('./X_train', parse_type = "SVMLight", destination_frame='train')

svmlight sparse format will also be detected by h2o.importFile(), which is a parallelized reader and pulls information from the server from a location specified by the client.

train.hex <- h2o.importFile('./X_train', destination_frame='train')


来源:https://stackoverflow.com/questions/38870109/how-to-use-h2o-on-feature-hashed-matrix-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!