gbm

H2O - balance classes - cross validation

醉酒当歌 提交于 2019-12-06 13:47:25
I would like to build a GBM model with H2O. My data set is imbalanced, so I am using the balance_classes parameter. For grid search (parameter tuning) I would like to use 5-fold cross validation. I am wondering how H2O deals with class balancing in that case. Will only the training folds be rebalanced? I want to be sure the test-fold is not rebalanced. Thank you. In class imbalance settings, artificially balancing the test/validation set does not make any sense: these sets must remain realistic , i.e. you want to test your classifier performance in the real world setting, where, say, the

gbm::interact.gbm vs. dismo::gbm.interactions

匿名 (未验证) 提交于 2019-12-03 01:49:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: Background The reference manual for the gbm package states the interact.gbm function computes Friedman's H-statistic to assess the strength of variable interactions. the H-statistic is on the scale of [0-1]. The reference manual for the dismo package does not reference any literature for how the gbm.interactions function detects and models interactions. Instead it gives a list of general procedures used to detect and model interactions. The dismo vignette "Boosted Regression Trees for ecological modeling" states that the dismo package

EGLDisplay on GBM

匿名 (未验证) 提交于 2019-12-03 00:48:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I want to create an OpenGL context through EGL. As I won't actually draw, I want to use Pbuffers in conjunction with the GBM platform. This is the code (C99): #include <stdlib.h> #include <assert.h> #include <fcntl.h> #include <unistd.h> #include <EGL/egl.h> #include <EGL/eglext.h> #include <gbm.h> int main( void ) { assert( eglBindAPI( EGL_OPENGL_API ) == EGL_TRUE ); int fd = open("/dev/dri/card0", O_RDWR); struct gbm_device * gbm = gbm_create_device( fd ); EGLDisplay dpy = eglGetDisplay( gbm ); eglInitialize( dpy , NULL , NULL ); EGLConfig

xgboost 参数调优指南

匿名 (未验证) 提交于 2019-12-03 00:39:02
XGBoost算法可以给预测模型带来能力的提升。当我对它的表现有更多了解的时候,当我对它的高准确率背后的原理有更多了解的时候,我发现它具有很多优势: 标准GBDT 的实现没有像XGBoost这样的正则化步骤。正则化对减少过拟合也是有帮助的。 实际上,XGBoost以“正则化提升(regularized boosting)”技术而闻名。 XGBoost可以实现并行处理,相比GBDT有了速度的飞跃。 不过,众所周知,Boosting算法是顺序处理的,它怎么可能并行呢?每一课树的构造都依赖于前一棵树,那具体是什么让我们能用多核处理器去构造一个树呢?其实 XGBoost并行指代的是更低粒度的并行,是在特征层面的并行。 XGBoost 也支持Hadoop实现。 XGBoost 允许用户定义自定义优化目标和评价标准 它对模型增加了一个全新的维度,所以我们的处理不会受到任何限制。 XGBoost内置处理缺失值的规则。 用户需要提供一个和其它样本不同的值,然后把它作为一个参数传进去,以此来作为缺失值的取值。XGBoost在不同节点遇到缺失值时采用不同的处理方法,并且会学习未来遇到缺失值时的处理方法。 当分裂时遇到一个负损失时,GBM会停止分裂。因此GBM实际上是一个贪心算法。 XGBoost会一直分裂到指定的最大深度(max_depth),然后回过头来剪枝。如果某个节点之后不再有正值

Caret error using GBM, but not without caret

白昼怎懂夜的黑 提交于 2019-12-01 17:27:18
I've been using gbm through caret without problems, but when removing some variables from my dataframe it started to fail. I've tried with both github and cran versions of the mentioned packages. This is the error: > fitRF = train(my_data[trainIndex,vars_for_clust], clusterAssignment[trainIndex], method = "gbm", verbose=T) Something is wrong; all the Accuracy metric values are missing: Accuracy Kappa Min. : NA Min. : NA 1st Qu.: NA 1st Qu.: NA Median : NA Median : NA Mean :NaN Mean :NaN 3rd Qu.: NA 3rd Qu.: NA Max. : NA Max. : NA NA's :9 NA's :9 Error in train.default(my_data[trainIndex, vars

Caret error using GBM, but not without caret

情到浓时终转凉″ 提交于 2019-12-01 16:58:22
问题 I've been using gbm through caret without problems, but when removing some variables from my dataframe it started to fail. I've tried with both github and cran versions of the mentioned packages. This is the error: > fitRF = train(my_data[trainIndex,vars_for_clust], clusterAssignment[trainIndex], method = "gbm", verbose=T) Something is wrong; all the Accuracy metric values are missing: Accuracy Kappa Min. : NA Min. : NA 1st Qu.: NA 1st Qu.: NA Median : NA Median : NA Mean :NaN Mean :NaN 3rd

GBM multinomial distribution, how to use predict() to get predicted class?

拟墨画扇 提交于 2019-12-01 03:46:22
I am using the multinomial distribution from the gbm package in R. When I use the predict function, I get a series of values: 5.086328 -4.738346 -8.492738 -5.980720 -4.351102 -4.738044 -3.220387 -4.732654 but I want to get the probability of each class occurring. How do I recover the probabilities? Thank You. Take a look at ?predict.gbm , you'll see that there is a "type" parameter to the function. Try out predict(<gbm object>, <new data>, type="response") . smci predict.gbm(..., type='response') is not implemented for multinomial, or indeed any distribution other than bernoulli or poisson. So

Understanding tree structure in R gbm package

夙愿已清 提交于 2019-11-30 15:21:02
I am having some difficulty understanding how the trees are structured in R's gbm gradient boosted machine package. Specifically, looking at the output of the pretty.gbm.tree Which features do the indices in SplitVar point to ? I trained a GBM on a dataset, here is the top ~quarter of one of my trees -- the result of a call to pretty.gbm.tree : SplitVar SplitCodePred LeftNode RightNode MissingNode ErrorReduction Weight Prediction 0 9 6.250000e+01 1 2 21 0.6634681 5981 0.005000061 1 -1 1.895699e-12 -1 -1 -1 0.0000000 3013 0.018956988 2 31 4.462500e+02 3 4 20 1.0083722 2968 -0.009168477 3 -1 1

R: Plot trees from h2o.randomForest() and h2o.gbm()

最后都变了- 提交于 2019-11-30 05:05:49
Looking for an efficient way to plot trees in rstudio, H2O's Flow or in local html page from h2o's RF and GBM models similar to the one in the image in link below. Specifically, how do you plot trees for the objects, (fitted models) rf1 and gbm2 produced by code below perhaps by parsing h2o.download_pojo(rf1) or h2o.download_pojo(gbm1)? # # The following two commands remove any previously installed H2O packages for R. # if ("package:h2o" %in% search()) { detach("package:h2o", unload=TRUE) } # if ("h2o" %in% rownames(installed.packages())) { remove.packages("h2o") } # # Next, we download

R: Plot trees from h2o.randomForest() and h2o.gbm()

此生再无相见时 提交于 2019-11-29 02:54:29
问题 Looking for an efficient way to plot trees in rstudio, H2O's Flow or in local html page from h2o's RF and GBM models similar to the one in the image in link below. Specifically, how do you plot trees for the objects, (fitted models) rf1 and gbm2 produced by code below perhaps by parsing h2o.download_pojo(rf1) or h2o.download_pojo(gbm1)? # # The following two commands remove any previously installed H2O packages for R. # if ("package:h2o" %in% search()) { detach("package:h2o", unload=TRUE) } #