lightgbm

LightGBM源码如何计算增益

半世苍凉 提交于 2020-01-09 19:22:38
在之前的 XGBoost原理理解 中已经推导过XGBoost的决策树的分裂增益为 L s p l i t = 1 2 [ ( ∑ i ∈ I L g i ) 2 ∑ i ∈ I L h i + λ + ( ∑ i ∈ I R g i ) 2 ∑ i ∈ I R h i + λ − ( ∑ i ∈ I g i ) 2 ∑ i ∈ I h i + λ ] − γ \mathcal{L}_{split}={1\over2}\bigg[{(\sum_{i\in I_L}g_i)^2\over \sum_{i\in I_L}h_i+\lambda}+{(\sum_{i\in I_R}g_i)^2\over \sum_{i\in I_R}h_i+\lambda}-{(\sum_{i\in I}g_i)^2\over \sum_{i\in I}h_i+\lambda}\bigg]-\gamma L s p l i t ​ = 2 1 ​ [ ∑ i ∈ I L ​ ​ h i ​ + λ ( ∑ i ∈ I L ​ ​ g i ​ ) 2 ​ + ∑ i ∈ I R ​ ​ h i ​ + λ ( ∑ i ∈ I R ​ ​ g i ​ ) 2 ​ − ∑ i ∈ I ​ h i ​ + λ ( ∑ i ∈ I ​ g i ​ ) 2 ​ ] − γ 其中 λ \lambda λ

LightGBM and XGBoost Explained

自闭症网瘾萝莉.ら 提交于 2019-12-26 17:15:55
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> The gradient boosting decision tree (GBDT) is one of the best performing classes of algorithms in machine learning competitions. One implementation of the gradient boosting decision tree – xgboost – is one of the most popular algorithms on Kaggle. Among the 29 challenge winning solutions published at Kaggle’s blog during 2015, 17 used xgboost. If you take a look at the kernels in a Kaggle competition, you can clearly see how popular xgboost is. The search results for all kernels that had xgboost in their titles for the Kaggle Quora Duplicate Question

集群提交lightGBM算法

烈酒焚心 提交于 2019-12-26 17:11:17
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> ## mmlspark https://mvnrepository.com/artifact/Azure/mmlspark/0.15 ## lightgbmlib https://mvnrepository.com/artifact/com.microsoft.ml.lightgbm/lightgbmlib/2.2.200 [root@hadoop-1-1 ~]# more lgbm.sh /app/spark2.3/bin/spark-submit \ --master yarn \ --jars /root/external_pkgs/mmlspark-0.15.jar,/root/external_pkgs/lightgbmlib-2.2.200.jar \ --class com.sf.demo.lgmClassifier /root/lgbm_demo.jar nohup sh lgbm.sh > lgbm_20191226_001.log 2>&1 & package com.xx.demo import org.apache.spark.SparkConf import org.apache.spark.sql.SparkSession import org.apache.spark.ml

GPU support for XGBoost and LightGBM

白昼怎懂夜的黑 提交于 2019-12-23 00:27:17
GPU support for XGBoost and LightGBM GBDT 是表格型数据挖掘比赛的大杀器,其主要思想是利用弱分类器(决策树)迭代训练以得到最优模型,该模型具有训练效果好、不易过拟合等优点。XGBoost 和 LightGBM 是两个实现 GBDT 算法的框架,为了加快模型的训练效率,本文记录了 GPU Support 的 XGBoost and LightGBM 的构建过程。 本次构建的系统环境为 CentOS 7.2。 Installation Guide for XGBoost GPU support Building XGBoost from source,构建和安装 XGBoost 包括如下两个步骤, 从 C++ 代码构建共享库(libxgboost.so for Linux/OSX and xgboost.dll for Windows) 安装语言包(如Python) Building the Shared Library 在 CentOS 上构建共享库,默认情况下,分布式 GPU 训练是关闭的,仅仅只有一个 GPU 将被使用,为开启分布式 GPU 训练,用 CMake 构建时,设置选项 USE_NCLL=ON ,分布式 GPU 训练依赖 NCLL2 ,可在 https://developer.nvidia.com/nccl 获取

Installing Lightgbm on Mac with OpenMP dependency

妖精的绣舞 提交于 2019-12-22 18:03:57
问题 I'm new to python and would like to install lightgbm on my macbook. I did a pip install lightgbm and it said installation successful. However when I try to import that into my notebook I get the following error message: ../anaconda/envs/python3/lib/python3.6/ctypes/__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error) 342 343 if handle is None: --> 344 self._handle = _dlopen(self._name, mode) 345 else: 346 self._handle = handle OSError: dlopen(../anaconda/envs/python3

Multiclass Classification with LightGBM

允我心安 提交于 2019-12-22 04:45:17
问题 I am trying to model a classifier for a multi-class Classification problem (3 Classes) using LightGBM in Python. I used the following parameters. params = {'task': 'train', 'boosting_type': 'gbdt', 'objective': 'multiclass', 'num_class':3, 'metric': 'multi_logloss', 'learning_rate': 0.002296, 'max_depth': 7, 'num_leaves': 17, 'feature_fraction': 0.4, 'bagging_fraction': 0.6, 'bagging_freq': 17} All the categorical features of the dataset is label encoded with LabelEncoder . I trained the

Python: LightGBM cross validation. How to use lightgbm.cv for regression?

↘锁芯ラ 提交于 2019-12-22 03:22:45
问题 I want to do a cross validation for LightGBM model with lgb.Dataset and use early_stopping_rounds . The following approach works without a problem with XGBoost's xgboost.cv . I prefer not to use Scikit Learn's approach with GridSearchCV, because it doesn't support early stopping or lgb.Dataset. import lightgbm as lgb from sklearn.metrics import mean_absolute_error dftrainLGB = lgb.Dataset(data = dftrain, label = ytrain, feature_name = list(dftrain)) params = {'objective': 'regression'} cv

light gbm - python API vs Scikit-learn API

隐身守侯 提交于 2019-12-19 04:02:16
问题 I was trying to apply lgbm in one of my problems. For that I was going through "http://lightgbm.readthedocs.io/en/latest/Python-API.html". However, I have a basic question. Is there any difference between Training API and Scikit-learn API? Can we use both the APIs to achieve same result for the same problem? Thanks, Dipanjan. 回答1: The short answer: yes, they will provide identical results if you will configure them in identical ways. The reason is that sklearn API is just a wrapper around the

Cross-validation in LightGBM

佐手、 提交于 2019-12-17 19:18:50
问题 After reading through LightGBM's documentation on cross-validation, I'm hoping this community can shed light on cross-validating results and improving our predictions using LightGBM. How are we supposed to use the dictionary output from lightgbm.cv to improve our predictions? Here's an example - we train our cv model using the code below: cv_mod = lgb.cv(params, d_train, 500, nfold = 10, early_stopping_rounds = 25, stratified = True) How can we use the parameters found from the best iteration

Access trees and nodes from LightGBM model

冷暖自知 提交于 2019-12-14 02:17:27
问题 In sci-kit learn, it's possible to access the entire tree structure, that is, each node of the tree. This allows to explore the attributes used at each split of the tree and which values are used for the test The binary tree structure has 5 nodes and has the following tree structure: node=0 test node: go to node 1 if X[:, 3] <= 0.800000011920929 else to node 2. node=1 leaf node. node=2 test node: go to node 3 if X[:, 2] <= 4.950000047683716 else to node 4. node=3 leaf node. node=4 leaf node.