Build an efficient Machine Learning model with LightGBM

Poo Kuan Hoong
Build an effective
Machine Learning
Model with LightGBM

Agenda
• Introduction
• Decision Tree
• Ensemble Method
• Gradient Boosting
• Motivation for Gradient Boosting on Decision Trees
• LightGBM
• Demo

About Me
Poo Kuan Hoong
• Google Developer Expert (GDE) in Machine
Learning
• Founded and managing Malaysia R User Group &
TensorFlow & Deep Learning Malaysia User
Group

Malaysia R User Group
https://www.facebook.com/groups/MalaysiaRUserGroup/

Introduction
• Everyone is jumping into the hype of
Deep Learning.
• However, Deep Learning is not always
the best model.
• Deep Learning requires a lot of data,
hyperparameters tuning and training
time
• Often, the best model is the simplest
model.

Goal
1. Partition input space
2. Pure class distribution in each partition

Decision Trees: Guillotine cuts

Greedily Constructing A Decision Tree

Ensemble Methods
1. Weighted combination of
weak learners
2. Prediction is based on
committee votes
3. Boosting:
1. Train ensemble one weak
learner at the time
2. Focus new learners on
wrongly predicted examples

Gradient Boosting
1. Learn a regressor
2. Compute the error residual (Gradient in deep learning)
3. Then build a new model to predict that residual

Motivation for gradient boosting on Decision
Trees
Single decision tree can easily overfit the data

Gradient boosting on decision trees
• Let’s define our objective functions

Gradient boosting on decision trees –
regularization

Tricks from XGBoost
• The tree is grown in breadth first fashion (as opposed to depth first
like in the original C4.5 implementation). This provides a possibility of
sorting and traversing data only once on each level
• Furthermore, the sorted features can be cached – no need to sort
that many times

LightGBM
• LightGBM is a fast, distributed, high-performance gradient boosting
framework based on decision tree algorithm, used for ranking,
classification and many other machine learning tasks.
• New library, developed by Microsoft, part of Distributed Machine
Learning Toolkit.
• Main idea: make the training faster First release: April, 24th 2017

Gradient Boosting Machine (GBM)

Why LightGBM?
• Light GBM grows tree vertically while
other algorithm grows trees horizontally
meaning that Light GBM grows tree leaf-
wise while other algorithm grows level-
wise.
• It will choose the leaf with max delta
loss to grow. When growing the same
leaf, Leaf-wise algorithm can reduce
more loss than a level-wise algorithm

Features
Speed
• Light GBM is prefixed as ‘Light’ because of its high speed. Light
GBM can handle the large size of data and takes lower memory to
run
Accuracy
• LightGBM focuses on accuracy of results.
Distributed/Parellel Computing
• LGBM also supports GPU learning

Tips to fine tune LightGBM
• Following set of practices can be used to improve your model
efficiency.
• num_leaves: This is the main parameter to control the complexity of the
tree model. Ideally, the value of num_leaves should be less than or equal
to 2^(max_depth). Value more than this will result in overfitting.
• min_data_in_leaf: Setting it to a large value can avoid growing too deep
a tree, but may cause under-fitting. In practice, setting it to hundreds or
thousands is enough for a large dataset.
• max_depth: You also can use max_depth to limit the tree depth
explicitly.

• For Faster Speed:
• Use bagging by setting bagging_fraction and
bagging_freq
• Use feature sub-sampling by setting
feature_fraction
• Use small max_bin
• Use save_binary to speed up data loading in future
learning
• Use parallel learning, refer to parallel learning guide.

• For better accuracy:
• Use large max_bin (may be slower)
• Use small learning_rate with large num_iterations
• Use large num_leaves (may cause over-fitting)
• Use bigger training data
• Try dart
• Try to use categorical feature directly

Conclusion
• LightGBM works well on
multiple datasets and its
accuracy is as good or even
better than other boosting
algorithms.
• Based on its speed and
accuracy, it is recommended
to try LightGBM

To install LightGBM R Package
• Build and install R-package with the following commands:
git clone --recursive
https://github.com/Microsoft/LightGBM
cd LightGBM
Rscript build_r.R
https://github.com/Microsoft/LightGBM/tree/master/R-package

Data
• Porto Seguro’s Safe Driver Prediction
• https://www.kaggle.com/c/porto-seguro-safe-driver-prediction

Poo Kuan Hoong
kuanhoong@gmail.com
http://www.linkedin.com/in/kuanhoong
https://twitter.com/kuanhoong

Build an efficient Machine Learning model with LightGBM

Build an efficient Machine Learning model with LightGBM

More Related Content

What's hot (20)

Similar to Build an efficient Machine Learning model with LightGBM (20)

More from Poo Kuan Hoong (20)

Recently uploaded (20)

Build an efficient Machine Learning model with LightGBM