SlideShare a Scribd company logo
TreeNet Tree
Ensembles and CART
  Decision Trees: A
Winning Combination
                                                                             October 2012
                                                                          Mikhail Golovnya
                                                                           Salford Systems

CART® software is a trademark of California Statistical Software, Inc. and is licensed exclusively to Salford Systems.
TreeNet® software is a trademark of Salford Systems
Course Outline
• CART decision tree pros/cons
• TreeNet stochastic gradient boosting: a promising
  way to overcome the shortcomings of a single tree
• Introducing TreeNet, a powerful modern ensemble
  of boosted trees
    o   Methodology
    o   Reporting
    o   Interpretability
    o   Post-processing
    o   Interaction detection
• Advantages of using both CART and TreeNet
    o Contribution from CART
    o Contribution from TreeNet



 © Salford Systems 2012
Demonstration Dataset
108,376 bank customers (commercial and individual)
with 6,564 in bad standing over the past two years
Goal: identify customers in bad standing using the
following predictors
Revolving utilization of credit
Age of the primary account holder
Debt ratio of the primary account holder
Monthly income
Number of open credit lines
Number of mortgages
Number of dependents

 © Salford Systems 2012
CART Advantages
1. Relatively fast
2. All types of variables
    1.    Numeric, binary, categorical, missing values

3. Invariant under monotone transformations
    1.    Variable scales are irrelevant
    2.    Immunity to outliers
    3.    Most variables can be used “as is”

4. Resistance to many irrelevant variables
5. Few tunable parameters “off-the-shelf” procedure
6. Interpretable model representation



 © Salford Systems 2012
CART Disadvantages
1. Trade-off: accuracy vs. interpretability
2. Piecewise-constant model
    1.    Big errors near region boundaries
    2.    Impossible to detect fine differences within the segment

3. Instability => high variance
    1.    Small data change => big model change (especially for large trees)

4. Data fragmentation – splitting
5. High interaction order model, unreasonably
   complicated way to represent simple additive
   dependencies



 © Salford Systems 2012
TreeNet Tree Ensembles
• Complements CART advantages, while
  dramatically increasing accuracy

       Tree 1                  Tree 2                    Tree 3


                         +                        +




  First tree grown           2nd tree grown on        3rd tree grown on
     on original              residuals from            residuals from
        target.              first. Predictions       model consisting
    Intentionally            made to improve           of first two trees
   “weak” model                   first tree


© Salford Systems 2012
TreeNet Overcomes
         CART’s Shortcomings
Piecewise-Constant         CART                           TreeNet
Model                      Big errors near region         Fine predictions, nearly
                           boundaries, coarse             emulating smooth
                           predictions                    continuous response
                                                          surface
Instability and Variance   CART                           TreeNet
                           Small data changes             Stable models due to
                           induce big model changes       averaging of individual
                           (especially for large trees)   tree responses
Data Fragmentation         CART                           TreeNet
                           Relatively few predictors      Each tree works with the
                           make it into the model         entire data – many
                                                          opportunities for
                                                          variables to enter
High Interaction Order     CART                           TreeNet
Model                      Always enforced                Allows precise control
  © Salford Systems 2012                                  over the interactions
TreeNet and CART
 A Winning Combination



© Salford Systems 2012

More Related Content

Similar to TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination (20)

PDF
Introduction to Random Forest
Rupak Roy
 
PDF
Distributed Logistic Model Trees
Stratio
 
PDF
Deep neural networks and tabular data
JimmyLiang20
 
PPTX
18 Simple CART
Vishal Dutt
 
PPT
The Use Of Decision Trees For Adaptive Item
barthriley
 
PPTX
Hadoop & Greenplum: Why Do Such a Thing?
Ed Kohlwey
 
PPTX
Scaling metagenome assembly
c.titus.brown
 
PPTX
An Introduction to Random Forest and linear regression algorithms
Shouvic Banik0139
 
PDF
The return of big iron?
Ben Stopford
 
PDF
Data Mining Module 3 Business Analtics..pdf
Jayanti Pande
 
PPT
decisiontrees (3).ppt
LvlShivaNagendra
 
PPT
decisiontrees.ppt
PriyadharshiniG41
 
PPT
decisiontrees.ppt
LvlShivaNagendra
 
PDF
Data Science - Part V - Decision Trees & Random Forests
Derek Kane
 
PPTX
10 best practices in operational analytics
Decision Management Solutions
 
PPTX
Morse-Smale Regression
Colleen Farrelly
 
PPT
Machine Learning M1A.ppt for supervise and unsupervise learning
ArifHussainmathsclas
 
PDF
Subdivision of large uniform stands lacking natural bounding features
KR Walters Consulting Services
 
PDF
DDS in SCADA, Utilities, Smart Grid and Smart Cities
Angelo Corsaro
 
PDF
Random forests-talk-nl-meetup
Willem Hendriks
 
Introduction to Random Forest
Rupak Roy
 
Distributed Logistic Model Trees
Stratio
 
Deep neural networks and tabular data
JimmyLiang20
 
18 Simple CART
Vishal Dutt
 
The Use Of Decision Trees For Adaptive Item
barthriley
 
Hadoop & Greenplum: Why Do Such a Thing?
Ed Kohlwey
 
Scaling metagenome assembly
c.titus.brown
 
An Introduction to Random Forest and linear regression algorithms
Shouvic Banik0139
 
The return of big iron?
Ben Stopford
 
Data Mining Module 3 Business Analtics..pdf
Jayanti Pande
 
decisiontrees (3).ppt
LvlShivaNagendra
 
decisiontrees.ppt
PriyadharshiniG41
 
decisiontrees.ppt
LvlShivaNagendra
 
Data Science - Part V - Decision Trees & Random Forests
Derek Kane
 
10 best practices in operational analytics
Decision Management Solutions
 
Morse-Smale Regression
Colleen Farrelly
 
Machine Learning M1A.ppt for supervise and unsupervise learning
ArifHussainmathsclas
 
Subdivision of large uniform stands lacking natural bounding features
KR Walters Consulting Services
 
DDS in SCADA, Utilities, Smart Grid and Smart Cities
Angelo Corsaro
 
Random forests-talk-nl-meetup
Willem Hendriks
 

More from Salford Systems (20)

PDF
Datascience101presentation4
Salford Systems
 
PPTX
Improve Your Regression with CART and RandomForests
Salford Systems
 
PPTX
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Salford Systems
 
PPTX
Churn Modeling-For-Mobile-Telecommunications
Salford Systems
 
PPT
The Do's and Don'ts of Data Mining
Salford Systems
 
PPTX
Introduction to Random Forests by Dr. Adele Cutler
Salford Systems
 
PPTX
9 Data Mining Challenges From Data Scientists Like You
Salford Systems
 
PPTX
Statistically Significant Quotes To Remember
Salford Systems
 
PPTX
Using CART For Beginners with A Teclo Example Dataset
Salford Systems
 
PPTX
Evolution of regression ols to gps to mars
Salford Systems
 
PPTX
Data Mining for Higher Education
Salford Systems
 
PDF
Comparison of statistical methods commonly used in predictive modeling
Salford Systems
 
PDF
Molecular data mining tool advances in hiv
Salford Systems
 
PDF
SPM v7.0 Feature Matrix
Salford Systems
 
PDF
SPM User's Guide: Introducing MARS
Salford Systems
 
PPT
Hybrid cart logit model 1998
Salford Systems
 
PPTX
Session Logs Tutorial for SPM
Salford Systems
 
PPTX
Text mining tutorial
Salford Systems
 
PPT
Paradigm shifts in wildlife and biodiversity management through machine learning
Salford Systems
 
PPT
Global Modeling of Biodiversity and Climate Change
Salford Systems
 
Datascience101presentation4
Salford Systems
 
Improve Your Regression with CART and RandomForests
Salford Systems
 
Improved Predictions in Structure Based Drug Design Using Cart and Bayesian M...
Salford Systems
 
Churn Modeling-For-Mobile-Telecommunications
Salford Systems
 
The Do's and Don'ts of Data Mining
Salford Systems
 
Introduction to Random Forests by Dr. Adele Cutler
Salford Systems
 
9 Data Mining Challenges From Data Scientists Like You
Salford Systems
 
Statistically Significant Quotes To Remember
Salford Systems
 
Using CART For Beginners with A Teclo Example Dataset
Salford Systems
 
Evolution of regression ols to gps to mars
Salford Systems
 
Data Mining for Higher Education
Salford Systems
 
Comparison of statistical methods commonly used in predictive modeling
Salford Systems
 
Molecular data mining tool advances in hiv
Salford Systems
 
SPM v7.0 Feature Matrix
Salford Systems
 
SPM User's Guide: Introducing MARS
Salford Systems
 
Hybrid cart logit model 1998
Salford Systems
 
Session Logs Tutorial for SPM
Salford Systems
 
Text mining tutorial
Salford Systems
 
Paradigm shifts in wildlife and biodiversity management through machine learning
Salford Systems
 
Global Modeling of Biodiversity and Climate Change
Salford Systems
 
Ad

Recently uploaded (20)

PDF
GITLAB-CICD_For_Professionals_KodeKloud.pdf
deepaktyagi0048
 
PDF
Shuen Mei Parth Sharma Boost Productivity, Innovation and Efficiency wit...
AWS Chicago
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PDF
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
PDF
Productivity Management Software | Workstatus
Lovely Baghel
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
PDF
2025-07-15 EMEA Volledig Inzicht Dutch Webinar
ThousandEyes
 
PDF
HydITEx corporation Booklet 2025 English
Георгий Феодориди
 
PDF
Are there government-backed agri-software initiatives in Limerick.pdf
giselawagner2
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PPTX
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
PDF
"Effect, Fiber & Schema: tactical and technical characteristics of Effect.ts"...
Fwdays
 
PPTX
The Yotta x CloudStack Advantage: Scalable, India-First Cloud
ShapeBlue
 
PPTX
UI5Con 2025 - Beyond UI5 Controls with the Rise of Web Components
Wouter Lemaire
 
PDF
Upskill to Agentic Automation 2025 - Kickoff Meeting
DianaGray10
 
GITLAB-CICD_For_Professionals_KodeKloud.pdf
deepaktyagi0048
 
Shuen Mei Parth Sharma Boost Productivity, Innovation and Efficiency wit...
AWS Chicago
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Ampere Offers Energy-Efficient Future For AI And Cloud
ShapeBlue
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
Productivity Management Software | Workstatus
Lovely Baghel
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
2025-07-15 EMEA Volledig Inzicht Dutch Webinar
ThousandEyes
 
HydITEx corporation Booklet 2025 English
Георгий Феодориди
 
Are there government-backed agri-software initiatives in Limerick.pdf
giselawagner2
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
"Effect, Fiber & Schema: tactical and technical characteristics of Effect.ts"...
Fwdays
 
The Yotta x CloudStack Advantage: Scalable, India-First Cloud
ShapeBlue
 
UI5Con 2025 - Beyond UI5 Controls with the Rise of Web Components
Wouter Lemaire
 
Upskill to Agentic Automation 2025 - Kickoff Meeting
DianaGray10
 
Ad

TreeNet Tree Ensembles & CART Decision Trees: A Winning Combination

  • 1. TreeNet Tree Ensembles and CART Decision Trees: A Winning Combination October 2012 Mikhail Golovnya Salford Systems CART® software is a trademark of California Statistical Software, Inc. and is licensed exclusively to Salford Systems. TreeNet® software is a trademark of Salford Systems
  • 2. Course Outline • CART decision tree pros/cons • TreeNet stochastic gradient boosting: a promising way to overcome the shortcomings of a single tree • Introducing TreeNet, a powerful modern ensemble of boosted trees o Methodology o Reporting o Interpretability o Post-processing o Interaction detection • Advantages of using both CART and TreeNet o Contribution from CART o Contribution from TreeNet © Salford Systems 2012
  • 3. Demonstration Dataset 108,376 bank customers (commercial and individual) with 6,564 in bad standing over the past two years Goal: identify customers in bad standing using the following predictors Revolving utilization of credit Age of the primary account holder Debt ratio of the primary account holder Monthly income Number of open credit lines Number of mortgages Number of dependents © Salford Systems 2012
  • 4. CART Advantages 1. Relatively fast 2. All types of variables 1. Numeric, binary, categorical, missing values 3. Invariant under monotone transformations 1. Variable scales are irrelevant 2. Immunity to outliers 3. Most variables can be used “as is” 4. Resistance to many irrelevant variables 5. Few tunable parameters “off-the-shelf” procedure 6. Interpretable model representation © Salford Systems 2012
  • 5. CART Disadvantages 1. Trade-off: accuracy vs. interpretability 2. Piecewise-constant model 1. Big errors near region boundaries 2. Impossible to detect fine differences within the segment 3. Instability => high variance 1. Small data change => big model change (especially for large trees) 4. Data fragmentation – splitting 5. High interaction order model, unreasonably complicated way to represent simple additive dependencies © Salford Systems 2012
  • 6. TreeNet Tree Ensembles • Complements CART advantages, while dramatically increasing accuracy Tree 1 Tree 2 Tree 3 + + First tree grown 2nd tree grown on 3rd tree grown on on original residuals from residuals from target. first. Predictions model consisting Intentionally made to improve of first two trees “weak” model first tree © Salford Systems 2012
  • 7. TreeNet Overcomes CART’s Shortcomings Piecewise-Constant CART TreeNet Model Big errors near region Fine predictions, nearly boundaries, coarse emulating smooth predictions continuous response surface Instability and Variance CART TreeNet Small data changes Stable models due to induce big model changes averaging of individual (especially for large trees) tree responses Data Fragmentation CART TreeNet Relatively few predictors Each tree works with the make it into the model entire data – many opportunities for variables to enter High Interaction Order CART TreeNet Model Always enforced Allows precise control © Salford Systems 2012 over the interactions
  • 8. TreeNet and CART A Winning Combination © Salford Systems 2012