SlideShare a Scribd company logo
Decision Trees and Random Forests
Machine Learning 2021
UML book chapter 18
Slides P. Zanuttigh
Decision Trees
Class 0 Class 1 Class 0
Example: Decision Tree
Grow a Decision Tree
Consider a binary classification setting and assume to have a gain
(performances) measure:
Start
❑ A single leaf assigning the most common of the two labels (i.e., the
one of the majority of the samples)
At each iteration
❑ Analyze the effect of splitting a leaf
❑ Among all possible splits select the one leading to a larger gain and
split that leaf (or choose not to split)
• Iterative Dichotomizer 3 (ID3)
Find which split (i.e. splitting over which
feature) leads to the maximum gain
Split on xj and recursively call the algorithm
considering the remaining features*
* Split on a feature only once: they are binary
No more
features to use
xj: selected feature
for the split
If real valued features: need to
find threshold, can split on
same feature with different
thresholds
Gain Measure
Example
Pruning
❑ Issue of ID3: The tree is typically very large with high risk of overfitting
❑ Prune the tree to reduce its size without affecting too much the performances
Random Forests (RF)
❑ Introduced by Leo Breiman in 2001
❑ Instead of using a single large tree
construct an ensemble of simpler
trees
❑ A Random Forest (RF) is a classifier
consisting of a collection of
decision trees
❑ The prediction is obtained by a
majority voting over the prediction
of the single trees
Random Forest: Example
Random Sampling with
Replacement
Idea: randomly sample from a training dataset with replacement
❑ Assume a training set S of size m: we can build new training sets
by taking at random m samples from S with replacement (i.e., the
same sample can be selected multiple times)
For example, if our training data is [1, 2, 3, 4, 5, 6] then we might sample
sets like [1, 2, 2, 3, 6, 6], [1, 2, 4, 4, 5, 6], [1 1 1 1 1 1], etc…..
i.e., all lists have a length of six but some values can be repeated in the
random selection
❑ Notice that we are not subsetting the training data into smaller
chunks
Bootstrap Aggregation
(Bagging)
Bagging (Bootstrap Aggregation):
❑ Decisions trees are very sensitive to the data they are trained on: small
changes to the training set can result in significantly different tree structures
❑ Random forest takes advantage of this by allowing each individual tree to
randomly sample with replacement from the dataset, resulting in different
training sets producing different trees
❑ This process is known as bagging
Bagging: Example
Randomization:
Feature Randomnsess
❑ In a normal decision tree, when it is time to split a node, we consider every
possible feature and pick the one that produces the largest gain
❑ In contrast, each tree in a random forest can pick only from a random subset of
features ( Feature Randomness )
❑ I.e., node splitting in a random forest model is based on a random subset of
features for each tree.
❑ This forces even more variation amongst the trees in the model and ultimately
results in lower correlation across trees and more diversification

More Related Content

Similar to Random Forests for Machine Learning ML Decision Tree (20)

PPTX
RandomForests_Sayed-tree based model.pptx
RadhaKilari
 
PPTX
artifial intelligence notes of islamia university
ghulammuhammad83506
 
PDF
Random Forest / Bootstrap Aggregation
Rupak Roy
 
PPTX
Random Forest
Abdullah al Mamun
 
PPTX
CS109a_Lecture16_Bagging_RF_Boosting.pptx
AbhishekSingh43430
 
PDF
Random forest sgv_ai_talk_oct_2_2018
digitalzombie
 
PPTX
13 random forest
Vishal Dutt
 
PDF
Decision Trees- Random Forests.pdf
TahaYasmin
 
PPTX
Decision_Trees_Random_Forests for use in machine learning and computer scienc...
nicolusstephen6
 
PPTX
Footprinting, Enumeration, Scanning, Sniffing, Social Engineering
MubashirHussain792093
 
PPTX
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
 
PPTX
Random Forest Decision Tree.pptx
Ramakrishna Reddy Bijjam
 
PDF
Working mechanism of a random forest classifier and its performance evaluation
Puspanjali Mohapatra
 
PDF
Random Forest Algorithm: A Machine Learning ALgorithm.pdf
SuhaanaKhan1
 
PPTX
Random Forest and KNN is fun
Zhen Li
 
PDF
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
AdityaSoraut
 
PPTX
Ml7 bagging
ankit_ppt
 
PPTX
decision_trees_forests_2.pptx
stalkthemhaha
 
PDF
lec17_ref.pdf
vishal choudhary
 
PPTX
Introduction to RandomForests 2004
Salford Systems
 
RandomForests_Sayed-tree based model.pptx
RadhaKilari
 
artifial intelligence notes of islamia university
ghulammuhammad83506
 
Random Forest / Bootstrap Aggregation
Rupak Roy
 
Random Forest
Abdullah al Mamun
 
CS109a_Lecture16_Bagging_RF_Boosting.pptx
AbhishekSingh43430
 
Random forest sgv_ai_talk_oct_2_2018
digitalzombie
 
13 random forest
Vishal Dutt
 
Decision Trees- Random Forests.pdf
TahaYasmin
 
Decision_Trees_Random_Forests for use in machine learning and computer scienc...
nicolusstephen6
 
Footprinting, Enumeration, Scanning, Sniffing, Social Engineering
MubashirHussain792093
 
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
 
Random Forest Decision Tree.pptx
Ramakrishna Reddy Bijjam
 
Working mechanism of a random forest classifier and its performance evaluation
Puspanjali Mohapatra
 
Random Forest Algorithm: A Machine Learning ALgorithm.pdf
SuhaanaKhan1
 
Random Forest and KNN is fun
Zhen Li
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
AdityaSoraut
 
Ml7 bagging
ankit_ppt
 
decision_trees_forests_2.pptx
stalkthemhaha
 
lec17_ref.pdf
vishal choudhary
 
Introduction to RandomForests 2004
Salford Systems
 

More from premkumar1891 (7)

PDF
Random Forests for AIML for 3rd year ECE department CSE
premkumar1891
 
PDF
decision tree and random forest in AIML for CSE
premkumar1891
 
PDF
Microprocessor and Microcontroller Notes
premkumar1891
 
PDF
AIML notes students study material for CSE IT ECE and other departments
premkumar1891
 
PPTX
BATCH NO FIRST REVIEW Smart trolley-1.pptx
premkumar1891
 
PPTX
TNWise Hackathon PPT industry 4.0 PMC TECH
premkumar1891
 
PPTX
Robotics lab module 3 ppt 4
premkumar1891
 
Random Forests for AIML for 3rd year ECE department CSE
premkumar1891
 
decision tree and random forest in AIML for CSE
premkumar1891
 
Microprocessor and Microcontroller Notes
premkumar1891
 
AIML notes students study material for CSE IT ECE and other departments
premkumar1891
 
BATCH NO FIRST REVIEW Smart trolley-1.pptx
premkumar1891
 
TNWise Hackathon PPT industry 4.0 PMC TECH
premkumar1891
 
Robotics lab module 3 ppt 4
premkumar1891
 
Ad

Recently uploaded (20)

PPTX
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
PPTX
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
PPTX
Presentation 2.pptx AI-powered home security systems Secure-by-design IoT fr...
SoundaryaBC2
 
PPTX
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
PPTX
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
PDF
Electrical Engineer operation Supervisor
ssaruntatapower143
 
PPTX
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
PPTX
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
PDF
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
PPTX
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
DOC
MRRS Strength and Durability of Concrete
CivilMythili
 
PDF
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
PDF
Zilliz Cloud Demo for performance and scale
Zilliz
 
PDF
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
DOCX
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
PDF
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
PPTX
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
PPTX
Evaluation and thermal analysis of shell and tube heat exchanger as per requi...
shahveer210504
 
PDF
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
PPTX
Product Development & DevelopmentLecture02.pptx
zeeshanwazir2
 
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
Heart Bleed Bug - A case study (Course: Cryptography and Network Security)
Adri Jovin
 
Presentation 2.pptx AI-powered home security systems Secure-by-design IoT fr...
SoundaryaBC2
 
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
Electrical Engineer operation Supervisor
ssaruntatapower143
 
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
MRRS Strength and Durability of Concrete
CivilMythili
 
Introduction to Productivity and Quality
মোঃ ফুরকান উদ্দিন জুয়েল
 
Zilliz Cloud Demo for performance and scale
Zilliz
 
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
8th International Conference on Electrical Engineering (ELEN 2025)
elelijjournal653
 
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
Evaluation and thermal analysis of shell and tube heat exchanger as per requi...
shahveer210504
 
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
Product Development & DevelopmentLecture02.pptx
zeeshanwazir2
 
Ad

Random Forests for Machine Learning ML Decision Tree

  • 1. Decision Trees and Random Forests Machine Learning 2021 UML book chapter 18 Slides P. Zanuttigh
  • 3. Class 0 Class 1 Class 0 Example: Decision Tree
  • 4. Grow a Decision Tree Consider a binary classification setting and assume to have a gain (performances) measure: Start ❑ A single leaf assigning the most common of the two labels (i.e., the one of the majority of the samples) At each iteration ❑ Analyze the effect of splitting a leaf ❑ Among all possible splits select the one leading to a larger gain and split that leaf (or choose not to split)
  • 5. • Iterative Dichotomizer 3 (ID3) Find which split (i.e. splitting over which feature) leads to the maximum gain Split on xj and recursively call the algorithm considering the remaining features* * Split on a feature only once: they are binary No more features to use xj: selected feature for the split If real valued features: need to find threshold, can split on same feature with different thresholds
  • 8. Pruning ❑ Issue of ID3: The tree is typically very large with high risk of overfitting ❑ Prune the tree to reduce its size without affecting too much the performances
  • 9. Random Forests (RF) ❑ Introduced by Leo Breiman in 2001 ❑ Instead of using a single large tree construct an ensemble of simpler trees ❑ A Random Forest (RF) is a classifier consisting of a collection of decision trees ❑ The prediction is obtained by a majority voting over the prediction of the single trees
  • 11. Random Sampling with Replacement Idea: randomly sample from a training dataset with replacement ❑ Assume a training set S of size m: we can build new training sets by taking at random m samples from S with replacement (i.e., the same sample can be selected multiple times) For example, if our training data is [1, 2, 3, 4, 5, 6] then we might sample sets like [1, 2, 2, 3, 6, 6], [1, 2, 4, 4, 5, 6], [1 1 1 1 1 1], etc….. i.e., all lists have a length of six but some values can be repeated in the random selection ❑ Notice that we are not subsetting the training data into smaller chunks
  • 12. Bootstrap Aggregation (Bagging) Bagging (Bootstrap Aggregation): ❑ Decisions trees are very sensitive to the data they are trained on: small changes to the training set can result in significantly different tree structures ❑ Random forest takes advantage of this by allowing each individual tree to randomly sample with replacement from the dataset, resulting in different training sets producing different trees ❑ This process is known as bagging
  • 14. Randomization: Feature Randomnsess ❑ In a normal decision tree, when it is time to split a node, we consider every possible feature and pick the one that produces the largest gain ❑ In contrast, each tree in a random forest can pick only from a random subset of features ( Feature Randomness ) ❑ I.e., node splitting in a random forest model is based on a random subset of features for each tree. ❑ This forces even more variation amongst the trees in the model and ultimately results in lower correlation across trees and more diversification