Random Forests for Machine Learning ML Decision Tree

0 likes12 views

premkumar1891

Random Forests with example

Engineering

Decision Trees and Random Forests
Machine Learning 2021
UML book chapter 18
Slides P. Zanuttigh

Class 0 Class 1 Class 0
Example: Decision Tree

Grow a Decision Tree
Consider a binary classification setting and assume to have a gain
(performances) measure:
Start
❑ A single leaf assigning the most common of the two labels (i.e., the
one of the majority of the samples)
At each iteration
❑ Analyze the effect of splitting a leaf
❑ Among all possible splits select the one leading to a larger gain and
split that leaf (or choose not to split)

• Iterative Dichotomizer 3 (ID3)
Find which split (i.e. splitting over which
feature) leads to the maximum gain
Split on xj and recursively call the algorithm
considering the remaining features*
* Split on a feature only once: they are binary
No more
features to use
xj: selected feature
for the split
If real valued features: need to
find threshold, can split on
same feature with different
thresholds

Pruning
❑ Issue of ID3: The tree is typically very large with high risk of overfitting
❑ Prune the tree to reduce its size without affecting too much the performances

Random Forests (RF)
❑ Introduced by Leo Breiman in 2001
❑ Instead of using a single large tree
construct an ensemble of simpler
trees
❑ A Random Forest (RF) is a classifier
consisting of a collection of
decision trees
❑ The prediction is obtained by a
majority voting over the prediction
of the single trees

Random Sampling with
Replacement
Idea: randomly sample from a training dataset with replacement
❑ Assume a training set S of size m: we can build new training sets
by taking at random m samples from S with replacement (i.e., the
same sample can be selected multiple times)
For example, if our training data is [1, 2, 3, 4, 5, 6] then we might sample
sets like [1, 2, 2, 3, 6, 6], [1, 2, 4, 4, 5, 6], [1 1 1 1 1 1], etc…..
i.e., all lists have a length of six but some values can be repeated in the
random selection
❑ Notice that we are not subsetting the training data into smaller
chunks

Bootstrap Aggregation
(Bagging)
Bagging (Bootstrap Aggregation):
❑ Decisions trees are very sensitive to the data they are trained on: small
changes to the training set can result in significantly different tree structures
❑ Random forest takes advantage of this by allowing each individual tree to
randomly sample with replacement from the dataset, resulting in different
training sets producing different trees
❑ This process is known as bagging

Randomization:
Feature Randomnsess
❑ In a normal decision tree, when it is time to split a node, we consider every
possible feature and pick the one that produces the largest gain
❑ In contrast, each tree in a random forest can pick only from a random subset of
features ( Feature Randomness )
❑ I.e., node splitting in a random forest model is based on a random subset of
features for each tree.
❑ This forces even more variation amongst the trees in the model and ultimately
results in lower correlation across trees and more diversification

More Related Content

Similar to Random Forests for Machine Learning ML Decision Tree (20)

PPTX

RandomForests_Sayed-tree based model.pptxRadhaKilari

PPTX

artifial intelligence notes of islamia universityghulammuhammad83506

PDF

Random Forest / Bootstrap AggregationRupak Roy

PPTX

Random ForestAbdullah al Mamun

PPTX

CS109a_Lecture16_Bagging_RF_Boosting.pptxAbhishekSingh43430

PDF

Random forest sgv_ai_talk_oct_2_2018digitalzombie

PPTX

13 random forestVishal Dutt

PDF

Decision Trees- Random Forests.pdfTahaYasmin

PPTX

Decision_Trees_Random_Forests for use in machine learning and computer scienc...nicolusstephen6

PPTX

Footprinting, Enumeration, Scanning, Sniffing, Social EngineeringMubashirHussain792093

PPTX

Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi

PPTX

Random Forest Decision Tree.pptxRamakrishna Reddy Bijjam

PDF

Working mechanism of a random forest classifier and its performance evaluationPuspanjali Mohapatra

PDF

Random Forest Algorithm: A Machine Learning ALgorithm.pdfSuhaanaKhan1

PPTX

Random Forest and KNN is funZhen Li

PDF

Machine Learning Unit-5 Decesion Trees & Random Forest.pdfAdityaSoraut

PPTX

Ml7 baggingankit_ppt

PPTX

decision_trees_forests_2.pptxstalkthemhaha

PDF

lec17_ref.pdfvishal choudhary

PPTX

Introduction to RandomForests 2004Salford Systems

RandomForests_Sayed-tree based model.pptxRadhaKilari

artifial intelligence notes of islamia universityghulammuhammad83506

Random Forest / Bootstrap AggregationRupak Roy

Random ForestAbdullah al Mamun

CS109a_Lecture16_Bagging_RF_Boosting.pptxAbhishekSingh43430

Random forest sgv_ai_talk_oct_2_2018digitalzombie

13 random forestVishal Dutt

Decision Trees- Random Forests.pdfTahaYasmin

Decision_Trees_Random_Forests for use in machine learning and computer scienc...nicolusstephen6

Footprinting, Enumeration, Scanning, Sniffing, Social EngineeringMubashirHussain792093

Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi

Random Forest Decision Tree.pptxRamakrishna Reddy Bijjam

Working mechanism of a random forest classifier and its performance evaluationPuspanjali Mohapatra

Random Forest Algorithm: A Machine Learning ALgorithm.pdfSuhaanaKhan1

Random Forest and KNN is funZhen Li

Machine Learning Unit-5 Decesion Trees & Random Forest.pdfAdityaSoraut

Ml7 baggingankit_ppt

decision_trees_forests_2.pptxstalkthemhaha

lec17_ref.pdfvishal choudhary

Introduction to RandomForests 2004Salford Systems

More from premkumar1891 (7)

PDF

Random Forests for AIML for 3rd year ECE department CSEpremkumar1891

PDF

decision tree and random forest in AIML for CSEpremkumar1891

PDF

Microprocessor and Microcontroller Notespremkumar1891

PDF

AIML notes students study material for CSE IT ECE and other departmentspremkumar1891

PPTX

BATCH NO FIRST REVIEW Smart trolley-1.pptxpremkumar1891

PPTX

TNWise Hackathon PPT industry 4.0 PMC TECHpremkumar1891

PPTX

Robotics lab module 3 ppt 4premkumar1891

Random Forests for AIML for 3rd year ECE department CSEpremkumar1891

decision tree and random forest in AIML for CSEpremkumar1891

Microprocessor and Microcontroller Notespremkumar1891

AIML notes students study material for CSE IT ECE and other departmentspremkumar1891

BATCH NO FIRST REVIEW Smart trolley-1.pptxpremkumar1891

TNWise Hackathon PPT industry 4.0 PMC TECHpremkumar1891

Robotics lab module 3 ppt 4premkumar1891

Recently uploaded (20)

PPTX

Solar Thermal Energy System Seminar.pptxGpc Purapuza

PPTX

Heart Bleed Bug - A case study (Course: Cryptography and Network Security)Adri Jovin

PPTX

Presentation 2.pptx AI-powered home security systems Secure-by-design IoT fr...SoundaryaBC2

PPTX

Shinkawa Proposal to meet Vibration API670.pptxAchmadBashori2

PPTX

Worm gear strength and wear calculation as per standard VB Bhandari Databook.shahveer210504

PDF

Electrical Engineer operation Supervisorssaruntatapower143

PPTX

The Role of Information Technology in Environmental Protectio....pptxnallamillisriram

PPTX

Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...shahveer210504

PDF

Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdfAdelLoin

PPTX

Damage of stability of a ship and how its change .pptxehamadulhaque

DOC

MRRS Strength and Durability of ConcreteCivilMythili

PDF

Introduction to Productivity and Qualityমোঃ ফুরকান উদ্দিন জুয়েল

PDF

Zilliz Cloud Demo for performance and scaleZilliz

PDF

International Journal of Information Technology Convergence and services (IJI...ijitcsjournal4

DOCX

8th International Conference on Electrical Engineering (ELEN 2025)elelijjournal653

PDF

AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...vidyalalltv1

PPTX

Element 11. ELECTRICITY safety and hazardsmerrandomohandas

PPTX

Evaluation and thermal analysis of shell and tube heat exchanger as per requi...shahveer210504

PDF

PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...MasumKhan59

PPTX

Product Development & DevelopmentLecture02.pptxzeeshanwazir2

Solar Thermal Energy System Seminar.pptxGpc Purapuza

Heart Bleed Bug - A case study (Course: Cryptography and Network Security)Adri Jovin

Presentation 2.pptx AI-powered home security systems Secure-by-design IoT fr...SoundaryaBC2

Shinkawa Proposal to meet Vibration API670.pptxAchmadBashori2

Worm gear strength and wear calculation as per standard VB Bhandari Databook.shahveer210504

Electrical Engineer operation Supervisorssaruntatapower143

The Role of Information Technology in Environmental Protectio....pptxnallamillisriram

Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...shahveer210504

Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdfAdelLoin

Damage of stability of a ship and how its change .pptxehamadulhaque

MRRS Strength and Durability of ConcreteCivilMythili

Introduction to Productivity and Qualityমোঃ ফুরকান উদ্দিন জুয়েল

Zilliz Cloud Demo for performance and scaleZilliz

International Journal of Information Technology Convergence and services (IJI...ijitcsjournal4

8th International Conference on Electrical Engineering (ELEN 2025)elelijjournal653

AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...vidyalalltv1

Element 11. ELECTRICITY safety and hazardsmerrandomohandas

Evaluation and thermal analysis of shell and tube heat exchanger as per requi...shahveer210504

PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...MasumKhan59

Product Development & DevelopmentLecture02.pptxzeeshanwazir2

Random Forests for Machine Learning ML Decision Tree

1. Decision Trees and Random Forests Machine Learning 2021 UML book chapter 18 Slides P. Zanuttigh

2. Decision Trees

3. Class 0 Class 1 Class 0 Example: Decision Tree

4. Grow a Decision Tree Consider a binary classification setting and assume to have a gain (performances) measure: Start ❑ A single leaf assigning the most common of the two labels (i.e., the one of the majority of the samples) At each iteration ❑ Analyze the effect of splitting a leaf ❑ Among all possible splits select the one leading to a larger gain and split that leaf (or choose not to split)

5. • Iterative Dichotomizer 3 (ID3) Find which split (i.e. splitting over which feature) leads to the maximum gain Split on xj and recursively call the algorithm considering the remaining features* * Split on a feature only once: they are binary No more features to use xj: selected feature for the split If real valued features: need to find threshold, can split on same feature with different thresholds

6. Gain Measure

7. Example

8. Pruning ❑ Issue of ID3: The tree is typically very large with high risk of overfitting ❑ Prune the tree to reduce its size without affecting too much the performances

9. Random Forests (RF) ❑ Introduced by Leo Breiman in 2001 ❑ Instead of using a single large tree construct an ensemble of simpler trees ❑ A Random Forest (RF) is a classifier consisting of a collection of decision trees ❑ The prediction is obtained by a majority voting over the prediction of the single trees

10. Random Forest: Example

11. Random Sampling with Replacement Idea: randomly sample from a training dataset with replacement ❑ Assume a training set S of size m: we can build new training sets by taking at random m samples from S with replacement (i.e., the same sample can be selected multiple times) For example, if our training data is [1, 2, 3, 4, 5, 6] then we might sample sets like [1, 2, 2, 3, 6, 6], [1, 2, 4, 4, 5, 6], [1 1 1 1 1 1], etc….. i.e., all lists have a length of six but some values can be repeated in the random selection ❑ Notice that we are not subsetting the training data into smaller chunks

12. Bootstrap Aggregation (Bagging) Bagging (Bootstrap Aggregation): ❑ Decisions trees are very sensitive to the data they are trained on: small changes to the training set can result in significantly different tree structures ❑ Random forest takes advantage of this by allowing each individual tree to randomly sample with replacement from the dataset, resulting in different training sets producing different trees ❑ This process is known as bagging

13. Bagging: Example

14. Randomization: Feature Randomnsess ❑ In a normal decision tree, when it is time to split a node, we consider every possible feature and pick the one that produces the largest gain ❑ In contrast, each tree in a random forest can pick only from a random subset of features ( Feature Randomness ) ❑ I.e., node splitting in a random forest model is based on a random subset of features for each tree. ❑ This forces even more variation amongst the trees in the model and ultimately results in lower correlation across trees and more diversification