Course : Basic Artificial Intelligence
Effective Period : 2022
Ensemble Learning
Session 11-12
These slides have been adapted from:
Artasanchez, A., & Joshi, P. (2020). Artificial
Intelligence with Python. Packt Publishing Ltd.
ISBN: 978-1-839-21607-7
Chapter 6
Acknowledgement
Learning Objectives
LO: Students are able to acknowledge the fundamental
concept of modern age Artificial Intelligence
Sub-Topics
● Decision trees and decision trees classifiers
● Learning models with ensemble learning
● Random forests and extremely random forests
● Confidence measure estimation of predictions
● Dealing with class imbalance
● Finding optimal training parameters using grid search
● Computing relative feature importance
● Traffic prediction using the extremely random forests regression
Ensemble Learning
• Ensemble learning involves building multiple models and then combining them
in such a way that it produces better results than what the models could
produce individually. These individual models can be classifiers, regressors, or
other models.
• Ensemble learning is used extensively across multiple fields, including data
classification, predictive modeling, and anomaly detection.
• Why we need ensemble learning?
– instead of relying on one model, we can combine individual method of
models. Doing this minimizes the possibility of making a wrong or
suboptimal task.
Ensemble Learning
• While selecting a model, a commonly used procedure is to choose the one with
the smallest error on the training dataset. The problem with this approach is
that it will not always work. The model might become biased or overfit the
training data. Even when training the model using cross-validation, it can
perform poorly on unknown data.
• A reason why ensemble learning models are effective is because they reduce
the overall risk of making a poor model selection. This enables it to train in a
diverse manner and then perform well on unknown data. When a model is built
using ensemble learning, the individual models need to exhibit some diversity.
This allows them to capture various nuances in the data; hence the overall
model becomes more accurate.
Ensemble Learning
• The diversity is achieved by using different training parameters for each
individual model. This allows individual models to generate different decision
boundaries for training data. This means that each model will use different rules
to make an inference, which is a powerful way of validating the result. If there is
agreement among the models, this increases the confidence in the predictio
• A special type of an ensemble learning is when you combine decision trees into
an ensemble. These models are usually known as random forests and
extremely random forests, which we'll describe in the coming sections.
Decision Tree
A decision tree is a way to partition a dataset into distinct branches. The branches
or partitions are then traversed to make simple decisions. Decision trees are
produced by training algorithms, which identify how to split the data in an optimal
way.
The decision process starts at the root node at the top of the tree. Each node in the
tree is a decision rule. Algorithms construct these rules based on the relationship
between the input data and the target labels in the training data. The values in the
input data are utilized to estimate the value of the output.
Now that we understand the basic concept behind decision trees, the next concept
to understand is how the trees are automatically constructed. We need algorithms
that can construct the optimal tree based on the data. In order to understand it, we
need to understand the concept of entropy. In this context, entropy refers to
information entropy and not thermodynamic entropy. Information entropy is basically
a measure of uncertainty.
Decision Tree
One of the main goals of a decision tree is to reduce uncertainty as we move from
the root node towards the leaf nodes. When we see an unknown data point, we are
completely uncertain about the output. By the time we reach the leaf node, we are
certain about the output. This means that the decision tree needs to be constructed
in a way that will reduce the uncertainty at each level. This implies that we need to
reduce the entropy as we progress down the tree.
Random Forests and Extremely Random Forest
• A random forest is an instance of ensemble learning where individual models
are constructed using decision trees. This ensemble of decision trees is then
used to predict the output value. We use a random subset of training data to
construct each decision tree.
• One of the advantages of random forests is that they do not overfit. Overfitting
is more likely with nonparametric and nonlinear models that have more
flexibility when learning a target function. By constructing a diverse set of
decision trees using various random subsets, we ensure that the model does
not overfit the training data. During the construction of the tree, the nodes are
split successively, and the best thresholds are chosen to reduce the entropy at
each level. This split doesn't consider all the features in the input dataset.
Instead, it chooses the best split among the random subset of the features that
is under consideration. Adding this randomness tends to increase the bias of
the random forest, but the variance decreases because of averaging. Hence,
we end up with a robust model.
Random Forests and Extremely Random Forest
• Extremely random forests take randomness to the next level. Along with
taking a random subset of features, the thresholds are chosen randomly as
well. These randomly generated thresholds are chosen as the splitting rules,
which reduce the variance of the model even further. Hence, the decision
boundaries obtained using extremely random forests tend to be smoother than
the ones obtained using random forests. Some implementations of extremely
random forest algorithms also enable better parallelization and can scale better.
Data Imbalance
A classifier is only as good as the data that is used for training. A common problem
faced in the real world is issues with data quality. For a classifier to perform well, it
needs to see an equal number of points for each class. But when data is collected in
the real world, it's not always possible to ensure that each class has the exact same
number of data points. If one class has 10 times the number of data points than
another class, then the classifier tends to get biased towards the more numerous
class. Hence, we need to make sure that we account for this imbalance
algorithmically.
Finding Optimal Training Parameters
with Grid Search
• When working with classifiers, it is not always possible to know what the best
parameters are to use. It is not efficient to use brute force by checking for all
possible combinations manually.
• Grid search allows us to specify a range of values and the classifier will
automatically run various configurations to figure out the best combination of
parameters.
Computing Relative Feature Importance
• When working with a dataset that contains N-dimensional data points, it must
be understood that not all features are equally important. Some are more
discriminative than others. If we have this information, we can use it to reduce
the dimensionality. This is useful in reducing the complexity and increasing the
speed of the algorithm. Sometimes, a few features are completely redundant.
Hence, they can be easily removed from the dataset.
• By using the AdaBoost regressor to compute feature importance. AdaBoost,
short for Adaptive Boosting, is an algorithm that's frequently used in conjunction
with other machine learning algorithms to improve their performance. In
AdaBoost, the training data points are drawn from a distribution to train the
current classifier. This distribution is updated iteratively so that the subsequent
classifiers get to focus on the more difficult data points. The difficult data points
are the ones that are misclassified. This is done by updating the distribution at
each step. This will make the data points that were previously misclassified
more likely to come up in the next sample dataset that's used for training.
References
Artasanchez, A., & Joshi, P. (2020). Artificial Intelligence with
Python. Packt Publishing Ltd. ISBN: 978-1-839-21607-7

More Related Content

PPTX
Object-Oriented Design Fundamentals.pptx
PDF
Choosing a Machine Learning technique to solve your need
PPTX
What is Machine Learning
PPTX
Supervised learning and Unsupervised learning
PPTX
Mis End Term Exam Theory Concepts
PPT
2.2 decision tree
PDF
Machine Learning - Deep Learning
PDF
Introduction to machine learning
Object-Oriented Design Fundamentals.pptx
Choosing a Machine Learning technique to solve your need
What is Machine Learning
Supervised learning and Unsupervised learning
Mis End Term Exam Theory Concepts
2.2 decision tree
Machine Learning - Deep Learning
Introduction to machine learning

What's hot (20)

PPTX
Machine Learning
PPT
Machine Learning
PPTX
Machine Learning
PPTX
Supervised learning
PPT
Supervised and unsupervised learning
PPTX
04 Classification in Data Mining
PDF
Identification of Relevant Sections in Web Pages Using a Machine Learning App...
PPTX
Terminology Machine Learning
PPTX
Tree pruning
PPTX
03 Data Mining Techniques
PPTX
PPTX
Presentation on supervised learning
PDF
Machine learning it is time...
PPTX
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
PDF
Machine Learning Interview Questions and Answers
PPTX
Machine learning for Data Science
PPTX
Classification
PPT
15857 cse422 unsupervised-learning
PPT
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Machine Learning
Machine Learning
Machine Learning
Supervised learning
Supervised and unsupervised learning
04 Classification in Data Mining
Identification of Relevant Sections in Web Pages Using a Machine Learning App...
Terminology Machine Learning
Tree pruning
03 Data Mining Techniques
Presentation on supervised learning
Machine learning it is time...
Machine Learning Unit 2 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Interview Questions and Answers
Machine learning for Data Science
Classification
15857 cse422 unsupervised-learning
Chapter - 6 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Ad

Similar to 20211229120253D6323_PERT 06_ Ensemble Learning.pptx (20)

PPTX
An Introduction to Random Forest and linear regression algorithms
PPTX
Random Forest
PPTX
Unit 2-ML.pptx
PPTX
Random Forest Classifier in Machine Learning | Palin Analytics
PDF
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
PPTX
machine learning navies bayes therom and how it is soved.pptx
PPTX
Machine learning session6(decision trees random forrest)
PPTX
Gradient Boosted trees
PDF
dm1.pdf
PPTX
Decision Tree.pptx
PDF
Data Science - Part V - Decision Trees & Random Forests
PDF
Machine Learning Algorithm - Decision Trees
PPTX
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
PPTX
random forest.pptx
PDF
Introduction to Random Forest
PPTX
Introduction to RandomForests 2004
PPTX
Case Study Presentation on Random Variables in machine learning.pptx
PPTX
Ensemble learning Techniques
PPTX
Ensemble Models in machine learning.pptx
PPTX
AIML UNIT 4.pptx. IT contains syllabus and full subject
An Introduction to Random Forest and linear regression algorithms
Random Forest
Unit 2-ML.pptx
Random Forest Classifier in Machine Learning | Palin Analytics
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
machine learning navies bayes therom and how it is soved.pptx
Machine learning session6(decision trees random forrest)
Gradient Boosted trees
dm1.pdf
Decision Tree.pptx
Data Science - Part V - Decision Trees & Random Forests
Machine Learning Algorithm - Decision Trees
MACHINE LEARNING INTRODUCTION DIFFERENCE BETWEEN SUOERVISED , UNSUPERVISED AN...
random forest.pptx
Introduction to Random Forest
Introduction to RandomForests 2004
Case Study Presentation on Random Variables in machine learning.pptx
Ensemble learning Techniques
Ensemble Models in machine learning.pptx
AIML UNIT 4.pptx. IT contains syllabus and full subject
Ad

Recently uploaded (20)

PDF
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
PPTX
Microsoft User Copilot Training Slide Deck
PDF
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
PDF
NewMind AI Weekly Chronicles – August ’25 Week IV
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PDF
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
PDF
SaaS reusability assessment using machine learning techniques
PDF
Co-training pseudo-labeling for text classification with support vector machi...
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
PPTX
MuleSoft-Compete-Deck for midddleware integrations
PDF
Advancing precision in air quality forecasting through machine learning integ...
PDF
LMS bot: enhanced learning management systems for improved student learning e...
PDF
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
PDF
Convolutional neural network based encoder-decoder for efficient real-time ob...
PPTX
Module 1 Introduction to Web Programming .pptx
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
PDF
4 layer Arch & Reference Arch of IoT.pdf
PPTX
Training Program for knowledge in solar cell and solar industry
PDF
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
Transform-Your-Factory-with-AI-Driven-Quality-Engineering.pdf
Microsoft User Copilot Training Slide Deck
Planning-an-Audit-A-How-To-Guide-Checklist-WP.pdf
NewMind AI Weekly Chronicles – August ’25 Week IV
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
SaaS reusability assessment using machine learning techniques
Co-training pseudo-labeling for text classification with support vector machi...
Rapid Prototyping: A lecture on prototyping techniques for interface design
MuleSoft-Compete-Deck for midddleware integrations
Advancing precision in air quality forecasting through machine learning integ...
LMS bot: enhanced learning management systems for improved student learning e...
“The Future of Visual AI: Efficient Multimodal Intelligence,” a Keynote Prese...
Convolutional neural network based encoder-decoder for efficient real-time ob...
Module 1 Introduction to Web Programming .pptx
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
4 layer Arch & Reference Arch of IoT.pdf
Training Program for knowledge in solar cell and solar industry
AI.gov: A Trojan Horse in the Age of Artificial Intelligence
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf

20211229120253D6323_PERT 06_ Ensemble Learning.pptx

  • 1. Course : Basic Artificial Intelligence Effective Period : 2022 Ensemble Learning Session 11-12
  • 2. These slides have been adapted from: Artasanchez, A., & Joshi, P. (2020). Artificial Intelligence with Python. Packt Publishing Ltd. ISBN: 978-1-839-21607-7 Chapter 6 Acknowledgement
  • 3. Learning Objectives LO: Students are able to acknowledge the fundamental concept of modern age Artificial Intelligence
  • 4. Sub-Topics ● Decision trees and decision trees classifiers ● Learning models with ensemble learning ● Random forests and extremely random forests ● Confidence measure estimation of predictions ● Dealing with class imbalance ● Finding optimal training parameters using grid search ● Computing relative feature importance ● Traffic prediction using the extremely random forests regression
  • 5. Ensemble Learning • Ensemble learning involves building multiple models and then combining them in such a way that it produces better results than what the models could produce individually. These individual models can be classifiers, regressors, or other models. • Ensemble learning is used extensively across multiple fields, including data classification, predictive modeling, and anomaly detection. • Why we need ensemble learning? – instead of relying on one model, we can combine individual method of models. Doing this minimizes the possibility of making a wrong or suboptimal task.
  • 6. Ensemble Learning • While selecting a model, a commonly used procedure is to choose the one with the smallest error on the training dataset. The problem with this approach is that it will not always work. The model might become biased or overfit the training data. Even when training the model using cross-validation, it can perform poorly on unknown data. • A reason why ensemble learning models are effective is because they reduce the overall risk of making a poor model selection. This enables it to train in a diverse manner and then perform well on unknown data. When a model is built using ensemble learning, the individual models need to exhibit some diversity. This allows them to capture various nuances in the data; hence the overall model becomes more accurate.
  • 7. Ensemble Learning • The diversity is achieved by using different training parameters for each individual model. This allows individual models to generate different decision boundaries for training data. This means that each model will use different rules to make an inference, which is a powerful way of validating the result. If there is agreement among the models, this increases the confidence in the predictio • A special type of an ensemble learning is when you combine decision trees into an ensemble. These models are usually known as random forests and extremely random forests, which we'll describe in the coming sections.
  • 8. Decision Tree A decision tree is a way to partition a dataset into distinct branches. The branches or partitions are then traversed to make simple decisions. Decision trees are produced by training algorithms, which identify how to split the data in an optimal way. The decision process starts at the root node at the top of the tree. Each node in the tree is a decision rule. Algorithms construct these rules based on the relationship between the input data and the target labels in the training data. The values in the input data are utilized to estimate the value of the output. Now that we understand the basic concept behind decision trees, the next concept to understand is how the trees are automatically constructed. We need algorithms that can construct the optimal tree based on the data. In order to understand it, we need to understand the concept of entropy. In this context, entropy refers to information entropy and not thermodynamic entropy. Information entropy is basically a measure of uncertainty.
  • 9. Decision Tree One of the main goals of a decision tree is to reduce uncertainty as we move from the root node towards the leaf nodes. When we see an unknown data point, we are completely uncertain about the output. By the time we reach the leaf node, we are certain about the output. This means that the decision tree needs to be constructed in a way that will reduce the uncertainty at each level. This implies that we need to reduce the entropy as we progress down the tree.
  • 10. Random Forests and Extremely Random Forest • A random forest is an instance of ensemble learning where individual models are constructed using decision trees. This ensemble of decision trees is then used to predict the output value. We use a random subset of training data to construct each decision tree. • One of the advantages of random forests is that they do not overfit. Overfitting is more likely with nonparametric and nonlinear models that have more flexibility when learning a target function. By constructing a diverse set of decision trees using various random subsets, we ensure that the model does not overfit the training data. During the construction of the tree, the nodes are split successively, and the best thresholds are chosen to reduce the entropy at each level. This split doesn't consider all the features in the input dataset. Instead, it chooses the best split among the random subset of the features that is under consideration. Adding this randomness tends to increase the bias of the random forest, but the variance decreases because of averaging. Hence, we end up with a robust model.
  • 11. Random Forests and Extremely Random Forest • Extremely random forests take randomness to the next level. Along with taking a random subset of features, the thresholds are chosen randomly as well. These randomly generated thresholds are chosen as the splitting rules, which reduce the variance of the model even further. Hence, the decision boundaries obtained using extremely random forests tend to be smoother than the ones obtained using random forests. Some implementations of extremely random forest algorithms also enable better parallelization and can scale better.
  • 12. Data Imbalance A classifier is only as good as the data that is used for training. A common problem faced in the real world is issues with data quality. For a classifier to perform well, it needs to see an equal number of points for each class. But when data is collected in the real world, it's not always possible to ensure that each class has the exact same number of data points. If one class has 10 times the number of data points than another class, then the classifier tends to get biased towards the more numerous class. Hence, we need to make sure that we account for this imbalance algorithmically.
  • 13. Finding Optimal Training Parameters with Grid Search • When working with classifiers, it is not always possible to know what the best parameters are to use. It is not efficient to use brute force by checking for all possible combinations manually. • Grid search allows us to specify a range of values and the classifier will automatically run various configurations to figure out the best combination of parameters.
  • 14. Computing Relative Feature Importance • When working with a dataset that contains N-dimensional data points, it must be understood that not all features are equally important. Some are more discriminative than others. If we have this information, we can use it to reduce the dimensionality. This is useful in reducing the complexity and increasing the speed of the algorithm. Sometimes, a few features are completely redundant. Hence, they can be easily removed from the dataset. • By using the AdaBoost regressor to compute feature importance. AdaBoost, short for Adaptive Boosting, is an algorithm that's frequently used in conjunction with other machine learning algorithms to improve their performance. In AdaBoost, the training data points are drawn from a distribution to train the current classifier. This distribution is updated iteratively so that the subsequent classifiers get to focus on the more difficult data points. The difficult data points are the ones that are misclassified. This is done by updating the distribution at each step. This will make the data points that were previously misclassified more likely to come up in the next sample dataset that's used for training.
  • 15. References Artasanchez, A., & Joshi, P. (2020). Artificial Intelligence with Python. Packt Publishing Ltd. ISBN: 978-1-839-21607-7