20211229120253D6323_PERT 06_ Ensemble Learning.pptx

Course : Basic Artificial Intelligence
Effective Period : 2022
Ensemble Learning
Session 11-12

These slides have been adapted from:
Artasanchez, A., & Joshi, P. (2020). Artificial
Intelligence with Python. Packt Publishing Ltd.
ISBN: 978-1-839-21607-7
Chapter 6
Acknowledgement

Learning Objectives
LO: Students are able to acknowledge the fundamental
concept of modern age Artificial Intelligence

Sub-Topics
● Decision trees and decision trees classifiers
● Learning models with ensemble learning
● Random forests and extremely random forests
● Confidence measure estimation of predictions
● Dealing with class imbalance
● Finding optimal training parameters using grid search
● Computing relative feature importance
● Traffic prediction using the extremely random forests regression

Ensemble Learning
• Ensemble learning involves building multiple models and then combining them
in such a way that it produces better results than what the models could
produce individually. These individual models can be classifiers, regressors, or
other models.
• Ensemble learning is used extensively across multiple fields, including data
classification, predictive modeling, and anomaly detection.
• Why we need ensemble learning?
– instead of relying on one model, we can combine individual method of
models. Doing this minimizes the possibility of making a wrong or
suboptimal task.

Ensemble Learning
• While selecting a model, a commonly used procedure is to choose the one with
the smallest error on the training dataset. The problem with this approach is
that it will not always work. The model might become biased or overfit the
training data. Even when training the model using cross-validation, it can
perform poorly on unknown data.
• A reason why ensemble learning models are effective is because they reduce
the overall risk of making a poor model selection. This enables it to train in a
diverse manner and then perform well on unknown data. When a model is built
using ensemble learning, the individual models need to exhibit some diversity.
This allows them to capture various nuances in the data; hence the overall
model becomes more accurate.

Ensemble Learning
• The diversity is achieved by using different training parameters for each
individual model. This allows individual models to generate different decision
boundaries for training data. This means that each model will use different rules
to make an inference, which is a powerful way of validating the result. If there is
agreement among the models, this increases the confidence in the predictio
• A special type of an ensemble learning is when you combine decision trees into
an ensemble. These models are usually known as random forests and
extremely random forests, which we'll describe in the coming sections.

Decision Tree
A decision tree is a way to partition a dataset into distinct branches. The branches
or partitions are then traversed to make simple decisions. Decision trees are
produced by training algorithms, which identify how to split the data in an optimal
way.
The decision process starts at the root node at the top of the tree. Each node in the
tree is a decision rule. Algorithms construct these rules based on the relationship
between the input data and the target labels in the training data. The values in the
input data are utilized to estimate the value of the output.
Now that we understand the basic concept behind decision trees, the next concept
to understand is how the trees are automatically constructed. We need algorithms
that can construct the optimal tree based on the data. In order to understand it, we
need to understand the concept of entropy. In this context, entropy refers to
information entropy and not thermodynamic entropy. Information entropy is basically
a measure of uncertainty.

Decision Tree
One of the main goals of a decision tree is to reduce uncertainty as we move from
the root node towards the leaf nodes. When we see an unknown data point, we are
completely uncertain about the output. By the time we reach the leaf node, we are
certain about the output. This means that the decision tree needs to be constructed
in a way that will reduce the uncertainty at each level. This implies that we need to
reduce the entropy as we progress down the tree.

Random Forests and Extremely Random Forest
• A random forest is an instance of ensemble learning where individual models
are constructed using decision trees. This ensemble of decision trees is then
used to predict the output value. We use a random subset of training data to
construct each decision tree.
• One of the advantages of random forests is that they do not overfit. Overfitting
is more likely with nonparametric and nonlinear models that have more
flexibility when learning a target function. By constructing a diverse set of
decision trees using various random subsets, we ensure that the model does
not overfit the training data. During the construction of the tree, the nodes are
split successively, and the best thresholds are chosen to reduce the entropy at
each level. This split doesn't consider all the features in the input dataset.
Instead, it chooses the best split among the random subset of the features that
is under consideration. Adding this randomness tends to increase the bias of
the random forest, but the variance decreases because of averaging. Hence,
we end up with a robust model.

Random Forests and Extremely Random Forest
• Extremely random forests take randomness to the next level. Along with
taking a random subset of features, the thresholds are chosen randomly as
well. These randomly generated thresholds are chosen as the splitting rules,
which reduce the variance of the model even further. Hence, the decision
boundaries obtained using extremely random forests tend to be smoother than
the ones obtained using random forests. Some implementations of extremely
random forest algorithms also enable better parallelization and can scale better.

Data Imbalance
A classifier is only as good as the data that is used for training. A common problem
faced in the real world is issues with data quality. For a classifier to perform well, it
needs to see an equal number of points for each class. But when data is collected in
the real world, it's not always possible to ensure that each class has the exact same
number of data points. If one class has 10 times the number of data points than
another class, then the classifier tends to get biased towards the more numerous
class. Hence, we need to make sure that we account for this imbalance
algorithmically.

Finding Optimal Training Parameters
with Grid Search
• When working with classifiers, it is not always possible to know what the best
parameters are to use. It is not efficient to use brute force by checking for all
possible combinations manually.
• Grid search allows us to specify a range of values and the classifier will
automatically run various configurations to figure out the best combination of
parameters.

Computing Relative Feature Importance
• When working with a dataset that contains N-dimensional data points, it must
be understood that not all features are equally important. Some are more
discriminative than others. If we have this information, we can use it to reduce
the dimensionality. This is useful in reducing the complexity and increasing the
speed of the algorithm. Sometimes, a few features are completely redundant.
Hence, they can be easily removed from the dataset.
• By using the AdaBoost regressor to compute feature importance. AdaBoost,
short for Adaptive Boosting, is an algorithm that's frequently used in conjunction
with other machine learning algorithms to improve their performance. In
AdaBoost, the training data points are drawn from a distribution to train the
current classifier. This distribution is updated iteratively so that the subsequent
classifiers get to focus on the more difficult data points. The difficult data points
are the ones that are misclassified. This is done by updating the distribution at
each step. This will make the data points that were previously misclassified
more likely to come up in the next sample dataset that's used for training.

References
Artasanchez, A., & Joshi, P. (2020). Artificial Intelligence with
Python. Packt Publishing Ltd. ISBN: 978-1-839-21607-7

20211229120253D6323_PERT 06_ Ensemble Learning.pptx

More Related Content

What's hot (20)

Similar to 20211229120253D6323_PERT 06_ Ensemble Learning.pptx (20)

Recently uploaded (20)

20211229120253D6323_PERT 06_ Ensemble Learning.pptx