SlideShare a Scribd company logo
2
Most read
7
Most read
9
Most read
Random Forest Classifier
Classification Technique
Overview
Random Forest is a supervised learning ensemble algorithm. Ensemble algorithms are those which combine more than
one algorithms of same or different kind for classifying objects. The ‘forest’ that Random Forest Classifier builds, is an
ensemble of Decision Trees, most of the time trained with the ‘bagging’ method. The general idea of the bagging
method is that a combination of learning models increases the overall result.
Random forest classifier creates a set of decision trees from randomly selected subset of training set. It then aggregates
the votes from different decision trees to decide the final class of the test object.
Random Forest adds additional randomness to the model, while growing the trees. Instead of searching for the most
important feature while splitting a node, it searches for the best feature among a random subset of features. This
results in a wide diversity that generally results in a better model.
Explanation
Say, we have 1000 observations in the complete population with 10 variables. Random forest tries to build multiple
CART model with different sample and different initial variables. For instance, it will take a random sample of 100
observation and 5 randomly chosen initial variables to build a CART model. It will repeat the process (say) 10 times and
then make a final prediction on each observation. Final prediction is a function of each prediction. This final prediction
can simply be the mean of each prediction.
Each tree in a forest is grown as follows:
• If the number of cases in the training set is N, sample n cases at random (but with replacement) from the original
data. This sample will be the training set for growing the tree.
• If there are M input variables, a number m < M is specified such that at each node, m variables are selected at
random out of the M and the best split on these m is used to split the node. The value of m is held constant during
the forest growing.
• Each tree is grown to the largest extent possible. There is no pruning.
Forest Error rate depends on two things:
• The correlation between any two trees in the forest. Increasing the correlation increases the forest error rate.
• The strength of each individual tree in the forest. A tree with a low error rate is a strong classifier. Increasing the
strength of the individual trees decreases the forest error rate.
Reducing m reduces both the correlation and the strength. Increasing it increases both. Somewhere in between is an
"optimal" range of m (usually quite wide). Using the OOB error rate (explained in later slides) an optimal value of m can
quickly be found. This is the only adjustable parameter to which random forests is somewhat sensitive.
Features
• It is unexcelled in accuracy among current algorithms.
• It runs efficiently on large data bases.
• It can handle thousands of input variables without variable deletion.
• It gives estimates of what variables are important in the classification.
• It generates an internal unbiased estimate of the generalization error as the forest building progresses.
• It has an effective method for estimating missing data. It maintains accuracy even when a large proportion of the data are
missing.
• It has methods for balancing error in class population unbalanced data sets.
• Generated forests can be saved for future use on other data.
• Prototypes are computed that give information about the relation between the variables and the classification.
• The capabilities of the above can be extended to unlabeled data, leading to unsupervised clustering, data views and outlier
detection.
• It offers an experimental method for detecting variable interactions.
Out-Of-Bag (OOB)
When the training set for the current tree is drawn by sampling with replacement, about one-third of the
observations are left out of the sample.
This OOB (out-of-bag) data is used to get a running unbiased estimate of the classification error as trees are added to
the forest. It is also used to get estimates of variable importance.
Each tree is constructed using a different bootstrap sample from the original data. About one-third of the cases are left
out of the bootstrap sample and not used in the construction of the kth tree.
Out-Of-Bag (OOB) Error Estimate
Each tree is constructed using a different bootstrap sample from the original data. About one-third of the cases are
left out of the bootstrap sample and not used in the construction of the kth tree.
Put each case left out in the construction of the kth tree down the kth tree to get a classification. In this way, a test set
classification is obtained for each case in about one-third of the trees. At the end of the run, take j to be the class that
got most of the votes every time case n was OOB. The proportion of times that j is not equal to the true class of n
averaged over all cases is the OOB error estimate. This has proven to be unbiased in many tests.
Overfitting
Random Forest does not overfit. You can run as many trees as
you want. It is fast.
Summary
Random Forest is a great algorithm to train early in the model development process, to see how it performs and it’s hard
to build a “bad” Random Forest, because of its simplicity. This algorithm is also a great choice, if you need to develop a
model in a short period of time. On top of that, it provides a pretty good indicator of the importance it assigns to your
features.
Random Forests are also very hard to beat in terms of performance. Of course you can probably always find a model that
can perform better, like a neural network, but these usually take much more time in the development. And on top of that,
they can handle a lot of different feature types, like binary, categorical and numerical.
Python’s sklearn Documentation
http://scikit-learn.org/stable/modules/ensemble.html
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

More Related Content

What's hot (20)

PDF
Supervised and Unsupervised Machine Learning
Spotle.ai
 
PPT
2.2 decision tree
Krish_ver2
 
PDF
Machine Learning Algorithm - Decision Trees
Kush Kulshrestha
 
PPT
2.4 rule based classification
Krish_ver2
 
PPTX
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Simplilearn
 
PPTX
Decision Trees
Student
 
PPTX
Decision Tree - C4.5&CART
Xueping Peng
 
PPTX
Classification techniques in data mining
Kamal Acharya
 
PDF
Understanding Bagging and Boosting
Mohit Rajput
 
PPTX
Ensemble methods
Christopher Marker
 
PDF
Understanding random forests
Marc Garcia
 
PPTX
Linear regression with gradient descent
Suraj Parmar
 
PPTX
Classification in data mining
Sulman Ahmed
 
PPTX
Naive bayes
Ashraf Uddin
 
PDF
Decision tree
SEMINARGROOT
 
PDF
Data preprocessing using Machine Learning
Gopal Sakarkar
 
PDF
Bias and variance trade off
VARUN KUMAR
 
PPTX
Random Forest
Abdullah al Mamun
 
PDF
Classification and Clustering
Eng Teong Cheah
 
PPTX
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Simplilearn
 
Supervised and Unsupervised Machine Learning
Spotle.ai
 
2.2 decision tree
Krish_ver2
 
Machine Learning Algorithm - Decision Trees
Kush Kulshrestha
 
2.4 rule based classification
Krish_ver2
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Simplilearn
 
Decision Trees
Student
 
Decision Tree - C4.5&CART
Xueping Peng
 
Classification techniques in data mining
Kamal Acharya
 
Understanding Bagging and Boosting
Mohit Rajput
 
Ensemble methods
Christopher Marker
 
Understanding random forests
Marc Garcia
 
Linear regression with gradient descent
Suraj Parmar
 
Classification in data mining
Sulman Ahmed
 
Naive bayes
Ashraf Uddin
 
Decision tree
SEMINARGROOT
 
Data preprocessing using Machine Learning
Gopal Sakarkar
 
Bias and variance trade off
VARUN KUMAR
 
Random Forest
Abdullah al Mamun
 
Classification and Clustering
Eng Teong Cheah
 
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Simplilearn
 

Similar to Random Forest Classifier in Machine Learning | Palin Analytics (20)

PPTX
artifial intelligence notes of islamia university
ghulammuhammad83506
 
PPTX
CS109a_Lecture16_Bagging_RF_Boosting.pptx
AbhishekSingh43430
 
PPTX
Supervised and Unsupervised Learning .pptx
KerenEvangelineI
 
PPTX
13 random forest
Vishal Dutt
 
PPTX
random forest.pptx
PriyadharshiniG41
 
PPTX
Algoritma Random Forest beserta aplikasi nya
batubao
 
PPTX
Random Forest Decision Tree.pptx
Ramakrishna Reddy Bijjam
 
PPTX
An Introduction to Random Forest and linear regression algorithms
Shouvic Banik0139
 
PPTX
Seminar PPT on Random Forest Tree Algorithm
midnightcity47
 
PPT
RANDOM FORESTS Ensemble technique Introduction
Lalith86
 
PDF
M3R.FINAL
Peter Schindler
 
PDF
Working mechanism of a random forest classifier and its performance evaluation
Puspanjali Mohapatra
 
PDF
Random forest sgv_ai_talk_oct_2_2018
digitalzombie
 
DOCX
Advance KNN classification of brain tumor
Vikas Mahurkar
 
PPTX
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
 
PDF
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
AdityaSoraut
 
PDF
What Is Random Forest_ analytics_ IBM.pdf
Dr Arash Najmaei ( Phd., MBA, BSc)
 
PDF
Introduction to Some Tree based Learning Method
Honglin Yu
 
PPTX
Introduction to RandomForests 2004
Salford Systems
 
PPTX
Random Forest classifier in Machine Learning
Kv Sagar
 
artifial intelligence notes of islamia university
ghulammuhammad83506
 
CS109a_Lecture16_Bagging_RF_Boosting.pptx
AbhishekSingh43430
 
Supervised and Unsupervised Learning .pptx
KerenEvangelineI
 
13 random forest
Vishal Dutt
 
random forest.pptx
PriyadharshiniG41
 
Algoritma Random Forest beserta aplikasi nya
batubao
 
Random Forest Decision Tree.pptx
Ramakrishna Reddy Bijjam
 
An Introduction to Random Forest and linear regression algorithms
Shouvic Banik0139
 
Seminar PPT on Random Forest Tree Algorithm
midnightcity47
 
RANDOM FORESTS Ensemble technique Introduction
Lalith86
 
M3R.FINAL
Peter Schindler
 
Working mechanism of a random forest classifier and its performance evaluation
Puspanjali Mohapatra
 
Random forest sgv_ai_talk_oct_2_2018
digitalzombie
 
Advance KNN classification of brain tumor
Vikas Mahurkar
 
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
AdityaSoraut
 
What Is Random Forest_ analytics_ IBM.pdf
Dr Arash Najmaei ( Phd., MBA, BSc)
 
Introduction to Some Tree based Learning Method
Honglin Yu
 
Introduction to RandomForests 2004
Salford Systems
 
Random Forest classifier in Machine Learning
Kv Sagar
 
Ad

Recently uploaded (20)

PDF
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
PPTX
How to Send Email From Odoo 18 Website - Odoo Slides
Celine George
 
PDF
QNL June Edition hosted by Pragya the official Quiz Club of the University of...
Pragya - UEM Kolkata Quiz Club
 
PPTX
EDUCATIONAL MEDIA/ TEACHING AUDIO VISUAL AIDS
Sonali Gupta
 
PPTX
Identifying elements in the story. Arrange the events in the story
geraldineamahido2
 
PPTX
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
PDF
Horarios de distribución de agua en julio
pegazohn1978
 
PPTX
How to Set Up Tags in Odoo 18 - Odoo Slides
Celine George
 
PPTX
Introduction to Biochemistry & Cellular Foundations.pptx
marvinnbustamante1
 
PDF
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
PPTX
Controller Request and Response in Odoo18
Celine George
 
PPTX
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
PDF
The Constitution Review Committee (CRC) has released an updated schedule for ...
nservice241
 
PDF
AI-Powered-Visual-Storytelling-for-Nonprofits.pdf
TechSoup
 
PPTX
Nitrogen rule, ring rule, mc lafferty.pptx
nbisen2001
 
PPTX
Cultivation practice of Litchi in Nepal.pptx
UmeshTimilsina1
 
PDF
Governor Josh Stein letter to NC delegation of U.S. House
Mebane Rash
 
PPTX
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
PDF
Week 2 - Irish Natural Heritage Powerpoint.pdf
swainealan
 
PDF
Exploring the Different Types of Experimental Research
Thelma Villaflores
 
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
How to Send Email From Odoo 18 Website - Odoo Slides
Celine George
 
QNL June Edition hosted by Pragya the official Quiz Club of the University of...
Pragya - UEM Kolkata Quiz Club
 
EDUCATIONAL MEDIA/ TEACHING AUDIO VISUAL AIDS
Sonali Gupta
 
Identifying elements in the story. Arrange the events in the story
geraldineamahido2
 
CATEGORIES OF NURSING PERSONNEL: HOSPITAL & COLLEGE
PRADEEP ABOTHU
 
Horarios de distribución de agua en julio
pegazohn1978
 
How to Set Up Tags in Odoo 18 - Odoo Slides
Celine George
 
Introduction to Biochemistry & Cellular Foundations.pptx
marvinnbustamante1
 
The History of Phone Numbers in Stoke Newington by Billy Thomas
History of Stoke Newington
 
Controller Request and Response in Odoo18
Celine George
 
grade 5 lesson matatag ENGLISH 5_Q1_PPT_WEEK4.pptx
SireQuinn
 
The Constitution Review Committee (CRC) has released an updated schedule for ...
nservice241
 
AI-Powered-Visual-Storytelling-for-Nonprofits.pdf
TechSoup
 
Nitrogen rule, ring rule, mc lafferty.pptx
nbisen2001
 
Cultivation practice of Litchi in Nepal.pptx
UmeshTimilsina1
 
Governor Josh Stein letter to NC delegation of U.S. House
Mebane Rash
 
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
Week 2 - Irish Natural Heritage Powerpoint.pdf
swainealan
 
Exploring the Different Types of Experimental Research
Thelma Villaflores
 
Ad

Random Forest Classifier in Machine Learning | Palin Analytics

  • 2. Overview Random Forest is a supervised learning ensemble algorithm. Ensemble algorithms are those which combine more than one algorithms of same or different kind for classifying objects. The ‘forest’ that Random Forest Classifier builds, is an ensemble of Decision Trees, most of the time trained with the ‘bagging’ method. The general idea of the bagging method is that a combination of learning models increases the overall result. Random forest classifier creates a set of decision trees from randomly selected subset of training set. It then aggregates the votes from different decision trees to decide the final class of the test object. Random Forest adds additional randomness to the model, while growing the trees. Instead of searching for the most important feature while splitting a node, it searches for the best feature among a random subset of features. This results in a wide diversity that generally results in a better model.
  • 3. Explanation Say, we have 1000 observations in the complete population with 10 variables. Random forest tries to build multiple CART model with different sample and different initial variables. For instance, it will take a random sample of 100 observation and 5 randomly chosen initial variables to build a CART model. It will repeat the process (say) 10 times and then make a final prediction on each observation. Final prediction is a function of each prediction. This final prediction can simply be the mean of each prediction.
  • 4. Each tree in a forest is grown as follows: • If the number of cases in the training set is N, sample n cases at random (but with replacement) from the original data. This sample will be the training set for growing the tree. • If there are M input variables, a number m < M is specified such that at each node, m variables are selected at random out of the M and the best split on these m is used to split the node. The value of m is held constant during the forest growing. • Each tree is grown to the largest extent possible. There is no pruning.
  • 5. Forest Error rate depends on two things: • The correlation between any two trees in the forest. Increasing the correlation increases the forest error rate. • The strength of each individual tree in the forest. A tree with a low error rate is a strong classifier. Increasing the strength of the individual trees decreases the forest error rate. Reducing m reduces both the correlation and the strength. Increasing it increases both. Somewhere in between is an "optimal" range of m (usually quite wide). Using the OOB error rate (explained in later slides) an optimal value of m can quickly be found. This is the only adjustable parameter to which random forests is somewhat sensitive.
  • 6. Features • It is unexcelled in accuracy among current algorithms. • It runs efficiently on large data bases. • It can handle thousands of input variables without variable deletion. • It gives estimates of what variables are important in the classification. • It generates an internal unbiased estimate of the generalization error as the forest building progresses. • It has an effective method for estimating missing data. It maintains accuracy even when a large proportion of the data are missing. • It has methods for balancing error in class population unbalanced data sets. • Generated forests can be saved for future use on other data. • Prototypes are computed that give information about the relation between the variables and the classification. • The capabilities of the above can be extended to unlabeled data, leading to unsupervised clustering, data views and outlier detection. • It offers an experimental method for detecting variable interactions.
  • 7. Out-Of-Bag (OOB) When the training set for the current tree is drawn by sampling with replacement, about one-third of the observations are left out of the sample. This OOB (out-of-bag) data is used to get a running unbiased estimate of the classification error as trees are added to the forest. It is also used to get estimates of variable importance. Each tree is constructed using a different bootstrap sample from the original data. About one-third of the cases are left out of the bootstrap sample and not used in the construction of the kth tree.
  • 8. Out-Of-Bag (OOB) Error Estimate Each tree is constructed using a different bootstrap sample from the original data. About one-third of the cases are left out of the bootstrap sample and not used in the construction of the kth tree. Put each case left out in the construction of the kth tree down the kth tree to get a classification. In this way, a test set classification is obtained for each case in about one-third of the trees. At the end of the run, take j to be the class that got most of the votes every time case n was OOB. The proportion of times that j is not equal to the true class of n averaged over all cases is the OOB error estimate. This has proven to be unbiased in many tests.
  • 9. Overfitting Random Forest does not overfit. You can run as many trees as you want. It is fast.
  • 10. Summary Random Forest is a great algorithm to train early in the model development process, to see how it performs and it’s hard to build a “bad” Random Forest, because of its simplicity. This algorithm is also a great choice, if you need to develop a model in a short period of time. On top of that, it provides a pretty good indicator of the importance it assigns to your features. Random Forests are also very hard to beat in terms of performance. Of course you can probably always find a model that can perform better, like a neural network, but these usually take much more time in the development. And on top of that, they can handle a lot of different feature types, like binary, categorical and numerical.