Error analysis and variable significance with random forests Ned Horning American Museum of Natural History's  Center for Biodiversity and Conservation [email_address]
Error estimate Provides and unbiased estimate of the error Each tree uses a different bootstrap sample (~1/3 of samples) for testing Use function print() for OOB error estimate Use plot() to view plot of error estimate vs. number of trees Error rate vs. number of trees
Calculate OOB error estimate  Put OOB samples down tree after it is constructed and keep track of results Proportion of times the result is not accurate averaged over all samples is the OOB error estimate For regression “percent variance explained” is also called pseudo R-squared OOB estimate of error rate: 0.1% Confusion matrix:
Variable importance Put oob samples down a tree then for each variable randomly reorder that variable in each of the oob samples and put these down the trees  Two types of error can be calculated: mean decrease in accuracy and mean decrease in node impurity Actual measures depend if it is classification or regression
Variable importance steps In the randomForest() function specify “importance=TRUE” importance() function creates an importance object varImpPlot() function plots variable importance Specify type = 1 for mean decrease in accuracy and 2 for mean decrease in node impurity
Proximity measure Measures how frequent unique pairs of training samples (in and out of bag) end up in the same terminal node Used to fill in missing data and calculating outliers In the randomForest() function specify proximity=TRUE Outliers for classification
Outlier plots Use outlier() function to calculate outlier measures Can plot using the R plot() function Plot shows which samples contain variables that are outliers Outliers for classification

More Related Content

PPT
The Economic Importance of Agriculture for Sustainable Development and Povert...
PPTX
Dairy Value Chain Development In Ethiopia: The Experience of FAO
PPTX
Polycystic ovary syndrome
PDF
Random Forests: The Vanilla of Machine Learning - Anna Quach
PDF
Hivemall dbtechshowcase 20160713 #dbts2016
PDF
Visualization and Machine Learning - for exploratory data ...
PDF
Model Automation in R
PDF
The Economic Importance of Agriculture for Sustainable Development and Povert...
Dairy Value Chain Development In Ethiopia: The Experience of FAO
Polycystic ovary syndrome
Random Forests: The Vanilla of Machine Learning - Anna Quach
Hivemall dbtechshowcase 20160713 #dbts2016
Visualization and Machine Learning - for exploratory data ...
Model Automation in R

Similar to Error analysis randomforest (20)

PDF
R introduction v2
PDF
Machine Learning Feature Selection - Random Forest
PPTX
Decision Tree.pptx
PDF
PPTX
Random Forest Classifier in Machine Learning | Palin Analytics
PDF
Applied machine learning: Insurance
PPTX
CS109a_Lecture16_Bagging_RF_Boosting.pptx
PDF
Random Forest / Bootstrap Aggregation
PDF
Working mechanism of a random forest classifier and its performance evaluation
DOCX
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
PDF
Course Project for Coursera Practical Machine Learning
PDF
Variable selection for classification and regression using R
PDF
Statistical Regression With Python
PPT
RANDOM FORESTS Ensemble technique Introduction
PPTX
Statistics for Data Analysis - ODE - BVP .pptx
DOCX
R Machine Learning packages( generally used)
PDF
Introduction to Boosted Trees by Tianqi Chen
PDF
Boosted tree
PDF
Human_Activity_Recognition_Predictive_Model
R introduction v2
Machine Learning Feature Selection - Random Forest
Decision Tree.pptx
Random Forest Classifier in Machine Learning | Palin Analytics
Applied machine learning: Insurance
CS109a_Lecture16_Bagging_RF_Boosting.pptx
Random Forest / Bootstrap Aggregation
Working mechanism of a random forest classifier and its performance evaluation
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
Course Project for Coursera Practical Machine Learning
Variable selection for classification and regression using R
Statistical Regression With Python
RANDOM FORESTS Ensemble technique Introduction
Statistics for Data Analysis - ODE - BVP .pptx
R Machine Learning packages( generally used)
Introduction to Boosted Trees by Tianqi Chen
Boosted tree
Human_Activity_Recognition_Predictive_Model
Ad

Recently uploaded (20)

PDF
Myanmar Dental Journal, The Journal of the Myanmar Dental Association (2013).pdf
PDF
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 2).pdf
PPTX
Core Concepts of Personalized Learning and Virtual Learning Environments
PDF
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
PDF
IP : I ; Unit I : Preformulation Studies
PPTX
Module on health assessment of CHN. pptx
PDF
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
PDF
International_Financial_Reporting_Standa.pdf
PDF
Literature_Review_methods_ BRACU_MKT426 course material
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PDF
Environmental Education MCQ BD2EE - Share Source.pdf
PDF
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
PDF
semiconductor packaging in vlsi design fab
PDF
My India Quiz Book_20210205121199924.pdf
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PDF
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PPTX
Climate Change and Its Global Impact.pptx
Myanmar Dental Journal, The Journal of the Myanmar Dental Association (2013).pdf
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 2).pdf
Core Concepts of Personalized Learning and Virtual Learning Environments
Τίμαιος είναι φιλοσοφικός διάλογος του Πλάτωνα
IP : I ; Unit I : Preformulation Studies
Module on health assessment of CHN. pptx
BP 505 T. PHARMACEUTICAL JURISPRUDENCE (UNIT 1).pdf
International_Financial_Reporting_Standa.pdf
Literature_Review_methods_ BRACU_MKT426 course material
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
Environmental Education MCQ BD2EE - Share Source.pdf
LIFE & LIVING TRILOGY- PART (1) WHO ARE WE.pdf
semiconductor packaging in vlsi design fab
My India Quiz Book_20210205121199924.pdf
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
LIFE & LIVING TRILOGY - PART - (2) THE PURPOSE OF LIFE.pdf
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
Share_Module_2_Power_conflict_and_negotiation.pptx
Climate Change and Its Global Impact.pptx
Ad

Error analysis randomforest

  • 1. Error analysis and variable significance with random forests Ned Horning American Museum of Natural History's Center for Biodiversity and Conservation [email_address]
  • 2. Error estimate Provides and unbiased estimate of the error Each tree uses a different bootstrap sample (~1/3 of samples) for testing Use function print() for OOB error estimate Use plot() to view plot of error estimate vs. number of trees Error rate vs. number of trees
  • 3. Calculate OOB error estimate Put OOB samples down tree after it is constructed and keep track of results Proportion of times the result is not accurate averaged over all samples is the OOB error estimate For regression “percent variance explained” is also called pseudo R-squared OOB estimate of error rate: 0.1% Confusion matrix:
  • 4. Variable importance Put oob samples down a tree then for each variable randomly reorder that variable in each of the oob samples and put these down the trees Two types of error can be calculated: mean decrease in accuracy and mean decrease in node impurity Actual measures depend if it is classification or regression
  • 5. Variable importance steps In the randomForest() function specify “importance=TRUE” importance() function creates an importance object varImpPlot() function plots variable importance Specify type = 1 for mean decrease in accuracy and 2 for mean decrease in node impurity
  • 6. Proximity measure Measures how frequent unique pairs of training samples (in and out of bag) end up in the same terminal node Used to fill in missing data and calculating outliers In the randomForest() function specify proximity=TRUE Outliers for classification
  • 7. Outlier plots Use outlier() function to calculate outlier measures Can plot using the R plot() function Plot shows which samples contain variables that are outliers Outliers for classification