2
Most read
6
Most read
10
Most read
Overfitting &
Underfitting
ML HUB
By : Soumit Kar
Overfitting is a modeling error which occurs when a function is too closely fit to a limited set of data points. It is the
result of an overly complex model with an excessive number of training points. A model that is overfitted is inaccurate
because the model has effectively memorized existing data points.
Overfitting = Low bias+High variance
Underfitting is a modeling error which occurs when a function does not fit the data points well enough. It is the
result of a simple model with an insufficient number of training points. A model that is under fitted is inaccurate
because the trend does not reflect the reality of the data.
Underfitting = High bias+Low variance
Bias: The difference Between expected (avg) prediction of the
Model & the actual value .
Variance: How the prediction for a given point vary between
different realisation for the model .
Regression: If the final “Best Fit” line crosses over every single data point by
forming an unnecessarily complex curve, then the model is likely overfitting.
Overfitting with high variance, low bias
Degree of polynomial increase
Appropriate fit
Low bias , Low variance
Overfitting Appropriate fitting
Train
Accuracy
Increases Appropriate
Test
Accuracy
Decreases Appropriate
Classification: If every single class is properly classified on the training set by forming a very complex
decision boundary, then there is a good chance that the model is overfitting.
The green line shows its overfitting .
The black line shows its appropriate fitting .
Overfitting Low bias, high variance
Appropriate fitting Low bias , Low variance
Underfitting
Regression : As shown in the figure below, the data points are laid out in a given pattern, but the model is unable to “Fit”
properly to the given data due to low model complexity.
Classification:As shown in the figure below, the model is trained to classify between the circles and crosses. However, it
is unable to do so properly due to the straight line, which fails to properly classify either of the two classes.
FIX For Overfitting
Cross Validation : It is a technique that is used for the assessment of how the results of statistical analysis generalize to
an
independent data set.
Early Stopping: Its rules provide us with guidance as to how many iterations can be run before the learner begins to
over-fit.
Pruning: Pruning is extensively used while building related models. It simply removes the nodes which add
little predictive power for the problem in hand.
Regularization: It introduces a cost term for bringing in more features with the objective function. Hence it
tries to push the coefficients for many variables to zero and hence reduce cost term.
Remove features: Some algorithms have built-in feature selection. For those that don’t, you can manually
improve their generalizability by removing irrelevant input features. An interesting way to do so is to tell a
story about how each feature fits into the model. This is like the data scientist’s spin on software engineer’s
rubber duck debugging technique, where they debug their code by explaining it, line-by-line, to a rubber
duck.
Train with more data: It won’t work every time, but training with more data can help algorithms detect
the signal better. In the earlier example of modelling height vs. age in children, it’s clear how sampling
more schools will help your model. Of course, that’s not always the case. If we just add more noisy data,
this technique won’t help. That’s why you should always ensure your data is clean and relevant.
Ensembling: Ensembles are machine learning methods for combining predictions from multiple separate
models.
There are a few different methods for ensembling, but the two most common are:
Bagging attempts to reduce the chance of overfitting complex models.
1. It trains a large number of “strong” learners in parallel.
2. A strong learner is a model that’s relatively unconstrained.
3. Bagging then combines all the strong learners together in order to “smooth out” their predictions.
e.g. RandomForest
Boosting attempts to improve the predictive flexibility of simple models.
1. It trains a large number of “weak” learners in sequence.
2. A weak learner is a constrained model (i.e. you could limit the max depth of each decision tree).
3. Each one in the sequence focuses on learning from the mistakes of the one before it.
4. Boosting then combines all the weak learners into a single strong learner.
e.g. XGboost, Gradiant boosting, Adaboost
Handling Underfitting
1. Get more training data.
2. Increase the size or number of parameters in the model.
3. Increase the complexity of the model.
4. Increasing the training time, until cost function is minimised.
Bias Variance Tradeoff
If the algorithm is too simple (hypothesis with linear eq.) then it may be on high bias and low variance
condition and thus is error-prone. If algorithms fit too complex ( hypothesis with high degree eq.) then it
may be on high variance and low bias. In the latter condition, the new entries will not perform well.
Well, there is something between both of these conditions, known as Trade-off or Bias Variance Trade-
off.
To build a good predictive model, you'll need to find a balance between bias and variance that
minimizes the total error.

More Related Content

PPTX
Over fitting underfitting
PPTX
Underfitting and Overfitting in Machine Learning
PDF
Introduction to Machine Learning with SciKit-Learn
PPT
PDF
Reinforcement learning, Q-Learning
PDF
Classification Based Machine Learning Algorithms
PDF
Support Vector Machines ( SVM )
PDF
Deep Feed Forward Neural Networks and Regularization
Over fitting underfitting
Underfitting and Overfitting in Machine Learning
Introduction to Machine Learning with SciKit-Learn
Reinforcement learning, Q-Learning
Classification Based Machine Learning Algorithms
Support Vector Machines ( SVM )
Deep Feed Forward Neural Networks and Regularization

What's hot (20)

PDF
Bias and variance trade off
PPT
Back propagation
PPTX
Introduction to Machine Learning
PPTX
Ensemble learning
PDF
Linear regression
PDF
Decision trees in Machine Learning
PPTX
Machine Learning-Linear regression
PPTX
Supervised and unsupervised learning
PDF
PAC Learning
PPTX
05 Clustering in Data Mining
ODP
Machine Learning with Decision trees
PPTX
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
PPSX
Lasso and ridge regression
PDF
Data preprocessing using Machine Learning
PDF
An introduction to Machine Learning
PPTX
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
PPT
backpropagation in neural networks
PPTX
supervised learning
PDF
Naive Bayes
PPTX
Dimension reduction techniques[Feature Selection]
Bias and variance trade off
Back propagation
Introduction to Machine Learning
Ensemble learning
Linear regression
Decision trees in Machine Learning
Machine Learning-Linear regression
Supervised and unsupervised learning
PAC Learning
05 Clustering in Data Mining
Machine Learning with Decision trees
Supervised and Unsupervised Learning In Machine Learning | Machine Learning T...
Lasso and ridge regression
Data preprocessing using Machine Learning
An introduction to Machine Learning
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
backpropagation in neural networks
supervised learning
Naive Bayes
Dimension reduction techniques[Feature Selection]
Ad

Similar to Overfitting & Underfitting (20)

PPTX
Regularization_BY_MOHAMED_ESSAM.pptx
PPT
notes as .ppt
PPTX
ML2_ML (1) concepts explained in details.pptx
DOCX
dl unit 4.docx for deep learning in b tech
PDF
Dimd_m_004 DL.pdf
PDF
lec3_annotated.pdf ml csci 567 vatsal sharan
PPTX
CST413 KTU S7 CSE Machine Learning Classification Assessment Confusion matrix...
PDF
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
PPTX
machine learning intyerview que 2.pptx
PPTX
achine internet based learning ques.pptx2Presentation2.pptx
PPTX
PPTX
AIML UNIT 4.pptx. IT contains syllabus and full subject
PDF
Machine learning interview questions and answers
PDF
Top 100+ Google Data Science Interview Questions.pdf
PPTX
machine internet based learning ques.pptx
PPTX
machine achine internet based learning ques.pptx int.pptx
PDF
Sample_Subjective_Questions_Answers (1).pdf
PDF
Mastering Customer Segmentation with LLM.pdf
PPTX
Neural Network and deep learning Concept
PPTX
MACHINE LEARNING YEAR DL SECOND PART.pptx
Regularization_BY_MOHAMED_ESSAM.pptx
notes as .ppt
ML2_ML (1) concepts explained in details.pptx
dl unit 4.docx for deep learning in b tech
Dimd_m_004 DL.pdf
lec3_annotated.pdf ml csci 567 vatsal sharan
CST413 KTU S7 CSE Machine Learning Classification Assessment Confusion matrix...
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
machine learning intyerview que 2.pptx
achine internet based learning ques.pptx2Presentation2.pptx
AIML UNIT 4.pptx. IT contains syllabus and full subject
Machine learning interview questions and answers
Top 100+ Google Data Science Interview Questions.pdf
machine internet based learning ques.pptx
machine achine internet based learning ques.pptx int.pptx
Sample_Subjective_Questions_Answers (1).pdf
Mastering Customer Segmentation with LLM.pdf
Neural Network and deep learning Concept
MACHINE LEARNING YEAR DL SECOND PART.pptx
Ad

Recently uploaded (20)

PPTX
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
PPTX
Capstone Presentation a.pptx on data sci
PDF
Nucleic-Acids_-Structure-Typ...-1.pdf 011
PPTX
ISO 9001-2015 quality management system presentation
PDF
Teal Blue Futuristic Metaverse Presentation.pdf
PPTX
Basic Statistical Analysis for experimental data.pptx
PPTX
Transport System for Biology students in the 11th grade
PPTX
9 Bioterrorism.pptxnsbhsjdgdhdvkdbebrkndbd
PDF
PPT nikita containers of the company use
PPTX
DAA UNIT 1 for unit 1 time compixity PPT.pptx
PDF
NU-MEP-Standards معايير تصميم جامعية .pdf
PDF
American Journal of Multidisciplinary Research and Review
PDF
PPT IEPT 2025_Ms. Nurul Presentation 10.pdf
PPTX
Power BI - Microsoft Power BI is an interactive data visualization software p...
PDF
book-34714 (2).pdfhjkkljgfdssawtjiiiiiujj
PDF
General category merit rank list for neet pg
PPTX
1.Introduction to orthodonti hhhgghhcs.pptx
PPTX
research framework and review of related literature chapter 2
PPT
Classification methods in data analytics.ppt
PPT
What is life? We never know the answer exactly
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
Capstone Presentation a.pptx on data sci
Nucleic-Acids_-Structure-Typ...-1.pdf 011
ISO 9001-2015 quality management system presentation
Teal Blue Futuristic Metaverse Presentation.pdf
Basic Statistical Analysis for experimental data.pptx
Transport System for Biology students in the 11th grade
9 Bioterrorism.pptxnsbhsjdgdhdvkdbebrkndbd
PPT nikita containers of the company use
DAA UNIT 1 for unit 1 time compixity PPT.pptx
NU-MEP-Standards معايير تصميم جامعية .pdf
American Journal of Multidisciplinary Research and Review
PPT IEPT 2025_Ms. Nurul Presentation 10.pdf
Power BI - Microsoft Power BI is an interactive data visualization software p...
book-34714 (2).pdfhjkkljgfdssawtjiiiiiujj
General category merit rank list for neet pg
1.Introduction to orthodonti hhhgghhcs.pptx
research framework and review of related literature chapter 2
Classification methods in data analytics.ppt
What is life? We never know the answer exactly

Overfitting & Underfitting

  • 2. Overfitting is a modeling error which occurs when a function is too closely fit to a limited set of data points. It is the result of an overly complex model with an excessive number of training points. A model that is overfitted is inaccurate because the model has effectively memorized existing data points. Overfitting = Low bias+High variance Underfitting is a modeling error which occurs when a function does not fit the data points well enough. It is the result of a simple model with an insufficient number of training points. A model that is under fitted is inaccurate because the trend does not reflect the reality of the data. Underfitting = High bias+Low variance Bias: The difference Between expected (avg) prediction of the Model & the actual value . Variance: How the prediction for a given point vary between different realisation for the model .
  • 3. Regression: If the final “Best Fit” line crosses over every single data point by forming an unnecessarily complex curve, then the model is likely overfitting. Overfitting with high variance, low bias Degree of polynomial increase Appropriate fit Low bias , Low variance Overfitting Appropriate fitting Train Accuracy Increases Appropriate Test Accuracy Decreases Appropriate
  • 4. Classification: If every single class is properly classified on the training set by forming a very complex decision boundary, then there is a good chance that the model is overfitting. The green line shows its overfitting . The black line shows its appropriate fitting . Overfitting Low bias, high variance Appropriate fitting Low bias , Low variance
  • 5. Underfitting Regression : As shown in the figure below, the data points are laid out in a given pattern, but the model is unable to “Fit” properly to the given data due to low model complexity. Classification:As shown in the figure below, the model is trained to classify between the circles and crosses. However, it is unable to do so properly due to the straight line, which fails to properly classify either of the two classes.
  • 6. FIX For Overfitting Cross Validation : It is a technique that is used for the assessment of how the results of statistical analysis generalize to an independent data set. Early Stopping: Its rules provide us with guidance as to how many iterations can be run before the learner begins to over-fit.
  • 7. Pruning: Pruning is extensively used while building related models. It simply removes the nodes which add little predictive power for the problem in hand. Regularization: It introduces a cost term for bringing in more features with the objective function. Hence it tries to push the coefficients for many variables to zero and hence reduce cost term. Remove features: Some algorithms have built-in feature selection. For those that don’t, you can manually improve their generalizability by removing irrelevant input features. An interesting way to do so is to tell a story about how each feature fits into the model. This is like the data scientist’s spin on software engineer’s rubber duck debugging technique, where they debug their code by explaining it, line-by-line, to a rubber duck. Train with more data: It won’t work every time, but training with more data can help algorithms detect the signal better. In the earlier example of modelling height vs. age in children, it’s clear how sampling more schools will help your model. Of course, that’s not always the case. If we just add more noisy data, this technique won’t help. That’s why you should always ensure your data is clean and relevant. Ensembling: Ensembles are machine learning methods for combining predictions from multiple separate models. There are a few different methods for ensembling, but the two most common are:
  • 8. Bagging attempts to reduce the chance of overfitting complex models. 1. It trains a large number of “strong” learners in parallel. 2. A strong learner is a model that’s relatively unconstrained. 3. Bagging then combines all the strong learners together in order to “smooth out” their predictions. e.g. RandomForest Boosting attempts to improve the predictive flexibility of simple models. 1. It trains a large number of “weak” learners in sequence. 2. A weak learner is a constrained model (i.e. you could limit the max depth of each decision tree). 3. Each one in the sequence focuses on learning from the mistakes of the one before it. 4. Boosting then combines all the weak learners into a single strong learner. e.g. XGboost, Gradiant boosting, Adaboost
  • 9. Handling Underfitting 1. Get more training data. 2. Increase the size or number of parameters in the model. 3. Increase the complexity of the model. 4. Increasing the training time, until cost function is minimised.
  • 10. Bias Variance Tradeoff If the algorithm is too simple (hypothesis with linear eq.) then it may be on high bias and low variance condition and thus is error-prone. If algorithms fit too complex ( hypothesis with high degree eq.) then it may be on high variance and low bias. In the latter condition, the new entries will not perform well. Well, there is something between both of these conditions, known as Trade-off or Bias Variance Trade- off. To build a good predictive model, you'll need to find a balance between bias and variance that minimizes the total error.