Overfitting & Underfitting

Overfitting &
Underfitting
ML HUB
By : Soumit Kar

Overfitting is a modeling error which occurs when a function is too closely fit to a limited set of data points. It is the
result of an overly complex model with an excessive number of training points. A model that is overfitted is inaccurate
because the model has effectively memorized existing data points.
Overfitting = Low bias+High variance
Underfitting is a modeling error which occurs when a function does not fit the data points well enough. It is the
result of a simple model with an insufficient number of training points. A model that is under fitted is inaccurate
because the trend does not reflect the reality of the data.
Underfitting = High bias+Low variance
Bias: The difference Between expected (avg) prediction of the
Model & the actual value .
Variance: How the prediction for a given point vary between
different realisation for the model .

Regression: If the final “Best Fit” line crosses over every single data point by
forming an unnecessarily complex curve, then the model is likely overfitting.
Overfitting with high variance, low bias
Degree of polynomial increase
Appropriate fit
Low bias , Low variance
Overfitting Appropriate fitting
Train
Accuracy
Increases Appropriate
Test
Accuracy
Decreases Appropriate

Classification: If every single class is properly classified on the training set by forming a very complex
decision boundary, then there is a good chance that the model is overfitting.
The green line shows its overfitting .
The black line shows its appropriate fitting .
Overfitting Low bias, high variance
Appropriate fitting Low bias , Low variance

Underfitting
Regression : As shown in the figure below, the data points are laid out in a given pattern, but the model is unable to “Fit”
properly to the given data due to low model complexity.
Classification:As shown in the figure below, the model is trained to classify between the circles and crosses. However, it
is unable to do so properly due to the straight line, which fails to properly classify either of the two classes.

FIX For Overfitting
Cross Validation : It is a technique that is used for the assessment of how the results of statistical analysis generalize to
an
independent data set.
Early Stopping: Its rules provide us with guidance as to how many iterations can be run before the learner begins to
over-fit.

Pruning: Pruning is extensively used while building related models. It simply removes the nodes which add
little predictive power for the problem in hand.
Regularization: It introduces a cost term for bringing in more features with the objective function. Hence it
tries to push the coefficients for many variables to zero and hence reduce cost term.
Remove features: Some algorithms have built-in feature selection. For those that don’t, you can manually
improve their generalizability by removing irrelevant input features. An interesting way to do so is to tell a
story about how each feature fits into the model. This is like the data scientist’s spin on software engineer’s
rubber duck debugging technique, where they debug their code by explaining it, line-by-line, to a rubber
duck.
Train with more data: It won’t work every time, but training with more data can help algorithms detect
the signal better. In the earlier example of modelling height vs. age in children, it’s clear how sampling
more schools will help your model. Of course, that’s not always the case. If we just add more noisy data,
this technique won’t help. That’s why you should always ensure your data is clean and relevant.
Ensembling: Ensembles are machine learning methods for combining predictions from multiple separate
models.
There are a few different methods for ensembling, but the two most common are:

Bagging attempts to reduce the chance of overfitting complex models.
1. It trains a large number of “strong” learners in parallel.
2. A strong learner is a model that’s relatively unconstrained.
3. Bagging then combines all the strong learners together in order to “smooth out” their predictions.
e.g. RandomForest
Boosting attempts to improve the predictive flexibility of simple models.
1. It trains a large number of “weak” learners in sequence.
2. A weak learner is a constrained model (i.e. you could limit the max depth of each decision tree).
3. Each one in the sequence focuses on learning from the mistakes of the one before it.
4. Boosting then combines all the weak learners into a single strong learner.
e.g. XGboost, Gradiant boosting, Adaboost

Handling Underfitting
1. Get more training data.
2. Increase the size or number of parameters in the model.
3. Increase the complexity of the model.
4. Increasing the training time, until cost function is minimised.

Bias Variance Tradeoff
If the algorithm is too simple (hypothesis with linear eq.) then it may be on high bias and low variance
condition and thus is error-prone. If algorithms fit too complex ( hypothesis with high degree eq.) then it
may be on high variance and low bias. In the latter condition, the new entries will not perform well.
Well, there is something between both of these conditions, known as Trade-off or Bias Variance Trade-
off.
To build a good predictive model, you'll need to find a balance between bias and variance that
minimizes the total error.

Overfitting & Underfitting

More Related Content

What's hot (20)

Similar to Overfitting & Underfitting (20)

Recently uploaded (20)

Overfitting & Underfitting