Underfitting

What is Underfitting?

Underfitting happens when a model is too simple to capture the patterns in the data, so it performs poorly on both the training set and new, unseen data. It’s like trying to draw a detailed picture with just a few straight lines.

Let's break it down

Model: a computer program that learns from data to make predictions.
Too simple: the model has very few parameters or uses a very basic algorithm, so it can’t represent complex relationships.
Capture the patterns: recognize the underlying trends, shapes, or rules hidden in the data.
Perform poorly: give inaccurate or low-quality predictions.
Training set: the data the model learns from.
Unseen data: new data the model hasn’t seen before, used to test how well it generalizes.

Why does it matter?

If a model underfits, it won’t be useful for real-world tasks because it can’t make reliable predictions. Knowing about underfitting helps you build models that are accurate enough to solve problems, saving time and resources.

Where is it used?

Spam email detection: a too-simple classifier may miss many spam messages.
Stock price forecasting: an underfitted model can’t capture market volatility, leading to bad investment decisions.
Medical diagnosis: a simplistic model may fail to recognize disease patterns, risking patient safety.
Recommendation systems: underfitting can result in irrelevant product or content suggestions.

Good things about it

Easy to spot: high error on training data quickly signals underfitting.
Fast to train: simple models require less computational power and time.
Less risk of over-complexity: fewer chances of memorizing noise in the data.
Interpretability: simple models are easier to understand and explain.
Good baseline: provides a starting point to compare more sophisticated models.

Not-so-good things

Low accuracy: predictions are often wrong, limiting practical usefulness.
Poor generalization: the model fails to adapt to new situations or data variations.
May miss important features: oversimplification can ignore key variables that drive outcomes.
Can give false confidence: low training error might be misinterpreted as a sign of a good model when it’s actually underfitting.