What is Underfitting?
Underfitting happens when a model is too simple to capture the patterns in the data, so it performs poorly on both the training set and new, unseen data. It’s like trying to draw a detailed picture with just a few straight lines.
Let's break it down
- Model: a computer program that learns from data to make predictions.
- Too simple: the model has very few parameters or uses a very basic algorithm, so it can’t represent complex relationships.
- Capture the patterns: recognize the underlying trends, shapes, or rules hidden in the data.
- Perform poorly: give inaccurate or low-quality predictions.
- Training set: the data the model learns from.
- Unseen data: new data the model hasn’t seen before, used to test how well it generalizes.
Why does it matter?
If a model underfits, it won’t be useful for real-world tasks because it can’t make reliable predictions. Knowing about underfitting helps you build models that are accurate enough to solve problems, saving time and resources.
Where is it used?
- Spam email detection: a too-simple classifier may miss many spam messages.
- Stock price forecasting: an underfitted model can’t capture market volatility, leading to bad investment decisions.
- Medical diagnosis: a simplistic model may fail to recognize disease patterns, risking patient safety.
- Recommendation systems: underfitting can result in irrelevant product or content suggestions.
Good things about it
- Easy to spot: high error on training data quickly signals underfitting.
- Fast to train: simple models require less computational power and time.
- Less risk of over-complexity: fewer chances of memorizing noise in the data.
- Interpretability: simple models are easier to understand and explain.
- Good baseline: provides a starting point to compare more sophisticated models.
Not-so-good things
- Low accuracy: predictions are often wrong, limiting practical usefulness.
- Poor generalization: the model fails to adapt to new situations or data variations.
- May miss important features: oversimplification can ignore key variables that drive outcomes.
- Can give false confidence: low training error might be misinterpreted as a sign of a good model when it’s actually underfitting.