What is Underfitting?

Underfitting happens when a model is too simple to capture the patterns in the data, so it performs poorly on both the training set and new, unseen data. It’s like trying to draw a detailed picture with just a few straight lines.

Let's break it down

  • Model: a computer program that learns from data to make predictions.
  • Too simple: the model has very few parameters or uses a very basic algorithm, so it can’t represent complex relationships.
  • Capture the patterns: recognize the underlying trends, shapes, or rules hidden in the data.
  • Perform poorly: give inaccurate or low-quality predictions.
  • Training set: the data the model learns from.
  • Unseen data: new data the model hasn’t seen before, used to test how well it generalizes.

Why does it matter?

If a model underfits, it won’t be useful for real-world tasks because it can’t make reliable predictions. Knowing about underfitting helps you build models that are accurate enough to solve problems, saving time and resources.

Where is it used?

  • Spam email detection: a too-simple classifier may miss many spam messages.
  • Stock price forecasting: an underfitted model can’t capture market volatility, leading to bad investment decisions.
  • Medical diagnosis: a simplistic model may fail to recognize disease patterns, risking patient safety.
  • Recommendation systems: underfitting can result in irrelevant product or content suggestions.

Good things about it

  • Easy to spot: high error on training data quickly signals underfitting.
  • Fast to train: simple models require less computational power and time.
  • Less risk of over-complexity: fewer chances of memorizing noise in the data.
  • Interpretability: simple models are easier to understand and explain.
  • Good baseline: provides a starting point to compare more sophisticated models.

Not-so-good things

  • Low accuracy: predictions are often wrong, limiting practical usefulness.
  • Poor generalization: the model fails to adapt to new situations or data variations.
  • May miss important features: oversimplification can ignore key variables that drive outcomes.
  • Can give false confidence: low training error might be misinterpreted as a sign of a good model when it’s actually underfitting.