Home › Topics › Intro to ML-Ready Data › bias-variance

bias-variance

Introduction Reading Time: 12 min

Description
Prerequisites
Examples
Real-World Applications
Where topic Is Applied
Resources
Interview Questions

Description

The Bias-Variance trade-off is a fundamental concept in machine learning that describes how well a model generalizes to unseen data.
Bias refers to the error due to overly simplistic assumptions in the model. High bias leads to underfitting.
Variance refers to the error due to too much complexity in the model. High variance leads to overfitting.
The goal is to find the right balance: low bias + low variance.

Prerequisites

Understanding of supervised learning
Knowledge of training/testing splits
Model evaluation metrics (like MSE, accuracy)

Examples

Here's a simple example of a data science task using Python:


from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression

# Generate synthetic data
X, y = make_regression(n_samples=100, n_features=1, noise=20, random_state=42)

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Low variance, high bias model (Linear Regression)
lr = LinearRegression()
lr.fit(X_train, y_train)
y_pred_lr = lr.predict(X_test)

# High variance, low bias model (Decision Tree)
dt = DecisionTreeRegressor()
dt.fit(X_train, y_train)
y_pred_dt = dt.predict(X_test)

# Compare errors
print("Linear Regression MSE (High Bias):", mean_squared_error(y_test, y_pred_lr))
print("Decision Tree MSE (High Variance):", mean_squared_error(y_test, y_pred_dt))

🔍 This code compares a high-bias model (linear regression) and a high-variance model (decision tree) using Mean Squared Error (MSE).

Real-World Applications

Finance

Fraud detection models balancing over/under predictions

Healthcare

Choosing models for disease prediction (avoid underfitting or overfitting)

Marketing

Campaign response prediction

Where topic Is Applied

Model selection and evaluation

Error analysis in supervised learning

Tuning model complexity (regularization, pruning, etc.)

Resources

Data Science topic PDF

Download

Harvard Data Science Course

Free online course from Harvard covering data science foundations

Visit

Interview Questions

➤ Bias is the error from wrong assumptions in the learning algorithm. High bias can cause the model to miss relevant patterns.

➤ Variance is the error from sensitivity to small fluctuations in the training set. High variance can cause overfitting.

➤ It is the trade-off between the error introduced by bias and variance. The aim is to find a model with optimal balance.

➤ Use more training data, regularization, or simpler models.

➤ Use more complex models or reduce underfitting through better features.

Data Science in my style