Regularization: L1 (Lasso), L2 (Ridge)

Description

Regularization is a technique used to reduce overfitting in machine learning models by adding a penalty term to the loss function. This discourages complex models by shrinking the model coefficients, helping to improve generalization on unseen data.

L1 Regularization (Lasso)

L1 regularization adds the absolute value of the coefficients as a penalty term to the loss function.

  • Encourages sparsity in the model (many coefficients become exactly zero)
  • Can be used for feature selection
  • Loss function: Loss + λ * Σ|weights|

L2 Regularization (Ridge)

L2 regularization adds the squared value of the coefficients as a penalty term to the loss function.

  • Encourages small but non-zero coefficients
  • Does not set coefficients to zero
  • Helps reduce model complexity and multicollinearity
  • Loss function: Loss + λ * Σ(weights²)

Examples

Python Example: Using Lasso and Ridge Regression

from sklearn.linear_model import Lasso, Ridge
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error

# Load dataset
data = load_boston()
X, y = data.data, data.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Lasso Regression (L1)
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)
y_pred_lasso = lasso.predict(X_test)
mse_lasso = mean_squared_error(y_test, y_pred_lasso)

# Ridge Regression (L2)
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)
y_pred_ridge = ridge.predict(X_test)
mse_ridge = mean_squared_error(y_test, y_pred_ridge)

print(f"Lasso MSE: {mse_lasso:.2f}")
print(f"Ridge MSE: {mse_ridge:.2f}")
print(f"Lasso Coefficients: {lasso.coef_}")
print(f"Ridge Coefficients: {ridge.coef_}")

Real-World Applications

Regularization Applications

  • Finance: Predicting credit risk with reduced overfitting on complex datasets.
  • Healthcare: Selecting relevant biomarkers in genomic data using Lasso for interpretability.
  • Marketing: Building robust customer segmentation models while avoiding noise overfitting.
  • Natural Language Processing: Reducing feature space dimensionality in text classification.
Data scientist working with data

Resources

The following resources will be manually added later:

Video Tutorials

Interview Questions

1. What is the difference between L1 and L2 regularization?

Show Answer

L1 regularization adds the absolute values of coefficients as penalty and encourages sparsity, often setting some coefficients to zero. L2 regularization adds squared coefficients as penalty, shrinking coefficients but keeping them non-zero.

2. When would you prefer Lasso over Ridge regression?

Show Answer

You would prefer Lasso when you want feature selection to identify important variables, especially in high-dimensional data with many irrelevant features.

3. How does regularization help prevent overfitting?

Show Answer

Regularization adds a penalty to large coefficients, discouraging overly complex models that fit noise in training data, thus improving generalization on new data.

4. What does the regularization parameter alpha control?

Show Answer

Alpha controls the strength of the penalty term. A higher alpha means more regularization (more shrinkage), while a lower alpha means less regularization.