Regularization: L1 (Lasso), L2 (Ridge)
Description
Regularization is a technique used to reduce overfitting in machine learning models by adding a penalty term to the loss function. This discourages complex models by shrinking the model coefficients, helping to improve generalization on unseen data.
L1 Regularization (Lasso)
L1 regularization adds the absolute value of the coefficients as a penalty term to the loss function.
- Encourages sparsity in the model (many coefficients become exactly zero)
- Can be used for feature selection
- Loss function:
Loss + λ * Σ|weights|
L2 Regularization (Ridge)
L2 regularization adds the squared value of the coefficients as a penalty term to the loss function.
- Encourages small but non-zero coefficients
- Does not set coefficients to zero
- Helps reduce model complexity and multicollinearity
- Loss function:
Loss + λ * Σ(weights²)
Examples
Python Example: Using Lasso and Ridge Regression
from sklearn.linear_model import Lasso, Ridge
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston
from sklearn.metrics import mean_squared_error
# Load dataset
data = load_boston()
X, y = data.data, data.target
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Lasso Regression (L1)
lasso = Lasso(alpha=0.1)
lasso.fit(X_train, y_train)
y_pred_lasso = lasso.predict(X_test)
mse_lasso = mean_squared_error(y_test, y_pred_lasso)
# Ridge Regression (L2)
ridge = Ridge(alpha=1.0)
ridge.fit(X_train, y_train)
y_pred_ridge = ridge.predict(X_test)
mse_ridge = mean_squared_error(y_test, y_pred_ridge)
print(f"Lasso MSE: {mse_lasso:.2f}")
print(f"Ridge MSE: {mse_ridge:.2f}")
print(f"Lasso Coefficients: {lasso.coef_}")
print(f"Ridge Coefficients: {ridge.coef_}")
Real-World Applications
Regularization Applications
- Finance: Predicting credit risk with reduced overfitting on complex datasets.
- Healthcare: Selecting relevant biomarkers in genomic data using Lasso for interpretability.
- Marketing: Building robust customer segmentation models while avoiding noise overfitting.
- Natural Language Processing: Reducing feature space dimensionality in text classification.

Resources
The following resources will be manually added later:
Video Tutorials
PDF/DOC Materials
Interview Questions
1. What is the difference between L1 and L2 regularization?
L1 regularization adds the absolute values of coefficients as penalty and encourages sparsity, often setting some coefficients to zero. L2 regularization adds squared coefficients as penalty, shrinking coefficients but keeping them non-zero.
2. When would you prefer Lasso over Ridge regression?
You would prefer Lasso when you want feature selection to identify important variables, especially in high-dimensional data with many irrelevant features.
3. How does regularization help prevent overfitting?
Regularization adds a penalty to large coefficients, discouraging overly complex models that fit noise in training data, thus improving generalization on new data.
4. What does the regularization parameter alpha control?
Alpha controls the strength of the penalty term. A higher alpha means more regularization (more shrinkage), while a lower alpha means less regularization.