Feature scaling

Introduction Reading Time: 12 min

Table of Contents

Description

Feature Scaling is a data preprocessing technique used to normalize the range of independent variables (features) in a dataset. Many machine learning algorithms (especially distance-based models like KNN, SVM, and gradient descent-based models) perform better when features are on a similar scale. The two most common scaling techniques are:

1.Min-Max Scaling Scales data to a fixed range, usually [0, 1]. Formula: X scaled =( X−X min)/(X max −X min)


2.Standardization (StandardScaler) Scales data so that it has a mean of 0 and a standard deviation of 1. Formula: X scaled= (X−μ)/σ

​ ​

Prerequisites

  • Basics of NumPy and Pandas
  • Understanding of Machine Learning workflow
  • Familiarity with scikit-learn

Examples

Here's a simple example of a data science task using Python:


import pandas as pd
from sklearn.preprocessing import MinMaxScaler, StandardScaler

# Sample data
data = pd.DataFrame({
    'height': [150, 160, 170, 180, 190],
    'weight': [50, 60, 70, 80, 90]
})

# MinMax Scaling
min_max_scaler = MinMaxScaler()
scaled_minmax = min_max_scaler.fit_transform(data)
print("MinMax Scaled:\n", pd.DataFrame(scaled_minmax, columns=data.columns))

# Standard Scaling
standard_scaler = StandardScaler()
scaled_standard = standard_scaler.fit_transform(data)
print("\nStandard Scaled:\n", pd.DataFrame(scaled_standard, columns=data.columns))
          

Real-World Applications

Finance
Normalizing credit scores, loan amounts, or balances for risk models

Healthcare
Scaling test results or health indicators like blood pressure or sugar levels for diagnostic models

E-commerce
Normalizing user behavior data (clicks, views, time spent) in recommendation engines

Where topic Is Applied

Finance

  • Credit risk modeling, fraud detection using normalized financial metrics

E-commerce

  • Recommendation systems using normalized user ratings and behavior logs

Manufacturing

  • Scaling machine sensor data for predictive maintenance

Resources

Data Science topic PDF

Download

Harvard Data Science Course

Free online course from Harvard covering data science foundations

Visit

Interview Questions

➤ It transforms features into the same scale, helping models like KNN or gradient-based models converge better and make fair comparisons.

➤ Use MinMaxScaler when you need bounded values (e.g., for neural networks or image processing where inputs should be in [0,1]).

➤ Models may perform poorly or give biased results toward features with larger ranges.

➤ Not directly. Categorical variables should first be encoded (e.g., with one-hot encoding), and typically don’t require scaling.

➤ Normalization (MinMax) scales to a range [0,1]; Standardization centers data with mean 0 and standard deviation 1.