Home › Topics › data-transformations › Feature scaling

Feature scaling

Introduction Reading Time: 12 min

Description
Prerequisites
Examples
Real-World Applications
Where Data topic Is Applied
Resources
Interview Questions

Description

Feature Scaling is a data preprocessing technique used to normalize the range of independent variables (features) in a dataset. Many machine learning algorithms (especially distance-based models like KNN, SVM, and gradient descent-based models) perform better when features are on a similar scale. The two most common scaling techniques are:

1.Min-Max Scaling Scales data to a fixed range, usually [0, 1]. Formula: X scaled =( X−X min)/(X max −X min)

2.Standardization (StandardScaler) Scales data so that it has a mean of 0 and a standard deviation of 1. Formula: X scaled= (X−μ)/σ

Prerequisites

Basics of NumPy and Pandas
Understanding of Machine Learning workflow
Familiarity with scikit-learn

Examples

Here's a simple example of a data science task using Python:


import pandas as pd
from sklearn.preprocessing import MinMaxScaler, StandardScaler

# Sample data
data = pd.DataFrame({
    'height': [150, 160, 170, 180, 190],
    'weight': [50, 60, 70, 80, 90]
})

# MinMax Scaling
min_max_scaler = MinMaxScaler()
scaled_minmax = min_max_scaler.fit_transform(data)
print("MinMax Scaled:\n", pd.DataFrame(scaled_minmax, columns=data.columns))

# Standard Scaling
standard_scaler = StandardScaler()
scaled_standard = standard_scaler.fit_transform(data)
print("\nStandard Scaled:\n", pd.DataFrame(scaled_standard, columns=data.columns))

Real-World Applications

Finance
Normalizing credit scores, loan amounts, or balances for risk models

Healthcare
Scaling test results or health indicators like blood pressure or sugar levels for diagnostic models

E-commerce
Normalizing user behavior data (clicks, views, time spent) in recommendation engines

Where topic Is Applied

Finance

Credit risk modeling, fraud detection using normalized financial metrics

E-commerce

Recommendation systems using normalized user ratings and behavior logs

Manufacturing

Scaling machine sensor data for predictive maintenance

Resources

Data Science topic PDF

Download

Harvard Data Science Course

Free online course from Harvard covering data science foundations

Visit

Interview Questions

➤ It transforms features into the same scale, helping models like KNN or gradient-based models converge better and make fair comparisons.

➤ Use MinMaxScaler when you need bounded values (e.g., for neural networks or image processing where inputs should be in [0,1]).

➤ Models may perform poorly or give biased results toward features with larger ranges.

➤ Not directly. Categorical variables should first be encoded (e.g., with one-hot encoding), and typically don’t require scaling.

➤ Normalization (MinMax) scales to a range [0,1]; Standardization centers data with mean 0 and standard deviation 1.

Data Science in my style

Feature scaling

Table of Contents

Description

1.Min-Max Scaling Scales data to a fixed range, usually [0, 1]. Formula: X scaled =( X−X min)/(X max −X min)

2.Standardization (StandardScaler) Scales data so that it has a mean of 0 and a standard deviation of 1. Formula: X scaled= (X−μ)/σ

Prerequisites

Examples

Real-World Applications

Finance
Normalizing credit scores, loan amounts, or balances for risk models

Healthcare
Scaling test results or health indicators like blood pressure or sugar levels for diagnostic models

E-commerce
Normalizing user behavior data (clicks, views, time spent) in recommendation engines

Where topic Is Applied

Finance

E-commerce

Manufacturing

Resources

Data Science topic PDF

Harvard Data Science Course

Interview Questions

Feature scaling

Table of Contents

Description

1.Min-Max Scaling Scales data to a fixed range, usually [0, 1]. Formula: X scaled =( X−X min)/(X max −X min)

2.Standardization (StandardScaler) Scales data so that it has a mean of 0 and a standard deviation of 1. Formula: X scaled= (X−μ)/σ

Prerequisites

Examples

Real-World Applications

Finance Normalizing credit scores, loan amounts, or balances for risk models

Healthcare Scaling test results or health indicators like blood pressure or sugar levels for diagnostic models

E-commerce Normalizing user behavior data (clicks, views, time spent) in recommendation engines

Where topic Is Applied

Finance

E-commerce

Manufacturing

Resources

Data Science topic PDF

Harvard Data Science Course

Interview Questions

Finance
Normalizing credit scores, loan amounts, or balances for risk models

Healthcare
Scaling test results or health indicators like blood pressure or sugar levels for diagnostic models

E-commerce
Normalizing user behavior data (clicks, views, time spent) in recommendation engines