Feature scaling
Table of Contents
Description
Feature Scaling is a data preprocessing technique used to normalize the range of independent variables (features) in a dataset. Many machine learning algorithms (especially distance-based models like KNN, SVM, and gradient descent-based models) perform better when features are on a similar scale. The two most common scaling techniques are:
1.Min-Max Scaling Scales data to a fixed range, usually [0, 1]. Formula: X scaled =( X−X min)/(X max −X min)
2.Standardization (StandardScaler) Scales data so that it has a mean of 0 and a standard deviation of 1. Formula: X scaled= (X−μ)/σ
Prerequisites
- Basics of NumPy and Pandas
- Understanding of Machine Learning workflow
- Familiarity with scikit-learn
Examples
Here's a simple example of a data science task using Python:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler, StandardScaler
# Sample data
data = pd.DataFrame({
'height': [150, 160, 170, 180, 190],
'weight': [50, 60, 70, 80, 90]
})
# MinMax Scaling
min_max_scaler = MinMaxScaler()
scaled_minmax = min_max_scaler.fit_transform(data)
print("MinMax Scaled:\n", pd.DataFrame(scaled_minmax, columns=data.columns))
# Standard Scaling
standard_scaler = StandardScaler()
scaled_standard = standard_scaler.fit_transform(data)
print("\nStandard Scaled:\n", pd.DataFrame(scaled_standard, columns=data.columns))
Real-World Applications
Finance
Normalizing credit scores, loan amounts, or balances for risk models
Healthcare
Scaling test results or health indicators like blood pressure or sugar levels for diagnostic models
E-commerce
Normalizing user behavior data (clicks, views, time spent) in recommendation engines
Where topic Is Applied
Finance
- Credit risk modeling, fraud detection using normalized financial metrics
E-commerce
- Recommendation systems using normalized user ratings and behavior logs
Manufacturing
- Scaling machine sensor data for predictive maintenance
Resources
Data Science topic PDF
Harvard Data Science Course
Free online course from Harvard covering data science foundations
Interview Questions
➤ It transforms features into the same scale, helping models like KNN or gradient-based models converge better and make fair comparisons.
➤ Use MinMaxScaler when you need bounded values (e.g., for neural networks or image processing where inputs should be in [0,1]).
➤ Models may perform poorly or give biased results toward features with larger ranges.
➤ Not directly. Categorical variables should first be encoded (e.g., with one-hot encoding), and typically don’t require scaling.
➤ Normalization (MinMax) scales to a range [0,1]; Standardization centers data with mean 0 and standard deviation 1.