Binning, log transforms, datetime handling
Table of Contents
Description
These are feature engineering techniques used to preprocess or transform data into more meaningful or model-friendly formats.
🔹 Binning:
It converts continuous variables into discrete intervals ("bins"). Useful for reducing noise or handling non-linearity.
🔹 Log Transforms:
Apply logarithmic transformations to skewed features to normalize data and reduce the effect of outliers.
🔹 Datetime Handling:
Involves extracting meaningful components (like year, month, day, hour, weekday) from datetime features to make them usable in ML models.
Prerequisites
- Pandas for data manipulation
- NumPy for numerical transformations
- Basic understanding of statistics and datetime formats
Examples
Here's a simple example of a data science task using Python:
import pandas as pd
import numpy as np
# Sample DataFrame
df = pd.DataFrame({
'Age': [15, 25, 35, 45, 55],
'Salary': [3000, 5000, 15000, 35000, 80000],
'Join_Date': pd.to_datetime(['2020-01-01', '2019-06-15', '2021-03-10', '2018-11-20', '2022-08-05'])
})
# ------------------- Binning -------------------
# Bin age into 3 categories
df['Age_Bin'] = pd.cut(df['Age'], bins=[0, 20, 40, 60], labels=['Young', 'Middle', 'Senior'])
# ---------------- Log Transform ----------------
# Apply log on Salary to normalize skewed data
df['Log_Salary'] = np.log(df['Salary'])
# --------------- Datetime Handling --------------
# Extracting useful time features
df['Join_Year'] = df['Join_Date'].dt.year
df['Join_Month'] = df['Join_Date'].dt.month
df['Join_Weekday'] = df['Join_Date'].dt.day_name()
print(df)
Real-World Applications
Finance: Binning credit score ranges, log-transforming income for loan approvals, Extracting time from transaction dates for fraud detection
Healthcare: Log-transforming medical costs, binning ages for risk groups, Extracting admission year or day from hospital records
E-commerce:
Binning customer purchase frequency, date of first transaction
Log-transforming product prices for pricing models
Where topic Is Applied
Finance
- Credit score binning, salary normalization, transaction time analysis
E-commerce
- Price normalization, binning frequency, date-based seasonal feature
Marketing
- Campaign duration tracking, engagement binning, weekday effectiveness
Resources
Data Science topic PDF
Harvard Data Science Course
Free online course from Harvard covering data science foundations
Interview Questions
➤ Binning turns continuous values into categories. It reduces noise and handles non-linearity.
➤ It reduces skewness and the effect of outliers, improving model performance.
➤ When your data contains zero or negative values, as log is undefined for them.
➤ Use .dt accessor in Pandas to extract year, month, weekday, etc., which can be used as features.
➤ No. Binning can lead to loss of information. Use it only when it improves interpretability or model performance.