Binning, log transforms, datetime handling

Introduction Reading Time: 12 min

Table of Contents

Description

These are feature engineering techniques used to preprocess or transform data into more meaningful or model-friendly formats.
🔹 Binning: It converts continuous variables into discrete intervals ("bins"). Useful for reducing noise or handling non-linearity.
🔹 Log Transforms: Apply logarithmic transformations to skewed features to normalize data and reduce the effect of outliers.
🔹 Datetime Handling: Involves extracting meaningful components (like year, month, day, hour, weekday) from datetime features to make them usable in ML models.

Prerequisites

  • Pandas for data manipulation
  • NumPy for numerical transformations
  • Basic understanding of statistics and datetime formats

Examples

Here's a simple example of a data science task using Python:


import pandas as pd
import numpy as np

# Sample DataFrame
df = pd.DataFrame({
    'Age': [15, 25, 35, 45, 55],
    'Salary': [3000, 5000, 15000, 35000, 80000],
    'Join_Date': pd.to_datetime(['2020-01-01', '2019-06-15', '2021-03-10', '2018-11-20', '2022-08-05'])
})

# ------------------- Binning -------------------
# Bin age into 3 categories
df['Age_Bin'] = pd.cut(df['Age'], bins=[0, 20, 40, 60], labels=['Young', 'Middle', 'Senior'])

# ---------------- Log Transform ----------------
# Apply log on Salary to normalize skewed data
df['Log_Salary'] = np.log(df['Salary'])

# --------------- Datetime Handling --------------
# Extracting useful time features
df['Join_Year'] = df['Join_Date'].dt.year
df['Join_Month'] = df['Join_Date'].dt.month
df['Join_Weekday'] = df['Join_Date'].dt.day_name()

print(df)
          

Real-World Applications

Finance: Binning credit score ranges, log-transforming income for loan approvals, Extracting time from transaction dates for fraud detection

Healthcare: Log-transforming medical costs, binning ages for risk groups, Extracting admission year or day from hospital records

E-commerce: Binning customer purchase frequency, date of first transaction
Log-transforming product prices for pricing models

Where topic Is Applied

Finance

  • Credit score binning, salary normalization, transaction time analysis

E-commerce

  • Price normalization, binning frequency, date-based seasonal feature

Marketing

  • Campaign duration tracking, engagement binning, weekday effectiveness

Resources

Data Science topic PDF

Download

Harvard Data Science Course

Free online course from Harvard covering data science foundations

Visit

Interview Questions

➤ Binning turns continuous values into categories. It reduces noise and handles non-linearity.

➤ It reduces skewness and the effect of outliers, improving model performance.

➤ When your data contains zero or negative values, as log is undefined for them.

➤ Use .dt accessor in Pandas to extract year, month, weekday, etc., which can be used as features.

➤ No. Binning can lead to loss of information. Use it only when it improves interpretability or model performance.