Correlation matrices and heatmaps
Table of Contents
Description
Correlation Matrix is a table showing correlation coefficients between variables. It helps identify how strongly pairs of variables are linearly related.
Heatmap is a graphical representation of data where values are depicted by color. It’s often used to visualize correlation matrices for easier interpretation.
Prerequisites
- Basics of Python
- Libraries: pandas, seaborn, matplotlib
- Understanding of correlation (pearson, kendall, spearman)
Examples
Here's a simple example of a data science task using Python:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load example dataset
df = sns.load_dataset('iris')
# Compute the correlation matrix
correlation_matrix = df.corr(numeric_only=True)
# Display the correlation matrix
print(correlation_matrix)
# Create a heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Heatmap of Iris Dataset')
plt.show()
📝 Comments:
df.corr() calculates pairwise correlation of numerical columns.
annot=True adds the correlation values in each box.
cmap defines the color scheme.
linewidths separates the grid cells clearly.
Real-World Applications
Finance
Understand correlation between stock prices or portfolio assets
Heatmaps for risk factor relationships
Healthcare
Find relationships between patient vitals, symptoms, and outcomes
Identify multicollinearity in diagnostic data
E-commerce:
Analyze product price, rating, and sales correlations
Feature selection for recommendation systems
Where topic Is Applied
Finance
- Portfolio risk modeling, asset correlation
Healthcare
- Finding correlations in patient vitals or diagnostic parameters
Manufacturing
- Sensor data analysis, quality vs process variables
Resources
Data Science topic PDF
Harvard Data Science Course
Free online course from Harvard covering data science foundations
Interview Questions
➤ It's a table showing correlation coefficients between variables in a dataset.
➤ Values range from -1 (perfect negative) to +1 (perfect positive); 0 means no correlation.
➤ When you want a quick visual overview of the relationships between multiple variables.
➤ Pearson’s correlation, which measures linear relationships.
➤ You should convert them or use appropriate correlation methods (like Cramér’s V or Point Biserial).