Advanced plots
Table of Contents
Description
Advanced plots are essential for multivariate data exploration, pattern detection, and comparative visualizations:
Scatter Plot: Displays the relationship between two numerical variables.
Pair Plot: Visualizes pairwise relationships in a dataset (scatter plots + histograms).
Bar Chart: Used to compare categories with rectangular bars (good for categorical variables).
Prerequisites
- Basic knowledge of Python
- Installed libraries: matplotlib, seaborn, pandas
- Dataset with numerical and/or categorical columns
Examples
Here's a simple example of a data science task using Python:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
# Load example dataset
df = sns.load_dataset('iris') # Built-in seaborn dataset
# Scatter Plot - relationship between petal length and width
plt.figure(figsize=(6, 4))
sns.scatterplot(data=df, x='petal_length', y='petal_width', hue='species')
plt.title('Scatter Plot of Petal Length vs Width')
plt.show()
# Pair Plot - multivariate relationships in the Iris dataset
sns.pairplot(df, hue='species')
plt.suptitle('Pair Plot of Iris Dataset', y=1.02)
plt.show()
# Bar Chart - average sepal length per species
avg_sepal = df.groupby('species')['sepal_length'].mean().reset_index()
plt.figure(figsize=(6, 4))
sns.barplot(data=avg_sepal, x='species', y='sepal_length')
plt.title('Average Sepal Length by Species')
plt.ylabel('Avg Sepal Length')
plt.xlabel('Species')
plt.show()
Real-World Applications
Finance
Scatter: Compare risk vs return
Bar: Quarterly revenue comparison
Pair: Visualizing correlation between stock indicators
Healthcare
Scatter: Age vs Blood Pressure
Bar: Disease incidence by region
Pair: Explore vitals for diagnostic insights
E-commerce:
Scatter: Price vs Ratings
Bar: Top-selling categories
Pair: Customer behavior patterns
Where topic Is Applied
Finance
- Risk-return comparison, stock indicator analysis
Healthcare
- Feature interaction in patient data, symptom correlation
Education
- Subject-wise comparison of student performance
Resources
Data Science topic PDF
Harvard Data Science Course
Free online course from Harvard covering data science foundations
Interview Questions
➤ It helps visualize relationships or correlations between two numerical variables.
➤ To observe multiple pairwise relationships at once, especially in multivariate datasets.
➤ Bar charts are for categorical variables; histograms are for continuous distributions.
➤ Use transparency (alpha) or jittering to avoid overplotting.
➤ They can become cluttered. It's recommended to use a sample or dimensionality reduction.