Advanced plots

Introduction Reading Time: 12 min

Table of Contents

Description

Advanced plots are essential for multivariate data exploration, pattern detection, and comparative visualizations:
Scatter Plot: Displays the relationship between two numerical variables.
Pair Plot: Visualizes pairwise relationships in a dataset (scatter plots + histograms).
Bar Chart: Used to compare categories with rectangular bars (good for categorical variables).

Prerequisites

  • Basic knowledge of Python
  • Installed libraries: matplotlib, seaborn, pandas
  • Dataset with numerical and/or categorical columns

Examples

Here's a simple example of a data science task using Python:


import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Load example dataset
df = sns.load_dataset('iris')  # Built-in seaborn dataset

# Scatter Plot - relationship between petal length and width
plt.figure(figsize=(6, 4))
sns.scatterplot(data=df, x='petal_length', y='petal_width', hue='species')
plt.title('Scatter Plot of Petal Length vs Width')
plt.show()

# Pair Plot - multivariate relationships in the Iris dataset
sns.pairplot(df, hue='species')
plt.suptitle('Pair Plot of Iris Dataset', y=1.02)
plt.show()

# Bar Chart - average sepal length per species
avg_sepal = df.groupby('species')['sepal_length'].mean().reset_index()

plt.figure(figsize=(6, 4))
sns.barplot(data=avg_sepal, x='species', y='sepal_length')
plt.title('Average Sepal Length by Species')
plt.ylabel('Avg Sepal Length')
plt.xlabel('Species')
plt.show()

          

Real-World Applications

Finance


Scatter: Compare risk vs return
Bar: Quarterly revenue comparison
Pair: Visualizing correlation between stock indicators

Healthcare


Scatter: Age vs Blood Pressure
Bar: Disease incidence by region
Pair: Explore vitals for diagnostic insights

E-commerce:


Scatter: Price vs Ratings Bar: Top-selling categories
Pair: Customer behavior patterns

Where topic Is Applied

Finance

  • Risk-return comparison, stock indicator analysis

Healthcare

  • Feature interaction in patient data, symptom correlation

Education

  • Subject-wise comparison of student performance

Resources

Data Science topic PDF

Download

Harvard Data Science Course

Free online course from Harvard covering data science foundations

Visit

Interview Questions

➤ It helps visualize relationships or correlations between two numerical variables.

➤ To observe multiple pairwise relationships at once, especially in multivariate datasets.

➤ Bar charts are for categorical variables; histograms are for continuous distributions.

➤ Use transparency (alpha) or jittering to avoid overplotting.

➤ They can become cluttered. It's recommended to use a sample or dimensionality reduction.