Basic plots

Introduction Reading Time: 12 min

Table of Contents

Description

Basic plots are essential for exploratory data analysis (EDA). They help visualize distributions, trends, and anomalies:
Histogram: Shows the frequency distribution of a numerical feature.
Box Plot: Summarizes data using quartiles and highlights outliers.
Violin Plot: Combines a box plot and a KDE (Kernel Density Estimate) for detailed distribution.

Prerequisites

  • Python installed
  • Libraries: matplotlib, seaborn, pandas
  • Understanding of numerical data and distributions

Examples

Here's a simple example of a data science task using Python:


import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

# Sample dataset
data = pd.DataFrame({
    'Math_Score': [75, 88, 93, 45, 67, 89, 94, 91, 77, 83, 70, 60, 55, 95, 99]
})

# Histogram - distribution of Math scores
plt.figure(figsize=(6, 4))
sns.histplot(data['Math_Score'], bins=8, kde=True)
plt.title('Histogram of Math Scores')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.show()

# Box Plot - visualizing quartiles and outliers
plt.figure(figsize=(6, 4))
sns.boxplot(y=data['Math_Score'])
plt.title('Box Plot of Math Scores')
plt.show()

# Violin Plot - combines KDE and Box Plot
plt.figure(figsize=(6, 4))
sns.violinplot(y=data['Math_Score'])
plt.title('Violin Plot of Math Scores')
plt.show()

          

Real-World Applications

Finance: Histogram of daily returns Box plot of monthly expenses

Healthcare: Violin plot of patient blood pressure across age groups Histogram of recovery time

E-commerce: Box plot of order values, Distribution of product ratings

Where topic Is Applied

Finance

  • Visualizing return volatility, risk profiling

Retail

  • Product price range comparison

Education

  • Visual comparison of student marks across classes

Resources

Data Science topic PDF

Download

Harvard Data Science Course

Free online course from Harvard covering data science foundations

Visit

Interview Questions

➤ To show the frequency distribution of a continuous variable.

➤ Outliers are shown as points outside the whiskers, which extend 1.5×IQR from the quartiles.

➤ It includes the KDE, providing insight into the distribution shape.

➤ When you want to see both the quartiles and the full probability distribution.

➤ No, they are primarily for numerical data, though categorical variables can be used for grouping (e.g., box plots by category).