Basic plots
Table of Contents
Description
Basic plots are essential for exploratory data analysis (EDA). They help visualize distributions, trends, and anomalies:
Histogram: Shows the frequency distribution of a numerical feature.
Box Plot: Summarizes data using quartiles and highlights outliers.
Violin Plot: Combines a box plot and a KDE (Kernel Density Estimate) for detailed distribution.
Prerequisites
- Python installed
- Libraries: matplotlib, seaborn, pandas
- Understanding of numerical data and distributions
Examples
Here's a simple example of a data science task using Python:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
# Sample dataset
data = pd.DataFrame({
'Math_Score': [75, 88, 93, 45, 67, 89, 94, 91, 77, 83, 70, 60, 55, 95, 99]
})
# Histogram - distribution of Math scores
plt.figure(figsize=(6, 4))
sns.histplot(data['Math_Score'], bins=8, kde=True)
plt.title('Histogram of Math Scores')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.show()
# Box Plot - visualizing quartiles and outliers
plt.figure(figsize=(6, 4))
sns.boxplot(y=data['Math_Score'])
plt.title('Box Plot of Math Scores')
plt.show()
# Violin Plot - combines KDE and Box Plot
plt.figure(figsize=(6, 4))
sns.violinplot(y=data['Math_Score'])
plt.title('Violin Plot of Math Scores')
plt.show()
Real-World Applications
Finance: Histogram of daily returns Box plot of monthly expenses
Healthcare: Violin plot of patient blood pressure across age groups Histogram of recovery time
E-commerce: Box plot of order values, Distribution of product ratings
Where topic Is Applied
Finance
- Visualizing return volatility, risk profiling
Retail
- Product price range comparison
Education
- Visual comparison of student marks across classes
Resources
Data Science topic PDF
Harvard Data Science Course
Free online course from Harvard covering data science foundations
Interview Questions
➤ To show the frequency distribution of a continuous variable.
➤ Outliers are shown as points outside the whiskers, which extend 1.5×IQR from the quartiles.
➤ It includes the KDE, providing insight into the distribution shape.
➤ When you want to see both the quartiles and the full probability distribution.
➤ No, they are primarily for numerical data, though categorical variables can be used for grouping (e.g., box plots by category).