Titanic Dataset Survival Analysis

Introduction Reading Time: 12 min

Table of Contents

Description

Data trend visualization is a powerful way to identify patterns and fluctuations over time. Using in-built datasets like flights, tips, and titanic, we can simulate real-world data trends such as COVID-19 case rise, daily sales, or crime/survival distribution.

Prerequisites

  • Python installed with seaborn, matplotlib, pandas
  • Understanding of basic plotting, data grouping, and transformation
  • Familiarity with datasets like flights, tips, and titanic from seaborn

Examples

Here's a simple example of a data science task using Python:


# to disable warnings
import warnings

warnings.filterwarnings("ignore")
# 📈 COVID-Style Trend (using flights)
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

flights = sns.load_dataset('flights')
flights['Date'] = pd.to_datetime(flights['year'].astype(str) + '-' + flights['month'])

plt.figure(figsize=(12,6))
sns.lineplot(x='Date', y='passengers', data=flights)
plt.title("COVID-like Trend (Flights Data)")
plt.ylabel("Passengers (simulated cases)")
plt.xlabel("Date")
plt.grid(True)
plt.show()

#💸 Sales Trend (using tips)
tips = sns.load_dataset('tips')
sales_by_day = tips.groupby('day')['total_bill'].sum().reset_index()

plt.figure(figsize=(8,5))
sns.barplot(data=sales_by_day, x='day', y='total_bill')
plt.title("Sales Trend by Day")
plt.ylabel("Total Sales")
plt.xlabel("Day of Week")
plt.show()

#🚨 Crime Trend Simulation (using titanic)
titanic = sns.load_dataset('titanic')
titanic['age_group'] = pd.cut(titanic['age'], bins=[0,18,35,60,80], labels=['0-18','19-35','36-60','60+'])

crime_like = titanic.groupby(['pclass','age_group'])['survived'].sum().unstack()

crime_like.plot(kind='bar', stacked=True, figsize=(10,6))
plt.title("Simulated Crime Trend by Class and Age Group")
plt.ylabel("Survivors (simulated crime events)")
plt.xlabel("Passenger Class")
plt.legend(title='Age Group')
plt.grid(True)
plt.show()


          

Real-World Applications

Healthcare

Visualizing COVID-19 trends over time

Retail

Tracking sales performance by day or category

Public Safety

Monitoring crime patterns based on demographics

Where topic Is Applied

  • Time series analysis using pandas
  • Statistical charting with seaborn and matplotlib
  • Simulating domain-specific scenarios using public datasets

Resources

Data Science topic PDF

Download

Harvard Data Science Course

Free online course from Harvard covering data science foundations

Visit

Interview Questions

➤Common choices include matplotlib, seaborn, and plotly for interactive plots.

➤ By converting categorical time-like data (e.g., months, years) to datetime and plotting it on a time axis.

➤ Use pd.cut() to create bins and group by these categorical age groups for aggregated visualization.