Titanic Dataset Survival Analysis
Table of Contents
Description
Data trend visualization is a powerful way to identify patterns and fluctuations over time. Using in-built datasets like flights, tips, and titanic, we can simulate real-world data trends such as COVID-19 case rise, daily sales, or crime/survival distribution.
Prerequisites
- Python installed with seaborn, matplotlib, pandas
- Understanding of basic plotting, data grouping, and transformation
- Familiarity with datasets like flights, tips, and titanic from seaborn
Examples
Here's a simple example of a data science task using Python:
# to disable warnings
import warnings
warnings.filterwarnings("ignore")
# 📈 COVID-Style Trend (using flights)
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
flights = sns.load_dataset('flights')
flights['Date'] = pd.to_datetime(flights['year'].astype(str) + '-' + flights['month'])
plt.figure(figsize=(12,6))
sns.lineplot(x='Date', y='passengers', data=flights)
plt.title("COVID-like Trend (Flights Data)")
plt.ylabel("Passengers (simulated cases)")
plt.xlabel("Date")
plt.grid(True)
plt.show()
#💸 Sales Trend (using tips)
tips = sns.load_dataset('tips')
sales_by_day = tips.groupby('day')['total_bill'].sum().reset_index()
plt.figure(figsize=(8,5))
sns.barplot(data=sales_by_day, x='day', y='total_bill')
plt.title("Sales Trend by Day")
plt.ylabel("Total Sales")
plt.xlabel("Day of Week")
plt.show()
#🚨 Crime Trend Simulation (using titanic)
titanic = sns.load_dataset('titanic')
titanic['age_group'] = pd.cut(titanic['age'], bins=[0,18,35,60,80], labels=['0-18','19-35','36-60','60+'])
crime_like = titanic.groupby(['pclass','age_group'])['survived'].sum().unstack()
crime_like.plot(kind='bar', stacked=True, figsize=(10,6))
plt.title("Simulated Crime Trend by Class and Age Group")
plt.ylabel("Survivors (simulated crime events)")
plt.xlabel("Passenger Class")
plt.legend(title='Age Group')
plt.grid(True)
plt.show()
Real-World Applications
Healthcare
Visualizing COVID-19 trends over time
Retail
Tracking sales performance by day or category
Public Safety
Monitoring crime patterns based on demographics
Where topic Is Applied
- Time series analysis using pandas
- Statistical charting with seaborn and matplotlib
- Simulating domain-specific scenarios using public datasets
Resources
Data Science topic PDF
Harvard Data Science Course
Free online course from Harvard covering data science foundations
Interview Questions
➤Common choices include matplotlib, seaborn, and plotly for interactive plots.
➤ By converting categorical time-like data (e.g., months, years) to datetime and plotting it on a time axis.
➤ Use pd.cut() to create bins and group by these categorical age groups for aggregated visualization.