Statistical Measures
Table of Contents
Description
Statistical measures are essential in understanding the distribution and characteristics of data. They help summarize large datasets using a few meaningful values and provide insights into trends, variability, and data shape.
Mean: Average of the values.
Median: Middle value when sorted.
Mode: Most frequently occurring value.
Variance: Measure of spread from the mean.
Standard Deviation: Square root of variance; shows dispersion.
Skewness: Indicates the asymmetry of data distribution.
Kurtosis: Measures "tailedness" or the presence of outliers.
Prerequisites
- Python Basics
- NumPy and Pandas Libraries
- Understanding of data types and DataFrames
Examples
Here's a simple example of a data science task using Python:
import numpy as np
import pandas as pd
from scipy import stats
# Create a sample dataset
data = [12, 15, 14, 10, 18, 20, 12, 15, 14, 14, 13]
# Convert to a Pandas Series
series = pd.Series(data)
# Mean
mean_val = series.mean()
# Median
median_val = series.median()
# Mode
mode_val = series.mode().values[0] # Getting the first mode
# Variance
variance_val = series.var()
# Standard Deviation
std_dev_val = series.std()
# Skewness
skewness_val = series.skew()
# Kurtosis
kurtosis_val = series.kurt()
# Display results with comments
print("Mean:", mean_val)
print("Median:", median_val)
print("Mode:", mode_val)
print("Variance:", variance_val)
print("Standard Deviation:", std_dev_val)
print("Skewness:", skewness_val)
print("Kurtosis:", kurtosis_val)
Real-World Applications
Finance: Portfolio risk analysis (standard deviation), Market returns symmetry (skewness, kurtosis)
Healthcare: Average patient vitals (mean, median) Identifying anomalies in lab tests (variance, skew)
E-commerce: Customer purchase behavior analysis (mean order value) Detecting unusual buying patterns (kurtosis)
Where topic Is Applied
Finance
- Risk analysis, returns distribution
Retail
- Sales data analysis, customer behavior insights
Manufacturing
- Tolerance checking, defect detection based on deviation
Resources
Data Science topic PDF
Harvard Data Science Course
Free online course from Harvard covering data science foundations
Interview Questions
➤ Mean is the average, median is the middle value, and mode is the most frequent value.
➤ It indicates that data points are spread out over a wide range and far from the mean.
➤ Skewness measures the asymmetry of a distribution. Positive skew means a longer tail on the right; negative skew on the left.
➤ It measures the tailedness; high kurtosis means more outliers, and low kurtosis means fewer outliers.
➤ Using var() and std() methods in Pandas or NumPy.