Statistical Measures

Introduction Reading Time: 12 min

Table of Contents

Description

Statistical measures are essential in understanding the distribution and characteristics of data. They help summarize large datasets using a few meaningful values and provide insights into trends, variability, and data shape.
Mean: Average of the values.
Median: Middle value when sorted.
Mode: Most frequently occurring value.
Variance: Measure of spread from the mean.
Standard Deviation: Square root of variance; shows dispersion.
Skewness: Indicates the asymmetry of data distribution.
Kurtosis: Measures "tailedness" or the presence of outliers.

Prerequisites

  • Python Basics
  • NumPy and Pandas Libraries
  • Understanding of data types and DataFrames

Examples

Here's a simple example of a data science task using Python:


import numpy as np
import pandas as pd
from scipy import stats

# Create a sample dataset
data = [12, 15, 14, 10, 18, 20, 12, 15, 14, 14, 13]

# Convert to a Pandas Series
series = pd.Series(data)

# Mean
mean_val = series.mean()

# Median
median_val = series.median()

# Mode
mode_val = series.mode().values[0]  # Getting the first mode

# Variance
variance_val = series.var()

# Standard Deviation
std_dev_val = series.std()

# Skewness
skewness_val = series.skew()

# Kurtosis
kurtosis_val = series.kurt()

# Display results with comments
print("Mean:", mean_val)
print("Median:", median_val)
print("Mode:", mode_val)
print("Variance:", variance_val)
print("Standard Deviation:", std_dev_val)
print("Skewness:", skewness_val)
print("Kurtosis:", kurtosis_val)

          

Real-World Applications

Finance: Portfolio risk analysis (standard deviation), Market returns symmetry (skewness, kurtosis)

Healthcare: Average patient vitals (mean, median) Identifying anomalies in lab tests (variance, skew)

E-commerce: Customer purchase behavior analysis (mean order value) Detecting unusual buying patterns (kurtosis)

Where topic Is Applied

Finance

  • Risk analysis, returns distribution

Retail

  • Sales data analysis, customer behavior insights

Manufacturing

  • Tolerance checking, defect detection based on deviation

Resources

Data Science topic PDF

Download

Harvard Data Science Course

Free online course from Harvard covering data science foundations

Visit

Interview Questions

➤ Mean is the average, median is the middle value, and mode is the most frequent value.

➤ It indicates that data points are spread out over a wide range and far from the mean.

➤ Skewness measures the asymmetry of a distribution. Positive skew means a longer tail on the right; negative skew on the left.

➤ It measures the tailedness; high kurtosis means more outliers, and low kurtosis means fewer outliers.

➤ Using var() and std() methods in Pandas or NumPy.