Probability Distributions
Table of Contents
Description
Probability distributions describe how probabilities are distributed over values of a random variable. They are fundamental in statistics and machine learning for modeling and prediction.
There are two main types:
Discrete distributions (e.g., Binomial, Poisson)
Continuous distributions (e.g., Normal, Exponential)
Prerequisites
- Basics of probability and statistics
- Familiarity with NumPy and SciPy
- Concept of random variables
Examples
Here's a simple example of a data science task using Python:
#Normal Distribution (Continuous)
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
# Generate data
x = np.linspace(-5, 5, 1000)
y = norm.pdf(x, loc=0, scale=1) # mean = 0, std = 1
# Plot
plt.plot(x, y)
plt.title("Normal Distribution")
plt.xlabel("x")
plt.ylabel("Probability Density")
plt.grid(True)
plt.show()
#Binomial Distribution (Discrete)
from scipy.stats import binom
n = 10 # number of trials
p = 0.5 # probability of success
x = np.arange(0, n+1)
y = binom.pmf(x, n, p)
plt.bar(x, y)
plt.title("Binomial Distribution")
plt.xlabel("Number of Successes")
plt.ylabel("Probability")
plt.show()
#Poisson Distribution (Discrete)
from scipy.stats import poisson
mu = 3 # expected number of occurrences
x = np.arange(0, 10)
y = poisson.pmf(x, mu)
plt.bar(x, y)
plt.title("Poisson Distribution")
plt.xlabel("Events")
plt.ylabel("Probability")
plt.show()
Real-World Applications
Modeling asset returns (Normal distribution)
Modeling number of patients (Poisson distribution)
Purchase behavior modeling (Binomial)
Where topic Is Applied
- Risk modeling and uncertainty prediction
- Fraud detection algorithms
- A/B testing and statistical decision-making
Resources
Data Science topic PDF
Harvard Data Science Course
Free online course from Harvard covering data science foundations
Interview Questions
➤ A function that describes the likelihood of occurrence of different possible outcomes in an experiment.
➤ PMF is for discrete distributions (e.g., binomial), PDF is for continuous ones (e.g., normal).
➤ When modeling rare events occurring over time or space, like server failures per day.
➤ It states that the distribution of sample means approximates a normal distribution as sample size grows, regardless of the original distribution.
➤ Mean (μ) and standard deviation (σ).