Probability Distributions

Introduction Reading Time: 12 min

Table of Contents

Description

Probability distributions describe how probabilities are distributed over values of a random variable. They are fundamental in statistics and machine learning for modeling and prediction. There are two main types:
Discrete distributions (e.g., Binomial, Poisson)
Continuous distributions (e.g., Normal, Exponential)

Prerequisites

  • Basics of probability and statistics
  • Familiarity with NumPy and SciPy
  • Concept of random variables

Examples

Here's a simple example of a data science task using Python:


#Normal Distribution (Continuous)
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Generate data
x = np.linspace(-5, 5, 1000)
y = norm.pdf(x, loc=0, scale=1)  # mean = 0, std = 1

# Plot
plt.plot(x, y)
plt.title("Normal Distribution")
plt.xlabel("x")
plt.ylabel("Probability Density")
plt.grid(True)
plt.show()

#Binomial Distribution (Discrete)
from scipy.stats import binom

n = 10  # number of trials
p = 0.5  # probability of success
x = np.arange(0, n+1)
y = binom.pmf(x, n, p)

plt.bar(x, y)
plt.title("Binomial Distribution")
plt.xlabel("Number of Successes")
plt.ylabel("Probability")
plt.show()

#Poisson Distribution (Discrete)
from scipy.stats import poisson

mu = 3  # expected number of occurrences
x = np.arange(0, 10)
y = poisson.pmf(x, mu)

plt.bar(x, y)
plt.title("Poisson Distribution")
plt.xlabel("Events")
plt.ylabel("Probability")
plt.show()


          

Real-World Applications

Modeling asset returns (Normal distribution)

Modeling number of patients (Poisson distribution)

Purchase behavior modeling (Binomial)

Where topic Is Applied

  • Risk modeling and uncertainty prediction
  • Fraud detection algorithms
  • A/B testing and statistical decision-making

Resources

Data Science topic PDF

Download

Harvard Data Science Course

Free online course from Harvard covering data science foundations

Visit

Interview Questions

➤ A function that describes the likelihood of occurrence of different possible outcomes in an experiment.

➤ PMF is for discrete distributions (e.g., binomial), PDF is for continuous ones (e.g., normal).

➤ When modeling rare events occurring over time or space, like server failures per day.

➤ It states that the distribution of sample means approximates a normal distribution as sample size grows, regardless of the original distribution.

➤ Mean (μ) and standard deviation (σ).