Probability Distributions
Learn
What is a Probability Distribution?
A probability distribution describes all possible values of a random variable and their associated probabilities. Understanding distributions is essential for statistical inference, modeling real-world phenomena, and analyzing data patterns.
Probability Distribution
A probability distribution is a mathematical function that gives the probabilities of occurrence of different possible outcomes for a random variable.
For any distribution, the sum (or integral) of all probabilities equals 1.
Types of Random Variables
| Type | Definition | Examples | Distribution Type |
|---|---|---|---|
| Discrete | Countable values (integers) | Coin flips, dice rolls, defects | Probability Mass Function (PMF) |
| Continuous | Any value in an interval | Height, weight, time, temperature | Probability Density Function (PDF) |
Binomial Distribution
Binomial Distribution
Models the number of successes in a fixed number of independent trials, each with the same probability of success.
P(X = k) = C(n,k) · p^k · (1-p)^(n-k)
- n = number of trials
- p = probability of success on each trial
- k = number of successes
- C(n,k) = n!/(k!(n-k)!) (combinations)
Mean: μ = np Standard Deviation: σ = √(np(1-p))
Conditions for Binomial
- Fixed number of trials (n)
- Each trial has only two outcomes (success/failure)
- Trials are independent
- Probability of success (p) is constant
Normal Distribution
Normal (Gaussian) Distribution
The most important continuous distribution, characterized by its bell-shaped, symmetric curve.
f(x) = (1/(σ√(2π))) · e^(-(x-μ)²/(2σ²))
- μ = mean (center of the distribution)
- σ = standard deviation (spread)
Notation: X ~ N(μ, σ²)
The Empirical Rule (68-95-99.7)
For Normal Distributions
- 68% of data falls within 1 standard deviation of the mean
- 95% of data falls within 2 standard deviations of the mean
- 99.7% of data falls within 3 standard deviations of the mean
Standard Normal Distribution
Z-Scores and Standard Normal
The standard normal distribution has μ = 0 and σ = 1.
z = (x - μ) / σ
The z-score tells you how many standard deviations a value is from the mean.
Use z-tables or calculators to find probabilities for any normal distribution by converting to z-scores.
Common Z-Score Values
| Z-Score | Area to Left | Area to Right |
|---|---|---|
| -2.0 | 0.0228 (2.28%) | 0.9772 (97.72%) |
| -1.0 | 0.1587 (15.87%) | 0.8413 (84.13%) |
| 0 | 0.5000 (50%) | 0.5000 (50%) |
| 1.0 | 0.8413 (84.13%) | 0.1587 (15.87%) |
| 1.645 | 0.9500 (95%) | 0.0500 (5%) |
| 1.96 | 0.9750 (97.5%) | 0.0250 (2.5%) |
| 2.0 | 0.9772 (97.72%) | 0.0228 (2.28%) |
Other Important Distributions
Uniform Distribution
All values in an interval are equally likely.
For continuous: f(x) = 1/(b-a) for a ≤ x ≤ b
Mean: (a+b)/2 Variance: (b-a)²/12
Exponential Distribution
Models time between events in a Poisson process (waiting times).
f(x) = λe^(-λx) for x ≥ 0
Mean: 1/λ Standard Deviation: 1/λ
Central Limit Theorem
The Most Important Theorem in Statistics
For large samples (n ≥ 30), the sampling distribution of the sample mean is approximately normal, regardless of the population distribution.
x̄ ~ N(μ, σ²/n)
Standard error of the mean: SE = σ/√n
This is why the normal distribution is so widely used in statistical inference!
Examples
Example 1: Binomial Probability
Problem: A basketball player makes 70% of free throws. In 10 attempts, what's the probability of making exactly 8?
Solution:
This is binomial with n = 10, p = 0.70, k = 8
P(X = 8) = C(10,8) · (0.70)^8 · (0.30)^2
C(10,8) = 10!/(8!2!) = 45
P(X = 8) = 45 · (0.70)^8 · (0.30)^2
P(X = 8) = 45 · 0.05765 · 0.09
P(X = 8) = 0.2335 or about 23.35%
Example 2: Normal Distribution - Finding Probability
Problem: Test scores are normally distributed with μ = 75 and σ = 10. What percentage of students score above 90?
Solution:
Step 1: Convert to z-score
z = (90 - 75) / 10 = 15/10 = 1.5
Step 2: Find probability
P(X > 90) = P(Z > 1.5)
From z-table: P(Z < 1.5) = 0.9332
P(Z > 1.5) = 1 - 0.9332 = 0.0668 or 6.68%
Example 3: Finding a Percentile
Problem: Heights of adult women are normally distributed with μ = 64 inches and σ = 3 inches. What height is at the 90th percentile?
Solution:
Step 1: Find z-score for 90th percentile
From z-table: z = 1.28 (gives area 0.90 to the left)
Step 2: Convert back to original scale
x = μ + z·σ = 64 + 1.28(3) = 64 + 3.84
x = 67.84 inches
90% of women are shorter than about 67.8 inches.
Example 4: Central Limit Theorem
Problem: A population has mean 100 and standard deviation 15. If we take samples of size 36, describe the sampling distribution of x̄.
Solution:
By the Central Limit Theorem (n = 36 ≥ 30):
The sampling distribution of x̄ is approximately normal with:
Mean: μ_x̄ = μ = 100
Standard Error: SE = σ/√n = 15/√36 = 15/6 = 2.5
Therefore: x̄ ~ N(100, 2.5²)
Most sample means will fall within 2(2.5) = 5 points of 100.
Example 5: Combining Probabilities
Problem: For the normal distribution with μ = 50 and σ = 8, find P(42 < X < 62).
Solution:
Convert both boundaries to z-scores:
z₁ = (42 - 50)/8 = -1.0
z₂ = (62 - 50)/8 = 1.5
Find probability between:
P(-1.0 < Z < 1.5) = P(Z < 1.5) - P(Z < -1.0)
= 0.9332 - 0.1587
= 0.7745 or 77.45%
Practice
Apply your understanding of probability distributions.
1. For a binomial distribution with n = 20 and p = 0.5, the mean is:
A) 5 B) 10 C) 15 D) 20
2. In a standard normal distribution, P(Z < 0) equals:
A) 0 B) 0.25 C) 0.50 D) 1.0
3. If X ~ N(100, 16), the standard deviation is:
A) 4 B) 8 C) 16 D) 100
4. Using the empirical rule, what percent of data in a normal distribution lies beyond 2 standard deviations from the mean?
A) 2.5% B) 5% C) 32% D) 95%
5. A value has z-score = -1.5. The value is:
A) 1.5 standard deviations above the mean B) 1.5 standard deviations below the mean C) At the mean D) 1.5 times the mean
6. For a binomial distribution with n = 100 and p = 0.3, the standard deviation is:
A) √21 ≈ 4.58 B) √30 ≈ 5.48 C) 30 D) 70
7. The Central Limit Theorem applies when:
A) Population is normal B) Sample size is large (n ≥ 30) C) Population is symmetric D) Sample is random
8. If σ = 20 and n = 100, the standard error is:
A) 0.2 B) 2 C) 5 D) 20
9. What z-score corresponds to the 75th percentile?
A) 0.25 B) 0.67 C) 0.75 D) 1.28
10. P(X ≤ 60) = 0.84 when X ~ N(50, 100). What is σ?
A) 5 B) 10 C) 50 D) 100
Click to reveal answers
- B) 10 - Mean = np = 20(0.5) = 10
- C) 0.50 - By symmetry, half the distribution is below z = 0
- A) 4 - N(μ, σ²) means σ² = 16, so σ = 4
- B) 5% - 95% is within 2 SD, so 5% is beyond
- B) 1.5 standard deviations below the mean - negative z means below
- A) √21 ≈ 4.58 - σ = √(np(1-p)) = √(100·0.3·0.7) = √21
- B) Sample size is large (n ≥ 30) - key condition of CLT
- B) 2 - SE = σ/√n = 20/10 = 2
- B) 0.67 - z = 0.67 gives area 0.75 to the left
- B) 10 - z = 1.0 for 84th percentile, so 60 = 50 + 1(σ), σ = 10
Check Your Understanding
1. Explain when you would use a binomial distribution versus a normal distribution.
Show answer
Use binomial when counting discrete successes in a fixed number of independent trials with constant probability (like number of heads in 10 coin flips). Use normal for continuous measurements (height, weight, time) or when the Central Limit Theorem applies (sampling distributions of means). For large n with p not too close to 0 or 1, binomial can be approximated by normal: if np ≥ 10 and n(1-p) ≥ 10.
2. Why is the Central Limit Theorem so important?
Show answer
The CLT is crucial because it allows us to make inferences about populations using sample data, even when we don't know the population distribution. It guarantees that sample means follow a normal distribution for large samples, which lets us calculate probabilities, construct confidence intervals, and perform hypothesis tests using the familiar normal distribution. This is the foundation of most statistical inference.
3. What does a z-score tell you, and why is it useful?
Show answer
A z-score tells you how many standard deviations a value is from the mean. Z-scores are useful because they: (1) standardize different scales - you can compare scores from different tests, (2) allow probability calculations using the standard normal table, (3) identify outliers (z beyond ±2 or ±3), (4) determine percentiles, and (5) enable comparison across different normal distributions.
4. How does sample size affect the sampling distribution of the mean?
Show answer
As sample size (n) increases: (1) The standard error (σ/√n) decreases, making the sampling distribution narrower and more concentrated around μ. (2) The distribution becomes more normal (even if population isn't). (3) Sample means become more reliable estimates of the population mean. This is why larger samples give more precise estimates - the variability in x̄ decreases as n increases.
🚀 Next Steps
- Review any concepts that felt challenging
- Move on to the next lesson when ready
- Return to practice problems periodically for review