Probability Distributions | Open Textbooks

Learn

What is a Probability Distribution?

A probability distribution describes all possible values of a random variable and their associated probabilities. Understanding distributions is essential for statistical inference, modeling real-world phenomena, and analyzing data patterns.

Probability Distribution

A probability distribution is a mathematical function that gives the probabilities of occurrence of different possible outcomes for a random variable.

For any distribution, the sum (or integral) of all probabilities equals 1.

Types of Random Variables

Type	Definition	Examples	Distribution Type
Discrete	Countable values (integers)	Coin flips, dice rolls, defects	Probability Mass Function (PMF)
Continuous	Any value in an interval	Height, weight, time, temperature	Probability Density Function (PDF)

Binomial Distribution

Models the number of successes in a fixed number of independent trials, each with the same probability of success.

P(X = k) = C(n,k) · p^k · (1-p)^(n-k)

n = number of trials
p = probability of success on each trial
k = number of successes
C(n,k) = n!/(k!(n-k)!) (combinations)

Mean: μ = np Standard Deviation: σ = √(np(1-p))

Conditions for Binomial

Fixed number of trials (n)
Each trial has only two outcomes (success/failure)
Trials are independent
Probability of success (p) is constant

Normal Distribution

Normal (Gaussian) Distribution

The most important continuous distribution, characterized by its bell-shaped, symmetric curve.

f(x) = (1/(σ√(2π))) · e^(-(x-μ)²/(2σ²))

μ = mean (center of the distribution)
σ = standard deviation (spread)

Notation: X ~ N(μ, σ²)

The Empirical Rule (68-95-99.7)

For Normal Distributions

68% of data falls within 1 standard deviation of the mean
95% of data falls within 2 standard deviations of the mean
99.7% of data falls within 3 standard deviations of the mean

Standard Normal Distribution

Z-Scores and Standard Normal

The standard normal distribution has μ = 0 and σ = 1.

z = (x - μ) / σ

The z-score tells you how many standard deviations a value is from the mean.

Use z-tables or calculators to find probabilities for any normal distribution by converting to z-scores.

Common Z-Score Values

Z-Score	Area to Left	Area to Right
-2.0	0.0228 (2.28%)	0.9772 (97.72%)
-1.0	0.1587 (15.87%)	0.8413 (84.13%)
0	0.5000 (50%)	0.5000 (50%)
1.0	0.8413 (84.13%)	0.1587 (15.87%)
1.645	0.9500 (95%)	0.0500 (5%)
1.96	0.9750 (97.5%)	0.0250 (2.5%)
2.0	0.9772 (97.72%)	0.0228 (2.28%)

Other Important Distributions

Uniform Distribution

All values in an interval are equally likely.

For continuous: f(x) = 1/(b-a) for a ≤ x ≤ b

Mean: (a+b)/2 Variance: (b-a)²/12

Exponential Distribution

Models time between events in a Poisson process (waiting times).

f(x) = λe^(-λx) for x ≥ 0

Mean: 1/λ Standard Deviation: 1/λ

Central Limit Theorem

The Most Important Theorem in Statistics

For large samples (n ≥ 30), the sampling distribution of the sample mean is approximately normal, regardless of the population distribution.

x̄ ~ N(μ, σ²/n)

Standard error of the mean: SE = σ/√n

This is why the normal distribution is so widely used in statistical inference!

Examples

Example 1: Binomial Probability

Problem: A basketball player makes 70% of free throws. In 10 attempts, what's the probability of making exactly 8?

Solution:

This is binomial with n = 10, p = 0.70, k = 8

P(X = 8) = C(10,8) · (0.70)^8 · (0.30)^2

C(10,8) = 10!/(8!2!) = 45

P(X = 8) = 45 · (0.70)^8 · (0.30)^2

P(X = 8) = 45 · 0.05765 · 0.09

P(X = 8) = 0.2335 or about 23.35%

Example 2: Normal Distribution - Finding Probability

Problem: Test scores are normally distributed with μ = 75 and σ = 10. What percentage of students score above 90?

Solution:

Step 1: Convert to z-score

z = (90 - 75) / 10 = 15/10 = 1.5

Step 2: Find probability

P(X > 90) = P(Z > 1.5)

From z-table: P(Z < 1.5) = 0.9332

P(Z > 1.5) = 1 - 0.9332 = 0.0668 or 6.68%

Example 3: Finding a Percentile

Problem: Heights of adult women are normally distributed with μ = 64 inches and σ = 3 inches. What height is at the 90th percentile?

Solution:

Step 1: Find z-score for 90th percentile

From z-table: z = 1.28 (gives area 0.90 to the left)

Step 2: Convert back to original scale

x = μ + z·σ = 64 + 1.28(3) = 64 + 3.84

x = 67.84 inches

90% of women are shorter than about 67.8 inches.

Example 4: Central Limit Theorem

Problem: A population has mean 100 and standard deviation 15. If we take samples of size 36, describe the sampling distribution of x̄.

Solution:

By the Central Limit Theorem (n = 36 ≥ 30):

The sampling distribution of x̄ is approximately normal with:

Mean: μ_x̄ = μ = 100

Standard Error: SE = σ/√n = 15/√36 = 15/6 = 2.5

Therefore: x̄ ~ N(100, 2.5²)

Most sample means will fall within 2(2.5) = 5 points of 100.

Example 5: Combining Probabilities

Problem: For the normal distribution with μ = 50 and σ = 8, find P(42 < X < 62).

Solution:

Convert both boundaries to z-scores:

z₁ = (42 - 50)/8 = -1.0

z₂ = (62 - 50)/8 = 1.5

Find probability between:

P(-1.0 < Z < 1.5) = P(Z < 1.5) - P(Z < -1.0)

= 0.9332 - 0.1587

= 0.7745 or 77.45%

Practice

Apply your understanding of probability distributions.

1. For a binomial distribution with n = 20 and p = 0.5, the mean is:

A) 5 B) 10 C) 15 D) 20

2. In a standard normal distribution, P(Z < 0) equals:

A) 0 B) 0.25 C) 0.50 D) 1.0

3. If X ~ N(100, 16), the standard deviation is:

A) 4 B) 8 C) 16 D) 100

4. Using the empirical rule, what percent of data in a normal distribution lies beyond 2 standard deviations from the mean?

A) 2.5% B) 5% C) 32% D) 95%

5. A value has z-score = -1.5. The value is:

A) 1.5 standard deviations above the mean B) 1.5 standard deviations below the mean C) At the mean D) 1.5 times the mean

6. For a binomial distribution with n = 100 and p = 0.3, the standard deviation is:

A) √21 ≈ 4.58 B) √30 ≈ 5.48 C) 30 D) 70

7. The Central Limit Theorem applies when:

A) Population is normal B) Sample size is large (n ≥ 30) C) Population is symmetric D) Sample is random

8. If σ = 20 and n = 100, the standard error is:

A) 0.2 B) 2 C) 5 D) 20

9. What z-score corresponds to the 75th percentile?

A) 0.25 B) 0.67 C) 0.75 D) 1.28

10. P(X ≤ 60) = 0.84 when X ~ N(50, 100). What is σ?

A) 5 B) 10 C) 50 D) 100

Click to reveal answers

B) 10 - Mean = np = 20(0.5) = 10
C) 0.50 - By symmetry, half the distribution is below z = 0
A) 4 - N(μ, σ²) means σ² = 16, so σ = 4
B) 5% - 95% is within 2 SD, so 5% is beyond
B) 1.5 standard deviations below the mean - negative z means below
A) √21 ≈ 4.58 - σ = √(np(1-p)) = √(100·0.3·0.7) = √21
B) Sample size is large (n ≥ 30) - key condition of CLT
B) 2 - SE = σ/√n = 20/10 = 2
B) 0.67 - z = 0.67 gives area 0.75 to the left
B) 10 - z = 1.0 for 84th percentile, so 60 = 50 + 1(σ), σ = 10

Check Your Understanding

1. Explain when you would use a binomial distribution versus a normal distribution.

Show answer

Use binomial when counting discrete successes in a fixed number of independent trials with constant probability (like number of heads in 10 coin flips). Use normal for continuous measurements (height, weight, time) or when the Central Limit Theorem applies (sampling distributions of means). For large n with p not too close to 0 or 1, binomial can be approximated by normal: if np ≥ 10 and n(1-p) ≥ 10.

2. Why is the Central Limit Theorem so important?

Show answer

The CLT is crucial because it allows us to make inferences about populations using sample data, even when we don't know the population distribution. It guarantees that sample means follow a normal distribution for large samples, which lets us calculate probabilities, construct confidence intervals, and perform hypothesis tests using the familiar normal distribution. This is the foundation of most statistical inference.

3. What does a z-score tell you, and why is it useful?

Show answer

A z-score tells you how many standard deviations a value is from the mean. Z-scores are useful because they: (1) standardize different scales - you can compare scores from different tests, (2) allow probability calculations using the standard normal table, (3) identify outliers (z beyond ±2 or ±3), (4) determine percentiles, and (5) enable comparison across different normal distributions.

4. How does sample size affect the sampling distribution of the mean?

Show answer

As sample size (n) increases: (1) The standard error (σ/√n) decreases, making the sampling distribution narrower and more concentrated around μ. (2) The distribution becomes more normal (even if population isn't). (3) Sample means become more reliable estimates of the population mean. This is why larger samples give more precise estimates - the variability in x̄ decreases as n increases.

🚀 Next Steps

Review any concepts that felt challenging
Move on to the next lesson when ready
Return to practice problems periodically for review