Hypothesis Testing

Learn

Introduction to Hypothesis Testing

Hypothesis testing is a statistical method for making decisions about populations based on sample data. It's used in science, medicine, business, and policy to determine whether observed effects are real or could have occurred by chance.

Hypothesis testing is a formal procedure that uses sample data to evaluate a claim about a population parameter. We test whether our data provides enough evidence to reject a default assumption.

The Two Hypotheses

Null and Alternative Hypotheses

Null Hypothesis (H₀): The default assumption - usually states "no effect" or "no difference." We assume this is true unless evidence proves otherwise.

Alternative Hypothesis (H₁ or Hₐ): What we're trying to find evidence for - usually states there IS an effect or difference.

H₀: parameter = hypothesized value

H₁: parameter ≠, <, or > hypothesized value

Types of Tests

Test Type	Alternative Hypothesis	When to Use
Two-tailed	H₁: μ ≠ μ₀	Testing for any difference (larger or smaller)
Right-tailed (upper)	H₁: μ > μ₀	Testing if parameter is greater than claimed
Left-tailed (lower)	H₁: μ < μ₀	Testing if parameter is less than claimed

The Testing Process

Steps of Hypothesis Testing

State the hypotheses: Write H₀ and H₁ using symbols
Choose significance level: Select α (commonly 0.05 or 0.01)
Collect data: Gather a random sample
Calculate test statistic: Measure how far sample is from H₀
Find p-value: Probability of getting this result if H₀ is true
Make decision: Compare p-value to α
State conclusion: In context of the original question

Test Statistics

Z-Test (when σ is known or n is large)

z = (x̄ - μ₀) / (σ / √n)

x̄ = sample mean
μ₀ = hypothesized population mean
σ = population standard deviation
n = sample size

T-Test (when σ is unknown)

t = (x̄ - μ₀) / (s / √n)

s = sample standard deviation
df = n - 1 (degrees of freedom)

Use t-distribution tables or calculator for p-values.

Significance Level and P-Values

Key Concepts

Significance Level (α): The threshold for rejecting H₀. Common values: 0.05 (5%) or 0.01 (1%).

P-value: The probability of obtaining results at least as extreme as observed, assuming H₀ is true.

If p-value ≤ α → Reject H₀ (significant result)

If p-value > α → Fail to reject H₀ (not significant)

Interpreting P-Values

P-value	Evidence Against H₀
> 0.10	Little or no evidence
0.05 - 0.10	Weak evidence
0.01 - 0.05	Moderate evidence
0.001 - 0.01	Strong evidence
< 0.001	Very strong evidence

Types of Errors

	H₀ is True	H₀ is False
Reject H₀	Type I Error (α) - False Positive	Correct Decision (Power)
Fail to Reject H₀	Correct Decision	Type II Error (β) - False Negative

Understanding Errors

Type I Error: Rejecting H₀ when it's actually true (false alarm). Probability = α

Type II Error: Failing to reject H₀ when it's actually false (missed detection). Probability = β

Power: Probability of correctly rejecting a false H₀. Power = 1 - β

Confidence Intervals and Hypothesis Testing

Connection

There's a direct relationship between confidence intervals and two-tailed hypothesis tests:

A 95% confidence interval corresponds to α = 0.05
If the hypothesized value falls outside the CI, reject H₀
If the hypothesized value falls inside the CI, fail to reject H₀

Examples

Example 1: Two-Tailed Z-Test

Problem: A company claims batteries last 500 hours on average. A sample of 36 batteries has mean life 490 hours. Population σ = 30 hours. Test at α = 0.05.

Solution:

Step 1: State hypotheses

H₀: μ = 500 (batteries last 500 hours as claimed)

H₁: μ ≠ 500 (batteries don't last 500 hours)

Step 2: Calculate test statistic

z = (x̄ - μ₀) / (σ / √n) = (490 - 500) / (30 / √36)

z = -10 / (30/6) = -10 / 5 = -2.0

Step 3: Find p-value

For z = -2.0, P(Z < -2.0) = 0.0228

Two-tailed p-value = 2(0.0228) = 0.0456

Step 4: Decision

Since 0.0456 < 0.05, we reject H₀

Conclusion: There is sufficient evidence at α = 0.05 that the true mean battery life differs from 500 hours.

Example 2: One-Tailed T-Test

Problem: A teacher claims a new method improves test scores above 75. A sample of 25 students has mean 78 with s = 10. Test at α = 0.05.

Solution:

Step 1: State hypotheses

H₀: μ ≤ 75 (mean is at most 75)

H₁: μ > 75 (mean is greater than 75)

Step 2: Calculate test statistic

t = (78 - 75) / (10 / √25) = 3 / 2 = 1.5

df = 25 - 1 = 24

Step 3: Find p-value

Using t-table with df = 24: P(t > 1.5) ≈ 0.073

Step 4: Decision

Since 0.073 > 0.05, we fail to reject H₀

Conclusion: There is not sufficient evidence at α = 0.05 to conclude that the new method improves scores above 75.

Example 3: Proportion Test

Problem: A company claims 80% of customers are satisfied. In a survey of 200 customers, 150 were satisfied. Test at α = 0.05.

Solution:

Step 1: State hypotheses

H₀: p = 0.80

H₁: p ≠ 0.80

Step 2: Calculate test statistic

p̂ = 150/200 = 0.75

z = (p̂ - p₀) / √(p₀(1-p₀)/n)

z = (0.75 - 0.80) / √(0.80 × 0.20 / 200)

z = -0.05 / √(0.0008) = -0.05 / 0.0283 = -1.77

Step 3: Find p-value

Two-tailed p-value = 2 × P(Z < -1.77) = 2(0.0384) = 0.077

Step 4: Decision

Since 0.077 > 0.05, we fail to reject H₀

Conclusion: There is not sufficient evidence to conclude the satisfaction rate differs from 80%.

Example 4: Understanding Type I and Type II Errors

Problem: A drug company tests whether a new medication lowers blood pressure. Describe the Type I and Type II errors in context.

Solution:

H₀: The drug has no effect on blood pressure

H₁: The drug lowers blood pressure

Type I Error: Concluding the drug works when it actually doesn't. Consequence: Patients take an ineffective medication, possibly instead of treatments that work.

Type II Error: Concluding the drug doesn't work when it actually does. Consequence: An effective treatment is abandoned, patients miss beneficial medication.

Which is worse? Depends on context. If the drug has side effects, Type I might be worse. If the condition is serious and alternatives are limited, Type II might be worse.

Example 5: Using Confidence Intervals

Problem: A 95% CI for mean weight loss is (2.1, 5.3) pounds. Test H₀: μ = 0 vs H₁: μ ≠ 0 at α = 0.05.

Solution:

The hypothesized value μ = 0 falls outside the 95% CI (2.1, 5.3)

Therefore, we reject H₀ at α = 0.05

Conclusion: There is statistically significant evidence that the mean weight loss is different from zero. Since the entire CI is positive, there's evidence of actual weight loss (2.1 to 5.3 pounds on average).

Practice

Apply your understanding of hypothesis testing.

1. Which hypothesis contains the "=" sign?

A) Alternative hypothesis B) Null hypothesis C) Research hypothesis D) None of them

2. A p-value of 0.03 means:

A) 3% chance H₀ is true B) 3% chance of getting this result if H₀ is true C) 97% chance H₁ is true D) 3% chance of Type II error

3. At α = 0.05, which p-value leads to rejecting H₀?

A) 0.06 B) 0.10 C) 0.04 D) 0.08

4. Sample: n = 49, x̄ = 82, σ = 14. Test H₀: μ = 80 vs H₁: μ > 80. Find z.

A) 0.5 B) 1.0 C) 1.5 D) 2.0

5. A Type I error occurs when:

A) H₀ is rejected when false B) H₀ is rejected when true C) H₀ is not rejected when true D) H₀ is not rejected when false

6. The power of a test is:

A) Probability of Type I error B) Probability of Type II error C) 1 minus probability of Type II error D) The significance level

7. A 99% CI for μ is (12, 18). Which conclusion follows for testing H₀: μ = 15 at α = 0.01?

A) Reject H₀ B) Fail to reject H₀ C) Cannot determine D) Test is invalid

8. Increasing sample size generally:

A) Increases power B) Increases Type I error C) Increases Type II error D) Has no effect

9. For a left-tailed test with z = -2.1, the p-value is approximately:

A) 0.018 B) 0.036 C) 0.964 D) 0.982

10. "Statistically significant" means:

A) The result is practically important B) The p-value is less than α C) The effect size is large D) The null hypothesis is true

Click to reveal answers

B) Null hypothesis - H₀ always contains equality
B) 3% chance of getting this result if H₀ is true
C) 0.04 - only p-value less than 0.05
B) 1.0 - z = (82-80)/(14/7) = 2/2 = 1.0
B) H₀ is rejected when true (false positive)
C) 1 minus probability of Type II error (1 - β)
B) Fail to reject H₀ - 15 is inside the CI
A) Increases power - more data means better detection
A) 0.018 - left tail area for z = -2.1
B) The p-value is less than α

Check Your Understanding

1. Why do we "fail to reject H₀" rather than "accept H₀"?

Show answer

We say "fail to reject" because not finding evidence against H₀ doesn't prove H₀ is true - it just means we don't have enough evidence to conclude it's false. Absence of evidence isn't evidence of absence. Multiple reasons could explain insufficient evidence: small sample size, high variability, or the true effect might be small but real. The burden of proof is on rejecting H₀, not proving it.

2. Why might a statistically significant result not be practically significant?

Show answer

Statistical significance only tells us an effect is unlikely due to chance - not that it's meaningful. With large samples, even tiny differences become statistically significant. For example, a study with n=10,000 might find that a drug reduces blood pressure by 0.5 mmHg (p < 0.001), but this effect is too small to matter clinically. Always consider effect size and practical implications alongside p-values.

3. How do you choose between a one-tailed and two-tailed test?

Show answer

Use a one-tailed test when: (1) you have a directional hypothesis BEFORE seeing data (e.g., "the new drug will LOWER blood pressure"), (2) only one direction matters for your decision, (3) you have theoretical reason to expect a specific direction. Use a two-tailed test when: (1) any difference (higher or lower) is of interest, (2) you're being conservative, (3) you're exploring without prior expectations. Two-tailed is more common and more conservative.

4. Explain the trade-off between Type I and Type II errors. How does α affect both?

Show answer

There's an inverse relationship: decreasing α (being stricter) reduces Type I errors but increases Type II errors (more false negatives). Increasing α does the opposite. Choosing α depends on which error is worse in your context. In medical testing, Type I errors (approving ineffective drugs) might be weighted against Type II (rejecting effective treatments). The only way to reduce both errors simultaneously is to increase sample size or reduce variability.

🚀 Next Steps

Review any concepts that felt challenging
Move on to the next lesson when ready
Return to practice problems periodically for review