Statistical Inference
📖 Learn
Statistical Inference
Statistical inference is the process of using data from a sample to draw conclusions about a larger population. Rather than measuring every individual in a population, we use sample statistics to estimate population parameters and assess the reliability of those estimates.
Key Terminology
| Term | Definition | Symbol |
|---|---|---|
| Population | The entire group you want to study | N (size) |
| Sample | A subset of the population you actually measure | n (size) |
| Parameter | A numerical value describing the population | μ (mean), σ (std dev), p (proportion) |
| Statistic | A numerical value calculated from sample data | x̄ (mean), s (std dev), p̂ (proportion) |
Sampling Methods
Random Sampling
For valid statistical inference, samples must be collected using proper random sampling methods:
- Simple Random Sample (SRS): Every individual has an equal chance of being selected
- Stratified Random Sample: Population divided into groups (strata), then SRS from each
- Cluster Sample: Randomly select groups (clusters), then sample all in selected clusters
- Systematic Sample: Select every kth individual from a list
Bias in Sampling
Selection bias: Some individuals are more likely to be chosen
Response bias: Answers are influenced by question wording or social desirability
Nonresponse bias: People who don't respond differ from those who do
Biased samples lead to unreliable conclusions about the population.
Sampling Distributions
Sampling Distribution
A sampling distribution shows the distribution of a statistic (like x̄ or p̂) across all possible samples of a given size from a population. It tells us how much we expect the statistic to vary from sample to sample.
Central Limit Theorem (CLT)
Central Limit Theorem
For sufficiently large sample sizes (typically n ≥ 30), the sampling distribution of the sample mean x̄ is approximately normal, regardless of the shape of the population distribution.
Mean of x̄: μx̄ = μ
Standard Error: SE = σ/√n
Confidence Intervals
Confidence Interval
A confidence interval is a range of values that is likely to contain the true population parameter with a specified level of confidence.
CI = point estimate ± margin of error
CI for mean: x̄ ± z* · (σ/√n)
CI for proportion: p̂ ± z* · √(p̂(1-p̂)/n)
Common Critical Values (z*)
| Confidence Level | z* | Interpretation |
|---|---|---|
| 90% | 1.645 | 90% of intervals capture the true parameter |
| 95% | 1.96 | 95% of intervals capture the true parameter |
| 99% | 2.576 | 99% of intervals capture the true parameter |
Interpreting Confidence Intervals
Correct Interpretation
"We are 95% confident that the true population mean lies between [lower bound] and [upper bound]."
This means: If we took many samples and built intervals the same way, about 95% of them would contain the true parameter.
Common mistake: Saying "There is a 95% probability that μ is in this interval." (The parameter is fixed, not random.)
Margin of Error
Margin of Error
MOE = z* · (σ/√n) or MOE = z* · √(p̂(1-p̂)/n)
The margin of error depends on:
- Confidence level: Higher confidence → larger MOE
- Sample size: Larger n → smaller MOE
- Variability: More spread → larger MOE
Sample Size Determination
Required Sample Size
To achieve a desired margin of error E:
For means: n = (z* · σ / E)²
For proportions: n = p̂(1-p̂) · (z* / E)²
If p̂ is unknown, use p̂ = 0.5 for the most conservative (largest) sample size.
💡 Examples
Example 1: Constructing a Confidence Interval for a Mean
Problem: A sample of 64 students has a mean test score of x̄ = 72 with a known population standard deviation σ = 8. Construct a 95% confidence interval for the population mean.
Step 1: Identify the values:
x̄ = 72, σ = 8, n = 64, z* = 1.96 (for 95%)
Step 2: Calculate the standard error:
SE = σ/√n = 8/√64 = 8/8 = 1
Step 3: Calculate the margin of error:
MOE = z* · SE = 1.96 · 1 = 1.96
Step 4: Construct the interval:
CI = x̄ ± MOE = 72 ± 1.96
CI = (70.04, 73.96)
Answer: We are 95% confident that the true population mean test score is between 70.04 and 73.96.
Example 2: Confidence Interval for a Proportion
Problem: In a survey of 400 voters, 220 support a ballot measure. Construct a 90% confidence interval for the true proportion of supporters.
Step 1: Calculate the sample proportion:
p̂ = 220/400 = 0.55
Step 2: Check conditions (np̂ ≥ 10 and n(1-p̂) ≥ 10):
400(0.55) = 220 ≥ 10 ✓
400(0.45) = 180 ≥ 10 ✓
Step 3: Calculate standard error:
SE = √(p̂(1-p̂)/n) = √(0.55 · 0.45/400) = √(0.000619) = 0.0249
Step 4: Calculate MOE (z* = 1.645 for 90%):
MOE = 1.645 · 0.0249 = 0.041
Step 5: Construct interval:
CI = 0.55 ± 0.041 = (0.509, 0.591)
Answer: We are 90% confident that between 50.9% and 59.1% of all voters support the measure.
Example 3: Interpreting a Confidence Interval
Problem: A 95% confidence interval for the mean weight of apples from an orchard is (180g, 200g). Which statements are correct?
a) 95% of all apples weigh between 180g and 200g.
b) There is a 95% probability that μ is between 180g and 200g.
c) We are 95% confident that the true mean weight is between 180g and 200g.
a) INCORRECT: The CI is for the mean, not individual values. Individual apple weights have much more variation.
b) INCORRECT: The parameter μ is fixed (though unknown). It's either in the interval or not. Probability doesn't apply.
c) CORRECT: This properly expresses confidence in the method. If we repeated this process many times, about 95% of the intervals would contain μ.
Example 4: Determining Sample Size
Problem: A researcher wants to estimate the mean commute time with a margin of error of 2 minutes at 95% confidence. If σ = 10 minutes, how large a sample is needed?
Formula: n = (z* · σ / E)²
where z* = 1.96, σ = 10, E = 2
Calculate:
n = (1.96 · 10 / 2)² = (9.8)² = 96.04
Answer: n = 97 (always round up for sample size)
The researcher needs at least 97 participants.
Example 5: Effect of Sample Size on MOE
Problem: A 95% CI based on n = 100 has MOE = 4. What would the MOE be if n = 400?
Key insight: MOE is proportional to 1/√n
When n quadruples (100 → 400), √n doubles (10 → 20)
So MOE is halved: 4 → 2
Verification:
MOE₁/MOE₂ = √(n₂/n₁) = √(400/100) = 2
MOE₂ = MOE₁/2 = 4/2 = 2
Answer: The new MOE would be 2.
✏️ Practice
Apply your understanding of statistical inference.
Problem 1: A random sample of 49 light bulbs has a mean life of 1,200 hours. If σ = 140 hours, construct a 95% confidence interval for the mean life of all bulbs.
Problem 2: In a poll of 600 adults, 342 favor a new policy. Construct a 99% confidence interval for the true proportion.
Problem 3: A 90% confidence interval is (45, 55). (a) What is the point estimate? (b) What is the margin of error?
Problem 4: If we increase the sample size from 100 to 900, by what factor does the margin of error change?
Problem 5: A researcher wants a 95% CI for a proportion with MOE ≤ 0.03. What sample size is needed? (Use p̂ = 0.5)
Problem 6: Explain why convenience samples (like surveying friends) cannot be used for valid statistical inference.
Problem 7: A 95% CI for mean salary is ($48,000, $56,000). Can we conclude the population mean is $50,000?
Problem 8: Two researchers study the same population. Researcher A uses n = 50 and gets CI (10, 20). Researcher B uses n = 200. Will B's interval be wider, narrower, or the same? Why?
Problem 9: Calculate the standard error for x̄ if σ = 15 and n = 225.
Problem 10: A survey asks "Don't you agree this harmful policy should be stopped?" What type of bias might this cause?
Click to reveal answers
- SE = 140/√49 = 20; CI = 1200 ± 1.96(20) = (1160.8, 1239.2) hours
- p̂ = 0.57; SE = √(0.57·0.43/600) = 0.0202; CI = 0.57 ± 2.576(0.0202) = (0.518, 0.622)
- (a) Point estimate = (45+55)/2 = 50 (b) MOE = (55-45)/2 = 5
- MOE decreases by factor of √(900/100) = 3; new MOE is 1/3 of original
- n = 0.5(0.5)(1.96/0.03)² = 0.25(4268.4) = 1067.1, so n = 1068
- Convenience samples are biased because they don't give everyone in the population an equal chance of being selected. Results can't be generalized to the population.
- We cannot conclude the mean IS $50,000. We can only say $50,000 is a plausible value since it's within the interval. Many values are plausible.
- B's interval will be narrower because larger sample sizes produce smaller margins of error.
- SE = σ/√n = 15/√225 = 15/15 = 1
- Response bias due to leading question wording. The question pushes respondents toward agreeing.
✅ Check Your Understanding
Question 1: What is the difference between a parameter and a statistic?
Show answer
A parameter is a numerical value that describes a characteristic of the entire population (e.g., population mean μ). It is usually unknown and fixed.
A statistic is a numerical value calculated from sample data (e.g., sample mean x̄). It is known but varies from sample to sample.
We use statistics to estimate parameters.
Question 2: Why is random sampling important for statistical inference?
Show answer
Random sampling is essential because:
- It eliminates selection bias by giving everyone an equal chance of being chosen
- It allows us to use probability theory to quantify uncertainty
- It makes our sample representative of the population
- It justifies generalizing from sample to population
Without random sampling, we cannot make valid inferences about the population.
Question 3: How does increasing the confidence level affect the width of a confidence interval?
Show answer
Increasing the confidence level makes the interval wider. To be more confident that we capture the true parameter, we need a larger range of values.
For example, a 99% CI is wider than a 95% CI for the same data because we need more "room" to be more certain we've captured the parameter.
The trade-off: Higher confidence = wider (less precise) interval.
Question 4: What does the Central Limit Theorem tell us and why is it important?
Show answer
The Central Limit Theorem states that for large enough sample sizes, the sampling distribution of the sample mean is approximately normal, regardless of the population's shape.
This is crucial because:
- It allows us to use normal distribution methods even when the population isn't normal
- It provides the mathematical foundation for confidence intervals and hypothesis tests
- It explains why the normal distribution appears so often in statistics
- It tells us exactly how much sample means vary (via the standard error formula)
🚀 Next Steps
- Review any concepts that felt challenging
- Move on to the next lesson when ready
- Return to practice problems periodically for review