Unit Quiz: Advanced Data Analysis
Quiz Instructions
This comprehensive quiz tests your understanding of all concepts covered in the Advanced Data Analysis unit. Complete all 12 questions, then check your answers using the answer key at the bottom of the page.
Topics Covered
- Correlation and correlation coefficients
- Linear regression and predictions
- Residuals and their interpretation
- Coefficient of determination (r-squared)
- Statistical inference and confidence intervals
- Common mistakes and misinterpretations
Suggested Time: 25-30 minutes
Quiz Questions
Question 1
A researcher finds r = 0.72 between hours of exercise per week and cardiovascular health scores for 200 adults. Calculate the coefficient of determination and interpret it in context.
Question 2
The regression equation for predicting test scores (y) from study hours (x) is y = 45 + 6x. A student who studied 7 hours scored 82 on the test. Calculate the residual and explain what it indicates about this student's performance.
Question 3
A survey of 900 likely voters found that 48% support Candidate A. The margin of error is 3.2%. Construct the 95% confidence interval and determine whether the data supports the claim that Candidate A has majority support.
Question 4
Which of the following correlations indicates the strongest relationship?
A) r = 0.65 B) r = -0.89 C) r = 0.72 D) r = -0.55
Question 5
A regression model was built using data where x ranged from 20 to 80. A student uses the model to predict y when x = 150. What concern should be raised about this prediction?
Question 6
The correlation between a city's population and the number of hospitals is r = 0.94. Which statement correctly interprets this finding?
A) Building more hospitals causes population growth
B) There is a strong positive association between population and number of hospitals
C) Larger populations cause cities to build more hospitals
D) 94% of hospitals are located in large cities
Question 7
A company models sales revenue (in thousands) using the equation: Revenue = 120 + 8.5(Advertising), where advertising is in thousands of dollars. If they spend $25,000 on advertising, what is the predicted revenue? Show your work.
Question 8
Three polls show different results:
- Poll A: 52% (MOE: 4%)
- Poll B: 47% (MOE: 3%)
- Poll C: 49% (MOE: 2%)
Calculate the confidence interval for each poll. Do all three intervals have any overlap? What does this tell you about whether the polls are consistent?
Question 9
A student claims: "Since r-squared = 0.81, the correlation must be r = 0.81." Identify and correct the error in this reasoning.
Question 10
Data shows a strong positive correlation between the number of firefighters at a scene and the dollar amount of fire damage. A journalist writes: "Sending more firefighters leads to more damage." Explain why this conclusion is flawed and identify the likely lurking variable.
Question 11
The regression equation for predicting height (in inches) from shoe size is: Height = 50 + 1.8(ShoeSize). Interpret the slope and y-intercept in context. Is the y-intercept meaningful in this situation?
Question 12
A researcher reports a 95% confidence interval of (0.38, 0.46) for the proportion of adults who exercise regularly.
a) What is the sample proportion?
b) What is the margin of error?
c) A fitness company claims that "at least half of adults exercise regularly." Does this confidence interval support or refute this claim? Explain.
Answer Key
1. r-squared = (0.72)^2 = 0.5184 or about 51.84%. This means approximately 52% of the variation in cardiovascular health scores can be explained by the linear relationship with hours of exercise per week. The remaining 48% is due to other factors.
2. Predicted score = 45 + 6(7) = 45 + 42 = 87. Residual = Actual - Predicted = 82 - 87 = -5. The negative residual indicates the student scored 5 points below what the model predicted for someone who studied 7 hours.
3. Confidence interval: 48% plus or minus 3.2% = (44.8%, 51.2%). Since the interval includes values below 50%, the data does NOT support the claim that Candidate A has majority support. The true proportion could be anywhere from about 45% to 51%.
4. B) r = -0.89. Strength is determined by the absolute value. |-0.89| = 0.89 is the largest absolute value among the choices, indicating the strongest relationship (even though it is negative).
5. This is extrapolation beyond the range of the data. The model was built using x values from 20 to 80, so predicting at x = 150 is unreliable. The linear relationship may not hold that far outside the observed range, and the prediction could be very inaccurate.
6. B) There is a strong positive association between population and number of hospitals. This is the only answer that correctly describes correlation without implying causation. Options A and C incorrectly claim causation, and D misinterprets the correlation value.
7. Since advertising is in thousands, use x = 25: Revenue = 120 + 8.5(25) = 120 + 212.5 = 332.5. The predicted revenue is $332,500 (since revenue is also in thousands).
8. Poll A: (48%, 56%), Poll B: (44%, 50%), Poll C: (47%, 51%). Looking for overlap: Poll A's lower bound (48%) and Poll C's interval (47%-51%) overlap between 48% and 51%. Poll B (44%-50%) overlaps with Poll C between 47% and 50%. All three have some overlap in the range of approximately 48% to 50%. This suggests the polls are reasonably consistent with each other.
9. The error is confusing r and r-squared. If r-squared = 0.81, then r = plus or minus the square root of 0.81 = plus or minus 0.9. The correlation is 0.9 (or -0.9), not 0.81. Without additional context, we cannot determine the sign.
10. The journalist is making a correlation-causation error. The lurking variable is fire size/severity. Larger fires require more firefighters AND cause more damage. The firefighters are responding to damage that is already occurring or inevitable; they do not cause the damage. This is a classic example of confounding.
11. Slope: For each one-unit increase in shoe size, predicted height increases by 1.8 inches. Y-intercept: When shoe size is 0, predicted height is 50 inches. The y-intercept is NOT meaningful in this context because a shoe size of 0 is outside any realistic range and the relationship would not extend that far.
12.
a) Sample proportion = (0.38 + 0.46) / 2 = 0.42 or 42%
b) Margin of error = (0.46 - 0.38) / 2 = 0.04 or 4%
c) The claim is refuted. The entire confidence interval (38% to 46%) is below 50%, so we are 95% confident that the true proportion of adults who exercise regularly is less than half. The company's claim is not supported by this data.
Next Steps
- Score your quiz: 11-12 correct = Excellent, 9-10 = Good, 7-8 = Review needed, Below 7 = Return to earlier lessons
- Review any questions you missed by returning to the relevant lesson
- If you scored well, you are ready to move on to the next unit
- Consider retaking this quiz before standardized tests as a refresher
Congratulations!
You have completed the Advanced Data Analysis unit. These skills are essential for success on the SAT and ACT Problem Solving and Data Analysis sections. Continue practicing with real test questions to build speed and confidence.