Grade: Grade 11 Subject: Mathematics Unit: Advanced Data Analysis SAT: ProblemSolving+DataAnalysis ACT: Math

Word Problems: Data Analysis

Overview

This lesson presents real-world scenarios that require you to apply data analysis concepts. These problems mirror the types of questions you will encounter on the SAT and ACT, where data is presented in context and you must determine the appropriate analysis technique.

Problem-Solving Strategy

  1. Read carefully: Identify what the problem is asking for
  2. Identify the data type: Is this about correlation, regression, or inference?
  3. Extract key information: Find the numbers and relationships given
  4. Choose the right tool: Select the appropriate formula or method
  5. Interpret your answer: Make sure your answer makes sense in context

Worked Examples

Example 1: Sales Analysis

Problem: A marketing team tracked advertising spending (in thousands of dollars) and monthly sales (in thousands of dollars) for 12 months. They found a correlation of r = 0.89 and developed the regression equation: Sales = 45 + 3.2(Advertising). If they spend $15,000 on advertising, what sales can they predict?

Solution:

  1. Identify: This is a prediction problem using regression.
  2. Note that x = 15 (since units are in thousands).
  3. Substitute: Sales = 45 + 3.2(15) = 45 + 48 = 93
  4. Answer: They can predict $93,000 in sales.
  5. Note: The r = 0.89 indicates this is a strong positive relationship, so the prediction is reasonably reliable.

Example 2: Survey Analysis

Problem: A news organization surveyed 1,200 registered voters and found that 54% approve of a new policy. The margin of error is plus or minus 2.8%. A politician claims that "a majority definitely support this policy." Is this claim justified?

Solution:

  1. Calculate the confidence interval: 54% plus or minus 2.8%
  2. Lower bound: 54% - 2.8% = 51.2%
  3. Upper bound: 54% + 2.8% = 56.8%
  4. Check if the interval is entirely above 50%: Yes, 51.2% > 50%
  5. Answer: The claim is statistically justified. We can be confident that the true proportion exceeds 50% because the entire confidence interval is above the majority threshold.

Practice Problems

Problem 1: College Admissions

A university admissions office found that the correlation between SAT scores and first-year GPA is r = 0.68. The regression equation is GPA = 0.8 + 0.002(SAT). What GPA would you predict for a student with an SAT score of 1200? What percentage of the variation in GPA is explained by SAT scores?

Problem 2: Medical Study

Researchers studying a new medication surveyed 500 patients and found that 72% reported improvement. With a margin of error of 4%, construct the confidence interval. Can the researchers claim that at least two-thirds of patients experience improvement?

Problem 3: Environmental Science

A scientist studying climate data found r = 0.76 between CO2 levels and average temperature over 50 years. If the regression slope is 0.015 degrees per ppm of CO2, and the y-intercept is 12 degrees, predict the temperature when CO2 is at 420 ppm.

Problem 4: Sports Analytics

A basketball analyst found that a team's winning percentage has a correlation of r = 0.82 with their average points per game. If the regression equation is WinPct = -0.45 + 0.012(PPG), what winning percentage would you predict for a team averaging 110 points per game?

Problem 5: Economics

An economist models housing prices using the equation: Price = 50000 + 125(SquareFeet). A house with 2,000 square feet sold for $310,000. Calculate the residual and interpret what it means about this particular house.

Problem 6: Education Research

A study of 800 students found that 45% prefer online learning. The margin of error is 3.5%. A school board member says "Less than half of students prefer online learning." Is this statement supported by the data?

Problem 7: Retail Analysis

A store found r = -0.71 between the price of a product and the number of units sold. Interpret this correlation. If they want to maximize revenue (Price times Units), should they simply lower prices? Explain your reasoning.

Problem 8: Health Study

A nutritionist collected data on daily sugar intake (grams) and body weight. The correlation was r = 0.55. Calculate the coefficient of determination and explain why other factors beyond sugar intake affect body weight.

Problem 9: Manufacturing

A factory's regression equation for predicting production costs is: Cost = 5000 + 12(Units). They produced 450 units last month at an actual cost of $10,200. Calculate the residual. Was their production more or less efficient than the model predicted?

Problem 10: Political Polling

Three polls show: Poll A: 48% (MOE 3%), Poll B: 52% (MOE 4%), Poll C: 50% (MOE 2.5%). Construct confidence intervals for each. Do any of the intervals fail to overlap with the others? What does this suggest?

Answer Key

1. GPA = 0.8 + 0.002(1200) = 0.8 + 2.4 = 3.2. The r squared value is 0.4624, so about 46.24% of GPA variation is explained by SAT scores.

2. Confidence interval: 72% plus or minus 4% = (68%, 76%). Yes, since the lower bound of 68% exceeds 66.67% (two-thirds), the claim is justified.

3. Temperature = 12 + 0.015(420) = 12 + 6.3 = 18.3 degrees

4. WinPct = -0.45 + 0.012(110) = -0.45 + 1.32 = 0.87, or 87% winning percentage

5. Predicted = 50000 + 125(2000) = $300,000. Residual = 310000 - 300000 = $10,000. This house sold for $10,000 more than predicted, suggesting it has features that add value beyond just square footage.

6. Confidence interval: 45% plus or minus 3.5% = (41.5%, 48.5%). Since the entire interval is below 50%, the statement is supported by the data.

7. The negative correlation means higher prices are associated with fewer units sold. However, maximizing revenue is not about minimizing price; there is an optimal price point. Simply lowering prices might increase units but could decrease total revenue if the percentage increase in units is less than the percentage decrease in price.

8. r squared = 0.3025, meaning only about 30% of the variation in body weight is explained by sugar intake. The remaining 70% is influenced by genetics, overall calorie intake, exercise, metabolism, and other factors.

9. Predicted = 5000 + 12(450) = $10,400. Residual = 10200 - 10400 = -$200. The negative residual means actual costs were $200 less than predicted, indicating more efficient production than expected.

10. Poll A: (45%, 51%), Poll B: (48%, 56%), Poll C: (47.5%, 52.5%). All intervals overlap with each other, suggesting the polls are consistent. The true proportion likely falls in the overlapping region around 48-51%.

Next Steps

  • Practice reading problems carefully to identify the type of analysis needed
  • Work on speed - these skills will be tested under time pressure on standardized tests
  • Move on to Common Mistakes to learn what pitfalls to avoid
  • Return to these problems for review before tests