Data Distribution
Learn how to describe the way data is spread out, identify patterns, and recognize the shape of data sets.
What is Data Distribution?
Distribution Describes How Data is Spread Out
When we look at a data set, we want to understand not just the center (mean, median, mode) but also how the values are spread across the number line.
Data distribution tells us the story of our data: Where are most values? How spread out are they? Are there any unusual values? Let's explore the key features of data distribution.
Range: Measuring the Spread
The Range Formula
The range tells us how spread out the data is from end to end.
For example, if test scores in a class go from 65 to 98:
What Range Tells Us
- Small range = Data values are close together (low variability)
- Large range = Data values are spread far apart (high variability)
- Range only uses the two extreme values - it doesn't tell us about the middle!
Visualizing Data with Dot Plots
A dot plot (also called a line plot) shows each data value as a dot above a number line. This helps us SEE the distribution!
Example: Quiz Scores (out of 10)
Data: 6, 7, 7, 8, 8, 8, 8, 9, 9, 10 | Range = 10 - 6 = 4
Shapes of Data Distribution
When we look at a dot plot or histogram, we can describe the overall shape of how the data is distributed.
Symmetric
Left and right sides are mirror images. Data is balanced around the center.
Skewed Right
Most data on the left, with a "tail" stretching to the right (higher values).
Skewed Left
Most data on the right, with a "tail" stretching to the left (lower values).
Clusters, Gaps, and Outliers
Clusters
Groups of data points that are close together. Shows where values concentrate.
Gaps
Empty spaces in the data where no values appear. May indicate something unusual.
Outliers
Values that are far away from the rest of the data. These are unusual or extreme values.
Example: Showing Clusters, Gaps, and an Outlier
This data has two clusters (around 20-25 and 35-40), a gap (no values between 25-35), and one outlier (55).
Why Distribution Features Matter
- Clusters might show different groups in your data (e.g., two classes combined)
- Gaps might indicate missing data or a real absence of certain values
- Outliers might be errors OR genuinely unusual values worth investigating
- Always ask: "What story is this distribution telling me?"
Worked Examples
Let's practice analyzing data distributions step by step.
Example 1: Finding the Range
Looking at all values: 72, 85, 91, 68, 79, 88, 95, 76
Example 2: Identifying the Shape
Describe the shape of this distribution.
Most values are clustered at 1-5 books.
There are a few values (8, 12) that are much higher.
The peak (most common values) is on the left (lower numbers).
The tail extends to the right (toward higher numbers).
Peak on left + tail on right = Skewed Right
Example 3: Finding Clusters, Gaps, and Outliers
Identify any clusters, gaps, and outliers.
Cluster 1: 5, 6, 6, 7, 8 (children, ages 5-8)
Cluster 2: 35, 36, 37, 38, 39, 40 (adults, ages 35-40)
Gap from age 9 to age 34 - no teenagers or young adults!
Age 72 is far from both clusters - this is an outlier (maybe a grandparent!)
Example 4: Comparing Two Data Sets
Class B quiz scores: 3, 5, 7, 8, 8, 9, 10, 10, 10 (Range = 7)
Which class had more consistent performance?
Class A: Range = 10 - 6 = 4
Class B: Range = 10 - 3 = 7
Class A has a smaller range, meaning scores are closer together.
Class B has a larger range, meaning scores are more spread out.
Practice Problems
Try these problems to practice analyzing data distributions.
Problem 1: Calculate the Range
Find the range of these temperatures (in Fahrenheit): 45, 52, 48, 61, 55, 58, 49
Problem 2: Identify the Shape
A data set has most values between 80-100, with a few values around 40-50. What shape is this distribution?
Problem 3: Identify Outliers
Heights of plants in cm: 12, 14, 13, 15, 14, 13, 14, 32, 14, 15. Which value is most likely an outlier?
Problem 4: Compare Variability
Team A ages: 22, 24, 23, 25, 24 (Range = 3)
Team B ages: 18, 22, 25, 30, 35 (Range = 17)
Which team has MORE variability in ages?
Check Your Understanding: Distribution Challenge
Test your data distribution skills with this 6-question challenge!
Data Distribution Challenge
Challenge Complete!
Next Steps
Key Takeaways:
- Range = Highest - Lowest measures how spread out data is
- Symmetric distributions are balanced; skewed distributions have a tail
- Remember: The skew direction is where the tail points!
- Clusters are groups, gaps are empty spaces, outliers are extreme values
- Distribution helps us understand the full picture, not just the center
- Practice identifying distribution shapes in real-world data
- Look for patterns in graphs you see in news and science articles
- This is the last lesson in the Statistics Introduction unit - review both lessons!
- Continue exploring more advanced statistics concepts in the next unit