Data Distribution | Open Textbooks

What is Data Distribution?

Distribution Describes How Data is Spread Out

When we look at a data set, we want to understand not just the center (mean, median, mode) but also how the values are spread across the number line.

Data distribution tells us the story of our data: Where are most values? How spread out are they? Are there any unusual values? Let's explore the key features of data distribution.

Range: Measuring the Spread

The Range Formula

Range = Highest Value - Lowest Value

The range tells us how spread out the data is from end to end.

For example, if test scores in a class go from 65 to 98:

Range = 98 - 65 = 33 points

What Range Tells Us

Small range = Data values are close together (low variability)
Large range = Data values are spread far apart (high variability)
Range only uses the two extreme values - it doesn't tell us about the middle!

Visualizing Data with Dot Plots

A dot plot (also called a line plot) shows each data value as a dot above a number line. This helps us SEE the distribution!

Example: Quiz Scores (out of 10)

Data: 6, 7, 7, 8, 8, 8, 8, 9, 9, 10 | Range = 10 - 6 = 4

Reading Tip: Count the dots! Each dot represents one data value. Stack dots vertically when values repeat.

Shapes of Data Distribution

When we look at a dot plot or histogram, we can describe the overall shape of how the data is distributed.

Symmetric

Left and right sides are mirror images. Data is balanced around the center.

Skewed Right

Most data on the left, with a "tail" stretching to the right (higher values).

Skewed Left

Most data on the right, with a "tail" stretching to the left (lower values).

Memory Trick: The direction of the "skew" is where the tail points, NOT where most of the data is!

Clusters, Gaps, and Outliers

🔵

Clusters

Groups of data points that are close together. Shows where values concentrate.

➖

Gaps

Empty spaces in the data where no values appear. May indicate something unusual.

⭐

Outliers

Values that are far away from the rest of the data. These are unusual or extreme values.

Example: Showing Clusters, Gaps, and an Outlier

This data has two clusters (around 20-25 and 35-40), a gap (no values between 25-35), and one outlier (55).

Why Distribution Features Matter

Clusters might show different groups in your data (e.g., two classes combined)
Gaps might indicate missing data or a real absence of certain values
Outliers might be errors OR genuinely unusual values worth investigating
Always ask: "What story is this distribution telling me?"

Worked Examples

Let's practice analyzing data distributions step by step.

Example 1: Finding the Range

Find the range of these test scores: 72, 85, 91, 68, 79, 88, 95, 76

1

Find the highest value:
Looking at all values: 72, 85, 91, 68, 79, 88, 95, 76

Highest = 95

2

Find the lowest value:

Lowest = 68

3

Calculate the range:

Range = 95 - 68 = 27

The range is 27 points.

Example 2: Identifying the Shape

A class recorded how many books they read last month: 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4, 5, 8, 12
Describe the shape of this distribution.

1

Visualize or organize the data:
Most values are clustered at 1-5 books.
There are a few values (8, 12) that are much higher.

2

Look for the peak and tail:
The peak (most common values) is on the left (lower numbers).
The tail extends to the right (toward higher numbers).

3

Identify the shape:
Peak on left + tail on right = Skewed Right

The distribution is skewed right. Most students read 1-5 books, but a few read many more.

Example 3: Finding Clusters, Gaps, and Outliers

Students' ages at a family reunion: 5, 6, 6, 7, 8, 35, 36, 37, 38, 39, 40, 72
Identify any clusters, gaps, and outliers.

1

Look for clusters (groups of close values):
Cluster 1: 5, 6, 6, 7, 8 (children, ages 5-8)
Cluster 2: 35, 36, 37, 38, 39, 40 (adults, ages 35-40)

2

Look for gaps (empty spaces):
Gap from age 9 to age 34 - no teenagers or young adults!

3

Look for outliers (values far from the rest):
Age 72 is far from both clusters - this is an outlier (maybe a grandparent!)

Two clusters (children 5-8, adults 35-40), one large gap (ages 9-34), and one outlier (72).

Example 4: Comparing Two Data Sets

Class A quiz scores: 6, 7, 7, 8, 8, 8, 9, 9, 10 (Range = 4)
Class B quiz scores: 3, 5, 7, 8, 8, 9, 10, 10, 10 (Range = 7)
Which class had more consistent performance?

1

Compare the ranges:
Class A: Range = 10 - 6 = 4
Class B: Range = 10 - 3 = 7

2

Interpret:
Class A has a smaller range, meaning scores are closer together.
Class B has a larger range, meaning scores are more spread out.

Class A had more consistent performance with a range of only 4 points, compared to Class B's range of 7 points.

Practice Problems

Try these problems to practice analyzing data distributions.

Problem 1: Calculate the Range

Find the range of these temperatures (in Fahrenheit): 45, 52, 48, 61, 55, 58, 49

Range =

Problem 2: Identify the Shape

A data set has most values between 80-100, with a few values around 40-50. What shape is this distribution?

Problem 3: Identify Outliers

Heights of plants in cm: 12, 14, 13, 15, 14, 13, 14, 32, 14, 15. Which value is most likely an outlier?

Outlier =

Problem 4: Compare Variability

Team A ages: 22, 24, 23, 25, 24 (Range = 3)
Team B ages: 18, 22, 25, 30, 35 (Range = 17)
Which team has MORE variability in ages?

Check Your Understanding: Distribution Challenge

Test your data distribution skills with this 6-question challenge!

Data Distribution Challenge

Score: 0 / 6

Question 1 of 6

Challenge Complete!

0/6

Next Steps

Key Takeaways:

Range = Highest - Lowest measures how spread out data is
Symmetric distributions are balanced; skewed distributions have a tail
Remember: The skew direction is where the tail points!
Clusters are groups, gaps are empty spaces, outliers are extreme values
Distribution helps us understand the full picture, not just the center

Practice identifying distribution shapes in real-world data
Look for patterns in graphs you see in news and science articles
This is the last lesson in the Statistics Introduction unit - review both lessons!
Continue exploring more advanced statistics concepts in the next unit