Title at the top reads Simpson's Paradox. On the left, a hand-drawn scatterplot shows several separate clusters that each slope upward, while the clusters themselves step downward from top-left to bottom-right—so subgroup trends oppose the overall trend. On the right, text reads: The direction of the trend would change if you zoomed in on a section of this graph. Website critikid.com appears in the bottom corner.

Simpson's Paradox

Simpson's Paradox is a situation in statistics where a trend appears in different groups of data but reverses or disappears when the groups are combined.

Imagine two tutoring centers, Center A and Center B, are helping students pass an exam. Overall, Center B has a higher pass rate than Center A:

Center # Students # Passed Pass Rate
A 100 43 43%
B 100 62 62%

At first glance, Center B seems to be more successful. But, let's divide the students into two groups: those taking the exam for the first time and those who are retaking it. Among first-timers, Center A has a higher pass rate. Among repeat takers, Center A also has a higher pass rate.

Group Center # Students # Passed Pass Rate
First-time takers A 80 28 35%
First-time takers B 20 6 30%
Repeat takers A 20 15 75%
Repeat takers B 80 56 70%

The reason for this apparent contradiction is that Center A has a higher percentage of first-time test takers. Since first-time test takers are less likely to pass, Center A has a lower pass rate overall.

Simpson's Paradox teaches us that it's important to analyze subgroups within data for possible hidden variables.

To see a real-life example of Simpson's paradox, read my blog post about The Kidney Conundrum.

Back to the Data Analysis Handbook


Courses

US$15

US$15

US$15

US$15

US$10

Worksheets

US$10

US$5

US$10

US$10

US$10

US$10

US$10

US$5

US$5