Unit 2 of AP Statistics: Exploring Two-Variable Data.
This unit covers scatterplots, correlation and least-squares regression — essential concepts for AP Statistics. Use our interactive study games to test your understanding, or review questions in traditional format below.
Pick a mode. Play.
Answer questions as fast as you can. 2 minutes on the clock. Build streaks for bonus points!
Don't want to play?
Review the questions traditionally. Click to expand.
Questions loading...
Focus on understanding.
Focus on understanding core concepts before memorizing details. Use the game modes to test yourself repeatedly — spaced repetition is proven to boost long-term retention.
Related units
Ready for college?
This unit covers scatterplots, correlation and least-squares regression — essential concepts for AP Statistics. Use our interactive study games to test your understanding, or review questions in traditional format below.
Key Concepts Breakdown
1 Scatterplots
Scatterplots display the relationship between two quantitative variables, with the explanatory variable on the x-axis and the response variable on the y-axis. Students must describe associations using four characteristics: direction, form, strength, and outliers. The exam frequently asks students to interpret scatterplots in context.
Key Points
- Direction: positive (both increase together) or negative (one increases as other decreases)
- Form: linear vs. nonlinear (curved)
- Strength: how closely points follow the pattern (strong, moderate, weak)
- Always identify outliers — points that deviate from the overall pattern
A scatterplot of hours studied (x) vs. exam score (y) shows points rising from lower-left to upper-right with most points close to an imaginary line, except one point at (1 hour, 95%).
The direction is positive because exam scores tend to increase as study hours increase. The form appears linear and the strength is strong because points cluster tightly around the trend. The point at (1, 95) is an outlier because it does not follow the pattern — the student scored very high despite studying very little.
2 Correlation
The correlation coefficient r measures the direction and strength of a linear association between two quantitative variables. r is always between -1 and 1, and students must know what values of r indicate weak, moderate, or strong linear relationships. Critically, correlation does not imply causation, and r only measures linear association.
Key Points
- r close to +1 or -1 indicates a strong linear association; r near 0 indicates weak or no linear association
- r has no units and is not affected by changes in units or scale (it is not resistant to outliers)
- Correlation measures only linear relationships — a perfect curve can have r ≈ 0
- Switching x and y does not change r; both variables must be quantitative
A dataset of 10 students has r = 0.92 between hours of sleep and reaction time. A student concludes that sleeping more causes faster reaction times. Identify the error.
The error is confusing correlation with causation. While r = 0.92 indicates a strong positive linear association, an observational study cannot establish that sleep causes the change in reaction time. A lurking variable (such as overall health) could be responsible for both.
3 Least-Squares Regression
The least-squares regression line (LSRL) minimizes the sum of squared residuals and is used to predict the response variable from the explanatory variable. Students must be able to interpret the slope and y-intercept in context, calculate and interpret residuals, and understand the role of r² (coefficient of determination). The exam heavily tests interpretation, not just calculation.
Key Points
- LSRL equation: ŷ = a + bx, where b = r(Sy/Sx) and a = ȳ − bx̄
- Slope interpretation: 'For each additional [one unit of x], predicted [y] increases/decreases by [b] [units of y], on average'
- Residual = observed y − predicted ŷ; positive residual means the model underestimated
- r² = proportion of variation in y explained by the linear relationship with x (e.g., r² = 0.81 means 81% of variation in y is explained by x)
The LSRL for predicting weight (lbs) from height (inches) is ŷ = −100 + 3.5x, with r² = 0.64. A person is 68 inches tall and weighs 145 lbs. Find and interpret the residual.
The predicted weight is ŷ = −100 + 3.5(68) = 138 lbs. The residual is 145 − 138 = +7 lbs, meaning the model underestimated this person's weight by 7 lbs. The r² = 0.64 means that 64% of the variation in weight is explained by the linear relationship with height.
Questions, answered.
What is Exploring Two-Variable Data?
Exploring Two-Variable Data is Unit 2 of AP Statistics, covering scatterplots, correlation and least-squares regression.
How to study for AP Statistics Unit 2?
Start with the Quick Summary above, review the Key Concepts, then test yourself with our interactive study games. Aim for 80%+ accuracy before moving on.
How many questions are in this unit?
This unit has 28+ review questions across 5 different game modes.