Data tells a story, but sometimes that story is hard to see when you're looking at a long list of numbers. One powerful way to reveal patterns and relationships between two different measurements is to create a visual display called a scatterplot. A scatterplot (also called a scatter diagram or scatter graph) shows how two variables relate to each other by plotting points on a coordinate grid. Each point represents one observation or individual. By looking at the overall pattern of the points, you can quickly see whether the two variables tend to increase together, move in opposite directions, or have no clear relationship at all. Scatterplots are used everywhere-from analyzing the relationship between study time and test scores, to exploring connections between temperature and ice cream sales, to investigating patterns in scientific data.
A scatterplot is a graph that displays the relationship between two numerical variables. One variable is plotted along the horizontal axis (the \( x \)-axis), and the other variable is plotted along the vertical axis (the \( y \)-axis). Each individual data point is represented by a single dot (or other marker) on the graph. The position of each dot is determined by its \( x \)-value and \( y \)-value.
Unlike a line graph, where points are connected in sequence, the points in a scatterplot are not connected. Instead, we look at the overall pattern formed by all the dots together. This pattern helps us understand how the two variables are related.
Example: A teacher records the number of hours six students studied for a test and their test scores.
The data is: Student A (2 hours, 65 points), Student B (3 hours, 70 points), Student C (5 hours, 85 points), Student D (1 hour, 60 points), Student E (4 hours, 80 points), Student F (6 hours, 90 points).Create a scatterplot for this data.
Solution:
Step 1: Identify the variables. Hours studied will be the \( x \)-variable (independent), and test score will be the \( y \)-variable (dependent).
Step 2: Set up axes. The \( x \)-axis ranges from 0 to 7 hours. The \( y \)-axis ranges from 50 to 100 points.
Step 3: Plot each point using its \((x, y)\) coordinates: (2, 65), (3, 70), (5, 85), (1, 60), (4, 80), (6, 90).
Step 4: Label axes and add a title such as "Study Time vs. Test Score."
The scatterplot would show six dots positioned according to their coordinates, revealing a general upward pattern.
Creating an accurate and useful scatterplot requires careful attention to setting up your axes and plotting points precisely. Follow these steps to construct a scatterplot from a data set.
Example: The table below shows the height (in inches) and shoe size for seven people.
Height: 60, 65, 68, 70, 72, 74, 66
Shoe Size: 6, 7, 9, 10, 11, 12, 8Create a scatterplot to display this relationship.
Solution:
Step 1: Identify variables. Height is the \( x \)-variable, shoe size is the \( y \)-variable.
Step 2: Determine range. Height ranges from 60 to 74 inches. Shoe size ranges from 6 to 12.
Step 3: Set up axes. Use 55 to 75 on the \( x \)-axis (height) and 5 to 13 on the \( y \)-axis (shoe size).
Step 4: Plot seven points: (60, 6), (65, 7), (68, 9), (70, 10), (72, 11), (74, 12), (66, 8).
Step 5: Add title "Height vs. Shoe Size" and label axes as "Height (inches)" and "Shoe Size."
The completed scatterplot displays seven points showing how shoe size tends to increase as height increases.
The real value of a scatterplot comes from interpreting the pattern formed by the points. When you look at a scatterplot, you're trying to understand how the two variables are related. There are several key features to look for.
The direction of a relationship describes whether the variables tend to increase together, move in opposite directions, or show no clear pattern.
The form describes the shape of the pattern formed by the points.
The strength describes how closely the points follow the pattern.
When describing a scatterplot, statisticians typically state all three features: direction, form, and strength. For example: "The scatterplot shows a strong, positive, linear association between study time and test score."
Example: A scatterplot displays the relationship between the number of hours spent watching TV per week and grade point average (GPA) for 30 students.
The points slope downward from left to right and are moderately clustered around an imaginary straight line.Describe the association shown in this scatterplot.
Solution:
Direction: The pattern slopes downward, indicating that as TV watching time increases, GPA tends to decrease. This is a negative association.
Form: The points roughly follow a straight line, so this is a linear association.
Strength: The points are moderately clustered, so this is a moderate association.
The scatterplot shows a moderate, negative, linear association between TV watching time and GPA.
An outlier is a data point that stands apart from the overall pattern of the scatterplot. It is a point that doesn't fit with the rest of the data. Outliers are important to identify because they can significantly affect the interpretation of the relationship and any calculations based on the data.
When looking at a scatterplot, ask yourself: "Are there any points that are far away from where most of the other points are clustered?" An outlier might be:
Outliers can occur for several reasons:
It's important to investigate outliers to determine their cause. If an outlier is due to an error, it should be corrected or removed. If it's a genuine unusual case, it should be kept in the data but noted in your analysis. Outliers can sometimes reveal interesting and important information that deserves special attention.
Example: A scatterplot shows the relationship between age and reaction time for 25 people.
Most points show a positive linear pattern, with reaction time increasing gradually as age increases.
However, one point shows a 30-year-old person with an unusually slow reaction time, far above the general pattern.Is this point an outlier, and why might it exist?
Solution:
Yes, this point is an outlier because it is far from the pattern formed by the other 24 points.
Possible explanations could include: the person was distracted during the test, they misunderstood the instructions, the reaction time was recorded incorrectly, or they might have a medical condition affecting reaction time.
The researcher should investigate this data point to determine whether it represents a recording error or a genuine unusual case.
This point is an outlier that stands apart from the positive linear pattern shown by the other data points.
One of the most important principles in statistics is understanding the difference between correlation and causation. A scatterplot can show that two variables are related (correlated), but this does not automatically mean that one variable causes the other to change.
Correlation means that two variables tend to change together in a predictable way. When you see a clear pattern in a scatterplot, the variables are correlated. The correlation might be positive (both increase together) or negative (one increases while the other decreases).
Causation means that changes in one variable directly cause changes in the other variable. Establishing causation requires much more evidence than just observing a pattern in a scatterplot. To prove causation, you typically need a carefully designed experiment where you control other variables that might affect the relationship.
Just because two variables are correlated doesn't mean one causes the other. There are several possible explanations for a correlation:
Consider this example: A study finds a strong positive correlation between ice cream sales and drowning deaths. Does eating ice cream cause people to drown? No! The confounding variable here is temperature and season. During hot summer months, both ice cream sales and swimming activity increase, leading to more drowning incidents. The two variables are correlated but not causally related.
When interpreting scatterplots, always be cautious about claiming causation. Use phrases like "is associated with" or "is related to" rather than "causes" unless you have strong experimental evidence to support a causal claim.
Example: A scatterplot shows a strong positive linear association between the number of fire trucks that respond to a fire and the amount of damage caused by the fire.
Does this mean that fire trucks cause more fire damage?
Solution:
No, this would be an incorrect interpretation. While the scatterplot shows correlation, it does not show causation.
The confounding variable is the size or severity of the fire. Larger fires cause more damage AND require more fire trucks to respond.
The number of fire trucks doesn't cause the damage; rather, the severity of the fire causes both more trucks to respond and more damage to occur.
This scatterplot shows correlation, not causation. The fire severity is the confounding variable that explains both the truck response and the damage.
Scatterplots are used across many fields to explore and display relationships between variables. Understanding how to create and interpret them is a fundamental skill in data analysis.
When researchers collect data on two variables, creating a scatterplot is often the first analytical step. The visual display quickly reveals:
After examining the scatterplot, researchers can decide what additional statistical methods to apply, such as calculating a correlation coefficient or fitting a regression line. But the scatterplot always comes first because it provides essential visual insight that numbers alone cannot convey.
When working with scatterplots, be aware of these common errors:
Scatterplots are powerful tools for visualizing and understanding relationships between two numerical variables. By plotting each observation as a point on a coordinate grid, we can quickly see patterns that would be hidden in a table of numbers. When describing a scatterplot, we examine its direction (positive, negative, or none), form (linear or nonlinear), and strength (strong, moderate, or weak). We also look for outliers-points that don't fit the overall pattern.
While scatterplots can reveal correlation between variables, we must be careful not to jump to conclusions about causation. A strong association in a scatterplot tells us that two variables are related, but determining whether one actually causes the other requires additional investigation and carefully controlled experiments.
Mastering scatterplots gives you a fundamental skill for data analysis that applies across all fields where quantitative data is collected. Whether you're analyzing scientific experiments, business trends, or social patterns, the scatterplot provides an essential first look at how variables relate to each other.