When you collect data and plot it on a graph, you often notice that the points don't form a perfect line or curve, but they do seem to follow a general pattern or direction. A trend line, also called a line of best fit, is a straight line drawn through or near the data points on a scatter plot to represent the overall direction or pattern of the data. Trend lines help us see relationships between two variables, make predictions about values we haven't observed yet, and understand whether one variable tends to increase or decrease as the other changes. In this chapter, you'll learn what trend lines are, how to identify them, how to draw them, and how to use them to make predictions and interpret real-world data.
Before we can talk about trend lines, we need to understand scatter plots. A scatter plot is a graph that shows the relationship between two numerical variables. Each point on a scatter plot represents one observation or data pair. The independent variable (the one we think might be causing a change) is usually placed on the horizontal axis (the x-axis), and the dependent variable (the one that responds to changes) is placed on the vertical axis (the y-axis).
Think of a scatter plot like a star map. Each star has a position based on two coordinates, and when you look at all the stars together, you might notice patterns or clusters.
Scatter plots help us answer questions like:
Example: A teacher records the number of hours students studied and their test scores.
The data is: (1, 65), (2, 70), (3, 75), (4, 80), (5, 85), where the first number is hours studied and the second is the test score.How would you create a scatter plot for this data?
Solution:
Draw a coordinate plane with the horizontal axis labeled "Hours Studied" and the vertical axis labeled "Test Score."
Plot each data pair as a point: the point (1, 65) means 1 hour studied corresponds to a score of 65.
Mark all five points: (1, 65), (2, 70), (3, 75), (4, 80), and (5, 85).
The scatter plot shows five points that appear to follow an upward pattern from left to right.
Once you have a scatter plot, the next step is to look for patterns. Not all scatter plots show clear relationships, but when they do, we describe them using three characteristics: direction, form, and strength.
The direction of a relationship tells us whether the dependent variable tends to increase or decrease as the independent variable increases.
The form describes the shape of the pattern. In this chapter, we focus on linear patterns, where the points roughly follow a straight line. Some data sets show nonlinear patterns (curved), but those require different methods for analysis.
The strength of a relationship describes how closely the points cluster around a trend line.
Example: A scientist measures the temperature outside (in °F) and the number of hot chocolate cups sold at a café.
The scatter plot shows points that slope downward from left to right and are fairly close together.Describe the association.
Solution:
The direction is negative because as temperature increases, hot chocolate sales decrease.
The form is linear because the points roughly follow a straight line.
The strength is strong because the points are close together along that line.
The scatter plot shows a strong, negative, linear association.
A trend line is a straight line that we draw on a scatter plot to model the general pattern or trend in the data. The goal of a trend line is to summarize the relationship between the two variables with a single line, even though most of the data points won't fall exactly on that line.
Trend lines are useful because they allow us to:
Imagine you're watching runners on a track. Even though each runner has slightly different speeds and positions at any moment, you could imagine an "average path" that represents the group's overall direction. A trend line does the same for data points.
There are mathematical methods to calculate the exact "best" trend line (called the least-squares regression line), but you can also draw a reasonable trend line by hand using these steps:
A good trend line has roughly the same number of points above it as below it. You want to minimize the total distance from all points to the line.
Example: A student collects data on the age of used cars (in years) and their selling price (in thousands of dollars).
The data points are: (1, 18), (2, 16), (3, 14), (4, 13), (5, 10), (6, 9), (7, 7).Draw a trend line by hand and describe the process.
Solution:
Plot the seven points on a coordinate plane with age on the x-axis and price on the y-axis.
Observe that the points slope downward from left to right, showing a negative association.
Draw a straight line that starts near the top-left and moves to the bottom-right, passing close to as many points as possible.
Check that roughly half the points are above the line and half are below it. Adjust if needed.
The trend line shows a clear negative relationship between car age and price.
Once you have a trend line, you can describe it with an equation. Since a trend line is a straight line, it follows the form of a linear equation:
\[ y = mx + b \]where:
The slope \( m \) tells you the rate of change. A positive slope means the line goes upward (positive association), and a negative slope means the line goes downward (negative association).
The y-intercept \( b \) tells you where the line crosses the y-axis.
If you draw a trend line by hand, you can estimate the slope and y-intercept by choosing two points on your line (not necessarily data points, but points that lie on the line you drew) and using the slope formula:
\[ m = \frac{y_2 - y_1}{x_2 - x_1} \]Then, once you know \( m \), substitute one of the points into \( y = mx + b \) and solve for \( b \).
Example: A trend line passes through the points (2, 10) and (6, 18) on a scatter plot.
What is the equation of the trend line?
Solution:
Find the slope using the two points:
\[ m = \frac{18 - 10}{6 - 2} = \frac{8}{4} = 2 \]Use the point (2, 10) and the slope in the equation \( y = mx + b \):
\[ 10 = 2(2) + b \] \[ 10 = 4 + b \] \[ b = 6 \]Write the equation by substituting \( m = 2 \) and \( b = 6 \):
\[ y = 2x + 6 \]The equation of the trend line is \( y = 2x + 6 \).
One of the most powerful uses of a trend line is making predictions. Once you have the equation of your trend line, you can substitute any value of \( x \) to predict the corresponding value of \( y \).
Interpolation means predicting a value within the range of your data. This is generally more reliable because you're staying within the pattern you observed.
Extrapolation means predicting a value outside the range of your data. This can be less reliable because the pattern might change beyond the data you collected.
Example: A trend line for the relationship between hours of study (\( x \)) and test score (\( y \)) is given by the equation \( y = 4x + 60 \).
Predict the test score for a student who studies for 7 hours.
Solution:
Substitute \( x = 7 \) into the equation:
\[ y = 4(7) + 60 \]Multiply:
\[ y = 28 + 60 \]Add:
\[ y = 88 \]A student who studies for 7 hours is predicted to score 88 points.
Example: The equation \( y = -3x + 50 \) models the relationship between the number of absences (\( x \)) and a student's final grade (\( y \)).
What does the slope mean in this context?
Solution:
The slope is \( m = -3 \).
This means that for each additional absence, the final grade decreases by 3 points.
The slope represents a decrease of 3 points per absence.
The y-intercept is the value of \( y \) when \( x = 0 \). In real-world problems, the y-intercept often has a meaningful interpretation, but sometimes it doesn't make practical sense.
Example: The trend line \( y = 5x + 20 \) represents the total cost (\( y \), in dollars) of renting a bike for \( x \) hours.
What does the y-intercept represent?
Solution:
The y-intercept is \( b = 20 \).
This is the value of \( y \) when \( x = 0 \), meaning when 0 hours are rented.
The y-intercept represents a flat rental fee of $20, charged even before any hours are used.
While trend lines are very useful, there are important cautions to keep in mind:
Imagine you observe that ice cream sales and drowning incidents both increase in the summer. A scatter plot might show a positive association, but eating ice cream doesn't cause drowning-both are influenced by warmer weather.
Graphing calculators, spreadsheet programs, and statistical software can calculate the exact equation of the trend line using a method called least-squares regression. This method finds the line that minimizes the sum of the squares of the vertical distances from each data point to the line.
When you use technology, the output typically includes:
A correlation coefficient close to 1 means a strong positive association, close to -1 means a strong negative association, and close to 0 means little or no linear association.
Example: A graphing calculator gives the trend line \( y = -2.3x + 45 \) with a correlation coefficient \( r = -0.89 \) for a data set relating hours of screen time (\( x \)) to hours of sleep (\( y \)).
Interpret the slope and the correlation coefficient.
Solution:
The slope is \( m = -2.3 \), meaning for each additional hour of screen time, sleep decreases by about 2.3 hours.
The correlation coefficient is \( r = -0.89 \), which is close to -1.
This indicates a strong negative linear association between screen time and sleep hours.
After drawing or calculating a trend line, it's good practice to check how well it fits the data. You can do this visually by seeing if the points are evenly distributed above and below the line, or numerically by examining the correlation coefficient or another measure called the coefficient of determination (\( r^2 \)), which tells you what percentage of the variation in \( y \) is explained by \( x \).
A trend line is a useful model when:
Trend lines are used in many fields:
A city planner might use a trend line to predict future traffic volume based on population growth, helping decide when to expand roads.
Example: A company finds that the trend line \( y = 12x + 150 \) models weekly profit (\( y \), in dollars) based on the number of ads posted online (\( x \)).
If the company posts 20 ads in a week, what profit can they expect?
Solution:
Substitute \( x = 20 \) into the equation:
\[ y = 12(20) + 150 \]Multiply:
\[ y = 240 + 150 \]Add:
\[ y = 390 \]The company can expect a profit of $390 when posting 20 ads.
Trend lines are powerful tools for understanding and modeling relationships between two variables. By creating a scatter plot, identifying patterns, and drawing or calculating a trend line, you can summarize data, make predictions, and communicate findings clearly. Remember to interpret the slope and y-intercept in context, be cautious about extrapolation, and always keep in mind that correlation does not imply causation. With practice, you'll become skilled at using trend lines to analyze real-world data and solve meaningful problems.