Grade 9 Exam  >  Grade 9 Notes  >  Statistics & Probability  >  Chapter Notes: Introduction To Scatterplots

Chapter Notes: Introduction To Scatterplots

Data tells a story, but sometimes that story is hard to see when you're looking at a long list of numbers. One powerful way to reveal patterns and relationships between two different measurements is to create a visual display called a scatterplot. A scatterplot (also called a scatter diagram or scatter graph) shows how two variables relate to each other by plotting points on a coordinate grid. Each point represents one observation or individual. By looking at the overall pattern of the points, you can quickly see whether the two variables tend to increase together, move in opposite directions, or have no clear relationship at all. Scatterplots are used everywhere-from analyzing the relationship between study time and test scores, to exploring connections between temperature and ice cream sales, to investigating patterns in scientific data.

What Is a Scatterplot?

A scatterplot is a graph that displays the relationship between two numerical variables. One variable is plotted along the horizontal axis (the \( x \)-axis), and the other variable is plotted along the vertical axis (the \( y \)-axis). Each individual data point is represented by a single dot (or other marker) on the graph. The position of each dot is determined by its \( x \)-value and \( y \)-value.

Unlike a line graph, where points are connected in sequence, the points in a scatterplot are not connected. Instead, we look at the overall pattern formed by all the dots together. This pattern helps us understand how the two variables are related.

Key Components of a Scatterplot

  • Horizontal Axis (\( x \)-axis): Represents the independent variable or explanatory variable. This is often the variable you think might influence the other variable.
  • Vertical Axis (\( y \)-axis): Represents the dependent variable or response variable. This is often the variable you think might be affected by the other variable.
  • Data Points: Each dot represents one observation or individual. The coordinates of the dot are \((x, y)\), where \( x \) is the value of the first variable and \( y \) is the value of the second variable.
  • Title: Describes what the scatterplot shows.
  • Axis Labels: Clearly identify what each axis represents, including units of measurement.
  • Scale: Both axes should have evenly spaced tick marks with labeled values.

Example:  A teacher records the number of hours six students studied for a test and their test scores.
The data is: Student A (2 hours, 65 points), Student B (3 hours, 70 points), Student C (5 hours, 85 points), Student D (1 hour, 60 points), Student E (4 hours, 80 points), Student F (6 hours, 90 points).

Create a scatterplot for this data.

Solution:

Step 1: Identify the variables. Hours studied will be the \( x \)-variable (independent), and test score will be the \( y \)-variable (dependent).

Step 2: Set up axes. The \( x \)-axis ranges from 0 to 7 hours. The \( y \)-axis ranges from 50 to 100 points.

Step 3: Plot each point using its \((x, y)\) coordinates: (2, 65), (3, 70), (5, 85), (1, 60), (4, 80), (6, 90).

Step 4: Label axes and add a title such as "Study Time vs. Test Score."

The scatterplot would show six dots positioned according to their coordinates, revealing a general upward pattern.

Creating a Scatterplot

Creating an accurate and useful scatterplot requires careful attention to setting up your axes and plotting points precisely. Follow these steps to construct a scatterplot from a data set.

Step-by-Step Process

  1. Organize Your Data: Arrange your data in a table with two columns, one for each variable. Decide which variable will be the independent variable (\( x \)) and which will be the dependent variable (\( y \)).
  2. Determine the Range: Look at the minimum and maximum values for each variable. This helps you decide what scale to use on each axis.
  3. Draw and Label Axes: Draw a horizontal axis for the \( x \)-variable and a vertical axis for the \( y \)-variable. Label each axis with the variable name and units. Choose an appropriate scale that makes good use of the space and includes all your data points.
  4. Plot the Points: For each observation, locate the \( x \)-value on the horizontal axis and the \( y \)-value on the vertical axis. Mark a dot at the point where these two values meet.
  5. Add a Title: Give your scatterplot a descriptive title that clearly states what relationship is being displayed.

Example:  The table below shows the height (in inches) and shoe size for seven people.

Height: 60, 65, 68, 70, 72, 74, 66
Shoe Size: 6, 7, 9, 10, 11, 12, 8

Create a scatterplot to display this relationship.

Solution:

Step 1: Identify variables. Height is the \( x \)-variable, shoe size is the \( y \)-variable.

Step 2: Determine range. Height ranges from 60 to 74 inches. Shoe size ranges from 6 to 12.

Step 3: Set up axes. Use 55 to 75 on the \( x \)-axis (height) and 5 to 13 on the \( y \)-axis (shoe size).

Step 4: Plot seven points: (60, 6), (65, 7), (68, 9), (70, 10), (72, 11), (74, 12), (66, 8).

Step 5: Add title "Height vs. Shoe Size" and label axes as "Height (inches)" and "Shoe Size."

The completed scatterplot displays seven points showing how shoe size tends to increase as height increases.

Interpreting Scatterplots

The real value of a scatterplot comes from interpreting the pattern formed by the points. When you look at a scatterplot, you're trying to understand how the two variables are related. There are several key features to look for.

Direction of Association

The direction of a relationship describes whether the variables tend to increase together, move in opposite directions, or show no clear pattern.

  • Positive Association: When one variable increases, the other variable also tends to increase. The points form a pattern that slopes upward from left to right. Example: As the temperature increases, ice cream sales tend to increase.
  • Negative Association: When one variable increases, the other variable tends to decrease. The points form a pattern that slopes downward from left to right. Example: As the price of a product increases, the number of units sold tends to decrease.
  • No Association: There is no clear pattern in how the variables relate to each other. The points appear scattered randomly with no upward or downward trend. Example: A person's shoe size likely has no relationship with their score on a history test.

Form of Association

The form describes the shape of the pattern formed by the points.

  • Linear: The points roughly follow a straight line. This is the most common form studied in introductory statistics.
  • Nonlinear (Curved): The points follow a curved pattern. This might be exponential, quadratic, or some other curved shape.
  • No Form: The points don't follow any recognizable pattern.

Strength of Association

The strength describes how closely the points follow the pattern.

  • Strong: The points are tightly clustered around a line or curve. There is little scatter, and the pattern is very clear.
  • Moderate: The points follow a general pattern but with noticeable scatter. The trend is visible but not perfect.
  • Weak: The points show only a slight pattern with considerable scatter. The relationship is barely visible.

When describing a scatterplot, statisticians typically state all three features: direction, form, and strength. For example: "The scatterplot shows a strong, positive, linear association between study time and test score."

Example:  A scatterplot displays the relationship between the number of hours spent watching TV per week and grade point average (GPA) for 30 students.
The points slope downward from left to right and are moderately clustered around an imaginary straight line.

Describe the association shown in this scatterplot.

Solution:

Direction: The pattern slopes downward, indicating that as TV watching time increases, GPA tends to decrease. This is a negative association.

Form: The points roughly follow a straight line, so this is a linear association.

Strength: The points are moderately clustered, so this is a moderate association.

The scatterplot shows a moderate, negative, linear association between TV watching time and GPA.

Outliers in Scatterplots

An outlier is a data point that stands apart from the overall pattern of the scatterplot. It is a point that doesn't fit with the rest of the data. Outliers are important to identify because they can significantly affect the interpretation of the relationship and any calculations based on the data.

Identifying Outliers

When looking at a scatterplot, ask yourself: "Are there any points that are far away from where most of the other points are clustered?" An outlier might be:

  • A point that is far above or below the general pattern
  • A point that is far to the left or right of the general pattern
  • A point that is isolated from all the other points

Why Outliers Matter

Outliers can occur for several reasons:

  • Data entry error: Someone might have typed the wrong number.
  • Measurement error: The measuring instrument might have malfunctioned.
  • Natural variation: The outlier might represent a genuine unusual case that is legitimately part of the data.

It's important to investigate outliers to determine their cause. If an outlier is due to an error, it should be corrected or removed. If it's a genuine unusual case, it should be kept in the data but noted in your analysis. Outliers can sometimes reveal interesting and important information that deserves special attention.

Example:  A scatterplot shows the relationship between age and reaction time for 25 people.
Most points show a positive linear pattern, with reaction time increasing gradually as age increases.
However, one point shows a 30-year-old person with an unusually slow reaction time, far above the general pattern.

Is this point an outlier, and why might it exist?

Solution:

Yes, this point is an outlier because it is far from the pattern formed by the other 24 points.

Possible explanations could include: the person was distracted during the test, they misunderstood the instructions, the reaction time was recorded incorrectly, or they might have a medical condition affecting reaction time.

The researcher should investigate this data point to determine whether it represents a recording error or a genuine unusual case.

This point is an outlier that stands apart from the positive linear pattern shown by the other data points.

Correlation vs. Causation

One of the most important principles in statistics is understanding the difference between correlation and causation. A scatterplot can show that two variables are related (correlated), but this does not automatically mean that one variable causes the other to change.

Understanding Correlation

Correlation means that two variables tend to change together in a predictable way. When you see a clear pattern in a scatterplot, the variables are correlated. The correlation might be positive (both increase together) or negative (one increases while the other decreases).

Understanding Causation

Causation means that changes in one variable directly cause changes in the other variable. Establishing causation requires much more evidence than just observing a pattern in a scatterplot. To prove causation, you typically need a carefully designed experiment where you control other variables that might affect the relationship.

Why Correlation Doesn't Imply Causation

Just because two variables are correlated doesn't mean one causes the other. There are several possible explanations for a correlation:

  • Direct Causation: Variable \( x \) directly causes changes in variable \( y \).
  • Reverse Causation: Variable \( y \) actually causes changes in variable \( x \) (the opposite of what you thought).
  • Confounding Variable: A third variable (called a confounding variable or lurking variable) causes changes in both \( x \) and \( y \), making them appear related even though neither directly affects the other.
  • Coincidence: The correlation appeared by pure chance and has no meaningful explanation.

Consider this example: A study finds a strong positive correlation between ice cream sales and drowning deaths. Does eating ice cream cause people to drown? No! The confounding variable here is temperature and season. During hot summer months, both ice cream sales and swimming activity increase, leading to more drowning incidents. The two variables are correlated but not causally related.

When interpreting scatterplots, always be cautious about claiming causation. Use phrases like "is associated with" or "is related to" rather than "causes" unless you have strong experimental evidence to support a causal claim.

Example:  A scatterplot shows a strong positive linear association between the number of fire trucks that respond to a fire and the amount of damage caused by the fire.

Does this mean that fire trucks cause more fire damage?

Solution:

No, this would be an incorrect interpretation. While the scatterplot shows correlation, it does not show causation.

The confounding variable is the size or severity of the fire. Larger fires cause more damage AND require more fire trucks to respond.

The number of fire trucks doesn't cause the damage; rather, the severity of the fire causes both more trucks to respond and more damage to occur.

This scatterplot shows correlation, not causation. The fire severity is the confounding variable that explains both the truck response and the damage.

Applications of Scatterplots

Scatterplots are used across many fields to explore and display relationships between variables. Understanding how to create and interpret them is a fundamental skill in data analysis.

Real-World Uses

  • Education: Examining the relationship between attendance and grades, or between homework time and test performance.
  • Healthcare: Studying how dosage relates to treatment effectiveness, or how age relates to blood pressure.
  • Business: Analyzing how advertising spending relates to sales, or how product price relates to demand.
  • Environmental Science: Investigating relationships between temperature and plant growth, or between pollution levels and respiratory illness rates.
  • Sports Analytics: Exploring connections between practice time and performance, or between physical measurements and athletic ability.
  • Social Sciences: Studying relationships between income and education level, or between social media use and self-reported happiness.

Scatterplots as a First Step

When researchers collect data on two variables, creating a scatterplot is often the first analytical step. The visual display quickly reveals:

  • Whether a relationship exists
  • What type of relationship it is (linear or curved)
  • How strong the relationship appears to be
  • Whether there are any unusual data points that need investigation

After examining the scatterplot, researchers can decide what additional statistical methods to apply, such as calculating a correlation coefficient or fitting a regression line. But the scatterplot always comes first because it provides essential visual insight that numbers alone cannot convey.

Common Mistakes to Avoid

When working with scatterplots, be aware of these common errors:

  • Connecting the dots: Never draw lines connecting the points in a scatterplot. The points should remain separate so you can see the overall pattern.
  • Poor axis scaling: Choose scales that make good use of the available space and show the pattern clearly. Avoid scales that compress all points into a tiny corner of the graph.
  • Missing labels: Always label both axes with the variable names and units. Without labels, readers cannot understand what the scatterplot represents.
  • Confusing correlation with causation: Remember that a pattern in a scatterplot shows association, not necessarily a cause-and-effect relationship.
  • Ignoring outliers: Don't overlook unusual points. They may indicate errors or reveal important special cases.
  • Wrong variable placement: Generally, the independent variable (the one you think might influence the other) goes on the \( x \)-axis, and the dependent variable (the one being influenced) goes on the \( y \)-axis.

Summary

Scatterplots are powerful tools for visualizing and understanding relationships between two numerical variables. By plotting each observation as a point on a coordinate grid, we can quickly see patterns that would be hidden in a table of numbers. When describing a scatterplot, we examine its direction (positive, negative, or none), form (linear or nonlinear), and strength (strong, moderate, or weak). We also look for outliers-points that don't fit the overall pattern.

While scatterplots can reveal correlation between variables, we must be careful not to jump to conclusions about causation. A strong association in a scatterplot tells us that two variables are related, but determining whether one actually causes the other requires additional investigation and carefully controlled experiments.

Mastering scatterplots gives you a fundamental skill for data analysis that applies across all fields where quantitative data is collected. Whether you're analyzing scientific experiments, business trends, or social patterns, the scatterplot provides an essential first look at how variables relate to each other.

The document Chapter Notes: Introduction To Scatterplots is a part of the Grade 9 Course Statistics & Probability.
All you need of Grade 9 at this link: Grade 9
Explore Courses for Grade 9 exam
Get EduRev Notes directly in your Google search
Related Searches
Summary, study material, Semester Notes, Extra Questions, Chapter Notes: Introduction To Scatterplots, MCQs, practice quizzes, Chapter Notes: Introduction To Scatterplots, ppt, past year papers, Exam, Objective type Questions, Chapter Notes: Introduction To Scatterplots, shortcuts and tricks, mock tests for examination, video lectures, Sample Paper, Important questions, Free, Viva Questions, Previous Year Questions with Solutions, pdf ;