Statistics and probability help us make sense of information and predict what might happen in the future. We use statistics to collect, organize, and analyze data from the world around us-like test scores, survey results, or the number of people who prefer different types of music. Probability helps us measure the chance or likelihood that something will happen, such as flipping a coin and getting heads or predicting rain tomorrow. Together, these tools help us understand patterns and make informed decisions based on data.
When we want to learn about a large group, we often cannot study every single member. Instead, we study a smaller part of that group. Understanding the difference between a population and a sample is essential in statistics.
A population is the entire group that we want to learn about. For example, if we want to know the average height of all seventh graders in your school, the population is every single seventh grader in the school.
A sample is a smaller group selected from the population. We collect data from the sample and use it to make conclusions about the whole population. For example, instead of measuring all seventh graders, we might measure 30 seventh graders chosen randomly.
A good sample should be representative, which means it reflects the characteristics of the whole population. If we only measured students from the basketball team, our sample would not represent all seventh graders because basketball players tend to be taller than average.
Example: A school principal wants to know what students think about the cafeteria food.
There are 800 students in the school.
She surveys 50 students by asking every 16th student who enters the cafeteria on Monday morning.What is the population, and what is the sample?
Solution:
The population is all 800 students in the school, because we want to know what all students think.
The sample is the 50 students who were surveyed.
The population is 800 students, and the sample is 50 students.
A random sample is one where every member of the population has an equal chance of being selected. Random sampling helps reduce bias, which is when the sample does not accurately represent the population.
Some methods of random sampling include:
A biased sample is one that does not represent the population well. For example, if we wanted to know the favorite sport of all students but only surveyed students in gym class, the sample would be biased because those students are more likely to be interested in sports than the general population.
When we collect data, we often want to find one number that represents the "typical" or "central" value. These are called measures of center. The three main measures of center are the mean, median, and mode.
The mean is the average of a set of numbers. To find the mean, we add all the values together and then divide by the number of values.
\[ \text{Mean} = \frac{\text{Sum of all values}}{\text{Number of values}} \]In this formula, we add up all the data points and divide by how many data points there are.
Example: Five students scored the following points on a quiz: 8, 6, 9, 7, and 10.
What is the mean quiz score?
Solution:
First, add all the scores together:
8 + 6 + 9 + 7 + 10 = 40Next, count how many scores there are:
There are 5 scores.Finally, divide the sum by the number of scores:
Mean = 40 ÷ 5 = 8The mean quiz score is 8 points.
The median is the middle value when the data is arranged in order from least to greatest. If there is an even number of values, the median is the mean of the two middle values.
To find the median:
Example: Seven students ran a race and finished in the following times (in seconds): 45, 52, 48, 50, 47, 55, 49.
What is the median time?
Solution:
First, arrange the times in order from least to greatest:
45, 47, 48, 49, 50, 52, 55Next, find the middle value:
There are 7 values, so the middle one is the 4th value.The 4th value in the ordered list is 49.
The median time is 49 seconds.
Example: Six students scored the following on a test: 75, 82, 90, 85, 78, 88.
What is the median test score?
Solution:
First, arrange the scores in order:
75, 78, 82, 85, 88, 90There are 6 values (an even number), so the median is the average of the 3rd and 4th values.
The 3rd value is 82 and the 4th value is 85.
Find the average of 82 and 85:
(82 + 85) ÷ 2 = 167 ÷ 2 = 83.5The median test score is 83.5 points.
The mode is the value that appears most often in a data set. A data set can have one mode, more than one mode, or no mode at all.
Example: A teacher recorded how many books students read over the summer: 3, 5, 7, 5, 4, 5, 6, 8, 5.
What is the mode?
Solution:
Count how many times each number appears:
3 appears once
4 appears once
5 appears four times
6 appears once
7 appears once
8 appears onceThe number 5 appears most often (four times).
The mode is 5 books.
Different situations call for different measures of center. The mean is useful when the data is evenly distributed without extreme values. The median is better when there are outliers (values much higher or lower than the rest) because outliers affect the mean more than the median. The mode is helpful when we want to know the most common value.
While measures of center tell us about the typical value, measures of variability (also called measures of spread) tell us how spread out the data is. Two data sets can have the same mean but be very different in how spread out their values are.
The range is the difference between the greatest and least values in a data set. It gives us a quick sense of how spread out the data is.
\[ \text{Range} = \text{Maximum value} - \text{Minimum value} \]Example: The temperatures (in °F) for one week were: 68, 72, 75, 70, 73, 69, 74.
What is the range of temperatures?
Solution:
Find the maximum temperature: 75°F
Find the minimum temperature: 68°F
Subtract the minimum from the maximum:
Range = 75 - 68 = 7The range is 7°F.
The interquartile range (IQR) measures the spread of the middle 50% of the data. It is less affected by outliers than the range.
To find the IQR, we first need to understand quartiles. Quartiles divide the data into four equal parts:
The IQR is calculated as:
\[ \text{IQR} = Q_3 - Q_1 \]This measures the spread of the middle 50% of the data.
Example: The ages of nine people at a party are: 12, 15, 16, 18, 20, 22, 24, 25, 30.
What is the interquartile range?
Solution:
The data is already in order. Find Q2 (the median):
The middle value is the 5th value: 20Find Q1 (the median of the lower half):
Lower half: 12, 15, 16, 18
Q1 = (15 + 16) ÷ 2 = 15.5Find Q3 (the median of the upper half):
Upper half: 22, 24, 25, 30
Q3 = (24 + 25) ÷ 2 = 24.5Calculate the IQR:
IQR = Q3 - Q1 = 24.5 - 15.5 = 9The interquartile range is 9 years.
The mean absolute deviation (MAD) measures the average distance of each data value from the mean. It tells us how much the data typically varies from the mean.
To calculate MAD:
Example: Five students spent the following amounts of time on homework (in minutes): 30, 40, 35, 50, 45.
What is the mean absolute deviation?
Solution:
Step 1: Find the mean:
Mean = (30 + 40 + 35 + 50 + 45) ÷ 5 = 200 ÷ 5 = 40Step 2: Find the distance of each value from the mean:
|30 - 40| = 10
|40 - 40| = 0
|35 - 40| = 5
|50 - 40| = 10
|45 - 40| = 5Step 3: Find the mean of these distances:
MAD = (10 + 0 + 5 + 10 + 5) ÷ 5 = 30 ÷ 5 = 6The mean absolute deviation is 6 minutes.
Probability is a measure of how likely an event is to occur. Probabilities are expressed as numbers between 0 and 1, or as percentages between 0% and 100%.
The probability of an event is calculated as:
\[ P(\text{event}) = \frac{\text{Number of favorable outcomes}}{\text{Total number of possible outcomes}} \]In this formula, a favorable outcome is a result that we are interested in, and the total number of possible outcomes is all the different results that could occur.
Example: A bag contains 5 red marbles, 3 blue marbles, and 2 green marbles.
You randomly pick one marble without looking.What is the probability of picking a blue marble?
Solution:
Count the number of blue marbles (favorable outcomes): 3
Count the total number of marbles: 5 + 3 + 2 = 10
Use the probability formula:
P(blue) = 3/10 = 0.3 or 30%The probability of picking a blue marble is 3/10 or 30%.
Complementary events are two events where one event happens if and only if the other does not. The probabilities of complementary events always add up to 1.
For example, either it will rain tomorrow or it will not rain tomorrow. These are complementary events. If the probability of rain is 0.3, then the probability of no rain is 0.7.
\[ P(\text{not A}) = 1 - P(\text{A}) \]Example: The probability that a basketball player makes a free throw is 0.75.
What is the probability that the player misses the free throw?
Solution:
Making and missing are complementary events.
P(miss) = 1 - P(make) = 1 - 0.75 = 0.25
The probability of missing the free throw is 0.25 or 25%.
There are two ways to think about probability: theoretical and experimental.
Theoretical probability is what we expect to happen based on mathematical reasoning. It assumes that all outcomes are equally likely.
For example, when flipping a fair coin, the theoretical probability of getting heads is 1/2 because there are two equally likely outcomes (heads and tails), and one of them is heads.
Experimental probability is based on what actually happens when we perform an experiment or collect data. It is calculated as:
\[ P(\text{event}) = \frac{\text{Number of times event occurred}}{\text{Total number of trials}} \]Example: A student flips a coin 50 times and gets heads 28 times.
What is the experimental probability of getting heads?
Solution:
Number of times heads occurred: 28
Total number of flips: 50
Experimental probability = 28/50 = 0.56 or 56%
The experimental probability of getting heads is 0.56 or 56%.
As the number of trials increases, experimental probability tends to get closer to theoretical probability. This is called the Law of Large Numbers.
A compound event is an event that consists of two or more simple events. For example, flipping a coin and rolling a die at the same time is a compound event.
The sample space is the set of all possible outcomes. We can organize sample spaces using lists, tables, or tree diagrams.
Example: You flip a coin and roll a standard six-sided die.
How many possible outcomes are there?
Solution:
The coin has 2 possible outcomes: heads (H) or tails (T).
The die has 6 possible outcomes: 1, 2, 3, 4, 5, or 6.
For each coin outcome, there are 6 die outcomes:
Total outcomes = 2 × 6 = 12There are 12 possible outcomes.
Independent events are events where the outcome of one event does not affect the outcome of the other. When two events A and B are independent, the probability of both events occurring is:
\[ P(\text{A and B}) = P(\text{A}) \times P(\text{B}) \]Example: You flip a coin and spin a spinner divided into 4 equal sections (numbered 1, 2, 3, 4).
What is the probability of getting heads on the coin and a 3 on the spinner?
Solution:
P(heads) = 1/2
P(3 on spinner) = 1/4
Since these are independent events:
P(heads and 3) = (1/2) × (1/4) = 1/8 = 0.125 or 12.5%The probability is 1/8 or 12.5%.
We can use data from samples to make predictions about populations. If we know the experimental probability of an event from a sample, we can estimate how many times that event might occur in a larger population.
Example: A quality control inspector found that 3 out of every 50 light bulbs tested were defective.
A store receives a shipment of 2000 light bulbs from the same manufacturer.About how many defective light bulbs would you expect in the shipment?
Solution:
Find the experimental probability of a defective bulb:
P(defective) = 3/50 = 0.06Multiply this probability by the total number of bulbs:
Expected defective bulbs = 0.06 × 2000 = 120You would expect about 120 defective light bulbs in the shipment.
This type of prediction is most reliable when the sample is random and representative of the population. The larger the sample size, the more confident we can be in our predictions.