Grade 9 Exam  >  Grade 9 Notes  >  Statistics & Probability  >  Chapter Notes: Sampling Distributions

Chapter Notes: Sampling Distributions

Imagine you want to know the average height of all high school students in your state. Measuring every single student would take forever and cost a fortune. Instead, you could measure a sample-perhaps 100 students-and calculate their average height. But here's the interesting question: if you took a different sample of 100 students, would you get exactly the same average? Probably not. Each sample gives a slightly different result. Sampling distributions help us understand and predict how sample statistics (like the sample mean or sample proportion) vary from sample to sample. This powerful idea forms the foundation of statistical inference, allowing us to make conclusions about entire populations based on samples.

Understanding Samples and Populations

Before diving into sampling distributions, we need to clearly distinguish between a population and a sample.

A population is the entire group we want to study. It includes every individual, object, or measurement of interest. For example, all registered voters in the United States, every fish in Lake Superior, or all the bolts produced by a factory in one year.

A sample is a subset of the population that we actually observe and measure. We select a sample because examining the entire population is usually impractical, too expensive, or impossible. For instance, surveying 1,000 voters instead of all 150 million registered voters.

A parameter is a numerical characteristic of a population, such as the population mean \( \mu \) (mu) or the population proportion \( p \). Parameters are usually unknown because we rarely have access to the entire population.

A statistic is a numerical characteristic calculated from a sample, such as the sample mean \( \bar{x} \) (x-bar) or the sample proportion \( \hat{p} \) (p-hat). We use statistics to estimate parameters.

Think of a parameter as the true answer you're trying to find, and a statistic as your best guess based on the evidence you've collected. Just like different detectives examining different clues might form slightly different theories about the same case, different samples produce different statistics.

What is a Sampling Distribution?

A sampling distribution is the probability distribution of a statistic based on all possible samples of the same size from a population. In simpler terms, it shows us what values a statistic (like \( \bar{x} \) or \( \hat{p} \)) could take and how likely each value is when we repeatedly draw samples.

Here's how we can conceptually create a sampling distribution:

  1. Take a random sample of size \( n \) from the population.
  2. Calculate the statistic of interest (for example, the sample mean).
  3. Record that statistic.
  4. Repeat steps 1-3 many, many times (theoretically, infinitely many times).
  5. Create a distribution (histogram or probability distribution) of all the recorded statistics.

This distribution of sample statistics is the sampling distribution. It tells us how the statistic behaves across different samples.

Imagine you have a jar with 10,000 marbles of various weights. The average weight of all marbles is a parameter. You scoop out 50 marbles, weigh them, and calculate their average-that's one statistic. You pour them back, mix, and scoop again. The new average is slightly different. If you did this 1,000 times and made a histogram of those 1,000 averages, you'd see the sampling distribution of the sample mean.

Key Characteristics of Sampling Distributions

Sampling distributions have three important characteristics:

  • Center: The mean of the sampling distribution tells us where the statistics tend to cluster.
  • Spread: The standard deviation of the sampling distribution (often called the standard error) tells us how much variability there is from sample to sample.
  • Shape: The shape of the distribution (normal, skewed, uniform, etc.) tells us about the probability of obtaining different statistic values.

Sampling Distribution of the Sample Mean

The sampling distribution of the sample mean \( \bar{x} \) describes how sample means vary when we take repeated samples from a population. This is one of the most important sampling distributions in statistics.

Properties of the Sampling Distribution of \( \bar{x} \)

When we take all possible samples of size \( n \) from a population with mean \( \mu \) and standard deviation \( \sigma \), the sampling distribution of \( \bar{x} \) has these properties:

1. Mean of the Sampling Distribution:

\[ \mu_{\bar{x}} = \mu \]

The mean of all possible sample means equals the population mean. This property tells us that the sample mean is an unbiased estimator of the population mean. On average, sample means equal the true population mean.

2. Standard Deviation of the Sampling Distribution (Standard Error):

\[ \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \]

The standard deviation of the sampling distribution is called the standard error of the mean. It equals the population standard deviation divided by the square root of the sample size. Notice that as \( n \) increases, the standard error decreases. Larger samples produce sample means that cluster more tightly around the population mean.

3. Shape of the Sampling Distribution:

  • If the population distribution is normal, the sampling distribution of \( \bar{x} \) is exactly normal for any sample size.
  • If the population distribution is not normal, the sampling distribution of \( \bar{x} \) becomes approximately normal as the sample size increases (this is the Central Limit Theorem, which we'll explore next).

Example:  A population of test scores has a mean of \( \mu = 75 \) and a standard deviation of \( \sigma = 12 \).
You plan to take random samples of 36 students.

What are the mean and standard error of the sampling distribution of \( \bar{x} \)?

Solution:

The mean of the sampling distribution is:

\[ \mu_{\bar{x}} = \mu = 75 \]

The standard error of the mean is:

\[ \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{12}{\sqrt{36}} = \frac{12}{6} = 2 \]

The mean of the sampling distribution is 75 and the standard error is 2.

The Central Limit Theorem

The Central Limit Theorem (CLT) is one of the most remarkable results in all of statistics. It states:

Central Limit Theorem: For a population with mean \( \mu \) and standard deviation \( \sigma \), the sampling distribution of the sample mean \( \bar{x} \) becomes approximately normal as the sample size \( n \) increases, regardless of the shape of the population distribution.

This theorem is powerful because it means that even if we start with a population that is heavily skewed, uniform, or otherwise non-normal, the distribution of sample means will still be approximately normal if the sample size is large enough.

Guidelines for Applying the Central Limit Theorem

  • If the population is normally distributed, the sampling distribution of \( \bar{x} \) is exactly normal for any sample size.
  • If the population distribution is slightly skewed or symmetric but not normal, samples of size \( n \geq 15 \) usually produce a nearly normal sampling distribution.
  • If the population distribution is heavily skewed or has outliers, larger samples are needed. A common rule of thumb is \( n \geq 30 \).

The beauty of the CLT is that it allows us to use normal probability calculations for sample means, even when we know nothing about the shape of the population distribution, as long as our sample size is reasonably large.

Example:  The amount of time customers spend in a store is heavily right-skewed, with a mean of 18 minutes and a standard deviation of 6 minutes.
You take a random sample of 40 customers.

Can you assume the sampling distribution of \( \bar{x} \) is approximately normal? What are its mean and standard error?

Solution:

Even though the population distribution is heavily right-skewed, the sample size is \( n = 40 \), which is greater than 30. By the Central Limit Theorem, the sampling distribution of \( \bar{x} \) is approximately normal.

The mean of the sampling distribution is:

\[ \mu_{\bar{x}} = 18 \text{ minutes} \]

The standard error is:

\[ \sigma_{\bar{x}} = \frac{6}{\sqrt{40}} = \frac{6}{6.32} \approx 0.95 \text{ minutes} \]

The sampling distribution is approximately normal with mean 18 minutes and standard error 0.95 minutes.

Using the Sampling Distribution to Calculate Probabilities

Once we know that the sampling distribution of \( \bar{x} \) is approximately normal (or exactly normal), we can calculate probabilities about sample means using the standard normal distribution (z-distribution).

To find the probability that a sample mean falls in a certain range, we convert \( \bar{x} \) to a z-score using:

\[ z = \frac{\bar{x} - \mu_{\bar{x}}}{\sigma_{\bar{x}}} = \frac{\bar{x} - \mu}{\sigma / \sqrt{n}} \]

Then we use a standard normal table or technology to find the corresponding probability.

Example:  The weights of bags of sugar filled by a machine are normally distributed with mean \( \mu = 5 \) pounds and standard deviation \( \sigma = 0.15 \) pounds.
A random sample of 25 bags is selected.

What is the probability that the sample mean weight is less than 4.95 pounds?

Solution:

Since the population is normally distributed, the sampling distribution of \( \bar{x} \) is exactly normal.

The mean is \( \mu_{\bar{x}} = 5 \) pounds.

The standard error is:

\[ \sigma_{\bar{x}} = \frac{0.15}{\sqrt{25}} = \frac{0.15}{5} = 0.03 \text{ pounds} \]

Now convert \( \bar{x} = 4.95 \) to a z-score:

\[ z = \frac{4.95 - 5}{0.03} = \frac{-0.05}{0.03} \approx -1.67 \]

Using a standard normal table, \( P(z < -1.67)="" \approx="" 0.0475="">

The probability that the sample mean is less than 4.95 pounds is approximately 0.0475 or 4.75%.

Sampling Distribution of the Sample Proportion

When we're interested in categorical data-such as the proportion of voters who support a candidate, the percentage of defective products, or the fraction of students who pass an exam-we work with proportions rather than means.

Let \( p \) represent the true population proportion (parameter) and \( \hat{p} \) (p-hat) represent the sample proportion (statistic). The sample proportion is calculated as:

\[ \hat{p} = \frac{\text{number of successes in the sample}}{n} \]

where \( n \) is the sample size.

Properties of the Sampling Distribution of \( \hat{p} \)

When we take all possible samples of size \( n \) from a population where the true proportion is \( p \), the sampling distribution of \( \hat{p} \) has these properties:

1. Mean of the Sampling Distribution:

\[ \mu_{\hat{p}} = p \]

The mean of all possible sample proportions equals the true population proportion. The sample proportion is an unbiased estimator.

2. Standard Deviation of the Sampling Distribution (Standard Error):

\[ \sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}} \]

This is the standard error of the sample proportion. Like the standard error of the mean, it decreases as the sample size increases.

3. Shape of the Sampling Distribution:

The sampling distribution of \( \hat{p} \) is approximately normal when the sample size is large enough. The commonly used condition is:

\[ np \geq 10 \quad \text{and} \quad n(1-p) \geq 10 \]

This ensures that we have at least 10 expected successes and 10 expected failures in our sample.

Example:  In a large city, 35% of residents support a new transportation tax.
A random sample of 200 residents is selected.

What are the mean and standard error of the sampling distribution of \( \hat{p} \)? Is the distribution approximately normal?

Solution:

The population proportion is \( p = 0.35 \) and the sample size is \( n = 200 \).

The mean of the sampling distribution is:

\[ \mu_{\hat{p}} = p = 0.35 \]

The standard error is:

\[ \sigma_{\hat{p}} = \sqrt{\frac{0.35(1-0.35)}{200}} = \sqrt{\frac{0.35 \times 0.65}{200}} = \sqrt{\frac{0.2275}{200}} = \sqrt{0.0011375} \approx 0.0337 \]

Check the conditions for normality:
\( np = 200(0.35) = 70 \geq 10 \) ✓
\( n(1-p) = 200(0.65) = 130 \geq 10 \) ✓

Both conditions are satisfied, so the sampling distribution is approximately normal with mean 0.35 and standard error 0.0337.

Calculating Probabilities for Sample Proportions

When the sampling distribution of \( \hat{p} \) is approximately normal, we can calculate probabilities by converting to z-scores:

\[ z = \frac{\hat{p} - \mu_{\hat{p}}}{\sigma_{\hat{p}}} = \frac{\hat{p} - p}{\sqrt{\frac{p(1-p)}{n}}} \]

Example:  Suppose 60% of all adults in a country own a smartphone.
You take a random sample of 100 adults.

What is the probability that between 55% and 65% of the sample own a smartphone?

Solution:

Here \( p = 0.60 \) and \( n = 100 \).

First, check conditions:
\( np = 100(0.60) = 60 \geq 10 \) ✓
\( n(1-p) = 100(0.40) = 40 \geq 10 \) ✓

The sampling distribution is approximately normal with:

\[ \mu_{\hat{p}} = 0.60 \] \[ \sigma_{\hat{p}} = \sqrt{\frac{0.60(0.40)}{100}} = \sqrt{\frac{0.24}{100}} = \sqrt{0.0024} \approx 0.049 \]

Convert \( \hat{p} = 0.55 \) to a z-score:

\[ z_1 = \frac{0.55 - 0.60}{0.049} = \frac{-0.05}{0.049} \approx -1.02 \]

Convert \( \hat{p} = 0.65 \) to a z-score:

\[ z_2 = \frac{0.65 - 0.60}{0.049} = \frac{0.05}{0.049} \approx 1.02 \]

Using a standard normal table:
\( P(z < 1.02)="" \approx="" 0.8461="">
\( P(z < -1.02)="" \approx="" 0.1539="">

\( P(-1.02 < z="">< 1.02)="0.8461" -="" 0.1539="0.6922">

The probability is approximately 0.69 or 69%.

The Relationship Between Sample Size and Sampling Distributions

One of the most important insights from studying sampling distributions is understanding how sample size affects the variability of statistics.

For the sample mean: As sample size \( n \) increases, the standard error \( \sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \) decreases. This means larger samples produce sample means that are more tightly clustered around the population mean.

For the sample proportion: As sample size \( n \) increases, the standard error \( \sigma_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}} \) decreases. Larger samples produce sample proportions closer to the true population proportion.

Notice that both standard errors have \( n \) in the denominator. To cut the standard error in half, you need to quadruple the sample size (since the relationship involves \( \sqrt{n} \)).

Think of it like trying to estimate the color distribution of candies in a giant jar. If you grab just 5 candies, your estimate could be way off. But if you grab 500 candies, your estimate will be much closer to the true distribution. The larger sample "averages out" the randomness.

Common Misconceptions About Sampling Distributions

Students often confuse three different distributions when learning about sampling distributions. Let's clarify:

  • The population distribution: The distribution of all individual values in the population. This is what we're trying to learn about.
  • The sample distribution: The distribution of values in one particular sample. This is what we actually observe when we collect data.
  • The sampling distribution: The distribution of a statistic (like \( \bar{x} \) or \( \hat{p} \)) across all possible samples of the same size. This is a theoretical distribution that helps us understand sampling variability.

Another common misconception is thinking the Central Limit Theorem applies to individual values. It doesn't. The CLT tells us that sample means are approximately normally distributed for large samples, even if individual values in the population are not.

Why Sampling Distributions Matter

Sampling distributions are the bridge between the data we collect (samples) and the conclusions we make (inferences about populations). They answer crucial questions:

  • How much do sample statistics vary from sample to sample?
  • How likely is it that our sample statistic is close to the true population parameter?
  • How confident can we be in our estimates?
  • How large should our sample be to achieve a desired level of precision?

In the chapters ahead, you'll use sampling distributions to construct confidence intervals (ranges that likely contain the true parameter) and conduct hypothesis tests (procedures for making decisions based on data). Every inference we make relies on understanding how statistics behave across repeated samples-which is exactly what sampling distributions describe.

By mastering sampling distributions, you gain the fundamental tool needed to think statistically: recognizing that sample results vary, quantifying that variability, and using probability to make informed decisions despite uncertainty.

The document Chapter Notes: Sampling Distributions is a part of the Grade 9 Course Statistics & Probability.
All you need of Grade 9 at this link: Grade 9
Explore Courses for Grade 9 exam
Get EduRev Notes directly in your Google search
Related Searches
mock tests for examination, pdf , past year papers, shortcuts and tricks, Chapter Notes: Sampling Distributions, Viva Questions, video lectures, Free, Summary, Objective type Questions, practice quizzes, Extra Questions, Important questions, Semester Notes, Chapter Notes: Sampling Distributions, MCQs, ppt, Chapter Notes: Sampling Distributions, study material, Previous Year Questions with Solutions, Sample Paper, Exam;