Introduction
This unit applies probabilistic reasoning to sampling, introducing students to sampling distributions of statistics they will use when performing inference in Units 6 and 7. Students should understand that sample statistics can be used to estimate corresponding
population parameters and that measures of center (mean) and variability (standard deviation) for these sampling distributions can be determined directly from the population parameters when certain sampling criteria are met. For large enough samples from any population, these sampling distributions can be approximated by a normal distribution. Simulating sampling distributions helps students to understand how the values of statistics vary in repeated random sampling from populations with known parameters.
What is a Sampling Distribution?
A
sampling distribution is a distribution where we take ALL possible samples of a given size and put those
sample statistics together as a data set. For example, let's say we are looking at the average number of snap peas taken from a field. If we take all possible samples of size 30, average each field, and then average those averages together, we would get a really good picture of what the population parameter was (which is likely unrealistic to actually calculate). Sampling distributions are important because they lead the way to
statistical inference: the act of making a prediction or testing a claim regarding a population parameter.
Sampling Distribution for Proportions
The first type of sampling distribution you will encounter is a
sampling distribution for proportions used to estimate a population proportion. For a sampling distribution for proportions, we will take the sample proportion from all possible samples of our given size and average those together to find the mean of our sampling distribution. Our standard deviation is found using a formula given on the reference page. Once you have those two things, you have the crux of a sampling distribution for population proportion.
Conditions for Sampling Distribution
Sampling distributions hinge on certain conditions that make our sampling distributions an accurate portrayal of our population proportion:
- Random: The first and possibly most important condition necessary for creating a sampling distribution is that our sample is randomly selected. If our sample is not randomly selected, then all the math and calculations we do are all for naught because our point estimate, or sample statistic, is biased.
- Independence (10% Condition): Our samples have to be chosen independently of one another. By checking the 10% condition, we can determine that the amount of dependence is so negligible that our samples are essentially independent. The population must be at least 10 times our sample size!
- Normality (Large Counts Condition): To calculate the probability of obtaining certain samples using a sampling distribution, we need to verify that our sampling distribution is approximately normal. For categorical data (proportions), we need to check that the number of expected successes and failures are at least 10 (i.e., np ≥ 10 and n(1-p) ≥ 10).
Question for Chapter Notes: Overview: Sampling Distributions
Try yourself:
What is a sampling distribution?Explanation
A sampling distribution is a distribution where we take ALL possible samples of a given size and put those sample statistics together as a data set. For example, if we look at the average number of snap peas taken from a field, we would take all possible samples of size 30, average each field, and then average those averages to get a good picture of the population parameter. This is crucial for statistical inference, which involves making predictions or testing claims regarding a population parameter.
Report a problem
Sampling Distribution for Means
When dealing with means, our center is the average of all of our sample means from all possible samples of size n. Our standard deviation is found by dividing our population standard deviation by the square root of our sample size. As our sample size increases, our standard deviation decreases, which plays a huge part in why a large sample size is vital in accurately estimating our population mean.
Conditions for Sampling Distribution
As you will find as we get into statistical inference, sampling distributions hinge on certain conditions that make our sampling distributions an accurate portrayal of our population mean:
- Random: It is essential that our sampling distribution is based on random samples. No mathematics or fancy statistics can "fix" a biased sample.
- Independence (10% Condition): We must check the 10% condition the same way as we do for population proportions.
- Normality (Central Limit Theorem): We must verify that our population is normally distributed or our sample size is at least 30, known as the Central Limit Theorem.
Sampling Distributions for the Differences in Means and Proportions
The last type of sampling distribution we encounter is when we are seeing if there is a difference in two populations. In this type of sampling distribution, our center is the difference in our two samples (which is presumably 0 if the two populations are not different). The necessary formulas for the center and spread of these sampling distributions can be found on the reference page.
Conditions for Inference
In order to check the conditions for inference when there are two samples, you are essentially doing the same checks above but doing it twice: checking randomness, independence, and normality for both samples.
Key Terms
- Bias: Refers to a systematic error that leads to an incorrect or misleading representation of a population or phenomenon.
- Central Limit Theorem: States that the sampling distribution of the sample mean approaches a normal distribution as the sample size increases, regardless of the population's distribution.
- Experimental Studies: Research designs that allow researchers to establish cause-and-effect relationships by manipulating variables.
- Independence (10% Condition): Ensures the independence of random samples drawn from a finite population.
- Inference: The process of drawing conclusions about a population based on sample data.
- Large Counts Condition: States that for the sampling distribution of sample proportions to be approximately normal, the counts of successes and failures must be large enough.
- Mean: A measure of central tendency that represents the average value of a set of numbers.
- Measures of Center: Values that represent the center point of a dataset, such as mean, median, and mode.
- Normal Distribution: A continuous probability distribution characterized by a symmetric, bell-shaped curve.
- Point Estimate: A single value that serves as an approximation of a population parameter.
- Population Parameters: Numerical values that describe characteristics of a population.
- Population Mean: The average value of a set of observations for an entire population.
- Probabilistic Reasoning: Drawing conclusions based on probabilities and uncertainty in outcomes.
- Random Sampling: A method of selecting individuals from a population where every member has an equal chance of being chosen.
- Sample Statistics: Numerical values that summarize or describe characteristics of a sample.
- Sampling Distribution for Means: A probability distribution showing all possible sample means from a population.
- Sampling Distribution: A probability distribution of a statistic obtained by selecting random samples from a population.
- Sampling Distribution for Proportions: The probability distribution of the sample proportion.
- Standard Deviation: A measure of the amount of variation or dispersion in a set of values.
- Variability: How spread out the values in a dataset are.
Question for Chapter Notes: Overview: Sampling Distributions
Try yourself:
What must be checked for both samples in inference conditions?Explanation
In inference conditions, you need to check:- Randomness for both samples
- Independence
- Normality
Checking these helps ensure accurate conclusions about the populations involved.
Report a problem