Grade 9 Exam  >  Grade 9 Notes  >  Statistics & Probability  >  Chapter Notes: Comparing Two Proportions

Chapter Notes: Comparing Two Proportions

When researchers and statisticians want to understand differences between groups, they often compare proportions. A proportion is the fraction or percentage of individuals in a group that have a certain characteristic. For example, you might want to know if a higher proportion of teenagers use social media compared to adults, or whether one medication has a higher success rate than another. In this chapter, you will learn how to compare two proportions using statistical methods, test hypotheses about differences between proportions, and make informed decisions based on data.

Understanding Proportions

A proportion represents part of a whole. In statistics, we calculate a sample proportion by dividing the number of individuals with a particular characteristic by the total number of individuals in the sample. We typically denote a sample proportion with the symbol \( \hat{p} \) (read as "p-hat").

The formula for a sample proportion is:

\[ \hat{p} = \frac{x}{n} \]

where \( x \) is the number of successes (individuals with the characteristic) and \( n \) is the total sample size.

Example:  A researcher surveys 200 high school students and finds that 150 of them own a smartphone.

What is the sample proportion of students who own a smartphone?

Solution:

Number of students with smartphones: x = 150

Total number of students surveyed: n = 200

Sample proportion: \( \hat{p} = \frac{150}{200} = 0.75 \)

The sample proportion of students who own a smartphone is 0.75 or 75%.

When comparing two groups, we calculate a proportion for each group separately. We use subscripts to distinguish them: \( \hat{p}_1 \) for the first group and \( \hat{p}_2 \) for the second group.

The Difference Between Two Proportions

To compare two proportions, we examine the difference between proportions, which is simply \( \hat{p}_1 - \hat{p}_2 \). This difference tells us how much larger or smaller one proportion is compared to the other.

If \( \hat{p}_1 - \hat{p}_2 = 0 \), the two proportions are equal. If the difference is positive, the first group has a higher proportion. If negative, the second group has a higher proportion.

Example:  In a clinical trial, 120 out of 200 patients receiving Treatment A recovered, while 90 out of 180 patients receiving Treatment B recovered.

What is the difference in recovery proportions between the two treatments?

Solution:

For Treatment A: \( \hat{p}_1 = \frac{120}{200} = 0.60 \)

For Treatment B: \( \hat{p}_2 = \frac{90}{180} = 0.50 \)

Difference in proportions: \( \hat{p}_1 - \hat{p}_2 = 0.60 - 0.50 = 0.10 \)

The difference in recovery proportions is 0.10 or 10 percentage points, with Treatment A having the higher recovery rate.

Sampling Distribution of the Difference Between Two Proportions

When we take samples from two populations, the difference \( \hat{p}_1 - \hat{p}_2 \) will vary from sample to sample due to random sampling variability. The sampling distribution of the difference between two proportions describes all possible values this difference could take and their probabilities.

Under certain conditions, this sampling distribution is approximately normal with:

  • Mean: \( p_1 - p_2 \) (the true difference in population proportions)
  • Standard deviation (standard error): \( \sqrt{\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}} \)

Since we usually don't know the true population proportions \( p_1 \) and \( p_2 \), we estimate the standard error using our sample proportions:

\[ SE = \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}} \]

Conditions for Using the Normal Approximation

Before applying methods based on the normal distribution, we must verify these conditions:

  1. Random samples: Both samples must be randomly selected from their respective populations.
  2. Independence: The two samples must be independent of each other, and observations within each sample must be independent. A common rule is that each sample size should be less than 10% of its population.
  3. Success-failure condition: Both samples must have at least 10 successes and 10 failures:
    • \( n_1\hat{p}_1 \geq 10 \) and \( n_1(1-\hat{p}_1) \geq 10 \)
    • \( n_2\hat{p}_2 \geq 10 \) and \( n_2(1-\hat{p}_2) \geq 10 \)

When these conditions are met, we can use the normal distribution to construct confidence intervals and perform hypothesis tests.

Confidence Interval for the Difference Between Two Proportions

A confidence interval provides a range of plausible values for the true difference between two population proportions. The general form is:

\[ (\hat{p}_1 - \hat{p}_2) \pm z^* \times SE \]

where \( z^* \) is the critical value from the standard normal distribution corresponding to the desired confidence level, and \( SE \) is the standard error calculated using the sample proportions.

Common critical values include:

Confidence Level Critical Value (\( z^* \))
90% 1.645
95% 1.96
99% 2.576

Example:  A pollster surveys 400 urban voters and finds 240 support a ballot measure.
The same pollster surveys 350 rural voters and finds 175 support the measure.

Construct a 95% confidence interval for the difference in support proportions between urban and rural voters.

Solution:

Urban voters: \( \hat{p}_1 = \frac{240}{400} = 0.60 \), \( n_1 = 400 \)

Rural voters: \( \hat{p}_2 = \frac{175}{350} = 0.50 \), \( n_2 = 350 \)

Check conditions:
\( n_1\hat{p}_1 = 240 \geq 10 \), \( n_1(1-\hat{p}_1) = 160 \geq 10 \) ✓
\( n_2\hat{p}_2 = 175 \geq 10 \), \( n_2(1-\hat{p}_2) = 175 \geq 10 \) ✓

Calculate standard error:
\( SE = \sqrt{\frac{0.60(0.40)}{400} + \frac{0.50(0.50)}{350}} = \sqrt{\frac{0.24}{400} + \frac{0.25}{350}} = \sqrt{0.0006 + 0.000714} = \sqrt{0.001314} \approx 0.0362 \)

Difference in proportions: \( \hat{p}_1 - \hat{p}_2 = 0.60 - 0.50 = 0.10 \)

For 95% confidence, \( z^* = 1.96 \)

Margin of error: \( 1.96 \times 0.0362 \approx 0.071 \)

Confidence interval: \( 0.10 \pm 0.071 = (0.029, 0.171) \)

We are 95% confident that the true difference in support between urban and rural voters is between 2.9% and 17.1%, with urban voters showing higher support.

Interpreting Confidence Intervals

When interpreting a confidence interval for the difference between two proportions, pay attention to whether the interval contains zero:

  • If the interval is entirely positive (both endpoints above zero), we can conclude that the first population proportion is likely larger than the second.
  • If the interval is entirely negative (both endpoints below zero), we can conclude that the second population proportion is likely larger than the first.
  • If the interval contains zero, we cannot conclude that there is a significant difference between the two population proportions at that confidence level.

Hypothesis Testing for Two Proportions

A hypothesis test helps us determine whether the observed difference between two sample proportions provides sufficient evidence to conclude that a difference exists in the populations. The process follows these steps:

Step 1: State the Hypotheses

The null hypothesis (\( H_0 \)) typically states that there is no difference between the population proportions:

\[ H_0: p_1 = p_2 \quad \text{or equivalently} \quad H_0: p_1 - p_2 = 0 \]

The alternative hypothesis (\( H_a \)) can take three forms depending on the research question:

  • Two-sided test: \( H_a: p_1 \neq p_2 \) (the proportions are different)
  • One-sided test (greater): \( H_a: p_1 > p_2 \) (the first proportion is larger)
  • One-sided test (less): \( H_a: p_1 < p_2="" \)="" (the="" first="" proportion="" is="">

Step 2: Check Conditions

Verify the same conditions as for confidence intervals: random samples, independence, and the success-failure condition for both groups.

Step 3: Calculate the Pooled Proportion and Standard Error

When testing \( H_0: p_1 = p_2 \), we assume the null hypothesis is true, meaning both samples come from populations with the same proportion. We estimate this common proportion using the pooled proportion:

\[ \hat{p}_{pooled} = \frac{x_1 + x_2}{n_1 + n_2} \]

The standard error under the null hypothesis uses this pooled proportion:

\[ SE_{pooled} = \sqrt{\hat{p}_{pooled}(1-\hat{p}_{pooled})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)} \]

Step 4: Calculate the Test Statistic

The test statistic (z-score) measures how many standard errors the observed difference is from the hypothesized difference (which is zero under the null hypothesis):

\[ z = \frac{(\hat{p}_1 - \hat{p}_2) - 0}{SE_{pooled}} = \frac{\hat{p}_1 - \hat{p}_2}{SE_{pooled}} \]

Step 5: Find the P-value

The P-value is the probability of observing a test statistic as extreme as or more extreme than the one calculated, assuming the null hypothesis is true. For different alternative hypotheses:

  • Two-sided: P-value = 2 × P(Z > |z|)
  • One-sided (greater): P-value = P(Z > z)
  • One-sided (less): P-value = P(Z <>

Step 6: Make a Decision

Compare the P-value to the significance level (α, commonly 0.05):

  • If P-value < α,="" reject="" the="" null="" hypothesis.="" there="" is="" sufficient="" evidence="" to="" support="" the="" alternative="">
  • If P-value ≥ α, fail to reject the null hypothesis. There is insufficient evidence to support the alternative hypothesis.

Example:  A pharmaceutical company tests whether a new drug has a different success rate than the standard drug.
Of 300 patients taking the new drug, 210 showed improvement.
Of 250 patients taking the standard drug, 165 showed improvement.

At the 0.05 significance level, is there evidence that the success rates differ?

Solution:

Step 1: State hypotheses
\( H_0: p_1 = p_2 \) (success rates are equal)
\( H_a: p_1 \neq p_2 \) (success rates differ)

Step 2: Check conditions
New drug: \( \hat{p}_1 = \frac{210}{300} = 0.70 \), successes = 210 ≥ 10, failures = 90 ≥ 10 ✓
Standard drug: \( \hat{p}_2 = \frac{165}{250} = 0.66 \), successes = 165 ≥ 10, failures = 85 ≥ 10 ✓
Assuming random samples and independence are met.

Step 3: Calculate pooled proportion and standard error
\( \hat{p}_{pooled} = \frac{210 + 165}{300 + 250} = \frac{375}{550} \approx 0.6818 \)

\( SE_{pooled} = \sqrt{0.6818(0.3182)\left(\frac{1}{300} + \frac{1}{250}\right)} = \sqrt{0.2171 \times 0.00733} = \sqrt{0.001591} \approx 0.0399 \)

Step 4: Calculate test statistic
\( z = \frac{0.70 - 0.66}{0.0399} = \frac{0.04}{0.0399} \approx 1.00 \)

Step 5: Find P-value
For a two-sided test: P-value = 2 × P(Z > 1.00) = 2 × 0.1587 = 0.3174

Step 6: Make decision
Since P-value (0.3174) > α (0.05), we fail to reject the null hypothesis.

There is insufficient evidence to conclude that the success rates of the new drug and standard drug differ at the 0.05 significance level.

Practical Considerations and Common Mistakes

Statistical Significance vs. Practical Significance

A difference between proportions can be statistically significant (unlikely to occur by chance) without being practically significant (large enough to matter in real-world applications). Always consider the size of the difference, not just the P-value.

For example, in a study of 100,000 people, a difference of 1% in success rates might be statistically significant but too small to justify changing medical treatment protocols or incurring additional costs.

One-Sided vs. Two-Sided Tests

Choose between one-sided and two-sided tests based on the research question before collecting data. Use a two-sided test when interested in any difference. Use a one-sided test only when the research question specifically concerns whether one proportion is greater than (or less than) the other.

Confounding Variables

When comparing two proportions, be aware that other variables might explain the observed difference. Random assignment in experiments helps control for confounding variables, but in observational studies, differences might be due to factors other than the variable of interest.

Common Errors to Avoid

  • Using the wrong standard error: Remember to use the pooled proportion for hypothesis tests but separate sample proportions for confidence intervals.
  • Failing to check conditions: Always verify that sample sizes are large enough and that independence conditions are met.
  • Confusing proportion and count: Proportions are fractions or decimals between 0 and 1; counts are whole numbers. Always divide the count by the sample size to get the proportion.
  • Misinterpreting confidence intervals: A 95% confidence interval means that if we repeated the sampling process many times, about 95% of the intervals would contain the true parameter. It does not mean there is a 95% probability that the true parameter is in this specific interval.
  • Claiming causation: Statistical tests show association, not causation, unless the data come from a well-designed randomized experiment.

Sample Size Determination

Researchers often need to determine how large a sample size is needed to detect a difference between proportions with adequate power. The required sample size depends on:

  • The desired confidence level or significance level
  • The desired margin of error or power
  • The expected proportions in each group
  • Whether the sample sizes in the two groups will be equal

For a confidence interval with margin of error \( ME \), assuming equal sample sizes (\( n_1 = n_2 = n \)), the formula is approximately:

\[ n \approx \frac{(z^*)^2[\hat{p}_1(1-\hat{p}_1) + \hat{p}_2(1-\hat{p}_2)]}{ME^2} \]

When planning a study and no preliminary estimates exist, researchers often use \( \hat{p}_1 = \hat{p}_2 = 0.5 \), which gives the most conservative (largest) sample size estimate.

Technology and Calculations

In practice, statisticians use statistical software or graphing calculators to perform these calculations. Most statistical packages have built-in functions for:

  • Two-proportion z-tests
  • Two-proportion z-intervals
  • P-value calculations

However, understanding the underlying formulas and logic remains essential for interpreting results correctly and recognizing when conditions are not met or results don't make sense.

Connection to Other Statistical Methods

Comparing two proportions is closely related to other statistical procedures:

  • Chi-square test for independence: When comparing two proportions in a two-way table, the chi-square test gives equivalent results to the two-proportion z-test (for two-sided alternatives).
  • Comparing means: The logic of comparing two proportions parallels that of comparing two means, though the formulas differ because proportions and means have different distributions.
  • Comparing multiple proportions: When comparing more than two proportions, use chi-square tests or ANOVA-like methods rather than multiple two-sample tests.
The document Chapter Notes: Comparing Two Proportions is a part of the Grade 9 Course Statistics & Probability.
All you need of Grade 9 at this link: Grade 9
Explore Courses for Grade 9 exam
Get EduRev Notes directly in your Google search
Related Searches
past year papers, Chapter Notes: Comparing Two Proportions, Objective type Questions, video lectures, shortcuts and tricks, Sample Paper, mock tests for examination, study material, Exam, Chapter Notes: Comparing Two Proportions, Free, Previous Year Questions with Solutions, Important questions, MCQs, Extra Questions, Chapter Notes: Comparing Two Proportions, Summary, ppt, practice quizzes, Semester Notes, pdf , Viva Questions;