# Inference For Categorical Data (Chi-Square Tests)
Section A: Multiple Choice Questions
Q1: A chi-square goodness-of-fit test is used to determine: (a) If two categorical variables are independent (b) If observed frequencies match expected frequencies for one categorical variable (c) The strength of association between two quantitative variables (d) If the means of two populations are equal
Solution:
Ans: (b) Explanation: A chi-square goodness-of-fit test compares the observed frequencies in each category to the expected frequencies for a single categorical variable to determine if the distribution fits a hypothesized model. Option (a) describes a chi-square test of independence, option (c) describes correlation analysis, and option (d) describes a two-sample t-test.
Q2: The formula for the chi-square test statistic is: (a) \(\chi^2 = \sum \frac{(O - E)}{E}\) (b) \(\chi^2 = \sum \frac{(O - E)^2}{O}\) (c) \(\chi^2 = \sum \frac{(O - E)^2}{E}\) (d) \(\chi^2 = \sum (O - E)^2\)
Solution:
Ans: (c) Explanation: The correct chi-square test statistic formula is \(\chi^2 = \sum \frac{(O - E)^2}{E}\), where O represents observed frequencies and E represents expected frequencies. The squared difference is divided by the expected frequency, not the observed frequency, and the difference must be squared.
Q3: For a chi-square test of independence with a 3×4 contingency table, the degrees of freedom are: (a) 12 (b) 7 (c) 6 (d) 11
Solution:
Ans: (c) Explanation: The degrees of freedom for a chi-square test of independence is calculated as \(df = (r - 1)(c - 1)\), where r is the number of rows and c is the number of columns. For a 3×4 table: \(df = (3 - 1)(4 - 1) = 2 × 3 = 6\).
Q4: Which condition must be met to use a chi-square test? (a) All expected frequencies must be at least 5 (b) All observed frequencies must be at least 10 (c) The sample size must be less than 30 (d) The data must be normally distributed
Solution:
Ans: (a) Explanation: A key condition for using a chi-square test is that all expected frequencies (not observed frequencies) must be at least 5. This ensures the chi-square distribution is a good approximation. Chi-square tests do not require normal distribution as they are used for categorical data.
Q5: If the p-value from a chi-square test is 0.03 and the significance level is 0.05, we should: (a) Fail to reject the null hypothesis (b) Reject the null hypothesis (c) Accept the alternative hypothesis as proven (d) Increase the sample size
Solution:
Ans: (b) Explanation: When the p-value (0.03) is less than the significance level α (0.05), we reject the null hypothesis. This provides evidence for the alternative hypothesis, though we don't say it's "proven." Option (a) would be correct if p-value ≥ α.
Q6: The null hypothesis for a chi-square test of independence states: (a) The two categorical variables are associated (b) The two categorical variables are independent (c) The observed frequencies equal zero (d) The expected frequencies are incorrect
Solution:
Ans: (b) Explanation: The null hypothesis for a chi-square test of independence always states that the two categorical variables are independent (no association). The alternative hypothesis states that the variables are dependent or associated.
Q7: For a chi-square goodness-of-fit test with 5 categories, the degrees of freedom are: (a) 5 (b) 4 (c) 6 (d) 3
Solution:
Ans: (b) Explanation: For a chi-square goodness-of-fit test, the degrees of freedom equals the number of categories minus 1: \(df = k - 1\). With 5 categories, \(df = 5 - 1 = 4\).
Q8: A larger chi-square test statistic value indicates: (a) Greater agreement between observed and expected frequencies (b) Greater discrepancy between observed and expected frequencies (c) The null hypothesis is definitely true (d) The sample size is too small
Solution:
Ans: (b) Explanation: A larger chi-square test statistic value indicates a greater discrepancy between the observed and expected frequencies. This provides stronger evidence against the null hypothesis. A value closer to zero would indicate good agreement between observed and expected values.
Section B: Fill in the Blanks
Q9: The chi-square distribution is always __________ and skewed to the right.
Solution:
Ans: non-negative (or positive) Explanation: The chi-square distribution can only take values that are zero or positive because the test statistic involves squaring differences. It is always non-negative and has a right-skewed shape.
Q10: In a chi-square test, the __________ frequencies are calculated based on the assumption that the null hypothesis is true.
Solution:
Ans: expected Explanation:Expected frequencies are the theoretical frequencies we would anticipate in each category if the null hypothesis were true. These are compared to the actual observed frequencies from the data.
Q11: A chi-square test of __________ examines whether two categorical variables are related or associated.
Solution:
Ans: independence Explanation: A chi-square test of independence is used to determine whether there is a significant association between two categorical variables presented in a contingency table.
Q12: The expected frequency for a cell in a contingency table is calculated as: __________ .
Solution:
Ans: \(\frac{\text{(row total)} × \text{(column total)}}{\text{grand total}}\) Explanation: For any cell in a contingency table, the expected frequency is calculated by multiplying the row total by the column total and dividing by the grand total of all observations.
Q13: When conducting a chi-square test, if any expected frequency is less than __________, the test results may not be reliable.
Solution:
Ans: 5 Explanation: A key condition for the chi-square test is that all expected frequencies should be at least 5. When this condition is violated, the chi-square approximation may not be valid.
Q14: The shape of the chi-square distribution becomes more __________ as the degrees of freedom increase.
Solution:
Ans: symmetric (or normal) Explanation: As the degrees of freedom increase, the chi-square distribution becomes less skewed and approaches a more symmetric or normal shape, though it remains non-negative.
Section C: Word Problems
Q15: A genetics experiment predicts that pea plants will appear in four colors in the ratio 9:3:3:1. Out of 160 plants observed, the counts are: 95 yellow, 28 green, 25 yellow-green, and 12 white. Calculate the chi-square test statistic to determine if the observed data fit the expected ratio. Use the expected frequencies based on the predicted ratio.
Solution:
Ans: Step 1: Calculate expected frequencies based on ratio 9:3:3:1. Total parts = 9 + 3 + 3 + 1 = 16 Expected yellow = \(\frac{9}{16} × 160 = 90\) Expected green = \(\frac{3}{16} × 160 = 30\) Expected yellow-green = \(\frac{3}{16} × 160 = 30\) Expected white = \(\frac{1}{16} × 160 = 10\)
Step 2: Calculate chi-square statistic using \(\chi^2 = \sum \frac{(O - E)^2}{E}\) For yellow: \(\frac{(95 - 90)^2}{90} = \frac{25}{90} = 0.278\) For green: \(\frac{(28 - 30)^2}{30} = \frac{4}{30} = 0.133\) For yellow-green: \(\frac{(25 - 30)^2}{30} = \frac{25}{30} = 0.833\) For white: \(\frac{(12 - 10)^2}{10} = \frac{4}{10} = 0.400\)
Step 3: Sum all components: \(\chi^2 = 0.278 + 0.133 + 0.833 + 0.400 = 1.644\)
Final Answer: The chi-square test statistic is 1.644
Q16: A survey asked 200 students about their preferred lunch option (Pizza, Burger, or Salad) across two grade levels (Grade 9 and Grade 10). The results showed: Grade 9 (50 Pizza, 30 Burger, 20 Salad) and Grade 10 (45 Pizza, 35 Burger, 20 Salad). Calculate the expected frequency for Grade 9 students who prefer Pizza.
Solution:
Ans: Step 1: Calculate row and column totals. Grade 9 total = 50 + 30 + 20 = 100 Grade 10 total = 45 + 35 + 20 = 100 Pizza total = 50 + 45 = 95 Grand total = 200
Step 2: Use the expected frequency formula: \(E = \frac{\text{(row total)} × \text{(column total)}}{\text{grand total}}\) \(E = \frac{100 × 95}{200} = \frac{9500}{200} = 47.5\)
Final Answer: The expected frequency for Grade 9 students who prefer Pizza is 47.5
Q17: A researcher wants to test if the distribution of blood types (A, B, AB, O) in a sample of 300 people matches the national distribution (40% type A, 30% type O, 20% type B, 10% type AB). The observed frequencies are: 135 type A, 75 type O, 60 type B, and 30 type AB. Calculate the expected frequency for type O blood and find the contribution of type O to the chi-square statistic.
Solution:
Ans: Step 1: Calculate expected frequency for type O. Expected frequency = 30% of 300 = 0.30 × 300 = 90
Final Answer: The expected frequency for type O is 90, and its contribution to the chi-square statistic is 2.5
Q18: A biologist conducted a chi-square test of independence using a 2×3 contingency table and obtained a test statistic of 8.42. Using a significance level of 0.05 and the critical value of 5.991 for the appropriate degrees of freedom, state whether the null hypothesis should be rejected and explain what this means in context.
Solution:
Ans: Step 1: Identify degrees of freedom. For a 2×3 table: \(df = (2-1)(3-1) = 1 × 2 = 2\)
Step 2: Compare test statistic to critical value. Test statistic = 8.42 Critical value = 5.991 Since 8.42 > 5.991, the test statistic exceeds the critical value.
Step 3: Make decision. We reject the null hypothesis because the test statistic falls in the rejection region.
Final Answer: Reject the null hypothesis. This means there is sufficient evidence at the 0.05 significance level to conclude that the two categorical variables are not independent (they are associated).
Q19: A marketing company surveyed 400 consumers about their preference for three different brands (X, Y, Z). The observed frequencies were 180 for Brand X, 120 for Brand Y, and 100 for Brand Z. Test whether consumers have equal preference for all three brands by calculating the chi-square test statistic. Assume equal expected frequencies.
Solution:
Ans: Step 1: Calculate expected frequencies. Under equal preference, each brand should have: \(\frac{400}{3} = 133.33\) consumers
Step 3: Sum all contributions. \(\chi^2 = 16.34 + 1.33 + 8.33 = 26.00\)
Final Answer: The chi-square test statistic is 26.00
Q20: A school counselor wants to determine if there is an association between gender (Male, Female) and participation in after-school activities (Sports, Arts, Neither). A survey of 150 students gave the following data: Males (35 Sports, 15 Arts, 25 Neither) and Females (20 Sports, 30 Arts, 25 Neither). Calculate the expected frequency for females who participate in Sports.
Solution:
Ans: Step 1: Calculate totals. Male total = 35 + 15 + 25 = 75 Female total = 20 + 30 + 25 = 75 Sports total = 35 + 20 = 55 Grand total = 150
Semester Notes, mock tests for examination, Extra Questions, ppt, shortcuts and tricks, Free, Summary, practice quizzes, Objective type Questions, Worksheet (with Solutions): Inference For Categorical Data (Chi-Square Tests), Viva Questions, Previous Year Questions with Solutions, Worksheet (with Solutions): Inference For Categorical Data (Chi-Square Tests), Important questions, past year papers, Exam, pdf , Worksheet (with Solutions): Inference For Categorical Data (Chi-Square Tests), Sample Paper, video lectures, MCQs, study material;