We have previously discussed the concepts of null and alternative hypotheses, as well as the application of z-tests for large samples and t-tests for small samples. In many instances, we are tasked with determining whether a sample significantly differs from a known population. For instance, suppose we conducted a survey of 400 households in the Raigarh district of Chhattisgarh state and calculated their per capita income. Our objective is to assess whether the per capita income derived from the sample differs significantly from that of the entire district.
In such scenarios, two distinct situations may arise: i) when the population variance is known, and ii) when the population variance is unknown to us. Below, we outline the steps to be followed in each case.
Population Variance is Known:
Example 1: Let us consider the case that we know the per capita income of Raigarh district of Chhatisgarh as well as its variance. Suppose the data available in official records show that per capita income of Raigarh district is Rs. 10,000 and standard deviation of per capita income is Rs. 1,500. However, we did a sample survey of 400 households and found that their per capita income is Rs. 10,500. Do we accept the data provided in official records?
Given a large sample size and known population variance, we utilize the z-test as illustrated in Figure Our null hypothesis, denoted as , posits that the sample mean equals the population mean. Put differently, it suggests that the per capita income derived from the sample mirrors the data documented in official records.
Our alternative hypothesis is
By substituting values in the above we obtain
In the aforementioned scenario, with z = 6.67, the sample falls within the critical region, leading us to reject the hypothesis. Consequently, the per capita income derived from the sample significantly deviates from the per capita income documented in official records.
Example 2: Consider a scenario where the voltage generated by a particular brand of battery follows a normal distribution. A random sample of 100 such batteries yielded a mean voltage of 1.4 volts. At a significance level of 0.01, does this suggest that these batteries possess an average voltage different from 1.5 volts? It is assumed that the population standard deviation is 0.21 volts.
Here, H0: = 1.5
Since average voltage of the sample can be different from average voltage of the population if it is either less than or more than 1.5 volts, our rejection region is on both sides of the normal curve. Thus it is a case of two-tail test and alternative hypothesis is
Since the population standard deviation s is known, the test statistic is
According to Table 17.2, the critical value at a significance level of 1% is found to be 2.58. Given that the actual value of z exceeds 2.58, we reject the null hypothesis at the 1% significance level and instead accept the alternative hypothesis, indicating that the average life of batteries differs from 1.5 volts.
Population Variance not Known:
It is unrealistic to assume that we know the population standard deviation (σ), as we are typically unaware of the population mean itself. In cases where σ is unknown, we must estimate it using the sample standard deviation (s). Under such circumstances, two possibilities arise based on the sample size. If the sample size is large (n), we utilize the z-statistic.
In case the sample size is small (n 30) we apply t-statistic with n 1 degrees of freedom. The test statistic is
Here are the steps you should follow:
The null hypothesis posits that the population means of both populations are identical. In mathematical notation:
H₀: μ₁ = μ₂ ...
The alternative hypothesis, conversely, suggests that the population means differ:
H₁: μ₁ ≠ μ₂ ...
When the standard deviations (the positive square roots of the variances) of both populations are known, we utilize the z-statistic, which is defined as follows:
In equation above, the subscript 1 pertains to the first sample, and the subscript 2 pertains to the second sample. By inputting pertinent data into equation, we derive the actual value of z and juxtapose it with the tabulated value for the specified level of significance.
Example 1: A bank wants to find out the average savings of its customers in Delhi and Kolkata. A sample of 250 accounts in Delhi shows an average savings of Rs. 22500 while a sample of 200 accounts in Kolkata shows an average savings of Rs. 21500. It is known that standard deviation of savings in Delhi is Rs. 150 and that in Kolkata is Rs. 200. Can we conclude at 1 percent level of significance that banking pattern of customers in Delhi and Kolkata is the same?
In this case the null hypothesis is H0: 1 2
and the alternative hypothesis is HA: 1 2
We are provided with the information that
Since 1 and are known we apply z-test.
The test statistic is
By applying the information provided above we obtain
We find that at 1 per cent level of significance the critical value obtained from Table 17.1 is 2.58.
Since the actual value is greater than the tabulated value the null hypothesis is rejected and the alternative hypothesis is accepted. Thus the banking pattern of customers in Delhi and Kolkata are different.
Example 2: A mathematics teacher wants to compare the performance of Class X students in two sections. She administers the same set of questions to 25 students in Section A and 20 students in Section B. she finds that Section A students have a mean score of 78 marks with standard deviation of 4 marks while Section B students have a mean score of 75 marks with standard deviation of 5 marks. Is the performance of students in both Sections different at 1 percent level of significance?
In this case the null hypothesis H: is 1 2
and the alternative hypothesis is H A: 1 2
We are provided with the information that
Since 1 and 2 are not known and sample sizes are small we apply t-test.
The degree of freedom in this case is 25+20-2 = 43.
We can find out from Table 17.3 that at the 1 per cent level of significance the t-value for 43 degrees of freedom is 2.69.
Since the tabulated value of t is less than actual value of t we reject the hypothesis.
Therefore, students in Section A and Section B are different with respect to their performance in mathematics.
For a sample of 650 families the data obtained is presented in Table 17.3.
Table 17.3 is called contingency table, because we are trying to find whether the number of children is contingent upon the occupation of the father. Our purpose is test for possible relationship between the number of children and the occupation of father. Thus the null hypothesis is specified as:
In Table 17.3 we have presented the observed frequency for each cell in the table.
What should be the expected frequency when there is no relationship between the variables under consideration? We will answer this question below. Expected frequency is calculated under the assumption that there is no relationship between number of children and occupation of father. For each cell in Table 17.2 the expected frequency is obtained by
Where Eij is expected frequency for row ‘i’ and column j. For example, for row 2 and column 2 the expected frequency is
We find out the row and column totals for the data given in Table 17.3 and estimate the expected frequency for each cell. These are given in Table 17.4.
The next step is to compare the observed frequency with the expected frequency. In order to compare the observed frequency with the expected frequency we construct the chi-square statistic, which is given by
where O refers to observed frequency and E refers to expected frequency.
The chi-square statistic has degrees of freedom (r 1)(c 1) .
For example, if there are 3 rows and 4 columns, then degrees of freedom is (3 1)(4 1) 6 .
Let us summarise the steps to be followed in chi-square test. These are:
Since there are 5 rows and 5 columns, the degrees of freedom is .
The critical values of x2 for 5% & 1% level of significance for different degrees of freedom are given in Table 17.7 at the end of the Unit. We find from the table that for 16 d.f. the critical value of at 5 per cent level of significance is 26.30. The observed value of 2 to be 63.41. Since the observed value is greater than the critical value we reject the null hypothesis and accept the null hypothesis. Therefore, we conclude that the variables ‘number of children’ and ‘occupation of father’ are not independent.
1. What is the purpose of testing hypotheses for a single sample? |
2. How is testing for differences between two samples useful in research? |
3. What is the significance of using a contingency table in statistical analysis? |
4. How does inference about population variances help in research studies? |
5. What are some commonly asked questions related to hypothesis testing and statistical analysis in exams like UPSC? |