Introduction
Hypothesis testing is typically conducted for one or two samples. In the case of one sample, researchers often aim to determine whether a population characteristic, such as the mean, is equal to a specific value. For two samples, the focus may be on assessing whether the true means differ. Statistical hypothesis tests rely on a statistic designed to quantify the strength of evidence for various alternative hypotheses. The process of hypothesis testing involves the following steps:
Formulating a statement about the population.
- Drawing a sample from the population and analyzing the sample data.
- Acknowledging that some deviation between sample and population characteristics is expected since the sample represents only a subset, then determining whether the observed difference between the sample and the statement made could have occurred by chance alone, and hence is insignificant, or whether it is significant, casting doubt on the statement made.
- In essence, hypothesis testing entails an examination based on sample evidence and probability theory to ascertain the reasonableness of a hypothesis. A hypothesis represents a statement or claim about the entire population. A sample is taken from the population and analyzed. Consequently, in hypothesis testing, a statement or claim is made about the entire population, a sample is drawn from the population and analyzed, and the results of the analysis are utilized to determine whether the claim made is reasonably accepted as true (Vohra, 1929).
Components of a hypothesis test include:
- Null hypothesis (H0): A statement regarding the value(s) of unknown parameter(s), typically implying no association between explanatory and response variables in our applications (always containing an equality).
- Alternative hypothesis (H1): A statement contradicting the null hypothesis (always containing an inequality).
- Test statistic: A quantity based on sample data and the null hypothesis used to evaluate between the null and alternative hypotheses.
- Rejection region: Values of the test statistic for which we reject the null hypothesis in favor of the alternative hypothesis.
Another type of hypothesis includes one- and two-tailed alternative hypotheses. A one-tailed (or one-sided) hypothesis specifies the direction of the association between the predictor and outcome variables. A one-tailed hypothesis offers the statistical advantage of allowing a smaller sample size compared to that permissible by a two-tailed hypothesis. However, one-tailed hypotheses are not always appropriate.
Question for Hypothesis testing for differences between means and proportions
Try yourself:
What is the purpose of hypothesis testing?Explanation
- Hypothesis testing is used to quantify the strength of evidence for alternative hypotheses.
- It involves examining sample evidence and utilizing probability theory to determine the reasonableness of a hypothesis.
- The process includes formulating a statement about the population, drawing a sample, analyzing the sample data, and determining whether the observed difference between the sample and the statement made could have occurred by chance alone.
- The purpose is to assess the significance of the observed difference and make conclusions based on the evidence gathered.
Report a problem
Logic of Hypothesis Testing
- Assume the Null Hypothesis (H0) is true.
- Calculate the probability (p) of obtaining the observed results in your data if the Null Hypothesis were true.
- If that probability is low (< .05), then reject the Null Hypothesis.
- If the researcher rejects the Null Hypothesis, it leaves only the Research Hypothesis (H1).
Data Analysis Outcome:
Steps in hypothesis testing
- Formulate the null and alternative hypotheses: The null hypothesis is a statement regarding the value of a population parameter, while the alternative hypothesis is accepted if evidence suggests that the null hypothesis is false.
- Determine the significance level: Researchers select the significance level, denoted by alpha (α), which represents the probability of committing a Type I error. This determines the critical value or critical region for the test statistic.
- Choose the appropriate test statistic: Depending on whether the hypothesis pertains to a proportion or a mean, researchers utilize either the z-statistic or the t-statistic.
- Establish the decision rule: Decision rules specify the conditions under which the null hypothesis will be accepted or rejected. This is based on comparing the calculated test statistic to the critical value determined by the significance level.
- Perform computations: Researchers calculate the test statistic using the appropriate formula, depending on whether the z-statistic or the t-statistic is employed.
When using the t-statistic, researcher use the formula:
Subsequently, compare the calculated test statistic with the critical value. If the calculated value falls within the rejection region(s), researchers reject the null hypothesis; otherwise, they retain the null hypothesis. - Draw a conclusion Diagram: Flowchart of hypothesis testing:
Types of hypothesis testing
- Large Sample Tests, Population Mean (known population standard deviation)
- Large Sample Tests, Population Proportion (unknown population standard deviation)
- Small Sample Tests, Mean of a Normal Population
Errors in hypothesis testing: It can be observed from the formulation of hypotheses for a test that the null and alternative hypotheses represent competing statements about the true state of nature.
- Type I error, also referred to as a "false positive": This occurs when the null hypothesis is rejected despite being true. In simpler terms, it happens when an alternative hypothesis is accepted (the hypothesis of interest) even though the observed results could be due to chance. Essentially, it arises when a difference is perceived when none exists (or more precisely, when there is no statistically significant difference). Therefore, the probability of making a Type I error in a test with rejection region R is 0, P (R | H0 is true).
- Type II error, also known as a "false negative": This occurs when the null hypothesis is not rejected despite the alternative hypothesis being true. It is the failure to accept an alternative hypothesis when the researcher lacks sufficient power. Put plainly, it occurs when a difference exists but is not observed. So, the probability of making a Type II error in a test with rejection region R is 1 - P(R | Ha is true) − P(R | Ha). The power of the test can be represented as P (R | Ha is true).
Table: Comparison of Type I and Type II errors:
- Reducing Type I Errors: Utilizing prescriptive testing aims to enhance the confidence level, thereby diminishing the occurrence of Type I errors. Increasing the confidence level contributes to a reduction in the probability of making Type I errors.
- Reducing Type II Errors: Employing descriptive testing assists in providing a clearer description of the test conditions and acceptance criteria, leading to a decrease in Type II errors. This may result in a higher frequency of rejecting the Null hypothesis, consequently increasing the occurrence of Type I errors (rejecting H0 when it is actually true and should not have been rejected). Hence, reducing one type of error often entails an increase in the other.
- Testing Hypotheses for Differences Between Means: Testing differences between two or more means is a common practice in experimental research. The statistical technique employed when examining more than two means is known as analysis of variance. The procedure for hypothesis testing concerning differences in means varies depending on several factors, including whether the samples are unrelated or related, whether the population standard deviations are known, and whether they can be assumed to be equal.
- Testing Hypotheses for Differences Between Proportions: It is frequently necessary to assess the disparity between two population proportions. The initial step in hypothesis testing for differences in proportions involves calculating the standard error of the proportion using the hypothesized values of defect-free and defective items.
Advantages of Hypothesis Testing
- Suitable for comparing a treatment with the control.
- Relatively simple to compute.
Pitfalls of Hypothesis Testing:
- Statistical significance does not imply a cause-effect relationship and should be interpreted within the context of the study design.
- Dependency on the concentrations tested.
- Statistical power is influenced by variability.
- Inability to calculate confidence intervals. Prone to issues with poorly behaved data, often requiring the use of non-parametric statistical methods.
Question for Hypothesis testing for differences between means and proportions
Try yourself:
What is the purpose of formulating null and alternative hypotheses in hypothesis testing?Explanation
- The purpose of formulating null and alternative hypotheses in hypothesis testing is to make statements about the true state of nature.
- The null hypothesis represents the assumption that there is no significant difference or relationship between variables.
- The alternative hypothesis is accepted when there is evidence to suggest that the null hypothesis is false.
- By formulating these hypotheses, researchers can assess the probability of obtaining the observed results under the null hypothesis.
- If this probability is low, typically less than 0.05, the null hypothesis is rejected.
- Therefore, calculating the probability of obtaining the observed results under the null hypothesis helps researchers make informed decisions about accepting or rejecting the null hypothesis.
Report a problem
Conclusion
Hypothesis testing, also known as significance testing, is a method for assessing a claim or hypothesis about a parameter in a population using data collected from a sample. Researchers evaluate hypotheses by determining the likelihood of selecting a sample statistic if the hypothesis regarding the population parameter were true. While hypotheses are typically specified, an α-level is chosen, and a test statistic is calculated, practical applications may involve suggestions from the data, disregarding the choice of α-level, calculation of multiple test statistics, and modifications to the formal procedure.