This lecture discusses how to perform tests of hypotheses about the coefficients of a linear regression model estimated by ordinary least squares (OLS).
The lecture is divided in two parts:
in the first part, we discuss hypothesis testing in the normal linear regression model, in which the OLS estimator of the coefficients has a normal distribution conditional on the matrix of regressors;
in the second part, we show how to carry out hypothesis tests in linear regression models where the OLS estimator can be proved to be asymptotically normal.
In both parts, the regression model is
where yi is an output variable, xi is a x k vector of inputs, β is a K x 1 vector of coefficients and εi is an error term. There are N observations in the sample, so that i = 1,....,N.
We also denote:
by y the N x 1 vector of outputs
by x the N x K matrix of inputs
by ε the N x 1 vector of errors
Using this notation, we can write
Moreover, the OLS estimator of β is
We assume that the design matrix X has full-rank, so that the matrix is invertible.
In this section we derive tests about the coefficients of the normal linear regression model. In this model the vector of errors ε is assumed to have a multivariate normal distribution conditional on X, with mean equal to 0 and covariance matrix equal to
σ2I
where I is the N X N identity matrix and σ2 is a positive constant.
It can be proved (see the lecture about the normal linear regression model) that the assumption of conditional normality implies that:
the OLS estimator is conditionally multivariate normal with mean β and covariance matrix;
the adjusted sample variance of the residuals
is an unbiased estimator of σ2; furthermore, it has a Gamma distribution with parameters N - K and σ2;
is conditionally independent of .
In a test of a restriction on a single coefficient, we test the null hypothesis
where βk is the k-th entry of the vector of coefficients β and .
In other words, our null hypothesis is that the k-th coefficient is equal to a specific value.
This hypothesis is usually tested with the test statistic
where Skk is the k-th diagonal entry of the matrix .
The test statistic t has a standard Student's t distribution with N - K degrees of freedom. For this reason, it is called a t statistic and the test is called a t test.
Proof
The null hypothesis is rejected if t falls outside the acceptance region.
How the acceptance region is determined depends not only on the desired size of the test, but also on whether the test is two-tailed (if we think that βk could be both smaller or larger than q) or one-tailed (if we assume that only one of the two things, i.e., smaller or larger, is possible). For more details on how to determine the acceptance region, see the glossary entry on critical values.
When testing a set of linear restrictions, we test the null hypothesis
where R is a L x K matrix and q is a L x 1 vector. L is the number of restrictions.
Example Suppose that β is 2 x 1 and that we want to test the hypothesis β1 + β2 = 1. We can write it in the form by setting
Example Suppose that β is 3 x 1 and that we want to test whether the two restrictions β1 = β2 and β3 hold simultaneously. The first restriction can be written as
β1 + β2 = 0
So we have
This hypothesis is usually tested with the test statistic
which has an F distribution with L and N - K degrees of freedom. For this reason, it is called an F statistic and the test is called an F test.
Proof
The F test is usually one-tailed. A critical value in the right tail of the F distribution is chosen so as to achieve the desired size of the test. Then, the null hypothesis is rejected if the F statistics is larger than the critical value.
When you use a statistical package to run a linear regression, you often get a regression output that includes the value of an F statistic. Usually this is obtained by performing an F test of the null hypothesis that all the regression coefficients are equal to (except the coefficient on the intercept).
As we explained in the lecture entitled Linear regression - maximum likelihood, the maximum likelihood estimator of the vector of coefficients of a normal linear regression model is equal to the OLS estimator . As a consequence, all the usual tests based on maximum likelihood procedures(e.g., Wald, Lagrange multiplier, likelihood ratio) can be employed to conduct tests of hypothesis about β.
In this section we explain how to perform hypothesis tests about the coefficients of a linear regression model when the OLS estimator is asymptotically normal.
As we have shown in the lecture entitled OLS estimator properties, in several cases (i.e., under different sets of assumptions) it can be proved that:
the OLS estimator is asymptotically normal, that is,
where denotes convergence in distribution (as the sample size N tends to infinity), and is a multivariate normal random vector with mean and covariance matrix V; the value of the K x Kmatrix V depends on the set of assumptions made about the regression model;
it is possible to derive a consistent estimator , that is,
where denotes convergence in probability (again as tends to infinity). The estimator is an easily computable function of the observed inputs xi and yi outputs .
These two properties are used to derive the asymptotic distribution of the test statistics used in hypothesis testing.
In a z test the null hypothesis is a restriction on a single coefficient:
where βk is the k-th entry of the vector of coefficients β and .
The test statistic is
where is the k-th diagonal entry of the estimator of the asymptotic covariance matrix.
The test statistic zN converges in distribution to a standard normal distribution as the sample size increases. For this reason, it is called a z statistic (because the letter z is often used to denote a standard normal distribution) and the test is called a z test.
Proof
When is large, we approximate the actual distribution of zN with its asymptotic one (standard normal). We then employ the test statistic zN in the usual manner: based on the desired size of the test and on the distribution of zN, we determine the critical value(s) and the acceptance region. The null hypothesis is rejected if zN falls outside the acceptance region.
In a Chi-square test, the null hypothesis is a set of L linear restrictions
where is a matrix and is a vector.
The test statistic is
which converges to a Chi-square distribution with L degrees of freedom. For this reason, it is called a Chi-square statistic and the test is called a Chi-square test.
Proof
When setting up the test, the actual distribution of is approximated by the asymptotic one (Chi-square).
Like the F test, also the Chi-square test is usually one-tailed. The desired size of the test is achieved by appropriately choosing a critical value in the right tail of the Chi-square distribution. The null is rejected if the Chi-square statistics is larger than the critical value.
556 videos|198 docs
|
556 videos|198 docs
|
|
Explore Courses for Mathematics exam
|