Multivariate Normal Distribution
The MVN distribution is a generalization of the univariate normal distribution which has the density function (p.d.f.)
where = mean of distribution, 2 = variance. In pdimensions the density becomes
Within the mean vector there are p (independent) parameters and within the symmetric covariance matrix independent parameters ) independent parameters in total]. We use the notation
(3.2)
to denote a RV x having the pvariate MVN distribution with
Note that MVN distributions are entirely characterized by the rst and second moments of the distribution.
3.1 Basic properties
If x (p x 1)is MVN with mean μ and covariance matrix ∑
Any linear combination of x is MVN Let y = Ax + c with A(q x p) and c(q x 1) then
ii) if x is MVN with covariance matrix ∑, then Ax and Bx are independent if and only if
Result
For the MVN distribution, variable are uncorrelated ⇔variable are independent. Proof Let x (p x 1) be partitioned as
with mean vector
and covariance matrix
i) Independent ⇒ uncorrelated (always holds).
ii) Uncorrelated ⇒ independent (for MVN)
This result depends on factorizing the p.d.f. (3.1) when ∑12 = 0:
In this case (x - μ)T∑-1 (x - μ) has the partitioned form
f (x) = g (x1)h(x2)
proving that x1 and x2 are independent.
3.2 Conditional distribution
Let be a partitioned MVN random pvector,
with mean and covariance matrix
The conditional distribution of X2 given X1 = x1 is MVN with
Note: the notation X1 to denote the r:v: and x1 to denote a specic constant value (realization of X1) will be very useful here.
Proof of 3.4a
Dene a transformation from (X1;X2) to new variables This
is achieved by the linear transformation
This linear relationship shows that X1; are jointly MVN (by rst property of MVN stated above.) We now show that and X1 are independent by proving that X1 and are uncorrelated.
Approach 1:
Since and X1 are MVN and uncorrelated they are independent. Thus
as required.
Proof of 3.4b Because is independent of X1
The left hand side is
The right hand side is
following from the general expansion
with Therefore
as required.
Example Let x have a MVN distribution with covariance matrix Let x have a MVN distribution with covariance matrix
Show that the conditional distribution of (X1;X2) given X3 = x3 is also MVN with mean
and covariance matrix
Solution
Hence
and :
3.3 Maximum-likelihood estimation
Let XT = (x1,........,xn) contain an independent random sample of size n from Np (μ, ∑). The maximum likelihood estimates (MLE s) of μ, ∑ are the sample mean and covariance matrix (with divisor n)
The likelihood function is a function of the parameters μ, ∑ given the data X
(3.7)
The RHS is evaluated by substituting the individual data vectors {x1,..........,xn} in turn into the p.d.f. of Np (μ, ∑) and taking the product.
Maximizing L is equivalent to minimizing the "log likelihood" function
where K is a constant independent of μ, ∑
Result 3.3
up to an additive constant, where
Proof
Noting that the nal term in the likelihood expression (3.8) becomes
provingtheexpression(3.9). Notethatthecross-producttermshavevanishedbecause and therefore
In (3.9) the dependence on is entirely through d. Now assume that is positive denite (p.d.), then so is ∑-1 as
where is the eigenanalysis of is minimized with respect to μ for fixed ∑ when d = 0 i.e.
Final part of proof: to minimize the log-likelihood
We show that
Lemma 1 ∑-1S is positive semi-denite (proved elsewhere). Therefore the eigenvalues of ∑-1S are positive.
Lemma 2 For any set of positive numbers
A ≥ logG - 1
where A and G are the arithmetic, geometric means respectively.
Proof
For all x we have (simple exercise).Consider a set of n strictly positive numbers {yi}
as required. Recall that for any (nn) matrix A; we have tr the sum of the eigenvalues, and the product of the eigenvalues. Let λi (i = 1,......,p) be the positive eigenvalues of ∑-1S and substitute in (3.11)
Hence
This proves that the MLEs are as stated in (3:6):
3.3 Sampling distribution of and S
The Wishart distribution (Denition)
If M (p x p) can be written M = XTX where X (m x p) is a data matrix from Np (0, ∑) then M is said to have a Wishart distribution with scale matrix ∑ and degrees of freedom m.We write
(3.12)
When ∑ = Ip the distribution is said to be in standard form.
Note:
The Wishart distribution is the multivariate generalization of the chi-square X2 distribution
Additive property of matrices with a Wishart distribution Let M1, M2 be matrices having the Wishart distribution
independently, then
This property follows from the denition of the Wishart distribution because data matrices are additive in the sense that if
is a combined data matrix consisting of m1 + m2 rows then
is matrix (known as the "Gram matrix") formed from the combined data matrix X.
Case of p = 1
When p = 1 we know from the denition of as the distribution of the sum of squares of r independent N (0,1) variates that
so that
Sampling distributions
Let x1,x2,.......,xn be a random sample of size n from Np (μ,∑). Then
1. The sample mean has the normal distribution
2. The (scaled) sample covariance matrix has the Wishart distribution:
3. The distributions of and Su are independent.
3.4 Estimators for special circumstances
3.4.1 proportional to a given vector
Sometimes μ is known to be proportional to a given vector, so μ = kμ0 with μ0 being a known vector. For example if x represents a sample of repeated measurements then μ = k1 where 1 = (1,1,............,1)T is the pvector of 10s: We nd the MLE of k for this situation. Suppose ∑ is known and The log likelihood is
Set to minimize l(k) w.r.t. k
from which
(3.13)
Properties We now show that is an unbiased estimator of k and determine the variance of In (3.13) takes the form
since Henc
(3.14)
showing that is an unbiased estimator
3.4.2 Linear restriction on μ
We determine an estimator for μ to satisfy a linear restriction
Aμ = b
where A (m x p) and b (m x 1) are given constants and ∑ is assumed to be known. We write the restriction in vector form g (μ) = 0 and form the Lagrangean
where λT = ( λ1,....., λm) is a vector of Lagrange multipliers (the factor 2 is inserted just for convenience).
Set using results from Example Sheet 2:
We use the constraint Aμ = b to evaluate the Lagrange multipliers λ Premultiply by A
Substitute into (3.16 )
(3.17)
3.4.3 Covariance matrix proportional to a given matrix
We consider estimating k when is a given.constant matrix. The likelihood (3.8) takes the form when
plus constant terms (not involving k):
Hence
(3.18)
556 videos|198 docs
|
1. What is a multivariate normal distribution? |
2. How is the multivariate normal distribution defined mathematically? |
3. What are the key properties of the multivariate normal distribution? |
4. How is the multivariate normal distribution used in practice? |
5. How can one estimate the parameters of a multivariate normal distribution from data? |
556 videos|198 docs
|
|
Explore Courses for Mathematics exam
|