Covariance and Correlation - Engineering - Engineering PDF Download

Covariance and Correlation are two fundamental mathematical concepts used in probability and statistics to describe how two random variables change together. Both measure relationships between variables; covariance indicates the direction of the linear relationship, while correlation indicates both direction and standardized strength.

Covariance

Definition

Covariance of two random variables X and Y is the expected value of the product of their deviations from their respective means. It measures how X and Y vary together; a positive covariance indicates that when X is above its mean Y tends to be above its mean, while a negative covariance indicates opposite behaviour.

Mathematical expressions

  • Population covariance: Cov(X,Y) = E[(X - μX)(Y - μY)].
  • Sample covariance (computational form): Cov(X,Y) = (1/n) Σi=1n (xi - x̄)(yi - ȳ).
  • Unbiased estimator (sample): SXY = (1/(n-1)) Σi=1n (xi - x̄)(yi - ȳ).
Mathematical expressions
Mathematical expressions

Notes on formulas:

  • μX and μY denote population means; x̄ and ȳ denote sample means.
  • Covariance has units equal to the product of units of X and Y; it is scale-dependent.
  • Covariance can take any real value: -∞ < Cov(X,Y) < ∞.
  • If Cov(X,Y) = 0 this indicates no linear relationship; X and Y can still be dependent nonlinearly.

Properties of covariance

  • Cov(X,X) = Var(X).
  • Cov(aX + b, cY + d) = ac · Cov(X,Y) for constants a,b,c,d.
  • Cov(X,Y) = Cov(Y,X) (symmetry).
  • If X and Y are independent then Cov(X,Y) = 0; the converse need not hold.

Covariance matrix (multivariate)

For a vector of random variables X = (X1, X2, ..., Xk)ᵀ the covariance matrix Σ has entries Σij = Cov(Xi, Xj). The covariance matrix is symmetric and positive semi-definite.

Worked numerical example

Example dataset: X = {1, 2, 3}, Y = {2, 4, 5}.

Compute the sample covariance using the unbiased estimator (divide by n-1).

n = 3.

x̄ = (1 + 2 + 3)/3 = 2.

ȳ = (2 + 4 + 5)/3 = 11/3 ≈ 3.6667.

Deviations for X: (1-2) = -1, (2-2) = 0, (3-2) = 1.

Deviations for Y: (2-11/3) = -5/3, (4-11/3) = 1/3, (5-11/3) = 4/3.

Products of deviations: (-1)(-5/3) = 5/3, (0)(1/3) = 0, (1)(4/3) = 4/3.

Sum of products = 5/3 + 0 + 4/3 = 9/3 = 3.

Sample covariance SXY = (1/(n-1)) · 3 = (1/2) · 3 = 1.5.

Worked numerical example

Correlation

Definition

Correlation quantifies the strength and direction of a linear relationship between two variables. The most common measure is the Pearson correlation coefficient, which is the covariance normalised by the product of standard deviations, making it dimensionless and bounded between -1 and +1.

Pearson correlation coefficient

  • Population (ρ): ρX,Y = Cov(X,Y) / (σX σY).
  • Sample (r): r = SXY / (sX sY), where SXY is the sample covariance and sX, sY are sample standard deviations.
Pearson correlation coefficient

Interpretation

  • r = +1 indicates a perfect positive linear relationship.
  • r = -1 indicates a perfect negative linear relationship.
  • r = 0 indicates no linear association; there may still be a nonlinear relationship.
  • Values close to ±1 indicate stronger linear association; values near 0 indicate weak linear association.

Properties and remarks

  • Correlation is dimensionless and scale-invariant: multiplying one variable by a positive constant does not change r.
  • Pearson correlation measures only linear association; it may be misleading for nonlinear relationships or for data with strong outliers.
  • Coefficient of determination R² = r² indicates the fraction of variance in Y explained by a linear model with X (in simple linear regression context).
  • There are other correlation measures: Spearman rank correlation for monotonic relationships and Kendall's τ for concordance; these are less sensitive to outliers and nonlinearity.

Worked numerical example (using previous dataset)

Using X = {1,2,3} and Y = {2,4,5} and the covariance computed earlier SXY = 1.5.

Compute the sample standard deviation sX.

Variance of X (sample) sX2 = (1/(n-1)) Σ (xi - x̄)2.

Deviations squared for X: (-1)2 = 1, 02 = 0, 12 = 1.

Sum = 2.

sX2 = 2/(3-1) = 1, so sX = 1.

Compute sY.

Deviations for Y previously: -5/3, 1/3, 4/3.

Squared deviations: (25/9), (1/9), (16/9).

Sum = 42/9 = 14/3.

sY2 = (14/3)/(3-1) = (14/3)/2 = 7/3 ≈ 2.3333, so sY ≈ √(7/3) ≈ 1.5275.

Sample correlation r = SXY / (sX sY) = 1.5 / (1 · 1.5275) ≈ 0.982.

This indicates a very strong positive linear relationship for the given small sample.

Worked numerical example (using previous dataset)

Covariance versus Correlation

Covariance versus Correlation

Direct comparisons

  • Meaning: Covariance measures direction and magnitude in the original units; correlation measures direction and standardised strength.
  • Range: Covariance is unbounded (-∞ to +∞); correlation is bounded between -1 and +1.
  • Units: Covariance has compound units (units of X × units of Y); correlation is dimensionless.
  • Scale dependence: Covariance changes if units or scale of X or Y change; correlation does not (apart from sign change if a variable is multiplied by a negative constant).
  • Interpretation: Correlation is preferable when comparing strengths of relationships across different variable pairs; covariance is used directly in multivariate statistics (covariance matrices, principal component analysis, multivariate normal distributions).

Applications and relevance for engineering

Civil Engineering (CE): Covariance and correlation are used in reliability analysis, structural health monitoring, geotechnical data analysis and in the assessment of correlated loads and material properties.

Computer Science Engineering (CSE): Correlation is used in feature selection for machine learning, principal component analysis for dimensionality reduction, signal processing and pattern recognition.

Electrical Engineering (EE): Covariance and correlation are central to signal processing, estimation theory (Wiener filter, Kalman filter), antenna array processing and analysis of stochastic signals and noise.

Practical remarks and pitfalls

  • Correlation does not imply causation; a strong correlation may be due to an underlying common cause or coincidence.
  • Outliers can dramatically affect covariance and Pearson correlation; consider robust alternatives (Spearman, Kendall) if outliers are present.
  • For small samples the sample correlation can be an unreliable estimate of the population correlation; use appropriate statistical tests (e.g., t-test for correlation) and confidence intervals.
  • When working with time series, use autocovariance and autocorrelation functions and account for non-stationarity before interpretation.

Summary

Covariance and correlation both describe relationships between two variables. Covariance provides a measure in original units and indicates direction; correlation standardises this measure to a dimensionless index between -1 and +1 indicating both direction and relative strength. Both concepts are widely used across engineering disciplines for analysis, modelling and inference; choice between them depends on whether scale matters and whether comparison across different variable pairs is required.

The document Covariance and Correlation is a part of the Engineering Mathematics Course Engineering Mathematics.
All you need of Engineering Mathematics at this link: Engineering Mathematics
Explore Courses for Engineering Mathematics exam
Get EduRev Notes directly in your Google search
Related Searches
ppt, Summary, practice quizzes, past year papers, Important questions, Objective type Questions, Previous Year Questions with Solutions, Extra Questions, Exam, mock tests for examination, video lectures, Covariance and Correlation, Semester Notes, Covariance and Correlation, MCQs, study material, Viva Questions, shortcuts and tricks, pdf , Covariance and Correlation, Sample Paper, Free;