Karl Pearson’s Coefficient of Correlation
The measure of degree of relationship between two variables is called the correlation coefficient. It is denoted by symbol r. The assumptions that constitute a bivariate linear correlation population model, for which correlation is to be calculated, includes the following-(ya-lun chou)
1. Both X and Y are random variables. Either variable can be designated as the independent variable, and the other variable is the dependent variable.
2. The bivariate population is normal. A bivariate normal population is, among other things, one in which both X and Y are normally distributed.
3. The relationship between X and Y is, in a sense, linear. This assumption implies that all the means of Y’s associated with X values, fall on a straight line, which is the regression line of Y on X. And all the means of X’s associated with Y values, fall on a straight line, which is the regression line of X on Y. Furthermore, the population regression lines in the two equations are the same if and only if the relationship between Y and X is perfect- that is r = ± 1. Otherwise, with Y dependent, intercepts and slopes will differ from the regression equation with X dependent.
This method is most widely used in practice. It is denoted by symbol V. The formula for computing coefficient of correlation can take various alternative forms depending upon the choice of the user.
METHOD I — WHEN DEVIATIONS ARE TAKEN FROM ACTUAL ARITHMETIC MEAN
(A) WHEN STANDARD DEVIATIONS ARE GIVEN IN THE QUESTION.
Where x = Deviations taken from actual mean of X series Y = Deviations taken from actual mean of Y series N = Number of items σx = Standard deviation of X series σy = Standard deviation of Y series
(B) WHEN STANDARD DEVIATIONS ARE NOT GIVEN IN THE QUESTION
Where Σxy = Sum of product of deviations of X and Y series from actual mean
Σx2 = Sum of squares of deviation of X series from its mean
Σy2 = Sum of squares of deviation of Y series from its mean
Find correlation between marks obtained by 10 students in mathematics and statistics
Calculation of coefficient of correlation
Calculation by Method 1(a)
(Note : The above method should be used when specifically asked for, or if standard deviations are already given in the question, otherwise the following method should be used as it is less cumbersome)
Calculation by Method 1(b)
METHOD II WHEN DEVIATIONS ARE TAKEN FROM ASSUMED MEAN
This method is generally used when actual mean of X series or of Y series or both are in decimals, in which case using method I becomes tedious; in such a case deviations are taken from assumed mean to simplify the calculations.
Example 2 :
Calculate coefficient of correlation from following data
Since mean of X and Y are in decimals i.e. 11.5 and 12.1 respectively hence we would solve by method II
Example 3 : Find correlation between age of husband and age of wife.
Taking Ax as 10 and Ay as 15 N = 7
Properties of Coefficient of Correlation
Assumptions of Karl Pearson’s Coefficient of Correlation
|1. What is Karl Pearson's coefficient of correlation?
|2. How is Karl Pearson's coefficient of correlation calculated?
|3. What does a positive coefficient of correlation indicate?
|4. How do you interpret the value of Karl Pearson's coefficient of correlation?
|5. Can Karl Pearson's coefficient of correlation be used to establish causation between variables?