Method of Least Squares : If a straight line is fitted to the data it will serve as a satisfactory trend, perhaps the most accurate method of fitting is that of least squares. This method is designed to accomplish two results.
(i) The sum of the vertical deviations from the straight line must equal zero.
(ii) The sum of the squares of all deviations must be less than the sum of the squares for any other conceivable straight line.
There will be many straight lines which can meet the first condition. Among all different lines, only one line will satisfy the second condition. It is because of this second condition that this method is known as the method of least squares. It may be mentioned that a line fitted to satisfy the second condition, will automatically satisfy the first condition.
The formula for a straight-line trend can most simply be expressed as
Yc = a + bX
where X represents time variable, Yc is the dependent variable for which trend values are to be calculated and a and b are the constants of the straight tine to be found by the method of least squares.
Constant is the Y-intercept. This is the difference between the point of the origin (O) and the point of the trend line and Y-axis intersect. It shows the value of Y when X = 0, constant b indicates the slope which is the change in Y for each unit change in X.
Let us assume that we are given observations of Y for n number of years. If we wish to find the values of constants a and b in such a manner that the two conditions laid down above are satisfied by the fitted equation.
Mathematical reasoning suggests that, to obtain the values of constants a and b according to the Principle of Least Squares, we have to solve simultaneously the following two equations.
∑Y = na + b∑Y ...(i)
∑XY = a∑X + b∑X2 ...(ii)
Solution of the two normal equations yield the following values for the constants a and b :
b =
and a =
Least Squares Long Method : It makes use of the above mentioned two normal equations without attempting to shift the time variable to convenient mid-year. This method is illustrated by the following example.
Illustration : Fit a linear trend curve by the least-squares method to the following data :
Year Production (Kg.)
2001 3
2002 5
2003 6
2004 6
2005 8
2006 10
2007 11
2008 12
2009 13
2010 15
Solution : The first year 2001 is assumed to be 0, 2002 would become 1, 2003 would be 2 and so on. The various steps are outlined in the following table.
----------------------------------------------------
Year Production
Y X XY X2
1 2 3 4 5
----------------------------------------------------
2001 3 0 0 0
2002 5 1 5 1
2003 6 2 12 4
2004 6 3 18 9
2005 8 4 32 16
2006 10 5 50 25
2007 11 6 66 36
2008 12 7 84 49
2009 13 8 104 64
2010 15 9 135 11
Total 89 45 506 285
-----------------------------------------------------
The above table yields the following values for various terms mentioned below :
n = 10, ∑X = 45, ∑X2 = 285, ∑Y = 89, and ∑XY = 506
Substituting these values in the two normal equations, we obtain
89 = 10a + 45b ...(i)
506 = 45a + 285b ...(ii)
Multiplying equation (i) by 9 and equation (ii) by 2, we obtain
80l = 90a + 405b ...(iii)
1012 = 90a + 570b ...(iv)
Subtracting equation (iii) from equation (iv), we obtain
211 = 165b or b = 211/165 = 1.28
Substituting the value of b in equation (i), we obtain
89 = 10a + 45 × 1.28
89 = 10a + 57.60
10a = 89 – 57.6
10a = 31.4
a = 31.4/10 = 3.14
Substituting these values of a and b in the linear equation, we obtain the following trend line
Yc = 3. 14 + 1.28X
Inserting various values of X in this equation, we obtain the trend values as below :
-----------------------------------------------------------------
Year Observed Y bxX Yc (Col. 3 plus Col. 4)
1 2 3 4 5
-----------------------------------------------------------------
2001 3 3.14 1.28 × 0 3.14
2002 5 3.14 1.28 × 1 4.42
2003 6 3.14 1.28 × 2 5.70
2004 6 3.14 1.28 × 3 6.98
2005 8 3.14 1.28 × 4 8.26
2006 10 3.14 1.28 × 5 9.54
2007 11 3.14 1.28 × 6 10.82
2008 12 3.14 1.28 × 7 12.10
2009 13 3.14 1.28 × 8 13.38
2010 15 3.14 1.28 × 9 14.66
-------------------------------------------------------------------
Least Squares Method : We can take any other year as the origin, and for that year X would be 0. Considerable saving of both time and effort is possible if the origin is taken in the middle of the whole time span covered by the entire series. The origin would than be located at the mean of the X values. Sum of the X values would then equal 0. The two normal equations would then be simplified to
∑Y = Na ...(i)
or a =
and ∑XY = b∑X2 or b = ...(ii)
Two cases of short cut method are given below. In the first case there are odd number of years while in the second case the number of observations are even.
Illustration : Fit a straight line trend on the following data :
Year 1996 1997 1998 1999 2000 2001 2002 2003 2004
Y 4 7 7 8 9 11 13 14 17
Solution : Since we have 9 observations, therefore, the origin is taken at 2000 for which X is assumed to be 0.
------------------------------
Year Y X XY X2
------------------------------
1996 4 – 4 – 16 16
1997 7 – 3 – 21 9
1998 7 – 2 – 14 4
1999 8 – 1 – 8 1
2000 9 0 0 0
2001 11 1 11 1
2002 13 2 26 4
2003 14 3 42 9
2004 17 4 68 16
-----------------------------
Total 90 0 88 60
------------------------------
Thus n = 9, SY = 90, SX = 0, SXY = 88, and SX2 = 60
Substituting these values in the two normal equations, we get
90 = 9a or a = 90/9 or a = 10
88 = 60 or b = 88/60 or b = 1.47
Trend equation is : Yc = 10 + 1.47 X
Inserting the various values of X, we obtain the trend values as below :
Solution : Here there are two mid-years viz; 2006 and 2007. The mid-point of the two years is assumed to be 0 and the time of six months is treated to be the unit. On this basis the calculations are as shown below:
----------------------------------------------
Years Observed Y X XY X2
----------------------------------------------
2003 6.7 – 7 – 46.9 49
2004 5.3 – 5 – 26.5 25
2005 4.3 – 3 – 12.9 9
2006 6.1 – 1 – 6.1 1
2007 5.6 1 5.6 1
2008 7.9 3 23.7 9
2009 5.8 5 29.0 25
2010 6.1 7 42.7 49
----------------------------------------------
Total 47.8 0 8.6 168
----------------------------------------------
From the above computations, we get the following values.
n = 8, ∑Y = 47.8, ∑X = 0, ∑XY = 8.6, ∑X2 = 168
Substituting these values in the two normal equations, we obtain
47.8 = 8a or a = 47.8/8 or a = 5.98 and 8.6 = 168 b or = 8.6/168 or b = 0.051
The equation for the trend line is : Yc = 5.98 + 0.051X
Trend values generated by this equation are below :
Second Degree Parabola
The simplest example of the non-linear trend is the second degree parabola, the equation is written in the form :
Yc = a + bX + cX2
When numerical values for a, b and c have been derived, the trend value for any year may be
computed substituting in the equation the value of X for that year. The values of a, b and c can be determined
by solving the following three normal equations simultaneously:
(i) ∑Y = Na + bSX + c∑X2
(ii) ∑XY = a∑X + b∑X2 + c∑X3
(iii) ∑X2Y = a∑X2 + b∑X3 + c∑X4
Note that the first equation is merely the summation of the given function, the second is the summation of X multiplied into the given function, and the third is the summation of X2 multiplied into the given function.
When time origin is taken between two middle years SX would be zero. In that case the equations are reduced to :
(i) ∑Y = Na + c∑X2
(ii) ∑XY = b∑X2
(iii) ∑X2Y = a∑X2 + c∑X4
The value of b can now directly be obtained from equation (ii) and value of a and c by solving equations (i) and (iii) simultaneously. Thus,
a = b = c =
Illustration : The price of a commodity during 2000 – 2005 is given below. Fit a parabola Y = a + bX + cX2 to this data. Estimate the price of the commodity for the year 2010 :
Year Price Year Price
2000 100 2003 140
2001 107 2004 181
2002 128 2005 192
Also plot the actual and trend values on graph.
Solution : To determine the value a, b and c, we solve the following normal equations:
∑ Y = Na + b∑X + c∑X2
∑XY = a∑X + b∑X2 + c∑X3
∑X2Y = a∑X2 + b∑X3 + c∑X4
-----------------------------------------------------------------------------------
Year Y X X2 X3 X4 XY X2Y Yc
-----------------------------------------------------------------------------------
2000 100 – 2 4 – 8 16 – 200 400 97.744
2001 107 – 1 1 – 1 1 – 107 107 110.426
2002 128 0 0 0 0 0 0 126.680
2003 140 +1 1 +1 1 +140 140 146.506
2004 181 +2 4 +8 16 + 362 724 169.904
2005 192 +3 9 +27 81 +576 1728 196.874
--------------------------------------------------------------------------------------
N = 6 ∑Y = 848 ∑X = 3 ∑X2 = 19 ∑X3 = 27 ∑X4 = 115 ∑XY = 771 ∑X2Y = 3099 ∑Yc = 848.134
--------------------------------------------------------------------------------------
848 = 6a + 3b + 19c ...(i)
771 = 3a +19b +27c ...(ii)
3,099 = 19a + 27b +115c ...(iii)
Solving Eqns. (i) and (ii), get
35b + 35c = 695 ...(iv)
Multiplying Eqn. (ii) by 19 and Eqn. (iii) by 3. Subtracting (iii) from (ii), we get
5352 = 280b + 168 c ...(v)
Solving Eqns. (iv) and (v), we get
c = 1.786
Substituting the value of c in Eqn. (iv), we get
b = 18.04 [35 b +(35 × 1.786) = 695]
Putting the value of b and c in Eqn. (i), we get
a = 126.68 [848 = 6a + (3 × 18.04) + (19 × 1.786))
Thus a = 126.68, b =18.04 and c = 1.786
Substituting the values in the equation
Yc = 126.68 + 18.04X + 1.786X2
When X = – 2, Y = 126.68 + 18.04(–2) + 1.786(– 2)2
= 126.68 – 36.08 + 7.144 = 97.744
When X = –1, Y = 126.68 + 18.04(–1) + 1.786(–1)2
= 126.68 – 18.04 + 1.786 = 110.426
When X = 0, Y = 126.68
When X = l, Y = 126.68 + 18.04 + 1.786 = 146.506
When X = 2, Y = 126.68 + 18.04(2) + 1.786(2)2
= 126.68 + 36.08 + 7.144 = 169.904
When X = 3, Y = 126.68 + 18.04(3) + 1.786(3)2
= 126.68 + 54.12 + 16.074 = 196.874
Price for 2010, Y = 126.68 + 18.04(8) + 1.786(8)2
When X = 8 = 126.68 + 144.32 + 114.304 = 385.304
Thus the likely price of the commodity for the year 2010 is Rs.385.304.
The graph of the actual trend values values is given below:
Conversion of Annual Trend Equation to Monthly Trend Equation
Fiting a trend line by least squares to monthly data may be excessively time consuming. It is more convenient to compute the trend equation from annual data and then convert this trend equation to a monthly trend equation.
There are two possible situations: (i) the Y units are annual totals, for example, the total number of passenger cars sold; (ii) the Y units are monthly averages, for example average monthly wholesale price Index.
Where Data are Annual Totals
A trend equation operative on an annual level is to be reduced to a monthly level. Constant value, a, is expressed in terms of annual Y values. To express it in terms of monthly values, we must divide it by 12. Similarly b is to be divided by 12 to convert the annual change to a monthly change. But this division shows us only the change for any month of two consecutive years, whereas we want change for two consecutive months. Therefore b is to be divided by 12 once again. Consequently, to convert annual trend equation to a monthly trend equation, when the annual data are expressed as annual totals, we divide a by 12 and b by 144.
Where the Data are given as monthly averages per year
In this case, Y values are on a monthly level. Therefore, a value remains unchanged in the conversion process. The b value in this case shows us the change on a monthly level, but from a month in one year to the corresponding month in the following year. Here, it is necessary only to convert b value to make it measure the change between consecutive month by dividing it with 12 only.
Merits
(i) This method has no place for subjectivity since it is a mathematical method of measuring trend,
(ii) This method gives the line of best fit because from this line the sum of the positive and negative deviations is zero and the total of the squares of these deviations is minimum.
Limitations
The best practicable use of mathematical trends is for describing movements in time series. It does not provide a clue to the causes of such movements. Therefore, forecasting on this basis may be quite risky.
Forecasting will be valid if there is a functional relationship between the variable under consideration and time for a particular trend. But if trend describes the past behaviour, it hardly throws light on the causes which may influence the future behaviour.
The other limitation is that if some items are added to the original data, a new equation has to be obtained.
Curvilinear Trend
Sometimes, the time series may not be represented by a straight line trend. Such trends are known as curvilinear trends. If the curvilinear trend is represented by a straight line or semi-log paper, or by polynomials of second or higher degree or by double logarithmic function, then the method of least squares is also applicable to such cases.
1365 videos|1312 docs|1010 tests
|
1. What is the Method of Least Squares? |
2. How does the Method of Least Squares work? |
3. What is the importance of the Method of Least Squares in business mathematics and statistics? |
4. Can the Method of Least Squares be used for non-linear relationships? |
5. What are the limitations of the Method of Least Squares? |
1365 videos|1312 docs|1010 tests
|
|
Explore Courses for SSC CGL exam
|