The document Method of Least Squares, Business Mathematics and Statistics B Com Notes | EduRev is a part of the B Com Course Business Mathematics and Statistics.

All you need of B Com at this link: B Com

**Method of Least Squares :** If a straight line is fitted to the data it will serve as a satisfactory trend, perhaps the most accurate method of fitting is that of least squares. This method is designed to accomplish two results.

(i) The sum of the vertical deviations from the straight line must equal zero.

(ii) The sum of the squares of all deviations must be less than the sum of the squares for any other conceivable straight line.

There will be many straight lines which can meet the first condition. Among all different lines, only one line will satisfy the second condition. It is because of this second condition that this method is known as the method of least squares. It may be mentioned that a line fitted to satisfy the second condition, will automatically satisfy the first condition.

The formula for a straight-line trend can most simply be expressed as

**Y _{c} = a + bX**

where X represents time variable, Yc is the dependent variable for which trend values are to be calculated and a and b are the constants of the straight tine to be found by the method of least squares.

Constant is the Y-intercept. This is the difference between the point of the origin (O) and the point of the trend line and Y-axis intersect. It shows the value of Y when X = 0, constant b indicates the slope which is the change in Y for each unit change in X.

Let us assume that we are given observations of Y for n number of years. If we wish to find the values of constants a and b in such a manner that the two conditions laid down above are satisfied by the fitted equation.

Mathematical reasoning suggests that, to obtain the values of constants a and b according to the Principle of Least Squares, we have to solve simultaneously the following two equations.

∑Y = na + b∑Y ...(i)

∑XY = a∑X + b∑X^{2} ...(ii)

Solution of the two normal equations yield the following values for the constants a and b :

b =

and a =

**Least Squares Long Method :** It makes use of the above mentioned two normal equations without attempting to shift the time variable to convenient mid-year. This method is illustrated by the following example.

**Illustration :** Fit a linear trend curve by the least-squares method to the following data :

**Year Production (Kg.)**

2001 3

2002 5

2003 6

2004 6

2005 8

2006 10

2007 11

2008 12

2009 13

2010 15

**Solution :** The first year 2001 is assumed to be 0, 2002 would become 1, 2003 would be 2 and so on. The various steps are outlined in the following table.

----------------------------------------------------

**Year Production Y X XY X ^{2} 1 2 3 4 5**

----------------------------------------------------

2001 3 0 0 0

2002 5 1 5 1

2003 6 2 12 4

2004 6 3 18 9

2005 8 4 32 16

2006 10 5 50 25

2007 11 6 66 36

2008 12 7 84 49

2009 13 8 104 64

2010 15 9 135 11

Total 89 45 506 285

-----------------------------------------------------

The above table yields the following values for various terms mentioned below :

n = 10, ∑X = 45, ∑X^{2} = 285, ∑Y = 89, and ∑XY = 506

Substituting these values in the two normal equations, we obtain

89 = 10a + 45b ...(i)

506 = 45a + 285b ...(ii)

Multiplying equation (i) by 9 and equation (ii) by 2, we obtain

80l = 90a + 405b ...(iii)

1012 = 90a + 570b ...(iv)

Subtracting equation (iii) from equation (iv), we obtain

211 = 165b or b = 211/165 = 1.28

Substituting the value of b in equation (i), we obtain

89 = 10a + 45 × 1.28

89 = 10a + 57.60

10a = 89 – 57.6

10a = 31.4

a = 31.4/10 = 3.14

Substituting these values of a and b in the linear equation, we obtain the following trend line

Y_{c} = 3. 14 + 1.28X

Inserting various values of X in this equation, we obtain the trend values as below :

**-----------------------------------------------------------------**

**Year Observed Y bxX Y _{c} (Col. 3 plus Col. 4) 1 2 3 4 5**

**-----------------------------------------------------------------**

2001 3 3.14 1.28 × 0 3.14

2002 5 3.14 1.28 × 1 4.42

2003 6 3.14 1.28 × 2 5.70

2004 6 3.14 1.28 × 3 6.98

2005 8 3.14 1.28 × 4 8.26

2006 10 3.14 1.28 × 5 9.54

2007 11 3.14 1.28 × 6 10.82

2008 12 3.14 1.28 × 7 12.10

2009 13 3.14 1.28 × 8 13.38

2010 15 3.14 1.28 × 9 14.66

**-------------------------------------------------------------------**

**Least Squares Method :** We can take any other year as the origin, and for that year X would be 0. Considerable saving of both time and effort is possible if the origin is taken in the middle of the whole time span covered by the entire series. The origin would than be located at the mean of the X values. Sum of the X values would then equal 0. The two normal equations would then be simplified to

∑Y = Na ...(i)

or a =

and ∑XY = b∑X^{2} or b = ...(ii)

Two cases of short cut method are given below. In the first case there are odd number of years while in the second case the number of observations are even.

**Illustration :** Fit a straight line trend on the following data :

Year 1996 1997 1998 1999 2000 2001 2002 2003 2004

Y 4 7 7 8 9 11 13 14 17

**Solution :** Since we have 9 observations, therefore, the origin is taken at 2000 for which X is assumed to be 0.

**------------------------------**

**Year Y X XY X ^{2}**

**------------------------------**

1996 4 – 4 – 16 16

1997 7 – 3 – 21 9

1998 7 – 2 – 14 4

1999 8 – 1 – 8 1

2000 9 0 0 0

2001 11 1 11 1

2002 13 2 26 4

2003 14 3 42 9

2004 17 4 68 16

**-----------------------------**

**Total 90 0 88 60**

**------------------------------**

Thus n = 9, SY = 90, SX = 0, SXY = 88, and SX2 = 60

Substituting these values in the two normal equations, we get

90 = 9a or a = 90/9 or a = 10

88 = 60 or b = 88/60 or b = 1.47

Trend equation is : Yc = 10 + 1.47 X

Inserting the various values of X, we obtain the trend values as below :

**Solution :** Here there are two mid-years viz; 2006 and 2007. The mid-point of the two years is assumed to be 0 and the time of six months is treated to be the unit. On this basis the calculations are as shown below:

**----------------------------------------------**

**Years Observed Y X XY X ^{2}**

**----------------------------------------------**

2003 6.7 – 7 – 46.9 49

2004 5.3 – 5 – 26.5 25

2005 4.3 – 3 – 12.9 9

2006 6.1 – 1 – 6.1 1

2007 5.6 1 5.6 1

2008 7.9 3 23.7 9

2009 5.8 5 29.0 25

2010 6.1 7 42.7 49

**---------------------------------------------- Total 47.8 0 8.6 168**

**----------------------------------------------**

From the above computations, we get the following values.

n = 8, ∑Y = 47.8, ∑X = 0, ∑XY = 8.6, ∑X^{2} = 168

Substituting these values in the two normal equations, we obtain

47.8 = 8a or a = 47.8/8 or a = 5.98 and 8.6 = 168 b or = 8.6/168 or b = 0.051

The equation for the trend line is :** Y _{c} = 5.98 + 0.051X**

Trend values generated by this equation are below :

**Second Degree Parabola**

The simplest example of the non-linear trend is the second degree parabola, the equation is written in the form :

Y_{c} = a + bX + cX^{2}

When numerical values for a, b and c have been derived, the trend value for any year may be

computed substituting in the equation the value of X for that year. The values of a, b and c can be determined

by solving the following three normal equations simultaneously:

(i) ∑Y = Na + bSX + c∑X2

(ii) ∑XY = a∑X + b∑X^{2} + c∑X^{3}

(iii) ∑X^{2}Y = a∑X^{2} + b∑X^{3} + c∑X^{4}

Note that the first equation is merely the summation of the given function, the second is the summation of X multiplied into the given function, and the third is the summation of X2 multiplied into the given function.

When time origin is taken between two middle years SX would be zero. In that case the equations are reduced to :

(i) ∑Y = Na + c∑X^{2}

(ii) ∑XY = b∑X^{2}

(iii) ∑X^{2}Y = a∑X^{2} + c∑X^{4}

The value of b can now directly be obtained from equation (ii) and value of a and c by solving equations (i) and (iii) simultaneously. Thus,

a = b = c =

**Illustration :** The price of a commodity during 2000 – 2005 is given below. Fit a parabola Y = a + bX + cX^{2} to this data. Estimate the price of the commodity for the year 2010 :

**Year Price Year Price**

2000 100 2003 140

2001 107 2004 181

2002 128 2005 192

Also plot the actual and trend values on graph.

**Solution :** To determine the value a, b and c, we solve the following normal equations:

∑ Y = Na + b∑X + c∑X^{2}

∑XY = a∑X + b∑X^{2} + c∑X^{3}

∑X2Y = a∑X^{2} + b∑X^{3} + c∑X^{4}

**-----------------------------------------------------------------------------------**

**Year Y X X ^{2} X^{3} X^{4} XY X^{2}Y Yc**

**-----------------------------------------------------------------------------------**

2000 100 – 2 4 – 8 16 – 200 400 97.744

2001 107 – 1 1 – 1 1 – 107 107 110.426

2002 128 0 0 0 0 0 0 126.680

2003 140 +1 1 +1 1 +140 140 146.506

2004 181 +2 4 +8 16 + 362 724 169.904

2005 192 +3 9 +27 81 +576 1728 196.874

**-------------------------------------------------------------------------------------- N = 6 ∑Y = 848 ∑X = 3 ∑X ^{2} = 19 ∑X^{3} = 27 ∑X^{4} = 115 ∑XY = 771 ∑X^{2}Y = 3099 ∑Y_{c} = 848.134**

**--------------------------------------------------------------------------------------**

848 = 6a + 3b + 19c ...(i)

771 = 3a +19b +27c ...(ii)

3,099 = 19a + 27b +115c ...(iii)

Solving Eqns. (i) and (ii), get

35b + 35c = 695 ...(iv)

Multiplying Eqn. (ii) by 19 and Eqn. (iii) by 3. Subtracting (iii) from (ii), we get

5352 = 280b + 168 c ...(v)

Solving Eqns. (iv) and (v), we get

c = 1.786

Substituting the value of c in Eqn. (iv), we get

b = 18.04 [35 b +(35 × 1.786) = 695]

Putting the value of b and c in Eqn. (i), we get

a = 126.68 [848 = 6a + (3 × 18.04) + (19 × 1.786))

Thus a = 126.68, b =18.04 and c = 1.786

Substituting the values in the equation

Yc = 126.68 + 18.04X + 1.786X^{2}

When X = – 2, Y = 126.68 + 18.04(–2) + 1.786(– 2)^{2}

= 126.68 – 36.08 + 7.144 = 97.744

When X = –1, Y = 126.68 + 18.04(–1) + 1.786(–1)^{2}

= 126.68 – 18.04 + 1.786 = 110.426

When X = 0, Y = 126.68

When X = l, Y = 126.68 + 18.04 + 1.786 = 146.506

When X = 2, Y = 126.68 + 18.04(2) + 1.786(2)^{2}

= 126.68 + 36.08 + 7.144 = 169.904

When X = 3, Y = 126.68 + 18.04(3) + 1.786(3)^{2}

= 126.68 + 54.12 + 16.074 = 196.874

Price for 2010, Y = 126.68 + 18.04(8) + 1.786(8)^{2}

When X = 8 = 126.68 + 144.32 + 114.304 = 385.304

Thus the likely price of the commodity for the year 2010 is Rs.385.304.

The graph of the actual trend values values is given below:

**Conversion of Annual Trend Equation to Monthly Trend Equation**

Fiting a trend line by least squares to monthly data may be excessively time consuming. It is more convenient to compute the trend equation from annual data and then convert this trend equation to a monthly trend equation.

There are two possible situations: (i) the Y units are annual totals, for example, the total number of passenger cars sold; (ii) the Y units are monthly averages, for example average monthly wholesale price Index.

**Where Data are Annual Totals**

A trend equation operative on an annual level is to be reduced to a monthly level. Constant value, a, is expressed in terms of annual Y values. To express it in terms of monthly values, we must divide it by 12. Similarly b is to be divided by 12 to convert the annual change to a monthly change. But this division shows us only the change for any month of two consecutive years, whereas we want change for two consecutive months. Therefore b is to be divided by 12 once again. Consequently, to convert annual trend equation to a monthly trend equation, when the annual data are expressed as annual totals, we divide a by 12 and b by 144.

**Where the Data are given as monthly averages per year**

In this case, Y values are on a monthly level. Therefore, a value remains unchanged in the conversion process. The b value in this case shows us the change on a monthly level, but from a month in one year to the corresponding month in the following year. Here, it is necessary only to convert b value to make it measure the change between consecutive month by dividing it with 12 only.

**Merits**

(i) This method has no place for subjectivity since it is a mathematical method of measuring trend,

(ii) This method gives the line of best fit because from this line the sum of the positive and negative deviations is zero and the total of the squares of these deviations is minimum.

**Limitations**

The best practicable use of mathematical trends is for describing movements in time series. It does not provide a clue to the causes of such movements. Therefore, forecasting on this basis may be quite risky.

Forecasting will be valid if there is a functional relationship between the variable under consideration and time for a particular trend. But if trend describes the past behaviour, it hardly throws light on the causes which may influence the future behaviour.

The other limitation is that if some items are added to the original data, a new equation has to be obtained.

**Curvilinear Trend**

Sometimes, the time series may not be represented by a straight line trend. Such trends are known as curvilinear trends. If the curvilinear trend is represented by a straight line or semi-log paper, or by polynomials of second or higher degree or by double logarithmic function, then the method of least squares is also applicable to such cases.

122 videos|142 docs

### Method of Simple Average, Business Mathematics and Statistics

- Video | 26:34 min
### Ratio-to-Trend Method, Business Mathematics and Statistics

- Video | 12:46 min