2. The Method of Least Squares (1/2)

The method of fitting a function to datasets containing data for two or more variables and using the criteria of minimising the sums of squares is called Least Squares. If we are going to fit a line of the form b1·X + b0 = Y + ε to the data, then we get one equation for each matched pair of observations. If we have N observed data pairs then we get N observation equations:

b 1 · x 1 + b 0 = y 1 +ε b 1 · x 2 + b 0 = y 2 +ε . . . b 1 · x N + b 0 = y N +ε

Which we want to solve for b0 and b1, and in which (X1, Y1), (X2, Y2), …, (XN, YN) are the observed pairs of data values. It is usual that no pairs of data points will exactly fall on the best fitted line to the data, but where all of them have residuals. We will find the best fit by minimising the sums of squares of the residuals.

From the above equations, each residual is of the form

ε i = y i b 1 · x i b 0

So that the square of a residual is of the form

ε i 2 = ( y i b i · x i b 0 ) 2 = y i 2 2 b 1 b 0 y i 2 b 0 y i + b 1 2 x i 2 +2 b 0 b 1 x 1 + b 0 2

If you partially differentiate this with respect to the two unknowns (see for partial derivatives also the Supplement 1 of the SEOS tutorial Time Series Analysis), then you get

δ ε 2 δ b 1 =2 x i y i +2 b 1 x i 2 +2 b 0 x i

and

δ ε 2 δ b 0 =2 y i +2 b 1 x i +2 b 0

When these partial differentials are zero, then the gradient of the equations is zero and so the sums of squares are minimised. Each of these differentials gives us one equation. There are thus two equations to solve for the two unknowns. In creating these equations, remove the -2 from each differential and sum across the N equations to give

xy b 1 x 2 b 0 x =0 y b 1 x N b 0 =0

or

b 1 x 2 + b 0 x = xy b 1 x +N b 0 = y