Scientists often wish to fit a smooth curve to experimental data. Given (n + 1) points, an obvious approach is to use the interpolating polynomial of degree n, but when n is large, this is usually unsatisfactory. Better results are obtained by piecewise use of polynomials, i.e., by fitting lower degree polynomials through subsets of the data points. The use of spline functions, which, as a rule, provide a particularly smooth fit, has become widespread (Step 28 ).
A rather different, but often quite suitable approach is a least square fit, in which, instead of trying to fit points exactly, a polynomial of low degree (often linear or quadratic) is obtained which fits the points closely (after all, the points themselves may not, in general, be exact, but subject to experimental error).
Suppose we are studying experimentally the relationship between two variables x and y - for example, quantities x of drug injected and observed responses y, reorded in a laboratory experiment. By carrying out the appropriate experiment, say, six times, we obtain six pairs of values (xj, yj), which can be plotted on a diagram such as Figure 11(a).

Fig. 12 Fitting a straight line and a parabola
We may believe that the relationship between the variables can be described satisfactorily by a function y = f (x), but that the y-values, obtained experimentally, are subject to errors (or noise). Therefore one arrives at the mathematical model:
![]()
with n data, where f (xi
) are the values of y, corresponding to the value of xi,
used in the experiment, and ei is the experimental error involved
in the measurement of the variable y at the point. Thus,
the error in y at the observed point is ![]()
In the problem of curve fitting, we use the information of the sample data points to determine a suitable curve (i.e., find a suitable function f ) so that the equation y = f (x) gives a description of the (x, y) relationship, in other words, it is hoped that predictions made by means of this equation will not be too much in error.
How does on choose the function f ? There is an unlimited range of functions available. Figure 11(b) shows four possibilities. The polygon A passes through all six points; intuitively, however, we would prefer to fit a straight line B, or an exponential curve such as C. The curve D is clearly not a good candidate for our model.
Let us, first of all, answer the question
regarding the choice of function. Given a set of values (x1,
y1), (x2, y2),.
. , (xn, yn);
we shall pick a function which we can specify completely except
for·the values of a set of k parameters c1,
c2, .. , ck; we shall
denote this function by
. We then choose values for the
parameters which will make the errors at the observation points (xi,
yi) as small as possible. Next, we shall
suggest three ways by which the phrase as small as possible can be given specific meaning.
Examples of functions to use are:
These examples) may be termed general, linear forms:
In 1., the set of functions is
; in
2.,
with
a constant chosen to coincide with a periodicity in the data,
while in 3., the set is
Other functions commonly used in curve
fitting are exponential functions, Bessel
functions, Legendre polynomials, and Chebyshev polynomials (cf., for example, Burden and Faires (1993)).
We now present criteria which make precise the
concept of choosing a function to make measurement errors as
small as possible. We suppose that the curve to be fitted can be
expressed in a general linear form, with a known set of functions ![]()
The errors
at the n data
points are:
If the number of data points is less than or
equal to the number of parameters, i.e.,
, it is possible to find values for {c1,
c2,. .. ., ck) which make
all the errors ei zero. If n is an infinite
number of solutions for {ci} which
make al1 the errors zero, then an infinite number of curves of
the given form pass through all the experimental points; in this
case, the problem is not fully determined, i.e., more information
is needed to choose an appropriate curve.
If n > k, which, in practice, is mostly the case, then it is not normally possible to make all the errors zero by a choice of the {ci}. There are three possible choices:
In general, Procedures 1 and 2 are not readily applied. Procedure 3 leads to a linear system of equations for the set {cI}, referred to as the Principle of least squares; it is used almost exclusively.
In order to apply the principle of least squares, use has to be made of partial differentiation, a calculus technique which may not be known to some readers of this text. For that reason, a general description of the method will not be given until the Step 27(which is optional), but we describe it here and give examples, in order to show how it is used.
The sum of squared errors to be minimized is
![]()
The n values of (xi,
yi) are the known measurements taken
from n experiments. When they are inserted on the
right-hand side, S becomes an expression involving only
the k unknowns c1, c2,
. . , ck. In other words, S may be
regarded as a function of the ci, i.e.,
.
The problem is now to choose that set of values {ci}
which makes S a minimum.
A theorem in calculus tells us that, under certain conditions which are usually satisfied in practice, the minimum of S occurs when all the partial derivatives
![]()
vanish. The partial derivative
coincides here with the differential coefficient
,
while all the other ci are held
constant; for instance, if S = 3cl + 5c2,
then
![]()
Thus, we have to solve the system of k equations:

This system is a set of equations which is linear in the variables cl, c2, . . , ck and is referred to as the normal equations for the least squares approximation. One of the numerical methods presented in Step 11, Step 13, or Step 15 may be used to obtain the required set {cI} which minimizes S. . However, we note that the normal equations may be ill-conditioned, when it is preferable to invoke QR factorization as outlined in the next (optional) Step 27, or to employ orthogonal basis functions (cf. for example, Conte and de Boor ( 1980).
The following points were obtained in an experiment:
.
We shall plot the points on a diagram and use the method of least squares to fit through them
a) a straight line, and b) a parabola.
The plotted points are shown in Figure 12(a). In order to fit a straight line, we have to find a function y = cl + c2x, i.e., a first degree polynomial which minimizes
![]()
Differentiating first with respect to cl (keeping c2 constant) and then with respect to c2 (keeping cl constant), and setting the results equal to zero, yields the normal equations:

We may divide both equations by -2, take the summation operations through the brackets, and rearrange, in order to obtain:

We see that, in order to obtain a solution, we
have to evaluate the four sums
and insert them into
these equations. We can arrange the work in a table as follows
(the last three columns are devoted to fitting of the parabola
and the required sums are in the last row):

The corresponding normal equations for fitting a straight line are:
![]()
The solutions to 2D are c1 = 2.13 and c2 = 0.20, whence the required line is (Figure 12(b)):
![]()
In order to fit a parabola, we must find the second degree polynomial
![]()
which minimizes
.
Taking partial derivatives and proceeding as above we obtain the normal equations:

Inserting the values for the sums (see the table above), we obtain the system of linear equations:
.
The solution to 3D is c1 = -1.200, c2 = 2.700, and c3 = -0.357. The required parabola is therefore (retaining 2D):
.
it is also plotted in Figure 13(b). Obviously, the parabola is a better fit than the straight line!

a) Toughness x and percentage of nickel y in eight specimens of alloy steel.
![]()
b) Aptitude test marks x, given to six trainee sales people, and their first-year sales y in thousands of dollars.
![]()
For both sets of data, plot the points and draw the least squares line. Use the lines to predict the % - nickel of a specimen of steel the toughness of which is 38, and the likely first-year sales of a trainee sales person who obtains a mark of 48 in the aptitude test.

Deduce the matrix form of the normal equations for fitting a fourth-degree polynomial.