Chapter V
5.0 Introduction. Another dialogue:
Student. When I started reading Algebra, you advised me to study carefully the systems of linear equations; so I did. Moreover, I read general algebra and continued fractions. At first, it was hard work, but later on, I was quite successful.
Teacher: All right, but I don't think that you have met me for this. You are looking rather despondent.
Student: Indeed. I am again in the wilderness.
Teacher: Why? Is there some difficulty in the book, which you want me to explain?
Student: It is not for that, but the whole subject became again problematical for me.
Teacher: I wonder how.
Student: Yesterday an engineer asked me to solve a certain algebraic equation. I replied that by the fundamental theorem of general algebra, there exist roots in a suitable extension of the field of the coefficients and that for the fundamental theorem of classical algebra this extension can be chosen as the field of complex numbers. Thus there exist complex roots of the equation and some of them might be real.
Teacher: Was the engineer satisfied by your reply?
Student: Not at all! He said that I seemed to be a great philosopher and that I had missed the point completely. He was interested in real roots only and he had no doubt about their existence. He has found out that the force (expressed in kilograms), acting on a certain part of an engine, was bound to satisfy that equation. He was asking me to compute that force and nothing else.
Teacher: And you could not; the polynomial was too complicated.
Student:. It looked very simple. Something like x5 + 4x3 +2x + 6. By Eisenstein's theorem, it is irreducible, whence its real roots must be irrational; this I told the engineer.
Teacher: Perhaps, the good man did not know anything about irrationality.
Student:. He did! But he was not at all interested in my statement. He said: "I don't want to have an infinity of decimals, even if you can provide me with them; compute the kilograms; I leave the grams, etc., to you !" Now, for any positive x, the polynomial takes positive values only. So I told him that the real roots of the polynomial must be negative.
Teacher: And was this statement of any use to the engineer ?
Student:. No, he knew already that the force was directed to the negative side, and then he said: "The direction of the force is not very interesting to me, as there is a little difference whether the material is exposed to stress or to pressure; if you give me a solution with an error of 30% and a wrong sign, I could make some use of it, but your philosophical talk is worth nothing.
Teacher:. And you?
Student: I am bewildered. After having read about 200 pages of the book, I am still not able to solve a very simple algebraic problem), even not if an error of 30% and a wrong sign are admitted! Though I got very interested in algebra, the engineer's argument has impressed me; I am afraid that all my hard work has been spent in vain.
Teacher: I rather think you have stopped reading at the wrong stage. If you continue, you will be able to provide your friend with a solution which has a considerably smaller error than 30%.
Student: I had already a glance at the next chapter, but I do not see any connection between its content and the preceding parts of the book, for example, with general algebra, and then, there is another thing which strikes me: Every solution is given only approximately. I should like to know the solutions exactly. If for a particular application a few decimals only are requested, then I may neglect the higher terms of the correct result, but as a student of Pure Mathematics, I must know at first the proper solution before admitting some error for the sake of abbreviation.
Teacher:. How do you want to represent the solution, if it happens to be an irrational number?
Student: There are many ways of expressing irrational numbers. For instance, Ö2 is irrational; it cannot be expressed as a ratio of two integers, but nevertheless, Ö 2 is a number. Everybody knows what is Ö2.
Teacher: Suppose that I do not know it and try to explain!
Student: Ö2 is the positive number the square of which equals 2.
Teacher:. Well, I take it for granted that one and only one such positive number exists. Let x be positive and x² = 2, then x = Ö2, or Ö2 is the positive root of x² - 2 = 0. I think this statement is completely equivalent to yours.
Student: It is.
Teacher: You seem to be satisfied with this manner of expressing irrational numbers.
Student: Of course, I am. If I could express the roots of every algebraic equation in a similar way, then there would be nothing to complain about.
Teacher: My point is that in this case the roots are expressed by a tautology.
Student: I cannot follow you.
Teacher: Listen to me! What are the roots of the polynomial x² - 2 ?
Student: Ö2 and -Ö2.
Teacher: Whereby Ö2 is nothing else than a symbol for the positive root of x²-2. Besides the statement that x² - 2 has two real roots and that their sum is equal to zero, your solution of the problem to find the roots of x² - 2 is a mere tautology. Your conclusion goes like this: "Who is Amal ?" "The brother of Bimal". - "And who is Bimal ?". "Amal's brother." That means only that there exist two brothers: Amal and Bimal, but it does not explain who is Amal.
Student: But Ö2 is a well known number; mathematicians have got used to it and they calculate with Ö2 as they do with 23 or 1:7. For me, there is no problem about Ö2.
Teacher: Is it for the symbol Ö that you hold this opinion?
Student: Ö as a symbol is a mere convention; the mathematicians could use any other notation instead of it, but I do not see any reason why symbols familiar to everybody should be replaced by new ones.
Teacher: I fully agree with you, but for the sake of our conversation, let us denote the real roots of a polynomial f(x), as far as they exist, by [f(x)]1, [f(x)]2, ····,[f(x)]n in their order of magnitude, starting with the largest root. Then your explanation Ö2 means simply that Ö2 is equal to [x² - 2].
Student: Now you want me to admit that for every f(x) which has a real root, the symbol [f(x)] must be considered as a solution of the equation f(x) = 0. You propose that there is no higher justification in considering Ö2 as a given number than, for example, [x5 + 4x3 + 2 x +6]. But there is a huge difference between these two cases.
Teacher:. How is that?
Student: We know more about Ö2 than that it is positive and that its square is equal to 2.
Teacher:. What do you know about Ö2?
Student:. Ö2 = 1.4142 ···!
Teacher: 1.4142 is a rational number, whereas Ö2 is irrational.
Student:. Certainly, it is an infinite decimal fraction, but they have computed 200 decimals or even more of them. You cannot deny that Ö2 is very well known.
Teacher: There still remains a certain error!
Student:. But a negligible one!
Teacher:. That depends on the purpose of the calculation. I have been told that certain students of Pure Mathematics must know the proper solution before admitting some error for the sake of abbreviation. Is it not so?
Student: But Ö2 is uniquely determined as the only positive root of x² - 2.
Teacher: Yes, there exists one and only one such root. This is a statement on existence, and on uniqueness, but nothing more than that. I think that we have agreed already about this aspect. On the other hand, I admit that we know more than that about Ö2. For instance, Ö2 is approximately equal to 1.4142, or to put it more clearly: Ö2 lies between 1.4142 and 1.4143. One can discover easily smaller intervals in which Ö2 is located; there is no limit to the improvement of the approximation and the reduction of the error. This "error" is not a kind of "mistake" which is the result of negligent treatment; on the contrary, it is an essential part of the solution of the problem. One cannot determine irrational numbers otherwise than approximately. This fact is concealed by some symbols we may be using. Numbers represented by them are uniquely determined in the sense that there exists one and only one such number, but one cannot determine the place of an irrational number on the real axis otherwise than approximately.
Student: Thus a formula like
is only a
"recipe" how to determine an irrational number
approximately.
Teacher:. The formula denotes the greatest root of x6 - 28x3 + 26) and it shows, of course, a recipe how to compute that number approximately. A mental calculation furnishes 3 as a first approximation which could satisfy your friend completely.
Student:. Suppose that one could represent every root of a polynomial with the aid of similar symbols; this would furnish recipes to determine every root approximately.
Teacher:. As a matter of fact, not every root is representable in that manner, and even if it is, one prefers sometimes a different method.
Student:. My impression is that those methods have no connection with general algebra. Apparently, the Theory of Approximation and General Algebra belong to different branches of Mathematics if not to two different Sciences.
Teacher: They are complementary to each other. You have already mentioned the two fundamental theorems which state the existence of roots in certain fields. They must be supplemented by investigations regarding "where" the roots are situated in the field. These methods of investigation must tally with the structure of the particular field under consideration; they cannot be of a general nature. The real numbers are ordered linearly, whereas the complex numbers correspond to the points of a plane. Hence one subdivides the real axis into intervals to determine real numbers, and similarly the plane into certain domains (e.g., rectangular or circular ones) to locate complex numbers. Both ways lead to an approximate determination of numbers, e.g., of roots of a polynomial.
Student: As the n roots of the polynomial xn + an-1xn-1 + ··· + a0 are uniquely determined by the numbers a0, ··· , an-1, there must exist functions f(a0,···,an-1) which show the distribution of the roots in the complex plane. One should investigate these functions; this would be a worthwhile continuation of General Algebra.
Teacher: If you take "function" in the most general sense, there exist indeed such functions f(a0, ··· , an-1), e.g., the set of the roots itself is one. The problem is how to represent those functions; you should not expect that all of them are polynomials in a0, ··· , an-1. Every polynomial xn + an-1xn-1 + ··· + a0 can be represented by a point P = (a0, ··· , an-1) of an n-dimensional space. There are theorems which state that, if certain inequalities in a0, ··· , an-1 hold, i.e., if P is situated in a particular domain of the n-dimensional space, the n roots are distributed in the complex plane in a particular manner. These investigations are very interesting, but at the present time the approximation of the roots is based more on the methods of calculation than on these theorems. In many cases, the theorems seem to be the result of computational practice. For this reason, the author has started from Horner's scheme which gives the clue to the whole theory. I advise you to work out many numerical examples; it will help you understand the theoretical part.
5.1 Horner's scheme: The theory of approximation of the roots of a polynomial f(x) with real coefficients is based on the fact that f(x) represents a continuous function, if x is considered to be a real variable. As a consequence, the function f(x) takes all the values between f(a) and f(b) in the interval (a, b). In particular, if f(a) and f(b) have opposite signs, there must exist a root of f(a) in this interval. This conclusion has an important role in the approximation of roots, but. a theory of approximation based on it alone would involve an enormous amount of numerical calculations. A second important item is that
![]()
is the Taylor expansion of the function represented in the neighbourhood of x=0. Hence
![]()
Thus one knows from (1) at first sight much more about the behaviour of the function near x = 0 than in the neighbourhood of any other value. It is therefore a matter of the greatest importance to have a scheme for a quick calculation of the Taylor expansion at any point x = q, say
![]()
The method which is used for this purpose is called Horner's scheme; although it is very elementary, it discloses much about the function represented by f(x). The coefficients a'0, a"1, ··· , a(n+1)n are continuous functions of q. For instance, a'0 = f(q), and a(n+1)n = an are constant numbers. It is characteristic for applications of Horner's scheme that not a'0 alone, but the full set of the coefficients and their interconnections are being considered. Although Horner's scheme is mostly applied to real numbers, it can also be used when the coefficients belong to any field (or even to an arbitrary commutative ring) K.
5.11 Expansion of f(x) as a polynomial in x - q: Let K be an arbitrary field, e.g., the field of real or complex numbers, q an element of K and f(x) a polynomial of K[z]:
![]()
where
![]()
whence

Arrange the computation of the coefficients a' as follows:

By the same method,
![]()
is determined satisfying
![]()
After n - 1 steps, f(x) is represented as a polynomial in x - q
![]()
The appearance of Horner's complete scheme leading to the Taylor expansion at x = q is demonstrated by the example

The lines of this scheme correspond to the consecutive steps. Their significance is:

5.12 Approximate computation of roots by Horner's scheme: Horner's scheme is very useful for the calculation of roots. The method will be explained with the aid of the example above. Set x = y + 1, then

Thus, f(x) has decreased from 67 at x = 0 to 2 at x = 1, and it is still decreasing, as is seen from the value of the derivative, whence one may expect that there is a root of f(x) near x = 1. In the neighbourhood of x = 1, the function can be represented approximately by the two lowest terms. Applying the notation ~ for approximations, one finds f(x)~ -24y + 2, whence f(x)=0 for y= - 0.1 and it will be advantageous to represent g(y) by a polynomial in y-0.1=x-1.1:

Since f(1.1) = - 0.1209 < 0, there is a root between 1 and 1.1, whence one approximates f(x) by - 18.5126 (x - 1.1) - 0.1200, so that x - 1.1 ~ -0.007. Now, apply Horner's scheme again:

Hence the root is approximately equal to 1.093. The next approximation is q=0.000,53. By continuing in exactly the same manner, the calculations would become very burdensome. The number of decimals to be considered increases at every step. On the other hand, the influence of the higher terms decreases with q, whence one can omit those digits which do not affect the terms required in the final result. It is convenient to state at the very beginning of the calculation, the magnitude of the admissible error. In the present problem, one obtains an approximation of the root, correct up to seven decimal digits by considering
18.888,199,957,2q = 0.010,047,878,201 + 25.982,894q2 - 10.628q3 + q4.
Since q ~ 5.3·10-4, the last two terms will only influence the 9-th and following decimals on the right hand side for 5.3·10-4 < q < 5.4·10-4, the quadratic term be comes 0.000,007, whence
q = 0.000,532,4, x - 1.093,532,4.
In this manner, the approximation of the root can be gradually improved.
5.13 A modification of Horner's scheme: In order to obtain a first approximation of the roots, it is often important to get a quick and simple review of the values of the function for different values of x. For this purpose, the following modification of Homer's scheme is sometimes helpful:
Given
![]()
compute by Horner's scheme

In order to explain the method by an example, expand the polynomial under consideration in the preceding example in the following manner:

This representation shows that for x < l, f(x) >0, since each of the terms is >0, and that for x > 9, f(x) < 0, the first, the fourth and the sum of the two other terms being >9. Thus, all the roots are situated in the interval (1, 9). We have already computed one root in the interval (1, 2); moreover, there is at least one root in the interval (3, 9). f(8) = -117, whence there is a root in the interval (8,9). The reader should calculate it by Homer's scheme as an exercise.
5.14 Lagrange's method:
Next, a different method for the computation of roots will be
explained and applied to the example above. If
then 1/x
satisfies the condition
and there corresponds to every root x
in the interval (0, 1) a value of 1/x >l. These
considerations lead to the method of approximation due to
Lagrange:
If x is a root of g(x), a < x < a + 1 = a + 1/x, g(x) = f(a - x) = g1(1/(a - x); h1>1 is root of g1 and b £ h1 < b + 1, h1 = b + 1/h2. Repetition of this procedure yields a representation of x in the form of a continued fraction. Repetition of these calculations yields after n steps the approximation Pn/Qn with the error

This method will be illustrated by means of the
example above. It is known that
has a root in the interval (8, 9),
whence we represent this polynomial by Horner's method as a
polynomial in x 8:

Hence
![]()
Mental arithmetic shows that for q = 2 the last coefficient is positive, while it must be negative for q = l , whence h lies in the interval (1,2), and one has to arrange for the Horner expansion for q = 1:

One proves in the same manner, as it has been done for h1, that h2 is located in the interval (1, 2):

h3 lies in the interval (2,3):

The reader may by now have got the experience that f must be chosen in such a manner that the sign of the last coefficient does not change, but that the procedure adopted for q + 1 would alter this sign; q = 4 will alter the sign of the second coefficient in the second main row of Horner's scheme, but the third coefficient will not change its sign, 6601 being too big. Hence the following coefficients will increase and therefore will not be negative. However, for q = 5, - 6601 is counterbalanced by more than 3000 and therefore 2761 by more than 12000, whence the sign of the last coefficient would be altered. Hence q=4:

At the next step, one obtains q = 1, whence x = (8. 1. 1, 2, 4, 1, ···):

Hence x ~ 232/27, the error 232/27 - x is only correct to the second decimal; the third decimal may be 2 or 1. It appears from this example that Lagrange's method is sometimes slower than the method of 5.1.2. One learns best by practice how to discover the most convenient combination of methods in any particular case.
5.15 Kakeya's Theorem: In 5.1.1 to 5.1.4, Horner's scheme has been used for calculating the real roots of equations with real coefficients. But the scheme can be appliedas has been stated at the beginning of this section - to arbitrary fields. It will be used now to obtain a theorem on complex numbers. Let b0, b1 + b0, ··· , bn + ··· + b0) be the coefficients of a polynomial. Set g=a; then the first line of Horner's scheme is:

Hence, if a is a root,
Let bk be positive
numbers and a complex, then
whence in this case (1) cannot be
satisfied. The absolute value of a root therefore cannot be
smaller than 1. If |a|=1, then
![]()
when the numbers bk are positive, unless q = 2kp, but a = 1 cannot be a root, since no positive number satisfies an equation when all the coefficients are positive, whence |a| > 1. If therefore a0yn + a1yn-1 + ··· + an,
![]()
then for every root a of this polynomial |a| > 1. Hence if y = 1/x, the roots b of Sak xk satisfy |b | < 1. This is
Kakeya's Theorem: The complex roots of Sak xk have all absolute values <1, if the coefficients satisfy (2).
5.2 Roots of real polynomials: In this section
a, b, c, d, e (1)
- with or without indices and dashes - denote real numbers, while
a, b, g, e (2)
denote complex numbers, and
denotes the conjugate of a.
Hence a +
is real; a ·
is non-negative;
a -
= ci.
![]()
can be represented by
(4)
5.2.1 Real and complex roots:
Theorem: If a is a root of a polynomial f(x) with
real coefficients,
is also a root.
First Proof:. Let K be the field of real numbers, i and -i
the roots of x² + 1, then K(i) = K(-i)
is the field of the complex numbers and there is an automorphism J
of this field interchanging i with -i and
leaving the real numbers unaltered. f(x) will
not he altered by J, whence a will be transformed into a root of f(x),
but as a will be transformed into
, the theorem is true.
Second Proof: If a is real, a =
. If a is not real, (x a) (x
) =
g(x) is a real polynomial and irreducible in
the field of real numbers. As f(x) and g(x)
have a common root, these polynomials have a common factor of
positive degree. Hence f(x) is divisible by g(x)
and
is therefore a root of f(x).
Corollary 1:
![]()
where n = r + 2k.
Corollary 2: If every root of f(x) is counted as many times as its order of multiplicity in (1), the number of real roots is º n (mod 2).
Corollary 3: If n is odd, there exists at least one real root.
![]()
is called the discriminant of f(x). As F is a symmetric polynomial in a1, ··· , an with integral coefficients, it follows (cf. 3.5.2) that
![]()
where g is a polynomial with integral coefficients. It follows from (2) that
![]()
Let all the n roots aj be different. As has been proved in 3.3.5,

In order to find the conjugate of d, we have to interchange every number in (5) with its conjugate. It follows from (1) that this operation implies k interchanges of rows in the determinant (5). Now the product of two conjugate numbers is their norm N, which is a non-negative real number (cf. 3.3.2). Thus
![]()
Hence one has the
Theorem: Let f(x) of degree n have n different roots. Then the discriminant of f(x) is positive (negative) when the number of pairs of conjugate non-real roots is even (odd).
Corollary: A real polynomial of degree 3 has three real roots if and only if the discriminant is positive.
A real polynomial of degree 4 with positive discriminant has either four different real roots or two pairs of conjugate complex roots.
Exercise: Prove the preceding theorem without the help of (4).
5.2.2 Changes of sign: At the beginning of 5.1, it has already been mentioned that, if a < b and the signs of f(a) and f(b) differ, then there is a root of f(x) in the interval (a,b). This statement, which is fundamental for the calculation of real roots, must be complemented by two other statements (well known from the elements of Analysis) which show the link between the roots of a real function and those of its derivative.
1. If a < b and f(a) = f(b), then there exists a root of f '(x) in the interval (a, b). 2. If
![]()
then
![]()
where g1(a) ¹ 0, since g1(x) = kg(x) + (x - a)g'(x)
Hence, if the roots of f(x) take m different real values a1, ··· , am, there exists at least one root of f '(x) in each of the m 1 different intervals (ai,ai+1). If ak is a multiple root with the multiplicity q + 1, it can be considered to be a set of q degenerate intervals, each of them containing exactly one root of f '(x). f(x) has the same sign at every point of an interval; in two consecutive intervals (ai-1,ai)and (ai,ai+1), the sign of f(x) differs, when ai is a simple root or a multiple root of an odd order; the sign is not different if ai is a multiple root of even order. Thus, if ak is a multiple root of order q + 1 of f(x), it is a multiple root of order q of f '(x).
These properties hold for every analytic function with a finite number of roots and are not special properties of polynomials. If the coefficients of f(x) are all positive (negative), f(x) is obviously positive (negative) for every positive value of x. Hence f(x) has no positive roots when there is no change of sign in the sequence of the coefficients of f(x). Thus, we are led to study the connection between the existence of roots and the signs of the coefficients. The experience with Horner's scheme will be very helpful.
Expand f(x) as a polynomial in x - b
![]()
Given f(x) and any real number b, then a set of n + I real numbers ab,n,ab,n-1,···,ab,0 is uniquely determined; of these, ab,n = an independently of b and ab,0 = f(b), whence ab,0 = 0, if and only if b is a real root of f(x) and for 0 < k < n
![]()
Consider now the changes of sign in the sequence
ab,n, ab,n-1, ··· , ab,0 (3)
To determine uniquely the number of these changes, it is necessary to give some sign to those coefficients which are equal to zero. Without loss of generality, assume that an ¹ 0; if any coefficient or any set of consecutive coefficients which are equal to zero, follow in (3) immediately after a positive (negative) coefficient, they will be considered to be positive (negative) themselves, By this rule, there is allotted to every element of (3) a uniquely determined sign. For example, if
![]()
and b = 0, the sequence (3) is
![]()
and the signs are
![]()
The number of changes of sign in (3) is not altered when from any set of consecutive equal signs the first only is considered and the other ones are struck out; hence one gets the same number of changes if the zero-coefficients are simply struck out, but it is convenient to allot a uniquely determined sign to every element of the sequence (3).
Given f(x), the number of changes will be considered now as a function C(b) of b. Then
![]()
If the first and the last element of the sequence (3) have the same sign, C(b) is an even number; if they have different signs, C is an odd number. Hence C(b) is even when an f(b) > 0, and odd when an f(b) <0.
5.2.21 Alterations of the first and second kind: In order to study C(b) as a function of b, one need not consider the sequence of the coefficients ab,m (for m = n, ··· , 1, 0) themselves; it is sufficient to examine the sequence
![]()
of the signs of these n + 1 coefficients. A few general remarks on the alterations of any sequence (1) generated by transforming signs + into - or conversely will be helpful; one can restrict the consideration to such alterations where the first sign remains unchanged, since the first coefficient of the polynomial is constant for all values of b.
Every alteration of the sequence (1) which does not change the first sign can be generated with the aid of at most n alterations done in sequence and each changing one sign only. An alteration which changes the second, third, ···, (n + 1)-th sign either causes that sign to be equal to the preceding one - when it will be called an alteration of the first kind - or it will cause it to be different from the preceding sign - when it will be called an alteration of the second kind.
Lemma: An alteration of the first kind either diminishes the number of changes in (1) by one or two or it leaves them unaltered; the diminution by one occurs if and only if the last sign of (1) is altered.
Proof: If the last sign of (1) is altered by an alteration of the first kind, then the last two signs were different before the alteration and are made equal by it; hence one change is lost, whereas the other changes are preserved, and no new change is created. If the sign altered is not the last one, then it is the middle of a triplet of signs in which the two first signs differ. One has therefore to consider the following four cases (the arrows denoting the alteration)

In the first two cases, the number of changes is unaltered, in the third and fourth case, two changes are lost, whence follows the lemma.
Corollary 1: If an alteration A is composed only of alterations of the first kind, the number of changes either decreases or it remains unaltered; in particular. if the last sign is not altered by A, the number of the lost changes is an even non-negative number.
Proof: As a change cannot be gained by an alteration of the first kind, no change can be gained by A. Suppose now that the last sign is unaltered by A. Let there be among A's alterations of first kind s alterations losing one change and t alterations losing two changes. As those s alterations are the only ones, where the last sign is altered, s is even, say, s = 2k ; the number of changes lost by A is equal to 2k + 2t.
Corollary 2: If A is composed of alterations of the first kind and the r first signs of (1) are equal, then they remain equal when A is performed. Conversely, if A' is composed of alterations of the second kind and the r first signs are alternately + and -, then they remain so when A' is performed.
Proof: The statements hold obviously for alterations of the first (second) kind and therefore for the alterations A (alterationsA').
Exercises:
1. Show that an
alteration composed of alterations of the first kind only can
also be generated by composition of alterations of the first and
of the second kind.
2. Prove that an alteration, composed of one or
more alterations of the first kind, cannot be generated by
composition of alterations of the second kind only (and
conversely).
5.2.22 Monotony of C(b): The results of 5.2.21 will now be applied to investigate C(b) when b runs over the real axis. Consider two different values of b, say b1 and b2 = b1 + q, where q > 0. Let F(x) be any polynomial of degree n in the real variable x with real coefficients. Let

The coefficients of f(x - q) are obtained by Horner's scheme with the steps

The alteration leading from the first row of (3) to the second row can be performed by n consecutive steps, replacing successively

Since q > 0, the sign of a'n-1 can be different from the sign of an-1 only when it is equal to the sign of an. Hence the first step of (4) either does not alter the sequence of the signs or it generates an alteration of the first kind. It is similar for the other steps; thus, if the sequence of the signs in the second row of (3) is different from the sequence of the signs in the first row, the alteration is composed of alterations of the first kind. The same statement applies to the alterations leading from any row of (3) to the following one, whence an alteration leading from the coefficients of f(x) to those of f (xq) is composed of alterations of the first kind; conversely, the alteration from f to f is composed of alterations of the second kind. Indeed, if one interchanges f to f and replaces q by -q, then the first row of (3) is transformed into the second row by the equations

Hence, in this case, if the sign of a'n-1 differs from that of an-1, it must be opposite to the sign of an. The. by the corollaries of 5.2.21, one has:
1. The function C(b)
decreases as b increases.
2. When at any step in a Horner scheme for a positive q,
the first r coefficients have the same sign, then the
same applies to the final result of the scheme and the expansions
corresponding to all higher values of b.
3. When at any step in a Horner scheme for
negative q the first r coefficients have
alternating signs, then the same applies to the final result of
the scheme and the expansions corresponding to all lower values
of b.
Assume now that an, ··· , an-r+1 have the same sign, say, positive ones (without loss of generality) and q > 0, then it follow from (4) that

for a suitable q, whence a'n-r > 0. Hence one can find a value of b for which the first r +1 signs are equal, and by repetition of the procedure reach a value of b such that all the signs are equal for this value of b and for all higher values. For all these values of b, C(b) = 0. Now let an, ··· , an-r have alternating signs and -q < 0, then, by (4'),
![]()
and the signs alternate. Now, let an-r+1 > 0, then
![]()
is negative for a suitably chosen value of q. Similarly, if an-r+1 < 0, then a'n-r can be made positive. Thus, one can find a negative value -q such that the first r+1 coefficients are alternating, and corresponding to the case of equal signs and positive q, one finds that there exists a value of b for which the signs are alternating for that value of b and all smaller values. For all these values, C(b)=n, whence follows the
Theorem: C(b) decreases steadily from n to 0 as b moves along the real axis.
5.2.23 Budan-Fourier Theorem: By the last theorem, the function C(b) has been proved to be steadily decreasing with the real variable b. As it takes integral values only, it is constant by segments; it changes its value at points of discontinuity only, where it decreases, the step being an integral number. An alteration of C(b) occurs where a coefficient of the polynomial changes its sign, and as these coefficients are continuous functions of b, a coefficient must take the value zero as it changes its sign. If there occurs an odd loss of change, the last sign is altered (cf. 5.2.2.1 Lemma), i.e., the variable b passes through a root of the polynomial.
The discontinuity of C(b) at a root of F(x) will now be investigated. Let b be a root of F(x) and m be its multiplicity, where m may be any positive integral number. Again, let x - b = x and F(x) = f(x). Then
![]()
The coefficients f(x) are therefore
![]()
Apply Horner's scheme for a negative value -q; the second row will then be
![]()
thus, the second row has at least m changes more than the first row. a'm is a continuous function of q; if therefore |q| is small enough, a'm has the same sign as am and (-q)ma'm = a'0 = f(-q) has the same sign as am or a different sign, depending on whether m is even or odd. The coefficients of f(x) = f(x + q) show therefore m + 2k more changes of sign than those of f(x), where k is non-negative. By passing through a w-fold root of F(x), the function C(b) decreases by m or an even positive number larger than m. In any interval (b,c), there exists a finite number (may be zero) of points of discontinuity of C; at the non-roots among them, it changes by an even number, the alteration at any root is congruent (mod 2) to the multiplicity of the root, whence on has
Theorem 1: Let f(b) ¹ 0, f(c) ¹ 0, b < c and r be the number of the roots of f(x) in the interval (b,c), every root being counted with its own multiplicity, then
C(b) = C(c) + 2k,
where k ³ 0 is an integral number.
Applying this theorem to an interval (0,c), where c is chosen so large that C(c)=0, one arrives at the corollary
Descartes' Rule: The number of positive roots of f(x) (every root being counted with its own multiplicity) is equal to the number of the changes of sign of the coefficients of f(x) or to a number less than it by an even number.
The number of changes is not altered if one multiplies the coefficients of f(x) by positive factors, e.g., if one replaces the coefficient of xk by f (k)(b). Moreover, one may write the sequence in its reverse order; in this case, the first element of the sequence may become zero, and the last element a constant; hence the elements equal to zero must now be be provided with the sign of the next non-zero element on the right hand side. Using this notation, Theorem 1 becomes:
Budan-Fourier Theorem: Let f(b) ¹ 0, f(c) ¹ 0, then the number of the roots of f(x) in the interval (b,c) (every root being counted with its own multiplicity) is equal to the difference of the numbers of changes of signs in the sets

or to a number less than it by an even number.
Let x = (cy + b)/(y + 1) and therefore y = (x - b)/(c - x), then the positive roots of g(y) = (y + 1)kf(x) are in (1,1) correspondence to the roots of f(x) in the interval (b,c), whence the number of roots in this interval can be determined by Descartes' Rule.
These formulae do not always yield directly the exact number of the roots in an interval, but they are very useful for determining it in even more complicated cases.
5.2.231 An example: Consider again the example of 5.1.1
![]()
As stated earlier, the real roots are positive and situated in the interval (1,9). One could get this result also by considering the changes of sign. f(-x) has no change and therefore there is then no positive root, i.e., f(x) has no negative root. From the earlier calculations for this example, one only finds by considering the signs:

Two roots have been computed already, one in the interval (1, 2), another in the interval (8,9); there corresponds to each of these roots a loss of one change. We want to find out whether the loss of the two changes in (2,8) corresponds to roots of f(x). For this purpose, we try to approximate these suspected roots by Horner's scheme and obtain by very simple calculations

Hence the two roots can only be located in the interval 2.64 < x <2.65, but we shall prove that f(x) is negative in this interval. As stated previously,

Hence for 2.64 < x < 2.65, f(x) has only the two roots calculated in 5.1.2 and 5.1.4. The same result can also be obtained by calculating the discriminant and verifying that it is negative.
5.2.3 Sturm's theorem: By the Budan-Fourier Theorem, the number of the roots in a given interval (b,c) is not determined uniquely, since the difference of the number of changes in the sequences

may be greater than the number of the roots by an even number. Thus, there arises the task of modifying the earlier method by construction of a sequence of continuous functions
![]()
with the property that the number of changes C'(b) in
![]()
gives the exact number of the roots which are greater than b, or a number differing from it only by a constant number. Then
![]()
is equal to the number of the roots in the interval including its left (but not its right) end point. The solution of the problem, due to J. K. Fr. Sturm, will be given below. Although Sturm's method is applicable to every case, it is not very convenient for practical calculations
An m-fold root of f(x) is an (m - l)-fold root of f '(x) and therefore has the highest common factor h(x) = (f(x), f '(x)). Hence f(x) : h(x) has the same roots as f(x), each root being simple. Since f(x) : h(x) can be calculated by rational operations, one may replace f(x) by f(x) : h(x), whence there is no loss of generality in assuming that f(x) has only simple roots. This assumption will now be made. Then the sequence (2) has to be rearranged in such a way that C'(x) decreases by one at every root of f(x), but is constant elsewhere.
Let 1 < k < m; if fk(x) changes its sign at x = x0 and the sign of fk-1(x0) differs from fk+1(x0), then no change is gained or lost by fk(x) changing its sign. However, if fk-1(x0) and fk+1(x0) have the same sign, then two changes are either gained or lost. Now, let the sequence (2) be constructed in such a way that the number of the changes alters by one only, namely in the roots of f(x), and decreases with increasing x. For this purpose, construct a sequence, where the number of changes is altered only when one of the outer elements changes its sign. Say, f1(x) has the same sign as f(x) and fm(x) has a constant sign. These considerations lead to
Sturm's Theorem: Let (2) he a sequence of polynomials satisfying the conditions:
1.
f1(x) has the same sign as f(x),
2. f2(x)
·························· f
'(x),
3. fm(x)
has a constant sign,
4. of 1 < k < m and fk(x0)
= 0, then fk-1(x0)fk+1(x0)
< 0;
let C'(x) be the number of changes of sign in the sequence (2) for any particular value of x, then the number of the roots of f(x) in the interval (b,c) - the left end point b being excluded - is given by (3).
Proof: The number of changes can only be altered by passing through those points which are roots of the polynomials forming the sequence. Let x0 be such a point and k = k1, ··· , ks be the indices of the functions for which x0 is a root. By 3., then k ¹ m. By 4., for 1 < k < m, the triplet of the polynomials fk-1, fk, fk+1 contributes exactly one change in an interval containing x0, but no other root of fk-1, fk, or fk+1. Thus C (x) is altered in the roots of f1(x) only and its alteration is equal to the gain or loss of changes in the portion f1(x), f2(x) of the sequence (2). As there exist only simple roots, f1(x) changes its sign at any root x0 of f(x). If it changes from - to + , f(x) increases and therefore f '(x) > 0 and by 2 f2(x) > 0; hence one change is lost. Similarly, if f(x) changes from + to -, f'(x) and therefore f2(x) is negative and one change is lost. Under the assumption made for the counting of number of changes in 5.2.23, the zeroes of f1(x0) have to be counted with the sign of f2(x0). Hence C'(b) - C'(c) yields the number of the roots of f2(x0) which are located on the right hand side of b, but not on the right hand side of c, i.e., in the interval (b, c) which includes c, but excludes b. Hence follows the theorem.
A sequence (2) satisfying the conditions of Sturm's Theorem is said to be a Sturm chain. It is possible to construct such a chain by the rule:

If n is the degree of f(x), the procedure ends after not more than n steps; the last polynomial, say fm(x), is a common factor of the sequence; since f1(x)=f(x) and f2(x) = f '(x) do not have a common factor of positive degree, fm(x) is a positive or negative constant. Thus, Conditions 1, 2 and 3 of Sturm's Theorem are satisfied. Since (fj-1(x), fj(x)) = fm(x) is a constant, fj-1(x) and fj(x) have no common root. For 0 < k < m, let fk(x0) = 0, then -fk+1(x0) = fk-1(x0) ¹ 0. Hence fk+1(x0) fk-1(x0) = - (fk-1(x0))² < 0. Thus, the sequence which is determined by (5) is a Sturm chain . 0ther chains can be obtained, e.g., by multiplication of polynomials with arbitrary positive constants.
5.2.3.1* Legendre polynomials: Sturm's Theorem will now be applied to Legendre polynomials. Let
![]()
here Dm denotes the m-th derivative of the function inside [ ] and D0 is this function itself. If u and v are polynomials in x,

whence by (2) for m > 1
![]()
* May be omitted at a first reading.
On the other hand, one gets from (2) for m > 1

Subtracting (3) from (4) and applying (1), one finds

This equation also applies for m = 1 and therefore generally.
From

follows
![]()
Eliminate from (5) and (6) P'm-1 and obtain
![]()
After replacing m in (5) by m + 1, one finds
![]()
Eliminate P'm(x) between (7) and (5') and obtain
![]()
Now, consider the sequence
![]()
in the interval
-1 £ x £ +1.
By (7), for m = n, if Pn(x) = 0, P'n(x) has the same sign as Pn-1(x). If Pm+1(x) and Pm(x) have a common root, by (8), this root must also be a root of Pm-1(x), whence it must be so of all subsequent polynomials of (9), in contradiction to P0(x) =l. Hence for Pm(x) = 0, Pm+1(x) ¹ 0, whence, by (8), Pm+1(x), Pm-1(x) < 0 at every root of P'm(x).
As Pn(x) and Pn-1(x) have no common root, by (7), Pn(x) has no common root with its derivative and has therefore only simple roots. Hence (9) is a Sturm chain in the interval -1 £ x £ l for every n.
By (7),
![]()
Since P0(x) = 1,
![]()
Whence the number of changes of sign in (9) is C'( 1) = n, C'(l) = 0 and Pn(x) has n different roots in the interval (-1,+1). Then, by (1), Pn(x) is of degree n, that is, the roots of the Legendre polynomials are all located in the interval (-1, + 1) and are simple roots.
5.2.4 Method for calculation of roots: In order to find the roots of a polynomial, say,
![]()
one can proceed as follows: First determine an interval in which all the roots are located. For this purpose, set
![]()
and assume that x ³ t, so that
![]()
and therefore
![]()
Hence , f(x) > 0 for x ³ t. Similarly, for (-1)n£ and x £ t:

In any case, if there exists any real root of f(x), it lies between -t and +t. In general, it is not difficult to determine a smaller interval (a,b) in which all the roots of f(x) are located; it is sufficient that the coefficients of the Taylor expansion of f(x) for x = a have alternating signs, while the coefficients of the Taylor expansion for x = b are all positive. Then C(a) = n, and C(b) = 0, whence there cannot be any alteration of the monotone decreasing function C outside the interval (a, b).
Once an interval containing all the roots has been determined, one subdivides the interval into smaller sections. By Sturm's Theorem, one is able to decide how many roots are contained in each of these intervals; the intervals containing roots are subdivided again and the method is repeated. Given a positive number e, after a finite number of steps, there will be found, for every root x of f(x) an interval of length < e , such that x is situated in that interval, i.e., every root x is determined with an error < e.
Although this method is theoretically absolutely sound, a clever calculator will hardly use it without essential modifications. The application of Sturm's method demands many operations, whence one tries to avoid it. It is mostly not difficult to get an idea about the general behaviour of the function f(x) = y and the intervals where roots might be located. For this purpose, graphical methods are very helpful (e.g., Lill's rectangular method *), provided the calculator is sufficiently familiar with the theory and the practice of mathematical plotting. The method explained here aims to separate the roots, i.e., to determine intervals containing a single root and to narrow down each interval till its length does not exceed the admissible error. It is convenient to set the error in advance; this is done mostly by demanding that a certain number of decimals must be correct. At every step of a calculation, one may neglect some digits, but one has to take care that the accumulated error does not affect the digits of the final, required result. We can explain here only the methods of calculation; their skilful handling must be learned by practice.
* Cf., for example, Bieberbach-Bauer, Lectures about Algebra, pp. 134140.
5.2.41 Linear interpolation: If f(a) and f(b) have different signs, the graph of the function f(x) may be replaced by the straight line connecting the points with the abscissae a and b. This line intersects the x-axis at
x = [af(b) - bf(a)]:[f(b - f(a)].
This value may be taken as a first approximation of the root. Consider the example of 5.1:
Set

The approximation by the straight line yields y = 0.4 which is obviously too large. Of course, the graph of the polynomial in that interval is very different from a straight line. Now

The graph is therefore considerably deflected in the interval and the root must be near the points x - 1 = 0. For the suitability of the approximation by linear interpolation, it is essential that the derivate does not change its sign in the interval.
5.2.42 Newton's method: The graph of y = f(x) can be approximated by its tangent for an interval near the point of contact. This means that one omits in the Taylor expansion of the polynomial at the point of contact the terms which are of second and higher degrees. When the interval is small and the coefficient of the linear term is small in comparison with the coefficients of the higher terms, this approximation yields good results. When it is applied to the previous example, Newton's method yields y = 1: 12. Indeed, this approximation was the starting point for the calculation of the same example by Horner's method in 5.1.2.
If an approximation of a root is obtained which is already near the root, then Horner's method yields a Taylor expansion for this approximation
![]()
in which) for small values of x, the higher terms cause little change (unless a1 is small compared with the coefficients of the higher terms). Let x1 be an approximation of a root of f(x) which lies near 0 and set
![]()
then x2 is, in general, a better approximation of that root. Applying this method to the example considered above and setting x1 = 1:12, one finds x2 = 0.091··· which is a little too small. Indeed, the value of the root has been found in 5.1.2 to equal 0.093,532,4 ···
The methods explained here are very helpful when applied in connection with Horner's scheme, especially when single roots are required. When all the roots are asked for, it is often better to apply Graeffe's method, explained in 5.3.
5.2.5 Poulain's Theorem: The number of the real roots of a polynomial with real coefficients is interconnected with the corresponding number of a different polynomial by
Poulain's Theorem: Let h(x) = bmzm + ··· + b1z + b0 be a polynomial with real coefficients and only real roots, bm ¹ 0, b0 ¹ 0 and f(x) a polynomial with real coefficients, then the number of different real roots of
![]()
is not less than that of f(x) and the corresponding proposition holds, when each root has been counted with its own multiplicity.
This theorem is a generalization of the
Lemma: Let a ¹ 0, then the number of different real roots of af(x) + f '(x) is not less than the number of different real roots of f(x) and the corresponding proposition holds when each root has been counted with its own multiplicity.
Proof of Lemma: Let m1 be the number of the different real roots of f(x) and m2 be the number of its real roots counted with their own multiplicities; moreover, let m'1 and m'2 denote the corresponding for k(x) = af(x) + f '(x). The proof will be given in several steps. To start with, it will shown that

and then

This will prove the lemma.
1. Let
be the different real
roots of f(x) written in ascending order,
forming m1 consecutive intervals. It will be
proved that k(x) changes its sign in each of
these intervals. The sign of f(x) in anyone of
these intervals is constant; without loss of generality; this can
be assumed to be positive in (c1,c2).
Hence f(x) increases in a sub-interval (c1,c')
and decreases in (c",c2). If r1
is the multiplicity of the root c1 of f(x),
then f '(x) has a root of multiplicity r1-
1 in c1. Since the number of roots of f
'(x) is finite, one can assume that
f '(x) has no root inside the interval (c1,c'),
whence it is positive in the interval and is either positive or
zero at c1 according to whether r1
= 1or r1 > 1. In the first case, it
is obvious that k(x) has the same sign as f
'(x) in a sub-interval (c1,d')
of (c1,c'), where c1
< d' < c', and d' is suitably
chosen, but even if r1>1, the quotient af(x):f
'(x) tends to 0 as x tends to c1,
and therefore k(x) has the same sign as f
'(x) in a suitably chosen interval (c, d').
Thus k(x) is positive in (c1,d'),
and in the same manner it can be shown that k(x)
is negative in an interval (d",c2).
Hence k(x) changes its sign in (c1,c2)
and there exists therefore a root of k(x) in
this interval. Corresponding reasoning holds for each of the m1-1
intervals. Hence (1) is true.
2. If f(x)
has a root of multiplicity r1 in c1
(for i = 1, ··· , m1), then f
'(x) has a root of multiplicity r1 -
1 and k(x) = af(x) + f '(x)
has also a root of multiplicity r1 - 1 in c1,
where a root of multiplicity 0 means that there is no root at
that point. Counting the roots of k(x) in
with
their multiplicities and considering that in every interval (ci,ci+1)
there is at least one root of k(x), one arrives
at the inequality
![]()
This proves (2).
3. and 4. f(x)
and k(x) are of the same degree, say n.
As the number of complex roots is even, m2 º n º m2'
(mod 2). This rules out m2' = m2
- 1, whence (4) is true. Suppose now that m1'
= m1 - 1. As k(x) changes
its sign in each interval (ci,ci+1),
it has exactly one root in each of these intervals; this root is
of odd multiplicity and k(x) has no other
roots, whence
are non-roots of k(x) and
therefore simple roots of f(x). From these
considerations follows that
m1 = m2 = n (mod 2).
However, as the roots of k(x) are of odd order, m'1 º m'2 (mod 2). Hence m'2 º n (mod. 2) º n (mod. 2) and this consequence contradicts m1 º n (mod. 2). Hence the assumption m'1 = m1 - 1 leads to a contradiction, whence 3. applies. As 1., 2., 3. and 4. have been proved, the lemma applies.
Proof of Poulain's Theorem: Without any loss of generality, assume that bm=1, whence for m = 1 g(x) = b0 f(x) + f '(x), and in this case the theorem is reduced to the lemma. Let p > 0 and assume that the theorem holds for m = p. We shall prove it for m = p + 1. If a is a root of h(z), then a ¹ 0 and h(z)=(z-a)h1(z), where h1(z) has only real roots.
Let h1(z) = zp
+ ··· + c1z +c0.
As the theorem holds for m = p, the number of
the roots of
is larger than or equal to the number of the roots
of f(x). In this formula, f (0)(x)
means f(x).
![]()
If one replaces the powers of z in h(z) =(z - a)h1(z) by the corresponding derivatives of f(x), one gets
![]()
By the preceding lemma, the number of the roots of g(x) is not smaller than the number of the roots of f(x), whence the theorem applies.