Chapter 1

Linear Equations

This chapter deals with system of linear equations

1.1 Introduction - A dialogue

Student: The problem posed at the beginning of this chapter seems to me to be a very easy one, but I have seen such words as vector space, rank, matrix later on in the hook, also determinants and formulae with subscripts and superscripts. I cannot understand why the author is trying to make a very simple thing so complicated. The problem can be solved with the help of the methods which I have learned when I worked for the matriculation examination.

Teacher: Of course, it is my duty to help you to understand this theory clearly, but Mathematica is not a matter of seniority. History has shown several examples where mathematiciana were superior to their masters at a very early age. I should not miss the opportunity to learn something from you; please, explain your solution of the problem.

Student: It is simply the method of substitution! It follows from the last equation that xn = (k - k1x1, ... , - kn-1xn-1)/kn. Substituting this value into the remaining linear equations, I get linear equations with only n - 1 unknowns. After having solved this system, you calculate the value of xn by substituting the values of x1, ... , xn-1) into that equation, Is that so?

Teacher: Yes provided kn ¹ 0.

Student: In the case kn = 0,xn is infiiute!

Teacher: I do not think so! For example, consider the two equations - n=2, k2=0:

Obviously, the only solution of this system is x1 = 1, x1 = 2.

Student: Yes, that is true. - If kn = 0, then I take another equation of the system instead of the last one. Thus, without a loss of generality, I suppose kn¹0. I think you will be satisfied.

Teacher: Unless the coefficient of xn is zero in each equation of the system.

Student: In this case, the system has to be considered as a system with only (n-l) variables; it would be absurd to consider it as a system with n variables as the equations are actually independent of xn.

Teacher: Perhaps less absurd than you may believe, but I accept your definition that a system of linear equations should be considered to depend only on such variables, as have at least one non-zero coefficient.

Student: Certainly

Teacher:T. After xn has been eliminated, how do you continue?

Student: I shall repeat the process again and again until I arrive at a single equation with one variable x1, and then there is no problem left.

Teacher: You assume that the number of equations is equal to the number of the variables and believe that at every step of your procedure, both the numbers decrease by exactly one ?

Student: Certainly, but the number of equations may also be less than the number of variables, let us say that we hve m < n equations in n variables. In that case you shift the terms with xm+1, ... , xn to the right hand side. These variables may take arbitrary values. For every set of values xm+1, ... , xn, there exists one solution x1, ... , xm, since there are as many of these variables as there are equations. The number n— m is the degree of freedom of our system, as the values of n ~ m variables may be chosen arbitrarily.

Teacher:. And if there are more equations than variables ?

Student: Then there cannot exist any solution. It is obvious that n variables cannot satisfy a system of more than n conditions.

Teacher: But it seems to me that the system x = 1, 2x = 2 has a solution, although it is a system of two equations with a single variable.

Student: But these equations are not different. Equations which differ by a common factor only cannot be considered as different, and it is common sense to consider only equations which are different.

Teacher: Thus, if two equal equations are given, one of them should be dropped?

Student: Yes!

Teacher: But this must be done also at the later stages of the procedure.

Student: I cannot follow you exactly.

Teacher: Consider the system

Three different equations in x1, x2, x3. Since the degree of freedom is zero, do you expect to get exactly one solution ?

Student: Yes! Set x3 =6 - 2x1 + 5 x2 in the first two equations, so that

Teacher: These equations differ by a factor —1 only; hence one of them must be dropped. Thus you may choose x2 in an arbitrary manner. Put x1 = 23, x2 = 5, x3 = - 41x2 + 16, and this will solve the system of equations for every value of x2. You have one degree of freedom, although the number of equations is equal to the number of variables.

Student: That is true. This example ie obviously a wicked exception.

Teacher: You may call it an exception if you like, but there are plenty of them.

Student: I see!—There may be certain cases, where the degree of freedom is higher than the difference between the number of variables and the number of equations, but at any rate n equations with n variables have at least one solution which can be found by the method of substitution.

Teacher: Why?

Student: Because the number of equations can decrease, as one may get two equal equations by the procedure of substitution, and then one of them must be dropped, but the number of the variables cannot.

Teacher: Try the equations

 

Student: Substitute x3 = 2x1 - 5x2 - 2 in. the first and in the second equation, whence

This is funny!

Teacher: Indeed, the coefficients of x2 in both the equations became zero; thus the equations have to be considered as equations of one variable only. Now x1 should be equal to 7:3 and to 5:7; that is impossible.

Student: Perhaps I was somewhat rash in conceding that a system of equations in n variables should be considered as a system in n - 1 variables, if the coefficients of one of the variables are all equal to zero. Let us retain x2 and put

Hence x1 = 5/7 - 0x2, and therefore 0x2 = 34:7, x2 = ¥.

Teacher: What do you mean by ¥?

Student: Infinity! That is a number which is greater than every other number and equal to 1/0.

Teacher:. Can you calculate with this ¥ as with an ordinary number?

Student: Certainly.

Teacher: Then -¥ = -l/0 = l/(-0) and there for -0 = 0 there holds -¥ = ¥, whence 0=2·¥ and therefore 0 = ¥.

Student: No, that is not so. One cannot calculate with this symbol as with an ordinary number. But, as a matter of fact, ¥ occurs in mathematics. It is a somewhat complicated matter; one needs differential calculus to handle it, and I was hoping that you may explain it to me clearly one day.

Teacher: On another occasion. The symbol ¥ does occur, sometimes it is used rightly, sometimes wrongly; you find use as well as misuse in textbooks. Considering systems of linear equations, we enquire about those numbers which taken for x1, ... , xn, respectively, satisfy those equations. Numbers can be added, subtracted and multiplied; one can also divide a number by a number, unless the divisor is zero. The division by zero is meaningless as far as numbers are concerned.

Student: And the example which I attempted just now?

Teacher:T. It has no solution, since x1 cannot simultaneously be equal to 7/3 and 6/7.

Student: So there exist systems of 3 equations in 3 variables which have no solutions, systems which have an infinity of solutions and systems which have exactly one solution. How did you construct those examples by which you cornered me?

Teacher: It is not difficult if one knows a little bit of the theory.

Student: Those vector spaces, matrices, ranks, etc., Sir, I should be thankful if you could explain to me some part of the theory without using those notions. I do not like those innovations.

Teacher: Then try the simplest case : ax = b.

Student: Then x = b/a.

Teacher: Provided a ¹ 0.

Student: If a = 0, b ¹ 0, the equation has no solution as there exists no number x for which 0x = b ¹ 0. If however a = 0, b = 0, then every value x is a solution.

Teacher: Indeed! This simple case is the seed of the whole theory. Now try

Student: Let Dy = D1, Dx = D2, where D = a1b2 - b1a2, D1 = a1b - b1a, D2 = ab2 - ba2. If D ¹ 0, then x = D2/D, y = D1/D.

Teacher: In this case, there exists no other solution and these values satisfy the given equations, as you may readily verify.

Student: If D = 0, but D1 or D2 differ from zero, there is no solution. If D=D1=D2 = 0, then every couple of values (x, y) satisfies the equations.Teacher: Consider the equations

Here D = D1 = D2 = 0, but, for example, (x,y) = (0, 0) is not a solution as x=5-2y.

Student: This is true, but I cannot understand it. The equations Dx = D2, Dy=D1 are satisfied by every pair of values if D = D1 = D2 = 0.

Teacher: These equations are consequences of the given equations; they are necessarily conditions for solutions x,y of the given system, but they may not be sufficient. The case D = D1 = D2 = 0 contains the different cases:

1. If all the coefficients are equal to zero, then every pair (x,y) is a solution.
2. Let a
1 = a2 = b1 = 0, but (a,b) ¹ (0, 0), then there is no solution.
3. Let at least one of the 4 coefficients on the left hand side be
¹ 0. Say, without loss of generality, a1 ¹ 0. Set = b1/a1 = l, whence

Thus x = (a— a2y)/a1 for arbitrary y yields all the solutions. There are therefore 5 different cases.

Student: For a higher number of variables and of equations a full analysis may become very complicated. How do you tackle the problem for arbitrary n?

Teacher: For this material, I suggest you study the notions of vector space, rank, matrix, determinant etc., as they are explained in the following sections.

1.2 n-vectors:

Definition 1: An ordered set of n numbers is called an n-vector

The n numbers a defining a are called its coordinates.

As the set is supposed to be an ordered set, the n-vector will, in general, be altered by an interchange of the coordinates.

Definition 2: The product of a number c and an n-vector a is the n-vector

Definition 3: The sum of a and n-vector b = (b1, ... ,bn) is the n-vector

These definitions yield:

Commutative Law:

Associative Law:

First Distributive Law:

Second Distributive Law:

Due to these laws, you can use the notation of the sum of n-vectors in the same manner as it is used for numbere. Thus

where j = 1, ... , m, the c are arbitrary numbers and a = (a1j, ... , anj) are n-vectors.

The n-vector (—1)a is called the negative of a and is denoted by -a. The mverse of the addition of a is the addition of -a. As in elementary arithmetic, this operation is called subtraction of a and is denoted by the sign -. Thus, b-a is put written b + (-a).

Notation:

Formulae:

The vectors of Plane Geometry can considered as 2-vectors, those of Solid Geometry as 3-vectors. In Geometry, these vectors are added by placing them together in such a way that the endpoint of the first vector is the starting point of the second vector. The result of this addition is the same as here; n-vectors occur also in other context. For examplem let n be the number of the clients of a bank and a1, a2, ... , an, their account balances on a certain day; then the state of the bank on that day is represented by the n-vector (a1,a2,...,an) = a. Similarly, the change of the state on that day is represented by an n-vector b, and the state of the bank on the following day is given by a+b.

1.3 Vector Spaces:

Definition 1:

An on n-vector a is said to depend on the n-vectors aj(j = 1, ... ,m) if a can be represented by .

Definition 2:

The n-vectors depending on a1, a2, ... , am are said to form a vector space generated by a1, a2, ... , am.

Proposition 1: If b1, ... ,b r belong to a vector space V, then every vector which depends on them belongs to V.

Proof: Let V be generated by a1, ... ,am, say then where , whence there follows the proposition.

Definition 1 can also be expressed as follows: am+1 depends on a1, ... ,am, if there exists an equation

Definition 3:. a1, ... ,am+1 are independent, if implies

This definition and the remark above yields directly:

Proposition 2:

m + 1 > 1 n-vectors are independent, if and only if none of them depends on the m other vectors; a single n-vector is independent, if it differs from the zero-vector.

Definition 4: A set of independent n-vectors which generate a vector space V is called a base of V.

Proposition 3: Every vectorspace V which contains n-vectors ¹ 0 has a base.

Proof: Let V be generated by a1, ... ,am. If these n-vectorsa do not form a base, then either they are all = 0 or one of them, say am depends on the other vectors. In the first case, V contains no n-vector ¹ 0, in the second case, V is generated by a1, ... ,am+1. Thus one can reduce the number of the generating n-vectors until one obtains a generating system of independent n-vectors. At every step, the number of generating n-vectors decreases ; hence, after less than m steps, the procedure cannot be repeated any more, i.e., the generating n-vectors are independent and form a base.

Proposition 4:. Let a1, ... ,am be a base of V, and cm ¹ 0, then a1, ... ,am-1, b is a base of V.

Proof: Since b is contained in V, it follows from Proposition 1 that every n-vector of the vector space V', generated by a1, ... ,am-1, b is contained in V; on the other hand, am, and therefore every n-vector of V, belongs to V'. Hence V'= V. In order to prove that the n·vectors are independent, assume that there exists a system of numbers d1, ···, dm-1, dm not all = 0 such that

0=d1a1+d2a2+...+dm-1am-1+dmb=(d1+dmc1)a1+···+(dm-1+dmcm-1)am-1 + dmcmam.

Since a1, ... ,am-1 are independent, dm ¹ 0, and moreover cm ¹ 0, not all of the coefficients on the right hand side vanish; this contradicts the assumption that a1, ... ,am are independent, whence follows the proposition.

Proposition 5: Let V be a vector space, a1, ... ,am its base and b1, ... ,b t be t independent n-vectors n V ; then a new base of V is obtained by replacing t suitable elements of the base by the n-vectors b j; thus t < m holds.

Proof (by mathematical induction):. Proposition 4 yields that the theorem holds for t = l. Let it hold for t=r; without loss of generality, assume that
b1, ... ,b r, a1, ... ,ar+1, ... , am is a base of V. Thus

b r+1 = c1b1 + ··· + crb r + cr+1ar+1 + ··· + cmam.

As b r+1 does not depend on b1, ... ,b r, at least one of the numbers cr+1, ··· , cm differst from zero, say cr+1 ¹ 0. Then Proposition 4 yields that in the base ar+1 can be replaced by b r+1, whence follows Proposition 5.

Proposition 6: Every base of a vector space V contains the same number of elements; this number is called the Rank of V.

Proof: Let m be the number of elements of a base of V and t the number of elements of an arbitrary system of independent n-vectors in V. Proposition 5 yields that t £ m. Hence there exists one system of m, but no system of more than m independent n-vectors in V. The number of elements of any base is therefore equal to the maximum number of independent n-vectors, whence follows the proposition.

 

Definition 5: If every n-vector of a vector space V' is an n-vector of V, then V' is a sub-space of V. This relationship is denoted V' Í V.

Proposition 7: If V' Í V, either V' = V or Rank V' < Rank V.

Proof: The n-vectors b1, ... ,b t which form a base of V' are t independent n-vectors of V, whence it follows from Proposition 5 that t suitable n-vectors of any base of V can be replaced by the b vectors. Hence V has a basis b1, ... ,b t, at+1, ... ,a m, where m = Rank V and therefore t £ m. In the special case when t = m, the vector spaces V and V' have the same base b1, ... ,b t and this implies that V = V'.

In order to state that the sub-space V' of V differs from V, you use the notation

V' Ì V.

Proposition 8: The Rank of a vector space of n-vectors is at most n.

Proof: The n unit vectors generate a vector space which contains every n-vector, whence the proposition follows from Proposition 7.

Proposition 9: There exists always between p > n of n-vectors a linear equation with coefficients not all of which vanish.

Proof: If the proposition is not true, there must exist p > n independent n-vectors which contradicts Proposition 8.

Proposition 10: Let A be a system of n-vectors with the property that the sum of any two n-vectors of A as well as the product of any number and a n-vector of A belongs to A, then A is a vector space.

Proof: There cannot exist more than n independent n-vectors in A, say a1, ... ,ar are independent ; then every n-vector of A depends on the ai, but every n-vector depending on them must belong to A ; thus A is a vector space generated by a1, ... ,ar.

Proposition 11: The n-vectors

are independent.

Proof: Suppose Then Hence l ¹ 0 unless d1 = d2 = ··· = dr = 0.

1.4. Matrices. Method of "Sweep out":

Definition: A matrix M is a rectangular scheme which consists of nm numbers which are called its elements and are arranged in m (horizontal) rows and n (vertical) columns. The rows can be considered to be n-vectors generating a vector space R(M), the columns m-vectors generating a vector space C(M). If every element of M is equal to zero, M is called t the zero-matrix.

For example, consider the right hand side of 1-3, (2), where m = r. The row-vectors are independt, whence Rank R(M) = r; the first r column vectors are also independent and, since the vectors are m-vectors, Rank C(M) = r. It will now be proved that the Ranks of those two vector spaces are always equal; this number will be called the Rank of the matrix M. In order to achieve this result, consider certain operations whih neither alter the vector space R(M) nor the Rank of the vector space C(M). By means of these operations, the matrix is gradually swept out, i.e., in a certain portion of the matrix the elements are replaced by zero and eventually the matrix is reduced to a type which is similar to the matrix 1.3, (2). The method of sweep out is very important for the solution of systems of linear equation. Let A be the matrix

and a1, ··· , am the n-vectors formed by the row vectors. The vector space R(A) is obviously not altered by the operations:

Let a1, ··· , am be the m-vectors formed by the columns and

It will be shown now that the same equation holds after performance on the matrix A of any one of the operations I, II, III. Of course, (2) can be rewritten in the form

with i = 1, ··· , m.

Then, for any particular j, k, you have

whence (2') remains invariant for the operations I and II. The operation III only causes omission of a condition which is identically satisfied. Hence every linear equation (2) "between column vectors is invariant for all the operations I, II, III. The inverse operation of I is an operation of the same type, where c is replaced by c-1 ; the inverse operation of II is such an operation II, where d is replaced by — d; the inverse operation of III is addition of a new coordinate which takes the value 0 for every column vector. Hence, a linear equation (2) cannot hold after operations I, II, III unless it held before. Let r of the column vectors be independent and the other column-vectors depend on them, then these r vectors form a base and Rank C(A) = r. By the operation I, II, III, those r vectors remain independent and the other vectors depend on them. Hence the Rank of C(A) is not altered. The essence of these reflections can be formulated as follows

Proposition 1: Repeated application of Operations I, II, III does not alter the vector space R(A) and the Rank of the vector space C(A).

Note that, in general, the vector space C(A) will be changed, for example, the number of the coordinates may decrease; on the other hand, the Rank R(A) is invariant, since itself is not altered.

Theorem 1: Repeated application of the operations can transform the matrix A into

or into a matrix which differs from (3) by a permutation of only its columns. (Asterisks represent any numbers). The rows of this matrix form a base of R(A).

Proof: If any row vector is equal to 0, the row should be omitted. Neither I nor II,can transform the matrix into 0. Thus, at every later stage of the operations in this proof every row-vector 0 has been omitted automatically; the matrix cannot be annihilated thereby. Let a11 differ from zero, replace a1 by (a11)-1a1, thus a11 becomes 1 by I; then replace a1 by a1 - a11a1 (II) for i = 2, 3, ··· , m. This sequence of operations sweeps out the first column, i.e., one element (the first one) becomes 1, the other elements 0; during the following operations, the first row will not be multiplied by any number nor will it be added to any other row, whence the first column will remain swept out. If a11 = 0, there exists a number j1 so that next view column j1 to be first column and sweep it out accordingly. Since in the theorem a permutation of columns does not matter, let without loss of generality j1 = l. After the first column has been swept out, denote the elements again as in (1). Now a21 = 0 and every row with vanishing elements is omitted; hence there element for which j2 > 1. Without loss of generality, assume that j2 = 2. Again sweep out the second colnmn on replacing a2 by (a22)-1a2 and ak by ak - ak2a2 for k ¹ 3. Proceed in this manner intil the matrix is either reduced to the form (3) or only differs from it by a permutation of its columns. The rows of the matrix generate R(A) and as they are independent (cf. 1.3, II), they form a base of R(A).

Theorem 2: Rank R(A) = Rank C(A) for every matrix A. This number is called Rank of fhe matrix A

Proof: If A is the 0 matrix, then both the vector spaces are of Rank 0. If A¹0, sweep out A ; these operations do not change the Ranks (Proposition 1). In (3), the rank of both the vector spaces equal to the number of rows ; the same applies to the matrices obtained by interchanging the columns of (3). Hence follows the theorem.

1.5 Orthogonality. Homogeneous linear equations:

Definition: Two n-vectors a = (a1, ··· , an) and b = (b1, ··· , bn) are said to be orthogonal if

Thus, if a is orthogonal to b, then b is orthogonal to a, i.e., orthogonality is a symmetric property. This notation presents the opportunity to apply vectors to systems of linear equations. First of all, the homogeneous equations

For the matrix ( (aik) ), its rows, etc. use the notation of 1.4. Every ordered system of numbers

which satisfies (1) is called a solution of (1). Hence, a solution is an n-vector which is orthogonal to a1, ··· , am. Let (2) be a solution of (1) and be an arbitrary vector of the vector space R(A). Since

then

Hence x is orthogonal to a, you have

Proposition 1: Every solution of (1) is orthogonal to every vector of R(A).

If x is a solution of (1) and c an arbitrary number, then cx is also a solution of (1).

Moreover, let

be a solution of (1), then for j = 1, ··· , m:

Hence also x + h is a solution of (1).

It follows from 1.3, Proposition 10 that the solutions of (1) form a vector space. Every n-vector of this vector space is orthogonal to every n-vector of R(A), whence:

Proposition 2: The solutions of (1) form a vector space X(A). Every n-vector of X(A) is orthogonal to every n-vector of R(A).

In order to find the solutions of (1), you only must know the vectors which are orthogonal to any base of R(A). Using the method of sweep out, one obtains the base in a suitable standard form

(the asterisks of 1.4, (3) have been replaced by -bik or by a matrix which differs from (3) by a suitable permutation of columns. Let

be this permutation. An n-vector (2) is orthogonal to n-vectors of R(A) and is therefore a solution of (1) if and only if it satisfies the conditions

The values of can be chosen arbitrarily; they uniquely define the remaining r coordinates of the n-vector .

For example, set

then

You obtain n -- r indepondent n-vectors of X(A) which have coordinates, ordered corresponding to the permutation (4), whence assume the sets of values

These n-vectors form the base of a vector space X' of Rank (n r) and X' is a sub-space of X(A). In X', the coordinates assume every set of values, but it follows from (5) that by values determine uniquely a solution of (1). Hence every solution of (1) is an n-vector of X', whence X' = X(A). This result is formulated by the next two propositions:

Proposition 3: Rank R(A) + Rank X(A) = n. (7)

Proposition 4: In order to solve (1), apply sweep out to the matrix A; let the result be a matrix, which differs from (3) by a permutation (4) of the columns only; then a base of X(A) is given by (6).

Every n-vector a of R(A) is orthogonal to every n-vector of X(A). The n-vectors which are orthogonal to the n-vectors of X(A) form a vector space of Rank n— (n - r) = r in which R(A) is a sub-space. By 1.3, Proposition 7, this vector space is identical to R(A). This result is in

Proposition 5: If for every solution of (1), then (a1, ··· ,an)is an n-vector of R(A).

Propositions 1, 2, 3, 5 are condensed in the

Theorem: Every system (1) of homogeneous linear equations generates two vector spaces B(A) and X(A) subject to (7). R(A) is generated by the rows of (1) and X(A) consists of the solutions of (1). An n-vector is orthogonal to R(A) [X(A)] if and only if it belongs to X(A) [ R(A) ].

For the theory of homogeneous linear equations and its applications, it is important that the problem of solving such equations leads to two vector spaces which are connected by a reciprocity or duality, that is, R(A) defines X(A) in the same manner as X(A) defines R(A).

The advantage of the use of homogeneoua systems is mainly based on this fact. The treatment of non-homogeneous systems is more complicated. One may apply to them the method of homogenization discussed next. This method is used also in geometry, when ordinary coordinates are replaced by homogeneous ones and thereby the space is extended to a projective space. The duality thus obtained corresponds exactly to the duality between X(A) and R(A).

next last