Chapter VI
The problem of solving a system of linear equations leads to the concept of matrix (cf. Chapter 1); moreover, it has been shown that any linear transformation of a vector space is completely determined by a matrix. As linear transformations are very important for many branches of mathematics, it has become necessary to develop a theory of matrices of which some basic sections will be explained in this chapter. The concept of matrix has been generalized, the theory has been extended far beyond the modest aims of this book and is now applied nearly everywhere in Mathematics.
In Chapter 1, a matrix has been defined as a rectangular scheme of m n numbers ordered in n rows and m columns. It has been shown later on (cf. 2.6.1) that the results of Chapter I, excluding 1.7, are not affected if elements of an arbitrary field K instead of numbers are arranged in a rectangular scheme to form a matrix. This generalization will be used here. If the elements of a matrix A are elements of a field K, then A is said to be a matrix over the field K. If A is a matrix over K, it is obviously also a matrix over every extension of K. It is sometimes necessary to extend a field over which the matrices are supposed to be situated. On the other hand, it is often necessary to restrict the considerations to matrices, the elements of which belong to a certain integral domain D in K; these matrices are called matrices over D. Thus, a matrix over D is simultaneously a matrix over K, whereas not every matrix over K is a matrix over D. With a few exceptions, the matrices under consideration in this chapter will be square matrices, i.e., the number of rows and columns will be the same; this number is said to be the degree of the matrix. Thus

is a matrix* of degree n. The notation (1) is very convenient for general investigations. Throughout this chapter, the elements of a matrix of any degree, denoted by capital letters A, B, C, ··· will be expressed by the corresponding small type with upper and lower indices as in (1) unless the elements are expressly given in a different form.
* Instead of brackets, pairs of vertical double bars are used to denote a matrix.
6.1 Addition and multiplication of matrices of degree n: Let K be a field, 0 its zero and 1 its unit element; let D be an integral domain in K (which may be identical with K), A, B, C, ··· matrices over D of degree n and, in particular, O be the matrix all elements of which are equal to zero. Using this notation, one defines addition and subtraction as follows:

From these definitions follow directly

and the notation

Thus, addition and subtraction are inverse operations The matrix -B has the elements -bik and the set of all matrices over D of degree n forms a module of which O is the zero-element. The elements of this module can be multiplied by the elements of D with the aid of the definition:
![]()
Hence, in order to multiply the matrix A by an element c of D, one must multiply every element of A by c. By (5), one has:

In (7), m denotes an element of the prime field of K which is obtained by repeated addition of 1 to itself [cf. 2.2.5). All these elements belong to D as D is an integral domain; on the other hand, all these elements m belong to the prime field of K (no restriction has been made for the characteristic of K). Denote by
![]()
the matrix of degree n, the elements of which are each equal to 0, except the common element of the r-th row and the s-th column which is assumed to be equal to 1. Then
![]()
The representation (9) of a matrix A is unique. The left hand side of (9) is O if and only if each of the n² elements aik are equal to zero. Hence the n² matrices (8) are independent over D and form therefore a base. Let D be a field (D = K), then the matrices form a vector space over K (cf. 2.6.1). Hence
Theorem 1: The matrices over K of degree n form a vector space of Rank n² over K. The matrices (8) form a base of that vector space.
Multiplication of matrices of degree n has been defined already in 1.1.11 by
![]()
if
. It has been proved in 1.11.1 (3) that
the associative law
![]()
holds for this multiplication. That the commutative law is not satisfied by the muliplication (10), is seen, e.g., from the example of the two matrices
![]()
which are obviously non-commutative. From (1) and (10) follow easily the distributive laws

In the notation introduced in 2.16, one obtains
Theorem 2: The matrices over D of degree n form a ring R(D, n).
The zero element of R(D,n) is O, its unit element E is the diagonal matrix (cf. 1.11.12) with diagonal elements equal to 1. R(D,n) contains a sub-ring D' which consists of those diagonal matrices in which all the diagonal elements are equal. By mapping each of these matrices onto its diagonal element, one gets an isomorphism between D' and the integral domain D. The elements of D' are commutative with every matrix A of R(D,n). Multiplication (5) of A by an element c of D can be replaced by multiplication of A by the matrix of D' which corresponds to c. The matrices of D' are sometimes called scalars, but this terminology will not be used here. Let A be a matrix of R(D,n), then one can form the powers E = A0, A = A1, AA = A2, ··· , AkA = Ak+1 and it follows from the associative law that Ar As = Ar+s = As Ar. Hence
![]()
form a system of matrices which are commutative.
Denote by A and B (with or without indices) matrices of R(D,n) and correspondingly by c,d elements of D, then dE = D belongs to D' and therefore
![]()
More generally,

Hence, if each Ai is commutative with each Bj, then SciAi and SdjBj are also commutative. Hence
![]()
are commutative. Now let
![]()
run over all the polynomials D[x], then
![]()
runs over a system of matrices of degree n which form a commutative sub-ring R(D,A) of R(D,n). Since R(D,n) has a base of n² elements, the elements (13) cannot be independent. Hence there exists a polynomial f(x) which differs from the zero-polynomial such that
![]()
Hence A is a root of a polynomial, but it is not necessarily a root of a polynomial which is irreducible in D[x], as f1(A) f2(A) = 0 does not imply that f1(A) or f2(A) is equal to O. Although R(D,A ) is a commutative ring containing the unit element E, it might not be an integral domain.
If two fields D1 and D2 have the same prime field, the rings R(D1, n) and R(D2,n) n) have the same zero element O and the same unit element E. When different zero or unit elements are considered simultaneously (e.g., 6.21), a proper distinction by indices will be made; otherwise the notation O and E will be employed.
6.11 The group G(K,n). If n > 1, the ring R(D,n) cannot be mapped isomorphically onto D, since D is an integral domain and R(D, n) is non-commutative. However, there exists a non-isomorphism mapping for which multiplication is invariant, but addition is not. This mapping is
![]()
Indeed, it has been shown in 1.11.3 that
![]()
Matrices of Rank < n are mapped onto 0, whereas matrices of Rank n are mapped onto the elements ¹ 0 of D. Consider now the case when D is a field K. The matrices of R(K, n) of Rank n are mapped onto the elements ¹ 0 of the field K. These elements of K form a multiplicative Abelian group (cf. 2.12), as the product of any two elements ¹ 0 of K is again such an element and the multiplication is commutative, associative and satisfies the law of inverse existence. The system G(K,n) of the matrices which are mapped onto that Abelian group has similar properties except that multiplication is not commutative. In order to characterize the nature of that system, we shall employ the
Definition: A system G of elements is said to be a group, if every ordered pair of elements a, b of G can be composed into an element ab of G and the composition satisfies the conditions:
1. It is associative.
2. There exists an element e in G (the unit
element) such that ae = a = ea holds
for every element sa of G.
3. There corresponds to every element a of G,
there corresponds an (inverse) element a-1 of
G such that aa-1 = e = a-1a.
There cannot exist more than one unit element in G, for if e and e' are unit-elements, ee' rnust be equal to e and to e'. Moreover, there exists only one inverse element, because ba = e implies baa-1 = ea-1 and therefore b = a-1. Similarly, one sees that ax = b has, for given a and b, one and only one solution, namely x=a-1b. Correspondingly, y=ba-1 is the only solution of ya=b.
In the particular cases when the composition is commutative, the conditions (4m), (5m), (6m) of 2.11 are satisfied; if one uses the sign of addition in place of the sign of multiplication, those three formulas are transformed into (4a), (5a), (6a). An additive commutative group is therefore a module; on the other hand, every module is obviously an additive commutative group. This justifies the term Abelian group introduced in 2.13. The terms Abelian group and commutative group are, of course, synonymous. Thus, the notion of group, which has been introduced here, appears to be a generalization of the Abelian group (module) which has been very often used in earlier chapters.
Theorem: The system G(K,n) of the matrices over the field K of degree n with determinant ¹0 is a group, the matrix multiplication being the composition.
Proof: That matrix multiplication is associative, and that there exists a unit element, namely E, has already been proved. It is therefore sufficient to demonstrate the existence of an inverse element A-1 satisfying
![]()
but it has already been shown in l-(ll).4 that when Ai k denotes the co-factor of ai k in A, and bi k = Ak i : detA, then B = A-1 satisfies (2), whence follows the theorem.
It follows from
![]()
that
![]()
Thus, in order to form the inverse of a product, one has to put the inverse values of its factors, but in the opposite order.
6.13 The ring R(D):*
*May be omitted at a first reading.
The operations of addition and multiplication, as defined in 6.1 (1) and (10), cannot be applied without restrictions to rectangular matrices. In order to apply 6.1 (1), one must assume that the two matrices A and B have the same number of rows and that they have the same number of columns, which may differ from the number of rows. Moreover, 6.1 (10) can be applied if and only if the number of rows of A is equal to the number of columns of B; both these conditions cannot hold simultaneously, unless the two matrices are square and of the same degree. This difficulty can be overcome to a certain extent by a method similar to that applied during addition and multiplication of polynomials (cf. 2.32). It may be remembered that one is allowed to add higher terms with zero-coefficients to a polynomial f(x) or that one may omit such terms, and that this operation does not alter f(x). Similarly, one may supplement a matrix by adding rows composed of zeros below it, and similarly extending it to the right hand side by columns of zeros; all these matrices may be considered to be equivalent. In doing so, one replaces the investigation of the matrices by that of classes of matrices, which differ only by zero-rows (below) and zero-columns (on the right hand side). Obviously, an equality of matrices defined in this manner satisfies the conditions for equivalence 2.12.
Now, let A and B be two matrices over D; by adding zero-rows and zero-columns, one can extend them to square matrices A' and B' of a sufficiently high degree, say n', and to square matrices A* and B* of any degree n" > n'. Let

then one gets S" from S' by adding n" n' rows below and the same number of columns on the right hand side to S', and in the same manner one gets Q" from G'. Thus (1) defines addition and multiplication of classes of rectangular matrices of any number of rows and columns, the result being independent of the choice of the square representatives A' and B' of the two classes. As addition and multiplication defined in this manner satisfy (3), (4), (11) and (12) of 6.1, these classes form a ring R(D). This ring is non-commutative and does not contain a unit element, although there exist unities for every element of the ring which reproduce it (but not every element of R(D)) by multiplication. For example, let a, b, g be the elements of R(D) which are represented by the matrices

respectively. Then ag = ga = a, whereas

It seems to be reasonable to overcome this difficulty by extending every finite matrix to an (enumerably) infinite matrix by addition of zero-rows and zero-columns and to consider only matrices with an (enumerable) infinity of rows and columns. The infinite diagonal-matrix, where all diagonal elements are equal to 1, is a unit element in this system. This procedure displays much analogy to the step leading from polynomials to power series, and, as in the theory of power series, considerations concerning convergence become necessary here. However, such investigations of infinite matrices and the corresponding vector spaces lie beyond the scope of this book.
6.13 Notation and formulae: A matrix can be subdivided by horizontal and vertical lines into smaller arrays, and these again can be considered as matrices (sub-matrices) and denoted by capital letters. This procedure will be applied here to square matrices only and, as a matter of convenience, the intersection by horizontal lines is supposed to be symmetric to the intersection by vertical lines; thus the matrices in the diagonal are always supposed to be square.
It is not always convenient to give a particular notation for each element or for each sub-matrix in the array of a matrix. Portions left empty will be assumed to be filled with zeros, whereas those portions which are marked with asterisks are occupied by elements of any kind (zeros or non-zeros). Using this notation, one arrives directly at the following formulae from the definition of multiplication:

In these formulae, matrices with the same index are assumed to be of the same degree (which, in particular, may be 1). Denote the two matrices on the left hand side of (2) by A and B, respectively, then

If one interchanges the rows and columns of a matrix, say A, one gets the transpose of A,denoted by

In the notation of l.ll.ll

These matrices are very useful for representing n-vectors. (x) is often called a column vector, (x)T a row vector.
Let A and B be matrices over a field K and of degree n, let B have Rank n, then B-1 exists and
![]()
is also a matrix over K of degree n. A' is said to be the transform of A by B and to be similar to A. This will be denoted by
![]()
Similarity satisfies the conditions for equivalence (cf. 2.13). Indeed,
A = E-1AE ~
A (law
of reflexivity). (7) implies A =
BA'B-1, whence A ~ A' (law
of symmetry). (7) and A" = C-1
A'C together imply A"=(BC)-1A(B)~A
(law of
transitivity), whence similar
matrices form classes.
Let f(x) = a0 + a1x + ··· + anxn and f(A) = O, then
![]()
Hence follows the
Theorem: If A is a root of a polynomial, the matrices similar to A are roots of the same polynomial.
In some respect, the properties of the classes of similar matrices are more interesting than the matrices themselves as will be shown next.
6.2 Transformation of vector spaces:
Let W be a vector space of Rank n over the field K (cf. 2.61) and
![]()
be a base of W. Then every element of W can be expressed by
![]()
where x1, ··· , xn are elements of K. Thus every element of W can be represented by an n-vector (x1, ··· , xn) as has been done in Chapter 1, but this representation depends on the choice of the bass (cf. 2.61). If (1) is selected as the base, then the vectors ei are represented as unit-n-vectors, but if a different system of n independent vectors is chosen, then the vectors of the new base are represented by unit-n-vectors. For example, let
![]()
be another base of W, where
![]()
then the elements bik belong to K and form a matrix B, with
![]()
On the other hand, (5) implies the independence of the vectors (3), whence (5) is the necessary and sufficient condition for (3) to form a base of W. Every vector of W can be represented as a linear function of the base (1) as well as of the base (3). Compare the coefficients of these functions, i.e., the coordinates of the representing n-vectors:
![]()
As the vectors ei are independent, one has for i = 1, ··· ,
![]()
Again, every n-vector can be represented by a column vector* (cf. 6.13 (3)). Thus the last equation can be replaced by the matrix equation
![]()
whence
![]()
* Instead of using column vectors, one can represent the n-vectors by row vectors (x)T and (y)T which are linked by (x)T = (y)T BT.
From these considerations follows
Theorem 1: Let (1) and (3) be two bases of the same vector space W and be linked by (4). If the same vector is represented by (x) in System (1) and by (y) in System (3), then (x) and (y) are interrelated by (6) and (6').
In particular, when Base (1) is used, the vector ek is represented by the k-th unit vector and bk by the k-th column of the matrix B, whereas when (3) is used, ek is represented by the k-th column of B-1 and bk by the k-th unit vector.
Consider now a linear transformation A of the vector space W. By A, the vectors forming Base (1) are transformed into certain other vectors of W, which can again be represented by the base, say
![]()
then an arbitrary vector x = Saikxkei is transformed into

This equation can be written as a matrix equation
![]()
Hence, using any particular base (1), a linear transformation of W can be expressed by (8). On the other hand, if a matrix A over K is given, (8) determines a linear transformation of W (cf. 1.11). In particular, if det A ¹ 0, the vector space is mapped onto itself and there exists an inverse operation to A which is a linear transformation. If det A = 0, then W is mapped onto a sub-space W', the Rank of which is equal to Rank A. If one uses a different base, say (3), then x is represented as an n-vector (y) and x' by (y'). It follows from (6) and (8) that
![]()
whence follows
Theorem 2: If, under the assumptions of Theorem 1, the linear transformation A of W is expressed by (8) when the vectors are represented with the aid of Base (1), then A is expressed by (9) when the same vectors are represented with the aid of (3).
Again, let (1) denote any n independent vectors of W, then a linear transformation A is uniquely determined by the vector transformation (1). Comparing (7) and (7') and setting aik = ck, one obtains immediately
Theorem 3: If by a linear transformation A a set of n independent vectors (1) is transformed
![]()
then the transformation A is represented by the transposed matrix CT when (1) is used as base.
If in (9), B runs over all the matrices over K of degree and Rank n, then (3) runs over all the bases of W and B-1AB over all matrices similar to A (cf. 6.13). Thus, there does not correspond to a linear transformation A a single matrix A, but the full class of similar matrices. On the other hand, a matrix, say A, does not correspond to a single linear transformation A as it generates different transformations, according to different bases employed. Of course, the matrix A generates A if Base (1) is used; however, if a different base, say, (3) is used, A generates the same linear transformation as is generated by the matrix BAB-1 in connection with Base (1). The matrices, which represent this transformation with the aid of all possible bases, are similar to A. Thus, they form the same class of matrices as those which represent A, whence there is a (1,1) correspondence between the classes of similar matrices and classes of linear transformations, but a correspondence between single matrices and transformations demands the distinction of a particular base. This interconnection shows the importance of the notion of similarity of matrices.
Let A and B be two linear transformations which, with the aid of Base (1) are represented by A and B, respectively. Assume that, when x runs over the vector space W,
![]()
Then one gets linear transformations x ® x3, to be denoted by
![]()
and (for any pair a, b of elements of K) c ® ax1 + bx2 is denoted by
![]()
When Base (1) is used, (10) is represented by BA and (11) by aA+bB, whence the linear transformations of W form a ring by using any particular base; this ring is mapped isomorphically onto R(K,n). In general, different bases furnish isomorphisms. The unit element corresponds to E, whatever base may be in use, whence it is denoted by E.
6.21 Permutations as linear transformations: Let

be a permutation of n objects (cf. 0.3). Then there exists a linear transformation of a vector space of Rank n over K which interchanges correspondingly the vectors of a particular basis e1, ··· , en:
![]()
for k = 1, ··· , n. Then
where,
since ak= j,
![]()
Hence
![]()
where

The matrix P has in every column exactly one
element which is equal to 1, the other elements being
equal to 0. In the k-th column, the element 1
stands in row ak. As the numbers ak take
every value 1, ··· , n once and only once as k runs over 1,
··· , n, there is in every row exactly one element which is
equal to 1. Now, let Q be any matrix of degree n which
shows in every row and in every column exactly one element 1,
all other places being occupied by zero elements; if in the k-th
column, the element 1 stands at bk, then b1,
··· , bn form a permutation of 1, ··· , n. Hence
the permutation
of the base is effected by the linear transformation
(x') = Q(x). Thus, the matrices which have in
every row and in every column one element equal to 1 and
all other elements equal to 0 are exactly those which
effect a permutation of the corresponding base. If one computes
det P as the sum of n! products of elements taken of n different
rows and n different columns, one finds that only one of these
products differs from 0.
![]()
In particular, let P represent a transposition (i,k), then pik = pki = 1 and pjj = 1 for i ¹ j ¹ k, whereas the other elements are zeros. In this case, det P = -1. Consider now 2 permutations of the same base

and, as above, let P and Q be the corresponding matrices. Perform first
![]()
and then
(cf. 0.32), then one gets the permutation
which transforms ![]()
Let (x") = Q(x')
, then
whence (x") = QP(x) and ![]()
Thus, a composition of two permutations corresponds to a composition of the matrices representing them. The theory of permutations appears to be a special case of the theory of matrices; every even (odd) permutation consists of an even (odd) number of transpositions; the determinant of a matrix, corresponding to a transposition, is equal to -1, whence a permutation is even or odd according to whether its determinant is equal to + 1 or - 1.
As a corollary of the preceding considerations, one finds that the two products of two matrices of degree n which have in every row and every column exactly one element equal to 1, the other elements being 0, are matrices of the same type. It readily seen that these n ! matrices form a group (cf. 6.11).
Again consider the matrix P representing the
permutation
of the base- vectors e1,
··· , en. The same transformation of the
vector space is represented by B-1PB if the basis 6.2
(3) is used. In general, this matrix does not represent a
permutation as the vectors of the new base will not be
interchanged by the transformation, but transformed into other
vectors; however, if the two bases differ only by the order of
the vectors, B-1PB represents the permutation of the
vectors
![]()
For example, one can select B m such a way that
the different cycles which constitute the permutation
form
sets of consecutive indices of (6). Let b1,
··· , br form a cycle, then

where

is a matrix of degree r.
If B is arranged by its cycles and the number of the cycles is m, then

where the C's are matrices of the type (7).
Example:

Hence

In order to have the cycles formed by consecutive digits, one has to interchange 2 and 4. Hence put

Then

where

Again, consider (8). lf Gi-1CiGi = Hi, then it follows from 6.13 (4) that

In order to give P a simpler form, it is sufficient to investigate the transformation of the matrices (7) which correspond to a cyclic permutation of the vectors e1, ··· , er. Obviously, the vector e1+ ··· + er is transformed into itself by C. Now, we will investigate whether there exist in the vector space V over the field R of real numbers with the base e1, ··· , er other vectors x ¹ 0 which are transformed into a vector lx, where l is real. Let x = z1e1+···+ zrer , then by (7)
![]()
Hence lr = 1. As l is supposed to be real, there are the two cases

If x ® ± x, then x/z ® ± x/z, whence the arbitrary factor z can be omitted.
Now, two cases have to be considered:
![]()


Let
A be a linear transformation of a vector space
of Rank n over the field R of real numbers and n
independent vectors be interchanged by A, the
permutation being
; suppose
is composed of r= s + t
cycles, where s is the number of odd cycles (which indeed are
even permutations !) and t the number of even cycles. Then one
can represent A by the matrix (7) and obtain by
a further transformation a diagonal system of matrices where the
C's are replaced by H's. After a further transformation which
only interchanges the base vectors, one finds

where the unit matrices Er and Et are of the degree of their subscripts, s £ s is the number of the even cycles of more than two elements and t £ t the number of the odd cycles of more than one element.
As an example, consider a 6-dimensional
Euclidean vector space in which 5-vectors are interchanged by the
permutation
1 as above. Then this
transformation can be expressed by

Give a geometrical interpretation to this result!