# Linear algebra

##### Experimental html version of downloadable textbook, see https://theartofhpc.com
$% mathjax inclusion. \newcommand\inv{^{-1}}\newcommand\invt{^{-t}} \newcommand\bbP{\mathbb{P}} \newcommand\bbR{\mathbb{R}} \newcommand\defined{ \mathrel{\lower 5pt \hbox{{\equiv\atop\mathrm{\scriptstyle D}}}}} \newcommand\macro[1]{\langle#1\rangle} \newcommand\dtdxx{\frac{\alpha\Delta t}{\Delta x^2}} \newcommand\toprule{\hline} \newcommand\midrule{\hline} \newcommand\bottomrule{\hline} \newcommand\nobreak{} \newcommand\multicolumn[3]{#3}$

13.1 : Norms
13.1.1 : Vector norms
13.1.2 : Matrix norms
13.2 : Gram-Schmidt orthogonalization
13.3 : The power method
13.4 : Nonnegative matrices; Perron vectors
13.5 : The Gershgorin theorem
13.6 : Householder reflectors
Back to Table of Contents

# 13 Linear algebra

In this course it is assumed that you know what a matrix and a vector are, simple algorithms such as how to multiply them, and some properties such as invertibility of a matrix. This appendix introduces some concepts and theorems that are not typically part of a first course in linear algebra.

## 13.1 Norms

crumb trail: > norms > Norms

A norm is a way to generalize the concept of absolute value to multi-dimensional objects such as vectors and matrices. There are many ways of defining a norm, and there is theory of how different norms relate. Here we only give the basic concepts.

### 13.1.1 Vector norms

crumb trail: > norms > Norms > Vector norms

A norm is any function $n(\cdot)$ on a vector space $V$ with the following properties:

• $n(x)\geq0$ for all $x\in V$ and $n(x)=0$ only for $x=0$,

• $n(\lambda x)=|\lambda|n(x)$ for all $x\in V$ and $\lambda\in\bbR$.

• $n(x+y)\leq n(x)+n(y)$

For any $p\geq1$, the following defines a vector norm:

$$|x|_p = \sqrt[p]{\sum_i|x_i|^p}.$$

Common norms are the 1-norm, 2-norm, and infinity norm:

• The 1-norm $\|\cdot\|_1$ is the sum of absolute values:

$$\| x\|_1 = \sum |x_i|.$$

• The 2-norm $\|\cdot\|_2$ is the root of the sum of squares:

$$\| x\|_2 = \sqrt{ \sum x^2}.$$

• The infinity-norm $\|\cdot\|_\infty$ norm is defined as $\lim_{p\rightarrow\infty}\|\cdot\|_p$, and it is not hard to see that this equals

$$\|x\|_\infty=\max_i |x_i|.$$

### 13.1.2 Matrix norms

crumb trail: > norms > Norms > Matrix norms

By considering a matrix of size $n$ as a vector of length $n^2$, we can define the Frobenius matrix norm:

$$\|A\|_F=\sqrt{\sum_{i,j}|a_{ij}|^2}.$$

norms}:

$$\|A\|_p=\sup_{\|x\|_p=1}\|Ax\|_p= \sup_x\frac{\|Ax\|_p}{\|x\|_p}.$$

From their definition, it then follows that

$$\|Ax\|\leq\|A\|\|x\|$$

for associated norms.

The following are easy to derive:

• $\|A\|_1=\max_j\sum_i|a_{ij}|$, the maximum column-absolute sum;

• $\|A\|_\infty=\max_i\sum_j|a_{ij}|$, the maximum row-absolute sum.

By observing that $\|A\|_2=\sup_{\|x\|_2=1} x^tA^tAx$, it is not hard to derive that $\|A\|_2$ is the maximal singular value of $A$, which is the root of the maximal eigenvalue of $A^tA$, which is the largest eigenvalue of $A$ is $A$ is symmetric.

The matrix condition number is defined as

$$\kappa(A)=\|A\|\,\|A\inv\|.$$

In the symmetric case, and using the 2-norm, this is the ratio between the largest and smallest eigenvalue ; in general, it is the ratio between the largest and smallest singular value  .

## 13.2 Gram-Schmidt orthogonalization

crumb trail: > norms > Gram-Schmidt orthogonalization

The GS algorithm takes a series of vectors and inductively orthogonalizes them. This can be used to turn an arbitrary basis of a vector space into an orthogonal basis; it can also be viewed as transforming a matrix $A$ into one with orthogonal columns. If $Q$ has orthogonal columns, $Q^tQ$ is diagonal, which is often a convenient property to have.

The basic principle of the GS algorithm can be demonstrated with two vectors $u,v$. Suppose we want a vector $v'$ so that $u,v$ spans the same space as $u,v'$, but $v'\perp u$. For this we let

$$v'\leftarrow v-\frac{u^tv}{u^tu}u.$$

It is easy to see that this satisfies the requirements.

Suppose we have an set of vectors $u_1,\ldots,u_n$ that we want to orthogonalize. We do this by successive applications of the above transformation:

For $i=1,\ldots,n$:
For $j=1,\ldots i-1$:
let $c_{ji}=u_j^tu_i/u_j^tu_j$
For $i=1,\ldots,n$:
update $u_i\leftarrow u_i-u_jc_{ji}$

Often the vector $v$ in the algorithm above is normalized; this adds a line

$u_i\leftarrow u_i/\|u_i\|$

to the algorithm. GS orthogonalization with this normalization, applied to the columns of a matrix, is also known as the QR factorization  .

Exercise Suppose that we apply the GS algorithm to the columns of a rectangular matrix $A$, giving a matrix $Q$. Prove that there is an upper triangular matrix $R$ such that $A=QR$. (Hint: look at the $c_{ji}$ coefficients above.) If we normalize the orthogonal vector in the algorithm above, $Q$ has the additional property that $Q^tQ=I$. Prove this too.
End of exercise

The GS algorithm as given above computes the desired result, but only in exact arithmetic. A computer implementation can be quite inaccurate if the angle between $v$ and one of the $u_i$ is small. In that case, the MGS algorithm will perform better:

For $i=1,\ldots,n$:
For $j=1,\ldots i-1$:
let $c_{ji}=u_j^tu_i/u_j^tu_j$
update $u_i\leftarrow u_i-u_jc_{ji}$

To contrast it with MGS  , the original GS algorithm is also known as CGS  .

As an illustration of the difference between the two methods, consider the matrix

$$A= \begin{pmatrix} 1&1&1\\ \epsilon&0&0\\ 0&\epsilon&0\\ 0&0&\epsilon \end{pmatrix}$$

where $\epsilon$ is of the order of the machine precision, so that $1+\epsilon^2=\nobreak1$ in machine arithmetic. The CGS method proceeds as follows:

• The first column is of length 1 in machine arithmetic, so

$$q_1 = a_1 = \begin{pmatrix} 1\\ \epsilon\\ 0\\ 0 \end{pmatrix} .$$

• The second column gets orthogonalized as $v\leftarrow a_2-1\cdot q_1$, giving

$$v= \begin{pmatrix} 0\\ -\epsilon\\ \epsilon\\ 0 \end{pmatrix} , \quad\hbox{normalized:}\quad q_2 = \begin{pmatrix} 0\\ -\frac{\sqrt 2}2\\ \frac{\sqrt2}2\\ 0 \end{pmatrix}$$

• The third column gets orthogonalized as $v\leftarrow a_3-c_1q_1-c_2q_2$, where

$$\begin{cases} c_1 = q_1^ta_3 = 1\\ c_2 = q_2^ta_3=0 \end{cases} \Rightarrow v= \begin{pmatrix} 0\\ -\epsilon\\ 0\\ \epsilon \end{pmatrix} ;\quad \hbox{normalized:}\quad q_3= \begin{pmatrix} 0\\ \frac{\sqrt2}2\\ 0\\ \frac{\sqrt2}2 \end{pmatrix}$$

It is easy to see that $q_2$ and $q_3$ are not orthogonal at all. By contrast, the MGS method differs in the last step:

• As before, $q_1^ta_3=1$, so

$$v\leftarrow a_3-q_1 = \begin{pmatrix} 0\\ -\epsilon\\ \epsilon\\ 0 \end{pmatrix} .$$

Then, $q_2^tv=\frac{\sqrt2}2\epsilon$ (note that $q_2^ta_3=0$ before), so the second update gives

$$v\leftarrow v- \frac{\sqrt2}2\epsilon q_2= \begin{pmatrix} 0\\ \frac\epsilon2\\ -\frac\epsilon2\\ \epsilon \end{pmatrix} ,\quad\hbox{normalized:}\quad \begin{pmatrix} 0\\ \frac{\sqrt6}6\\ -\frac{\sqrt6}6\\ 2\frac{\sqrt6}6 \end{pmatrix}$$

Now all $q_i^tq_j$ are on the order of $\epsilon$ for $i\not=\nobreak j$.

## 13.3 The power method

crumb trail: > norms > The power method

The vector sequence

$$x_i = Ax_{i-1},$$

where $x_0$ is some starting vector, is called the \indexterm{power method} since it computes the product of subsequent matrix powers times a vector:

$$x_i = A^ix_0.$$

There are cases where the relation between the $x_i$ vectors is simple. For instance, if $x_0$ is an eigenvector of $A$, we have for some scalar $\lambda$

$$Ax_0=\lambda x_0 \qquad\hbox{and}\qquad x_i=\lambda^i x_0.$$

However, for an arbitrary vector $x_0$, the sequence $\{x_i\}_i$ is likely to consist of independent vectors. Up to a point.

Exercise Let $A$ and $x$ be the $n\times n$ matrix and dimension $n$ vector

$$A = \begin{pmatrix} 1&1\\ &1&1\\ &&\ddots&\ddots\\ &&&1&1\\&&&&1 \end{pmatrix} ,\qquad x = (0,…,0,1)^t.$$

Show that the sequence $[x,Ax,\ldots,A^ix]$ is an independent set for $i Now consider the matrix$B$: $$B=\left( \begin{array}{cccc|cccc} 1&1&&\\ &\ddots&\ddots&\\ &&1&1\\ &&&1\\ \hline &&&&1&1\\ &&&&&\ddots&\ddots\\ &&&&&&1&1\\ &&&&&&&1 \end{array} \right),\qquad y = (0,…,0,1)^t$$ Show that the set$[y,By,\ldots,B^iy]$is an independent set for$i
End of exercise

While in general the vectors $x,Ax,A^2x,\ldots$ can be expected to be independent, in computer arithmetic this story is no longer so clear.

Suppose the matrix has eigenvalues $\lambda_0>\lambda_1\geq\cdots \lambda_{n-1}$ and corresponding eigenvectors $u_i$ so that

$$Au_i=\lambda_i u_i.$$

Let the vector $x$ be written as

$$x=c_0u_0+\cdots +c_{n-1}u_{n-1},$$

then

$$A^ix = c_0\lambda_0^iu_i+\cdots +c_{n-1}\lambda_{n-1}^iu_{n-1}.$$

If we write this as

$$A^ix = \lambda_0^i\left[ c_0u_i+c_1\left(\frac{\lambda_1}{\lambda_0}\right)^i+ \cdots +c_{n-1}\left(\frac{\lambda_{n-1}}{\lambda_0}\right)^i \right],$$

we see that, numerically, $A^ix$ will get progressively closer to a multiple of $u_0$, the dominant eigenvector  . Hence, any calculation that uses independence of the $A^ix$ vectors is likely to be inaccurate. Section  5.5.9 discusses iterative methods that can be considered as building an orthogonal basis for the span of the power method vectors.

## 13.4 Nonnegative matrices; Perron vectors

crumb trail: > norms > Nonnegative matrices; Perron vectors

If $A$ is a nonnegative matrix, the maximal eigenvalue has the property that its eigenvector is nonnegative: this is the the Perron-Frobenius theorem  .

Theorem If a nonnegative matrix $A$ is irreducible, its eigenvalues satisfy

• The eigenvalue $\alpha_1$ that is largest in magnitude is real and simple:

$$\alpha_1> |\alpha_2|\geq\cdots.$$

• The corresponding eigenvector is positive.

End of theorem

The best known application of this theorem is the Googe PageRank algorithm; section  9.5  .

## 13.5 The Gershgorin theorem

crumb trail: > norms > The Gershgorin theorem

Finding the eigenvalues of a matrix is usually complicated. However, there are some tools to estimate eigenvalues. In this section you will see a theorem that, in some circumstances, can give useful information on eigenvalues.

Let $A$ be a square matrix, and $x,\lambda$ an eigenpair: $Ax=\lambda x$. Looking at one component, we have

$$a_{ii}x_i+\sum_{j\not=i} a_{ij}x_j=\lambda x_i.$$

Taking norms:

$$(a_{ii}-\lambda) \leq \sum_{j\not=i} |a_{ij}| \left|\frac{x_j}{x_i}\right|$$

Taking the value of $i$ for which $|x_i|$ is maximal, we find

$$(a_{ii}-\lambda) \leq \sum_{j\not=i} |a_{ij}|.$$

This statement can be interpreted as follows:

The eigenvalue $\lambda$ is located in the circle around $a_{ii}$ with radius $\sum_{j\not=i}|a_{ij}|$.

Since we do not know for which value of $i$ $|x_i|$ is maximal, we can only say that there is some value of $i$ such that $\lambda$ lies in such a circle. This is the Gershgorin theorem  .

Theorem Let $A$ be a square matrix, and let $D_i$ be the circle with center $a_{ii}$ and radius $\sum_{j\not=i}|a_{ij}|$, then the eigenvalues are contained in the union of circles $\cup_i D_i$.
End of theorem

We can conclude that the eigenvalues are in the interior of these discs, if the constant vector is not an eigenvector.

## 13.6 Householder reflectors

crumb trail: > norms > Householder reflectors

In some contexts the question comes up how to transform one subspace into another. Householder reflectors a unit vector $u$, and let

$$H= I-2uu^t.$$

For this matrix we have $Hu=-u$, and if $u\perp v$, then $Hv=v$. In other words, the subspace of multiples of $u$ is flipped, and the orthogonal subspace stays invariant.

Now for the original problem of mapping one space into another. Let the original space be spanned by a vector $x$ and the resulting by $y$, then note that

$$\begin{cases} x = (x+y)/2 + (x-y)/2\\ y = (x+y)/2 - (x-y)/2. \end{cases}$$

FIGURE 13.1: Householder reflector.

In other words, we can map $x$ into $y$ with the reflector based on $u=(x-y)/2$.

We can generalize Householder reflectors to a form

$$H=I-2uv^t.$$

The matrices $L_i$ used in LU factorization (see section  5.3  ) can then be seen to be of the form $L_i = I-\ell_ie_i^t$ where $e_i$ has a single one in the $i$-th location, and $\ell_i$ only has nonzero below that location. That form also makes it easy to see that $L_i\inv = I+\ell_ie_i^t$:

$$(I-uv^t)(I+uv^t) = I-uv^tuv^t = 0$$

if $v^tu=0$.

Back to Table of Contents