Echelon Forms
Table of Contents
In the next chapter, we'll look at Gaussian elimination, an algorithm which can be used to solve systems of linear equations. This is arguably the most important topic for the first part of the course.
That said, I think the best way to understand Gaussian elimination is from the punchline: Gaussian elimination converts any matrix into a unique row-equivalent matrix in row-reduced echelon form (RREF). RREF matrices are interesting because it's very easy to "read off" their solutions, no matter whether there are no solutions, one unique solution, or infinitely many solutions.1 In other words, RREF matrices constitute the "final matrices" that "represent solutions" which were hinted at in the previous chapter.
So before discussing how to solve systems of linear equations in general, we'll take one more scheduled detour to look at "solving" this particularly simple class of linear systems, i.e., those with RREF augmented matrices.2
Recap: Linear Systems with Unique Solutions
We've already seen what the "final matrix" look like when we're trying to solve a linear system in \(n\) variables with unique solution. It's an \(n \times (n + 1)\) matrix of the form:
\begin{bmatrix} 1 & 0 & \dots & 0 & b_1 \\ 0 & 1 & \dots & 0 & b_2 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \dots & 1 & b_n \end{bmatrix}which represents the linear system
\begin{align*} x_1 &= b_1 \\ x_2 &= b_2 \\ &\vdots \\ x_n &= b_n \end{align*}which has a unique solution \((b_1, b_2, \dots, b_n)\). In other words, it's a system whose coefficient matrix has \(1\) entries along the diagonal and \(0\) entries everywhere else.3 The point: it's very easy to "read off" what the solution is in this system, so if we can reduce a matrix to this form, then we can read off its unique solution easily.
But not all matrices can be reduced to a form like this. First off, the shape might not be right: if the augmented matrix of a system is \(2 \times 7\), we can't somehow end up with a \(2 \times 3\) or a \(6 \times 7\) matrix by row reductions.4 Secondly, not all matrices have unique solutions.5 This leaves us with the following questions:
- Can we characterize the matrices for which it's easy to "read off" their solutions?
- What do these matrices look like if there are no solutions?
- If there are infinitely many solutions, how do we characterize all possible solutions?
Echelon Form
We'll actually introduce two kinds of matrices, echelon form matrices and row-reduced echelon form (RREF) matrices (the reason will hopefully become clear, but I ask that you suspend your disbelief for now). Row-reduced echelon form is more restrictive than just echelon form; RREF matrices are also in echelon form, but not all matrices in echelon form are RREF matrices.6
Aside. Echelon is not a name (as many people, including myself, think when they first encounter this topic), it's a military term which describes a formation in which each successive unit in the formation is behind and to one side of the unit in front of it.
This imagery is an incredibly useful visual pun for understanding how echelon forms look.
First we have to set up some machinery. This is just a math-y formalization of exactly the description of the above aside.
Terminology. The leading entry of a row in a matrix its first nonzero entry.
Example. The leading entries of each row in the following matrix are underlined. The third row does not have a leading entry.
\begin{bmatrix} 0 & \underline{1} & -1 & 2 \\ \underline{1} & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & \underline{4} & 0 \end{bmatrix}
Definition. A matrix is in echelon form if
- no all-zero rows appear above a row with a leading entry
- if a row has a leading entry and it is not the first row, then it appears to the right of the leading entry in the row above it
Example. The matrix from the previous example is not in echelon form. It has a all-zeros row which appears above a row with a leading entry, and the leading entry of the second row appears to the left of the leading entry in the first. But it is equivalent to a matrix in echelon form. Applying the transformations \(R_1 \leftrightarrow R_2\) and \(R_3 \leftrightarrow R_4\) gives us the matrix
\begin{bmatrix} \underline{1} & 0 & 1 & 0 \\ 0 & \underline{1} & -1 & 2 \\ 0 & 0 & \underline{4} & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}which satisfies the property that the leading entries create a kind of "cascading triangular" shape, with the all-zero rows at the end.
The other visual pun that tends to be used to understand echelon form is something like the following:
\begin{bmatrix} 0&\blacksquare&*&*&*&*&*&*&*&*\\ 0&0&0&\blacksquare&*&*&*&*&*&*\\ 0&0&0&0&\blacksquare&*&*&*&*&*\\ 0&0&0&0&0&\blacksquare&*&*&*&*\\ 0&0&0&0&0&0&0&0&\blacksquare&*\\ 0&0&0&0&0&0&0&0&0&0\\ 0&0&0&0&0&0&0&0&0&0\\ \end{bmatrix}Here the \(\blacksquare\) entries represent nonzero entries and the \(*\) entries represent any number whatsoever. All the black squares appear to the right and below the black squares above it (where applicable) and the all-zero rows are at the bottom. If a matrix has this general "shape" then it is in echelon form.
Exercise. Convert the following matrix into echlon form by a short sequence of row reductions.
\begin{bmatrix} 0 & 0 & 0 & 0 & 2 & 1 \\ 0 & 1 & 2 & 3 & 0 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 \\ 2 & 2 & 2 & 2 & 2 & 2 \end{bmatrix}
Given an echelon form, its not yet easy to "read off" a solution if there is one. Take for example:
\begin{bmatrix} 1 & 4 & 3 & -3 \\ 0 & 2 & 6 & -1 \\ 0 & 0 & 3 & 1 \end{bmatrix}Can you tell immediately that \((2, -1.5, 0.\overline{3})\) is a solution?
Where echelon forms are interesting is in the case of inconsistent systems. Consider the simple 2D system
\begin{align*} 2x + 3y = 5 \\ -4x - 6y = 0 \\ \end{align*}which has the augmented matrix
\begin{bmatrix} 2 & 3 & 5 \\ -4 & -6 & 0 \end{bmatrix}which is row-equivalent to the matrix (by the operation \(R_2 \gets R_2 + 2R_1\))
\begin{bmatrix} 2 & 3 & 5 \\ 0 & 0 & 10 \end{bmatrix}which is in echelon-form (!), and which is represents the system
\begin{align*} 2x + 3y &= 5 \\ 0 &= 5 \end{align*}which is obviously inconsistent (\(0\) will never be equal to \(5\), ever), which means we now know we no longer have to do any reductions (which is great).
This turns out to be a general feature of echelon forms.
Theorem. Let \(A\) be the augmented matrix of an inconsistent linear system. If \(A \sim B\) and \(B\) is in echelon form, then the rightmost column of \(B\) contains a leading entry.
This leading entry represents the "obviously inconsistent" linear equations which equates \(0\) to an nonzero value (like 5). And its important to read this correctly: every echelon form to which \(A\) is equivalent has this property. Since we're often interested in these equivalent matrices, we'll call them, in a bit of terminological abuse, the echelon forms of A.
Thus, we will think of the general echelon form as a "pit-stop" we can make on our way to a RREF matrix (just as a reminder Gaussian elimination is about converting a matrix to an equivalent RREF matrix).
And if all we care about is characterizing the solutions, we can stop here!7
Row-Reduced Echelon Form
Our next goal is to further restrict the notion of echelon form to a form for which it is easy to "read off" solutions. This time, we start with the visual pun:
\begin{bmatrix} 0&1&*&0&0&0&*&*&0&*\\ 0&0&0&1&0&0&*&*&0&*\\ 0&0&0&0&1&0&*&*&0&*\\ 0&0&0&0&0&1&*&*&0&*\\ 0&0&0&0&0&0&0&0&1&*\\ 0&0&0&0&0&0&0&0&0&0\\ 0&0&0&0&0&0&0&0&0&0\\ \end{bmatrix}Again, \(*\) entries can be any number whatsoever, so this form differs in that
- leading entries are \(1\)
- there are \(0\) entries above every leading entry (not just below)
And that's all there is to it.
Definition. A matrix is in row-reduced echelon form (RREF) if
- it is in echelon form
- every leading entry is \(1\)
- every column containing a leading entry has \(0\) entries everywhere else
Example. One answer to the exercise above is
\begin{align*} \begin{bmatrix} 0 & 0 & 0 & 0 & 2 & 1 \\ 0 & 1 & 2 & 3 & 0 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 \\ 2 & 2 & 2 & 2 & 2 & 2 \end{bmatrix} \sim \begin{bmatrix} 1 & 1 & 1 & 1 & 1 & 1 \\ 0 & 1 & 2 & 3 & 0 & 1 \\ 0 & 0 & 0 & 0 & 2 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix} \end{align*}The matrix on the right is not quite in row-reduced echelon form. The leading entry of the third row is not \(1\) and there is a nonzero entry above it. But via a sequence of row operations, we can get to the matrix
\begin{bmatrix} 1 & 0 & -1 & -2 & 0 & -0.5 \\ 0 & 1 & 2 & 3 & 0 & 1 \\ 0 & 0 & 0 & 0 & 1 & 0.5 \\ 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix}which is an RREF matrix.
Exercise. Determine a sequence of row operations for the previous example.
Before moving on, let's think about why RREF matrices are useful. It's clear that the RREF matrix in the previous example does not have a form like the one at the beginning of the chapter which represented a unique solution. If we take this matrix and write it as a system of linear equations we get:
\begin{align*} x_1 - x_3 - 2x_4 &= -0.5 \\ x_2 + 2x_3 + 3x_4 &= 1 \\ x_5 &= 0.5 \\ 0 &= 0 \end{align*}The third row has a familiar form, we get a value for \(x_5\) in any solution to this system. But in the case of the other variables, we're not able to completely isolate them. For example, \(x_1\) is not a fixed value in any solution but has a relationship with \(x_3\) and \(x_4\). Same with \(x_2\).
The (kind of magical) point: All the leading entries in each row are in fact isolated, in the sense that they can be written in terms of all the variables which are not in leading positions.
In this example, if we isolate \(x_1\) and \(x_2\) (and ignore the extraneous \(0 = 0\)) we get:
\begin{align*} x_1 &= -0.5 + x_3 + 2x_4 \\ x_2 &= 1 - 2x_3 - 3x_4 \\ x_5 &= 0.5 \end{align*}We interpret this as meaning that no matter what values we give to \(x_3\) and \(x_4\), once we fix those values, we can derive a solution to the above system.
So \((-0.5, 1, 0, 0, 0.5)\) is a solution, but so is \((0, 0, 0.5, 0, 0.5)\). In the first case, we've set \(x_3 = x_4 = 0\) and in the second we've set \(x_3 = 0.5\) and \(x_4 = 0\).
Exercise. Write down the system whose augmented matrix is
\begin{bmatrix} 0 & 0 & 0 & 0 & 2 & 1 \\ 0 & 1 & 2 & 3 & 0 & 1 \\ 1 & 1 & 1 & 1 & 1 & 1 \\ 2 & 2 & 2 & 2 & 2 & 2 \end{bmatrix}and verify that the above two solutions are, in fact, solutions to this system.
Again, I want to emphasize that this is a special feature of RREF matrices. If we instead had the matrix (which is not RREF)
\begin{bmatrix} 1 & 1 & -1 & -2 & 0 & -0.5 \\ 0 & 1 & 2 & 3 & 0 & 1 \\ 0 & 0 & 0 & 0 & 1 & 0.5 \\ 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix}then trying to isolate the leading terms as above would yield:
\begin{align*} x_1 &= -0.5 - \underline{x_2} + x_3 + 2x_4 \\ \underline{x_2} &= 1 - 2x_3 - 3x_4 \\ x_5 &= 0.5 \end{align*}and now there's a more complex relationship between \(x_1\) and \(x_2\) because \(x_2\) appears on the LHS and RHS of equals signs.
The other important fact of RREF matrices is that every matrix is equivalent to (a unique) one.
Theorem.
- No two distinct RREF matrix are equivalent.
- Every matrix is equivalent to an RREF matrix.
Exercise. Convince yourself that this implies every matrix is equivalent to exactly one RREF matrix.
What's more, Sympy has a convenient function for getting the unique RREF of a matrix!
from sympy import * A = Matrix([ [1, 1, 1, 1], [0, -2, 1, -1], [3, 1, -3, 3] ]) pprint(A.rref()) print() pprint(A.rref()[0])
⎛⎡1 0 0 5/7 ⎤ ⎞ ⎜⎢ ⎥ ⎟ ⎜⎢0 1 0 3/7 ⎥, (0, 1, 2)⎟ ⎜⎢ ⎥ ⎟ ⎝⎣0 0 1 -1/7⎦ ⎠ ⎡1 0 0 5/7 ⎤ ⎢ ⎥ ⎢0 1 0 3/7 ⎥ ⎢ ⎥ ⎣0 0 1 -1/7⎦
The second argument argument holds the indices of the pivot columns of
the matrix. We won't use these for now, but they may be useful to
have in the future. Just remember that if you want to use a.rref()
to grab the first element as in a.rref()[0]
.
General form solutions
Taking stock, solving a general system of linear equations will follow this outline:
- Write your system as an augmented matrix
- Use Gaussian elimination to convert this matrix into an equivalent RREF matrix
- Read off the solution from the RREF matrix8
Our goal now is to deal with this last step, formalizing the process from the end of the last section. We begin with some terminology.
Terminology. The position of a leading entry in an echelon form of a matrix is called a pivot position. A column with a pivot position is called a pivot column.
We won't worry too much about why they're called pivot positions, suffices to say for now that it has to do with some of the nitty-gritty details of Gaussian elimination.
Note also that we're interested in the pivot positions/columns of the original matrix, not just a matrix in echelon form.
Example. The pivot positions of both equivalent matrices are underlined.
\begin{align*} \begin{bmatrix} \underline{0} & 0 & 0 & 0 & 2 & 1 \\ 0 & \underline{1} & 2 & 3 & 0 & 1 \\ 1 & 1 & 1 & 1 & \underline{1} & 1 \\ 2 & 2 & 2 & 2 & 2 & 2 \end{bmatrix} \sim \begin{bmatrix} \underline{1} & 1 & 1 & 1 & 1 & 1 \\ 0 & \underline{1} & 2 & 3 & 0 & 1 \\ 0 & 0 & 0 & 0 & \underline{2} & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix} \end{align*}
We want to have a uniform way of writing down the solution sets of a given linear system. There are a couple ways to do this, we'll use what's called general form also sometimes called parametric form.
Warning. In what follows, we assume we're working with an augmented matrix. If the matrix is not the augmented matrix of a system, then the terminology doesn't make sense.
Writing a general form solution is about isolating a collection of variables (called basic variables) and writing them in terms of the remaining variables (called free variables, becuase we're "free" to choose their values when we want to find a specific solution to a system). It's a way of describing not just single solutions, but solution sets "paramatrized" by choices of values for a collection of varaibles. With the terminology we have, we can formalize this a bit (and introduce more terminology).
Terminology. A variable is basic if its corresponding column is a pivot column. Otherwise it is called free.
Example. In the previous example, the basic variables are \(x_1\), \(x_2\), and \(x_5\) since the first, third, and fifth columns have leading entries. The remaining variables (\(x_3\) and \(x_4\)) are free variables.
At this point, there isn't much left to do but state everything we've said up to now in a concise way (and get lots of practice with the concepts).
HOW-TO. Writing a general-form solution from an RREF matrix
- Write the RREF as a system of linear equations.
- Drop any extraneous \(0 = 0\) equations. If the system includes the equation \(0 = 1\), then just write no solution. Otherwise continue.
- Isolate the leading terms of each equation, writing each leading term as an equation of the free variables.
- Write \(x_i\) is free for the remaining variables.9
Example. We come back to this RREF matrix:
\begin{bmatrix} 1 & 1 & -1 & -2 & 0 & -0.5 \\ 0 & 1 & 2 & 3 & 0 & 1 \\ 0 & 0 & 0 & 0 & 1 & 0.5 \\ 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix}
We write this as the system:
\begin{align*} x_1 - x_3 - 2x_4 &= -0.5 \\ x_2 + 2x_3 + 3x_4 &= 1 \\ x_5 &= 0.5 \\ 0 &= 0 \end{align*}We drop the extranous \(0 = 0\) equation:
\begin{align*} x_1 - x_3 - 2x_4 &= -0.5 \\ x_2 + 2x_3 + 3x_4 &= 1 \\ x_5 &= 0.5 \\ \end{align*}We isolate the leading terms in each row:
\begin{align*} x_1 &= -0.5 + x_3 + 2x_4 \\ x_2 &= 1 - 2x_3 - 3x_4 \\ x_5 &= 0.5 \end{align*}We add lines which tell us which variables are free (which variables on which \(x_1\), \(x_2\), and \(x_5\) depend:
\begin{align*} x_1 &= -0.5 + x_3 + 2x_4 \\ x_2 &= 1 - 2x_3 - 3x_4 \\ x_3 &\quad \text{is free} \\ x_4 &\quad \text{is free} \\ x_5 &= 0.5 \end{align*}
Again, we can get a specific solution by choosing values for the free variables (\(x_3\) and \(x_4\)) and calculating the values of the basic variables.
Example. If we take \(x_3 = 1\) and \(x_4 = 1\), then we get the solution:
\begin{align*} x_1 &= -0.5 + 1 + 2 = 2.5 \\ x_2 &= 1 - 2 - 3 = -4 \\ x_3 &= 1 \\ x_4 &= 1 \\ x_5 &= 0.5 \end{align*}or \((-3.5, -2, 1, 1, 0.5)\) written as a point in \(\mathbb R^5\).
Exercise. Write the general-form solution for the following RREF matrix (and convince yourself that it is, in fact, RREF)
\begin{bmatrix} 1 & 2 & 0 & -2 & 4 \\ 0 & 0 & 1 & 3 & 5 \\ 0 & 0 & 0 & 0 & 0 \end{bmatrix}Then write three distinct solutions to this system by choosing particular values for the free variables.
Footnotes:
And since the resultant matrix is row-equivalent to the starting matrix, these solutions are exactly the solutions to the starting linear system.
"Solving" is in quotes because this class represents, in essence, "trivial" linear systems.
This matrix, call the \(n \times n\) identity matrix, will become very familiar to us.
It may be obvious, but it's worth noting explicitly that the row operations never change the shape of the matrix
Perhaps less obvious, this implies that a system with a \(2 \times 7\) augmented matrix cannot have a unique solution.
We'll also often write "reduced echelon form" (dropping the "row-" part).
We often still want an RREF matrix, even if the matrix represents an inconsistent system, it depends on the application.
Again, in the case of inconsistent systems, Gaussian elimination, depending on the implementation, may stop at just echelon form because inconsistency can already be determined.
Typically we maintain the order of variables in a general-form solutions, so we would usually write \(x_2\) is free on the second line of a general-form solution.