Gaussian Elimination

Table of Contents

We arrive at the proverbial main event that we've been so unapologetically forward referencing up to now. I'd like to preface what follows with a couple personal opinions about learning this material.

So, as they say, without further ado…

Example. One more interruption. Let's start with a visualization. Gaussian elimination is one of those algorithms which is simpler in how it looks when it runs than when it's written down in code. Step through this a couple times, see if you can anticipate where we're going.

The Algorithm

As a reminder, the goal of Gaussian elimination is, given any matrix, convert it to its unique equivalent reduced-echelon form. A bit more pseudocode-y, we're defining a function with the following input/output behavior.

FUNCTION GE(A):
  # INPUT: m × n matrix A
  # OUTPUT: equivalent m × n RREF matrix
  ...

We've already set the groundwork for defining Gaussian elimination. It has two phases:

  • a forward eliminiation phase which converts a matrix to (just) echelon form
  • a backwards substitution phase which converts a (just) echelon form matrix to a RREF

In essence, all we're doing is codifying and generalizing the process we used to solve systems of linear equations in our first chapter.

FUNCTION fwd_elim(A):
  # INPUT: m × n matrix A
  # OUTPUT: equivalent m × n echelon form matrix
  ...

FUNCTION back_sub(A):
  # INPUT: m × n echelon form matrix A
  # OUTPUT: equivalent m × n RREF matrix
  ...

FUNCTION GE(A):
  RETURN back_sub(fwd_elim(A))

If we want to short-circuit before back substitution in the case of inconsistent systems (if all we care about is consistency, we can determine that from any echelon form), we can rewrite Gaussian elimination to include this.

FUNCTION GE'(A):
  # INPUT: m × n matrix A
  # OUTPUT: equivalent m × n echelon form matrix A,
  #         which is RREF if A represents a consistent system
  A ← fwd_elim(A)
  if [A has a leading entry in its last column]:
    RETURN A
  else:
    RETURN back_sub(A)

We'll stick with the first version here. Most of the remaining work then is dealing with these two phases.

Forward Elimination

This phase, in rough terms, follows what we did for small systems. We incrementally eliminate leading variables from all the equations below a given equation, going from top to bottom.

That is, we eliminate the first variable from all but the first equation, then the second variable from all but the first and second, then the third from all but the first, second and third, etc. etc.

The only detail we need to be weary about for now is the fact that variable \(i\) may not appear in the equation \(i\). So we may need to swap rows in order to make sure equation \(i\) has variable \(i\).

But that's a bit of a lie too. Variable \(i\) may not appear in any of the remaining equations. Then we need to make sure that the equation \(i\) has a leftmost leading entry. This is so, as we eliminate variables, we maintain the requirements of echelon form (i.e., that the new leading entry of every row is to the right of the one above it).

We can write this as the following psuedocode. Do your best to understand what's going on here, how this maps onto the rough description above.

FUNCTION fwd_elim(A):
  # INPUT: m × n matrix A
  # OUTPUT: equivalent m × n echelon form matrix
  FOR [i from 1 to m]:
    IF [rows i...m are all-zeros]:
      RETURN A
    ELSE:
      (j, k) ← [position of leftmost entry in the rows i...m]
      [swap row i and row j]
      FOR [l from i + 1 to m]:
        [zero out A[l, k] using a replacement operation]
  RETURN A

Example. Coming back to the matrix from the top of this chapter, we can split the process given and isolate the forward elimination part.

This pseudocode can then be pretty readily converted into Python code which works on SymPy matrices. Note that this code differs from the psuedocode in that it mutates the matrix \(A\) and does not return it.

from sympy import *

def leftmost_nonzero(A, curr_row):
    for j in range(A.cols):
        for i in range(A.rows):
            if not A[i, j].is_zero:
                return (i + curr_row, j)
    return

def fwd_elim(A):
    for i in range(A.rows):
        if A[i:,:].is_zero_matrix:
            return
        (j, k) = leftmost_nonzero(A[i:,:], i)
        A[i,:], A[j,:] = A[j,:], A[i,:] # SWAP ROWS
        for l in range(i + 1, A.rows):
            A[l,:] -= A[l, k] / A[i, k] * A[i,:] # ZERO OUT A[l, k]
    return

A = Matrix([
    [1, 1, 1, 1],
    [2, 0, 3, -1],
    [3, 1, -3, 3]
])
fwd_elim(A)
pprint(A)
⎡1  1   1   1 ⎤
⎢             ⎥
⎢0  -2  1   -3⎥
⎢             ⎥
⎣0  0   -7  3 ⎦

Exercise. Find an echelon form of the following matrix.

\begin{bmatrix} 0 & 1 & 2 & 2 \\ 1 & 3 & 7 & 8 \\ 1 & -2 & -3 & 2 \end{bmatrix}

Back Substitution

You've probably guessed it by now, but back substitution also goes essentially how it went when we solved small linear systems by hand.

Once we have a matrix in echelon form, the only thing we need to do to make it RREF is:

  • divide every row by its leading entry (so that the leading entry becomes 1)
  • zero out the entries above leading entries

This phase is a fair amount simpler than the elimination phase (both conceptually and in terms of running time). In psuedocode it looks something like this:

FUNCTION back_sub(A):
  # INPUT: m × n echelon form matrix
  # OUTPUT: equivalent m × n RREF matrix
  FOR [i from 1 to m]:
    IF [row i has a leading entry]:
      j ← index of leading entry of row i
      R_i(A) ← R_i(A) / A[i, j]            # DIVIDE BY THE LEADING ENTRY
      FOR [k from 1 to i - 1]:
        R_k(A) ← R_k(A) - R[k, j] * R_i(A) # ZERO OUT R[k, j] ABOVE THE LEADING ENTRY
  RETURN A

Example. Coming again back to our matrix from the top, we can take a look second part of the process, which is the back substitution phase.

Exercise. Find the RREF of the following matrix (the same one as in the previous exercise).

\begin{bmatrix} 0 & 1 & 2 & 2 \\ 1 & 3 & 7 & 8 \\ 1 & -2 & -3 & 2 \end{bmatrix}

Exercise. Implement back_sub as a function in Python which mutates SymPy matrices.

And that's it. If this were an algorithms class we'd dwell much more on this. Instead we'll use this as the basis of an intuition that we will build over time for solving systems of linear equations by hand. We'll also come back to all this when we talk about NumPy and floating-point error.

Example. Here is a "typical" run of Gaussian elimination. Try to follow along and internalize the rough order of operations. I recommend just stepping through this a couple times, getting a sense for it.

Example. It may also help to see Gaussian elimination in the simplest case, when a matrix is row equivalent to the identity matrix, and the order of row operations follows a very regular pattern.

Using Sympy

As a reminder, outside of exams, the real computer-science-y way to get the RREF of a matrix is to use SymPy:3

from sympy import *
A = Matrix([
    [1, 1, 1, 1],
    [0, -2, 1, -1],
    [3, 1, -3, 3]
])

pprint(A.rref())
print()
pprint(A.rref()[0])
⎛⎡1  0  0  5/7 ⎤           ⎞
⎜⎢             ⎥           ⎟
⎜⎢0  1  0  3/7 ⎥, (0, 1, 2)⎟
⎜⎢             ⎥           ⎟
⎝⎣0  0  1  -1/7⎦           ⎠

⎡1  0  0  5/7 ⎤
⎢             ⎥
⎢0  1  0  3/7 ⎥
⎢             ⎥
⎣0  0  1  -1/7⎦

Again, the second argument argument holds the indices of the pivot columns of the matrix. Just remember that if you want to use a.rref() to grab the first element as in a.rref()[0].

Footnotes:

1

This is a presumption for students at BU taking this course.

2

I like to think of Gaussian elimination as an informal process that I can imagine in my minds eye, the way one might imagine a web-crawler traversing page links, or an knitting machine constructing a fabric, without knowing how it works exactly.

3

Of course, on assignments, you may be asked to show your work, and this won't be much help, but you can use it to check your answer.