Homework 7

Table of Contents

The following assignment is due Thursday 10/31 by 11:59 PM. You should submit all your written solutions to Gradescope as a single pdf. Follow the instructions in the programming problem for code solutions.

1. Stochastic Matrices

For each of the following matrices:

  • Draw the state diagram for \(A\). You should label each node with a positive integer for the column of \(A\) which describes its edges.
  • Determine if \(A\) is regular. If it is write down the smallest exponent \(k\) such that \(A^k\) has strictly positive values. If it is not, then justify your answer.
  • Determine the general form solution of the equation \((A - I)\mathbf x = \mathbf 0\). You may use a computer to do this. You answer should be expressed in fractions, not decimals.
  • Determine a steady state vector for \(A\). If the steady state vector is you unique, write the steady state is unique. You must do this by hand and show your work.

1.1. (5 points)

\begin{bmatrix} 0.9 & 0.6 \\ 0.1 & 0.4 \end{bmatrix}

1.2. (5 points)

\begin{bmatrix} 1 & 0.3 & 0.4 \\ 0 & 0.7 & 0.1 \\ 0 & 0 & 0.5 \end{bmatrix}

1.3. (5 points)

\begin{bmatrix} 0 & 0.5 & 0.5 \\ 0.5 & 0 & 0.5 \\ 0.5 & 0.5 & 0 \\ \end{bmatrix}

1.4. (5 points)

\begin{bmatrix} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix}

1.5. (5 points)

\begin{bmatrix} 0 & 0 & 0 & 0.5 \\ 1 & 0 & 0 & 0.3 \\ 0 & 1 & 0 & 0.2 \\ 0 & 0 & 1 & 0 \end{bmatrix}

1.6. (5 points)

\begin{bmatrix} 0.4 & 0.8 & 0.5 & 0 \\ 0.6 & 0.2 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0.5 & 1 \end{bmatrix}

2. Market Prediction

(6 points) Suppose you've been watching the stock price of your favorite company and you've discovered the following trends based on the data you've taken:

  • If the price is STEADY there's a 70% chance it will remain steady the next day, but a 20% chance it will go HIGH and a 10% chance it will dip LOW.
  • If the price is HIGH there's a 55% chance it will remain high the next day, but a 40% change it will go back to being STEADY, and just a 5% chance it will dip all the way down to LOW.
  • If the price is LOW, there is a 60% change it will remain low the next day, but a 30% chance it will reset to STEADY, and a 10% chance it will suddenly jump to HIGH.

Suppose that the price is currently STEADY. In the long term, is more likely to remain STEADY, go up HIGH, or dip down LOW? Justify your answer. You may use a computer to solve any systems you need to solve, and you must present your final answer in fractions, not decimals.

3. LU Factorizations

For each of the following matrices, determine its LU factorization by hand. You must show your work.

3.1. (2 points)

\begin{bmatrix} 2 & 3 \\ 1 & -4 \end{bmatrix}

3.2. (4 points)

\begin{bmatrix} -3 & 0 & -6 & -9 \\ 0 & 2 & 4 & -4 \\ 3 & 2 & 10 & 9 \\ -3 & 2 & -2 & -9 \end{bmatrix}

3.3. (4 points)

Without doing any row operations, determine the LU factorization for the matrix obtained by applying the row operations

\begin{align*} R_3 &\gets R_3 - 2R_2 \\ R_3 &\gets R_3 + R_1 \\ R_2 &\gets R_2 + R_1 \\ \end{align*}

to the matrix

\begin{bmatrix} 2 & 8 & -4 & 1 \\ 0 & 0 & 3 & 2 \\ 0 & 0 & 0 & -7 \end{bmatrix}

Justify your answer.

4. Sparse Factorization

In this problem, you will use the provided code in sparse.py to benchmark LU factorization and matrix inversion.

4.1. (2 points)

The provided code includes a function test_matrix which builds a matrix given a positive integer parameter. Write down the matrix test_matrix(4).

4.2. (4 points)

Determine the number of \(\textcolor{red}{\textit{nonzero}}\) entries in the inverse of test_matrix(10 ** 3). Then use scipy.linalg.lu to determine the LU factorization of test_matrix(10 ** 3) and determine the number of entries in \(L\) and \(U\). Read the linked documentation on scipy.linalg.lu for more details (you can ignore the permutation matrix in the given factorization). Report these three entry-counts in your solution.

4.3. (4 points)

Uncomment the code at the end of the file sparse.py and run it. Write down the printed values and discuss what they mean. In particular, is the amount of time that it takes to factor versus invert consistent with our discussion in lecture? (If you are running on a fast computer you may have to change \(n\) to \(15^3\) to see the difference).

5. ChatMarkov (Programming)

(12 points) Large language models like GPT-4 use fancy techniques to predict what word or sequence of words should follow a given piece of text. It’s possible to build a very unfancy version of this using random walks. Given a corpus of text, we can build a state diagram whose nodes are words, and whose edges indicate word adjacency in the corpus weighted proportionally to the number of occurrences of that adjacent pair. So words which tend to be next to each other will have larger weights on their edges.

As an example, suppose we used the statement

a dog and a cat and a bird

as our corpus. We can build a state diagram with nodes for each word

V = ["a", "dog", "and", "cat", "bird"]

and with edges for each pair of adjacent words

E = [("a", "dog"), ("dog", "and"), ("and", "a"), ("a", "cat"), ...]
W = [1/3         , 1             , 1           , 1/3         , ...]

weighted proportionally to the number of adjacencies relative to other adjacencies with the same starting word.

Rather than using a single statement for our corpus, we’ll use the entirety of Shakespeare’s sonnets to generate new (nonsense) sonnets by taking a random walk on this graph and collecting words along the way.

The process is roughly as follows:

  • Read in the text of Shakespeare’s sonnets and build the stochastic matrix for the state machine described above.
  • Perform a random walk on this matrix, keeping track of the nodes you’ve visited so far.
  • Use that list of nodes to generate a list of words which will make up the poem. Then format that poem so that it can be nicely printed.

You are given starter code in the file hw07.py. Don’t change the name of this file when you submit. Also don’t change any of the names of functions included in the starter code. The only changes you should make are to fill in the TODO items in the starter code. Most of the above process in implemented for you. All together you have to fill in two functions:

  • to_stochastic which converts a list of word pairs into a stochastic matrix as described above (Hint. Make sure to look at and understand numpy.sum).
  • random_walk which performs a random walk of a given length and returns the list of nodes visited. There is already an implementation of a single random step in the function random_step. You should use this function and collect the outputs into a list.

See the docstrings in the starter code for more details. Once you've completed this, you should be able to run your code using python3 hw07.py and it should print a (unhinged) poem for you:

from fairest wights, and so thine
own, in one of my sinful
then, and suffer dearth, painting thy
shadow’s form another; whose worth’s unknown,
although their treasure. so oft when
clouds to trust; enjoy’d no name,
blesses an older friend, you most
precious friends possess’d, desiring this more.
then, if thou leave? thy lovers
gone, save what they see thy
mind. these poor beauty of nature’s
riches from those errors hath her
pipe in my soul of faults
should despair, which shall stay. no
shape so suited, and straight will...

You will upload a single file hw07.py. You will also be provided with a text file sonnets.txt which contains the text of the entirety of Shakespeare’s sonnets. You do not need to upload this when you submit, you just need to upload hw07.py.