CSCI 150: PreLab 5

The Game of Life
Due: 9AM on Wednesday, March 1 (in the back of the classroom King 106)

In this prelab you will formulate some of the ideas necessary to complete Lab 05. Please type your solutions, and hand in a paper copy before the beginning of lecture on Wednesday. Remember, no late prelabs allowed!

Watching

Now that you've done some basic image manipulation, here's a neat video of some relatively recent research on content-aware image rescaling. It's worth noting that by the end of this semester, you'll know enough to implement this yourself (although Python is unfortunately too slow with images for this sort of application in real-time).

1. Did watch the video? (Yes or no will suffice.)

List Practice

2. Write a chunk of code that creates a list containing the first 30 Fibonacci numbers (1, 1, 2, 3, 5, 8, and so on). Don't use any variables other than the list itself, and an index variable (i.e. don't duplicate the code we developed earlier to compute the Fibonacci numbers -- use the list!).

3. Write a chunk of code that prints the second largest integer in a list A of integers. You may assume A has already been created and assigned values, and that all values are distinct. Don't use any Python list methods (like sort), and don't change the list. You should be able to find the second largest integer in a single pass through the list.

4. Write pseudocode for an algorithm that counts the number of pixels of every possible red value in a Picture. That is, the algorithm keeps track of how many pixels have a red value of 0, how many have a red value of 1, and so on, up to how many have a red value of 255.

Looking for a Match

As you may know, proteins are chains of molecules called amino acids. There are 20 amino acids, each of which is typically represented by a single letter, and any protein can be specified by its sequence of amino acids. This sequence determines the properties of the protein, including its 3D structure.

Left: A general amino acid (structure of R determines the particular amino acid).
Right: 3D structure of a protein. Image source: wikipedia.org.

When a new protein is found, one way in which we might attempt to guess the functionality of that protein would be to see if it contains certain markers common to a known class of proteins. For example (and an entirely bogus example at that), suppose we discover a new protein, that we've named Duane, with the following amino acid sequence:

STTECQLKDNRAWTSLFIHTGHTECA

We may also suspect that Duane might belong to one of two possible classes of proteins: Spiffs and Blorts. As you well know, most Spiffs contain the pattern TECQRKMN or at least something close to it. That is, most of the sequences in the class of Spiff proteins have the subsequence TECQRKMN with only a few of the letters changed. Blorts, meanwhile, have the pattern ALFHHTTGT, or something very similar.

In this case, we can deduce that Duane is most likely a Spiff: Duane contains the pattern TECQLKDN which only has 2 mismatches from TECQRKMN (the errors are marked with a ^ below).

TECQLKDN
TECQRKMN
    ^ ^ 
                

The closest pattern to the Blort sequence is

SLFIHTGHT
ALFHHTTGT
^  ^  ^^ 
                

which has 4 mismatches.

Describe the Problem:

The problem you will be solving on your lab is as follows.

Input: A file that contains a string s representing a protein sequence, along with some number of strings, each representing a marker sequence.
Goal: For each marker sequence, find its best match in the protein sequence and report its location and the number of errors in the match.

Understand the Problem:

The file test.txt is in the format you should expect for your input. In particular, the first line will always contain the protein sequence. Following the protein sequence will be some number of marker sequences. For each of these sequences, you should report the location of the best match, and the number of errors at that location.

For example, the contents of test.txt are as follows:

STTECQLKDNRAWTSLFIHTGHTECA
TECQRKMN
ALFHHTTGT
TTECQ
HT
ZZZ
TTZZZRAWT
                      
For this file your program should have something like the following output:

Example Output

  Sequence 1 has 2 errors when aligned at position 2.
  Sequence 2 has 4 errors when aligned at position 14.
  Sequence 3 has 0 errors when aligned at position 1.
  Sequence 4 has 0 errors when aligned at position 18.
  Sequence 5 has 3 errors when aligned at position 0.
  Sequence 6 has 5 errors when aligned at position 5.
                        

Design an Algorithm:

In order to solve this problem, you will need to figure out how to find the best match between a single marker sequence and the original protein sequence.
5. Write pseudocode for an algorithm that finds the best fit for a given subsequence to a given protein. Assume you have a String or (if you'd prefer) an list of characters P (representing the protein) and a shorter String or list of characters S (representing the marker subsequence). Your pseudocode algorithm should find the location at which to align the subsequence with the protein so as to minimize the number of mismatches. You should print both the starting index of the best alignment as well as the number of mismatches at that location. If there are multiple best alignments, use the earliest appearing one.

For example, if P contains the characters

STTECQLKDNRAWTSLFIHTGHTECA

and S contains the characters

ALFHHTTGT

then your algorithm should print

Best match site: 14
Mismatches: 4

since the best match occurs starting with the 14th element in P and has 4 mismatches, as discussed above.

The Game of Life

The Game of Life, created by mathematician John Conway, is what is referred to as a cellular automaton. It is a set of rules rules which are used to generate patterns that evolve over time. Despite the name, the Game of Life is not a game; it is really a simulation. In some ways, it can be thought of as an extremely simplified biological simulator which produces unexpectedly complex behavior.

The Game of Life is played on an infinite board made up of square cells. It takes way too long to draw an infinite board, so we'll make do with a small finite piece. Each cell can be either live or dead. We'll indicate a live cell as a red square, and a dead cell as a black one. The board begins in some initial configuration, which just mean a setting of each cell to be either live or dead (generally we'll start mostly dead cells, and only a few live cells).

A configuration of a 10-by-10 portion of the board with 9 live cells.

The board is repeatedly updated according to a set of rules, thereby generating a new configuration based on the previous configuration. Some dead cells will become live, some live cells will die, and some cells will be unchanged. This configuration is then updated according to those same rules, producing yet another configuration. This continues in a series of rounds indefinitely (or until you get bored of running your simulation).

Rules of Life

The rules are pretty simple: to figure out whether a cell (x,y) will be live or dead in the following round, you just look at the 8 neighbors of that cell (those that share a corner or an edge, so N, S, W, E, NW, NE, SW and SE). What happens to (x,y) next round depends on the number of its neighbors who are live and whether it is currently live or not. In particular:

  • If a cell has 4 or more live neighbors, it dies (overcrowding).
  • If a cell has 0 or 1 live neighbors, it dies (loneliness... *sniff*).
  • If a cell has exactly 2 live neighbors, nothing changes (if it was live, it stays live, if it was dead, it stays dead).
  • If a cell has exactly 3 live neighbors, it comes to life (I don't have a good explanation for this one, but it does seem to make for cool patterns).

For example, consider the following 3 initial configurations, and the two configurations that follow each.

Three initial configurations and two subsequent iterations.

In the first example, both live cells have only one live neighbor, so they both die. Any dead cell has at most two live neighbors, so no new live cells spawn. Thus in one step, there are no live cells. Clearly, at this point, the configuration is stable.

In the second example, two live cells have only one neighbor, so both die. But the third cell lives, and the cell to its immediate left has exactly 3 live neighbors, so it spawns. On the next iteration, we find ourselves in a case similar to the previous example, and all cells die.

Note that we can't set a cell to be live or dead the instant we determine its status for the subsequent round; we will likely need to know whether it is alive or dead on this round to determine the future status of other nearby cells. To see this, consider the second example. We can immediately tell that the top-most live cell will die. But had we set it to dead immediately, then when we got to the second live cell, it would have only had 1 live neighbor and we would have (erroneously) determined that it too must die. Thus it is critical that we first determine for every cell whether or not it will be live, and only after doing so update the status of each.

In the last example, all currently living cells die; the middle cell has too many neighbors, and the other have too few. However, four dead cells have exactly 3 live neighbors, and so those cells spawn. In the following round, there are neither cells that die nor cells that spawn, so we have a stable configuration. In general, a pattern that doesn't change is called a still-life.

While all of these patterns stabilized quickly, some patterns take a long time to stabilize, and some never do. Of those that never stabilize, some at least have a regularity to them; they eventually eventually repeat states. These are said to be oscillators. Others never repeat the same state again, and produce an infinite number of configurations.

Describe the Problem:

In this lab, you will create a visual simulation of the game of life for a number of preset starting conditions.

Understand the problem:

Consider the following four intial configurations.

Initial configurations A, B, C and D.

For each of these, you should specify the next four generations of each. You can either draw pictures, or type solutions. If you do the latter, I suggest you use a fixed width-font (like courier) and use a period (.) for a dead cell, and an X for a live cell. For example, the initial configurations would look like:

.....   .....   .....   .X...
.....   .XXX.   .XX..   ..X..
.XXX.   XXX..   .XX..   XXX..
.....   .....   .....   .....
.....   .....   .....   .....
                        

6. Specify the next 4 generations of configuration A.

7. Specify the next 4 generations of configuration B.

8. Specify the next 4 generations of configuration C.

9. Specify the next 4 generations of configuration D.

Honor Code

If you followed the Honor Code in this assignment, write the following sentence attesting to the fact at the top of your homework.

I affirm that I have adhered to the Honor Code in this assignment.