# CSCI 150: PreLab 5

Lists and Strings
Due: 9AM on Wednesday, March 7

In this prelab you will formulate some of the ideas necessary to complete Lab 05. Please type your solutions, and hand in a paper copy before the beginning of lecture on Wednesday. Remember, no late prelabs allowed!

## Watching

Now that you've done some basic image manipulation, here's a neat video of some relatively recent research on content-aware image rescaling. It's worth noting that by the end of this semester, you'll know enough to implement this yourself (although Python is unfortunately too slow with images for this sort of application in real-time).

1. Did watch the video? (Yes or no will suffice.)

## List Practice

2. Write a chunk of code that creates a list containing the first 30 Fibonacci numbers (1, 1, 2, 3, 5, 8, and so on). Don't use any variables other than the list itself, and an index variable (i.e. don't duplicate the code we developed earlier to compute the Fibonacci numbers -- use the list!).

3. Write a chunk of code that prints the second largest integer in a list A of integers. You may assume A has already been created and assigned values, and that all values are distinct. Don't use any Python list methods (like sort), and don't change the list. You should be able to find the second largest integer in a single pass through the list.

4. Write pseudocode for an algorithm that counts the number of pixels of every possible red value in a Picture. That is, the algorithm keeps track of how many pixels have a red value of 0, how many have a red value of 1, and so on, up to how many have a red value of 255.

## Looking for a Match

As you may know, proteins are chains of molecules called amino acids. There are 20 amino acids, each of which is typically represented by a single letter, and any protein can be specified by its sequence of amino acids. This sequence determines the properties of the protein, including its 3D structure.

Left: A general amino acid (structure of R determines the particular amino acid).
Right: 3D structure of a protein. Image source: wikipedia.org.

When a new protein is found, one way in which we might attempt to guess the functionality of that protein would be to see if it contains certain markers common to a known class of proteins. For example (and an entirely bogus example at that), suppose we discover a new protein, that we've named Duane, with the following amino acid sequence:

`STTECQLKDNRAWTSLFIHTGHTECA`

We may also suspect that Duane might belong to one of two possible classes of proteins: Spiffs and Blorts. As you well know, most Spiffs contain the pattern TECQRKMN or at least something close to it. That is, most of the sequences in the class of Spiff proteins have the subsequence TECQRKMN with only a few of the letters changed. Blorts, meanwhile, have the pattern ALFHHTTGT, or something very similar.

In this case, we can deduce that Duane is most likely a Spiff: Duane contains the pattern TECQLKDN which only has 2 mismatches from TECQRKMN (the errors are marked with a ^ below).

```TECQLKDN
TECQRKMN
^ ^
```

The closest pattern to the Blort sequence is

```SLFIHTGHT
ALFHHTTGT
^  ^  ^^
```

which has 4 mismatches.

#### Describe the Problem:

The problem you will be solving on your lab is as follows.

Input: A file that contains a string s representing a protein sequence, along with some number of strings, each representing a marker sequence.
Goal: For each marker sequence, find its best match in the protein sequence and report its location and the number of errors in the match.

#### Understand the Problem:

The file test.txt is in the format you should expect for your input. In particular, the first line will always contain the protein sequence. Following the protein sequence will be some number of marker sequences. For each of these sequences, you should report the location of the best match, and the number of errors at that location.

For example, the contents of test.txt are as follows:

```STTECQLKDNRAWTSLFIHTGHTECA
TECQRKMN
ALFHHTTGT
TTECQ
HT
ZZZ
TTZZZRAWT
```
For this file your program should have something like the following output:

### Example Output

```  Sequence 1 has 2 errors when aligned at position 2.
Sequence 2 has 4 errors when aligned at position 14.
Sequence 3 has 0 errors when aligned at position 1.
Sequence 4 has 0 errors when aligned at position 18.
Sequence 5 has 3 errors when aligned at position 0.
Sequence 6 has 5 errors when aligned at position 5.
```

#### Design an Algorithm:

In order to solve this problem, you will need to figure out how to find the best match between a single marker sequence and the original protein sequence.
5. Write pseudocode for an algorithm that finds the best fit for a given subsequence to a given protein. Assume you have a String or (if you'd prefer) an list of characters P (representing the protein) and a shorter String or list of characters S (representing the marker subsequence). Your pseudocode algorithm should find the location at which to align the subsequence with the protein so as to minimize the number of mismatches. You should print both the starting index of the best alignment as well as the number of mismatches at that location. If there are multiple best alignments, use the earliest appearing one.

For example, if P contains the characters

`STTECQLKDNRAWTSLFIHTGHTECA`

and S contains the characters

`ALFHHTTGT`

```Best match site: 14
Mismatches: 4```

since the best match occurs starting with the 14th element in P and has 4 mismatches, as discussed above.

## Mind Mastery

Mastermind is a neat (although oftentimes frustrating) puzzle game. It works a something like this: There are two players. One player is the codemaker (your porgram), the other is the codebreaker (the user). The codemaker chooses a sequence of four colored pegs, out of a possible six colors (red, blue, green, yellow, orange, and purple). He may repeat colors and place them in any order he wishes. This sequence is hidden from the codebreaker. The codebreaker has 10 chances to guess the sequence. The codebreaker places colored pegs down to indicate each of her guesses. After each guess, the codemaker is required to reveal certain information about how close the guess was to the actual hidden sequence.

#### Describe the Problem:

In this part of the lab, you will create a program to play Mastermind, where computer is playing the codemaker, and the human user is the codebreaker. Thus your program needs to generate a secret code, and repeatedly prompt the user for guesses. For each guess, your program needs to give appropriate feedback (more detail below). The game ends when either the user guesses correctly (wins) or uses up 10 guesses (loses).

#### Understand the Problem:

The trickiest part of this game is determining how to provide feedback on the codebreaker's guesses. In particular, next to each guess that the codebreaker makes, the codemaker places up to four clue pegs. Each clue peg is either black or white. Each black peg indicates a correct color in a correct spot. Each white peg indicates a correct color in an incorrect spot. No indication is given as to which clue corresponds to which guess.

For example, suppose that the code is RYGY (red yellow green yellow). Then the guess GRGY (green red green yellow) would cause the codemaker to put down 2 black pegs (since guesses 3 and 4 were correct) and 1 white peg (since the red guess was correct, but out of place). Note that no peg was given for guess 1 even though there was a green in the code; this is because that green had already been "counted" (a black peg had been given for that one).

As another example, again using RYGY as our code, the guess YBBB would generate 1 white peg and 0 black; yellow appears twice in the code, but the guess only contains one yellow peg. Likewise, for the guess BRRR, only 1 white peg is given; there is an R in the code, but only one.

Check here for an online graphical version of the game. Note this version uses red instead of black pegs to indicate a correct guess. Also note, this isn't the pinacle of gaming technology.
6. Assuming the code is RYGY, fill in the appropriate number of black and white pegs for each guess.

guess black pegs white pegs
YYYY
YRYR
BBPO
PGYR
YYYG
RYGY

7. Consider the following algorithmic approach for calculating the number of white pegs to be awarded for a given guess.
``` set a white counter to 0
loop through the four positions of the guess
loop through the four positions of the code
if the current code character matches the current guess character
and the guess and code positions are not the same,
increment the white counter and exit the inner loop
```
What goes wrong with this algorithm? Give an example of a code, a guess, the value that should be generated and the value that this pseudocode would generate.

## Honor Code

If you followed the Honor Code in this assignment, write the following sentence attesting to the fact at the top of your homework.

I affirm that I have adhered to the Honor Code in this assignment.