CSCI 241 - Homework 3

Due by 11:59.59pm Monday, 05 October 2015

Give a command that will use a single egrep or grep -E on /usr/share/dict/words.241 to find the following. Consider only a, e, i, o, and u as vowels for our purposes. Put your answers in a file called README.

You may want to review the lecture notes, the class readings, and possibly do an online tutorial before beginning. There are useful links for regex visualizers on the course home page.

Protip: you can do export WORDS=/usr/share/dict/words.241 and then use $WORDS as your input file. Also, piping output to wc -l will let you count the lines output.

  1. All words that contain exactly one lowercase vowel (5948 on clyde)
  2. All words that contain the lowercase vowels a, e, i, o, and u in that order (6 on clyde)
  3. All words that are exactly 22 lowercase letters long (2 on clyde)
  4. All words that have a 4-letter sequence repeated (24 on clyde)
  5. All words that start and end with the same 3 letter sequence (32 on clyde)
  6. All lowercase words that are made up of only pairs of consonant-vowels like banana and are at least 6 letters long (545 on clyde)
  7. All words that end with their first 3 letters reversed like detected (14 on clyde)

For this portion of the assignment I want you to construct sed commands that will do the following activities. (Don't forget the -E flag!)

  1. Replace all instances of "snow fall" or "wind chill" with "summertime"
  2. Assuming the input is a dictionary file like /usr/share/dict/words.241 (one per line, alpha order), print out all words between "computer" and "science"
  3. Replaces all instances of "Teh" with "The" and "teh" with "the", but only in standalone words
  4. Move the last word on a line to the front
  5. Find lines where a word has been repeated on the same line and replace that line with a repeated word. Don't print the other lines.
  6. Delete all shell comment lines, that is lines with optional whitespace followed by a #. Don't delete them if the comments are on lines that have actual instructions (i.e., something other than whitespace before the #)
  7. Convert C block comments that are on one line and at the end into a line comment.
    So /* add things up */ would become // add things up
  8. Only print out lines that contain "cs 241", but change that to "CSCI 241"
  9. Take the previous, but modify it to handle "CS" or "CSCI" with or without space and of any type of capitalization (e.g., "cScI241")
  10. Truncate all lines after exactly 20 characters.
  11. Replace all instances of "Thomas B. Wexler" with "T-Wex" (including variations with "Tom" and/or no middle initial)
  12. Assuming that a name is made up of two adjacent words that start with a capital and are followed by one or more lower case letters, anonymize the input by changing every name to be just their initials. So "Roberto Hoyle" becomes "RH". Be sure to handle having multiple names on the line.
  13. Check to see if there is a 10 digit number on a line which may have non-letters between the digits and print it out in the format 1234567890
  14. If there is a 10 digit number on a line (not part of another word) reformat the number as (123) 456-7890
  15. Assume that the input is being piped from wget --quiet -O- (which will print the xkcd comic's html page to stdout), print out the Image and Title information as follows:
    Title: I'm the Philosopher until someone hands me a burrito.

Useful links

You may find the following links useful when working on this assignment:

Handin your README

Be sure your README document has your name and the honor code statement and hand it in (or the folder containing it).

    % handin -c 241 -a 3 README

    % lshand 241

Once you are done, try out Regex Golf!

Last Modified: Sep. 24, 2015 - Roberto Hoyle