CSCI 241 - Homework 4:
Creating our first Unix tools

Due by 11:59.59pm Fri, 17 Oct 2008

Update: You may work with a partner on this project. Let me know who you are working with, and I'll set up a shared project for this assignment. Also, I've moved the due date back. I'd encourage you to get started before the last minute.

Introduction

For this assignment, you'll be creating a few tools that are Unix-like in their design. We've not discussed this in class, but the general Unix tool philosophy is that you have a program that does one thing (hopefully well) and can be composed together to perform more powerful operations.

You'll be creating two tools, one that will allow you to reformat text, and a second one that will allow you to sort text.

You'll also learn about the C99 replacement for atoi() and get some experience working with strings.


format

The first program you'll be creating will be called format. Its job will be to read in input and "neaten" it up. It reads in paragraphs of words and rearranges them such that they fit nicely onto a line of specified width inserting line breaks as needed. A paragraph is separated from other paragraphs by one or more empty lines (which might contain whitespace).

Example

INPUT:
Atop a large, ice-covered plateau,
 a struggle
 for survival is occurring. Two groups of mechanical 
                            creatures -- Chompers and 
Lobbers -- are battling to control this square,
                 slippery field for reasons that are beyond human 
comprehension.

However, the Lobbers believe that you,
 The Programmer
 knows what you are doing
and have decided that you will instruct them in their activities
    during
        this
            fateful
                day.
OUTPUT:
Atop a large, ice-covered plateau, a struggle for survival is occurring.
Two groups of mechanical creatures -- Chompers and Lobbers -- are
battling to control this square, slippery field for reasons that are
beyond human comprehension.

However, the Lobbers believe that you, The Programmer knows what you are
doing and have decided that you will instruct them in their activities
during this fateful day.

Command line arguments

format supports 4 command line arguments which change the behavior slightly.

Change line width

The -w flag allows you to change the width of the output lines. The default width is 72 characters per line.

Using the same input as above, we can specify our program to run as ./format -w 40 and get the following:

Atop a large, ice-covered plateau, a
struggle for survival is occurring. Two
groups of mechanical creatures --
Chompers and Lobbers -- are battling to
control this square, slippery field for
reasons that are beyond human
comprehension.

However, the Lobbers believe that you,
The Programmer knows what you are doing
and have decided that you will instruct
them in their activities during this
fateful day.

Right alignment

The -r flag allows you to specify that you want all the text aligned on the right side, not the left. Using the same input as above, we can specify our program to run as ./format -r and get the following:

Atop a large, ice-covered plateau, a struggle for survival is occurring.
       Two groups of mechanical creatures -- Chompers and Lobbers -- are
    battling to control this square, slippery field for reasons that are
                                             beyond human comprehension.

However, the Lobbers believe that you, The Programmer knows what you are
  doing and have decided that you will instruct them in their activities
                                                during this fateful day.

Fully justified

The -j flag allows you to fully justify the text. That is each line extends from the left side all the way to the maximum width of the line. To simplify things, do this for even the last line of a paragraph.

Using the same input as above, we can specify our program to run as ./format -j and get the following:

Atop a large, ice-covered plateau, a struggle for survival is occurring.
Two  groups  of mechanical  creatures  -- Chompers  and  Lobbers  -- are
battling to  control this  square, slippery field  for reasons  that are
beyond                        human                       comprehension.

However, the Lobbers believe that you, The Programmer knows what you are
doing and have decided that  you will instruct them in  their activities
during                 this                 fateful                 day.

Professor Woods came up with a nifty way to calculate the number of spaces that go between words in the fully justified case. If you know the number of words on a line, then you can figure the number of gaps that need to be filled.

Similarly, if you know the length of the total number of words on that line, you can determine the number of spaces that should be inserted.

You then can apply integer division to calculate the total number of spaces that should have been seen by the time you finish a gap.

                                (# of gaps so far) * (total # of spaces)
    # of spaces seen so far =   ----------------------------------------
                                            (total # of gaps)

As an example, imagine we have 4 words on a line, and 8 extra spaces. There are 3 gaps to fill. At the first one we put 2 spaces:

    1 * 8 / 3 = 2 spaces

At the second, 3 spaces because the total needs to be:

    2 * 8 / 3 = 5 spaces

And finally, 3 more spaces to finish off the set:

    3 * 8 / 3 = 8 spaces

While you aren't required to use this formula to determine spacing, it seems to work quite well. Certainly better than just pushing the excess into one of the edge positions. (That would have been 2-2-4 instead of 2-3-3.)

Skip multiple blank lines

Finally, you should also support the -s flag to indicate that you want multiple blank lines to be compressed into just one. (Normally, you'd have a bunch of empty paragraphs.)

INPUT:
I like cheese.







How about you?
OUTPUT:
I like cheese.

How about you?

Multiple options

These options should be cumulative -- you can specify width, alignment, and skipping of blank lines together. The -r and -j flags should not be used together. If they are, just use whichever was specified last.

Programming notes

Some guidelines you should follow when working on your solution:

You might also want to think about how you can break this down into the basic functionality and how each flag modifies the behavior. There are many ways to correctly implement these specifications -- read it in a line at a time and then process it OR read in a bunch of words, then assemble the lines from that.

You should be doing a process of stepwise refinement. Add in one new component and test to be sure it works before moving on to the next. Trying to do everything all at once will likely only lead to much confusion. Also, sketching out your design on paper beforehand is invaluable.

For example, when sketching things out, you decide you'd like it if you could just implement some of the functionality of Java's Scanner.next(), you'd have a design of how to handle the basic case. Then you should implement that function and then test to see if it behaves as it should on a variety of input. Once it is working, you can move on to the next phase of your design.


sort

The second tool you'll be creating is sort, modeled after the Unix tool of the same name. This program reads in lines of text and then sorts them based on specified criteria. By default, it does so using the ordering of strcmp().

Example

INPUT:
ALICE
bob
BOB
CAROL
alice
carol

INPUT:
ALICE
BOB
CAROL
alice
bob
carol

Folding case

Notice how all the capital letters come before the lower case ones? That is not a mistake. The numeric value of 'A' is 32 less than the value of 'a' so it is natural for them to come first.

However, sometimes you don't want this to take place. Therefore you should support the -f flag which folds lowercase letters into uppercase ones (i.e., case insensitive comparisons).

OUTPUT:
ALICE
alice
BOB
bob
CAROL
carol

Ordering of upper vs. lower case lines is up to the sort, not something you need to handle.

Numeric sorting

Sometimes, you want to sort lines based on a leading number. To do that you'll use the -n flag. Note that this should ignore leading whitespace, treat the first thing as a number, then sort based on the rest of the line as before if multiple lines have the same leading number.

INPUT:
    0    Alice
    99   Bob
    20   Carol
    172  Eve
    76   Mallory
    9    Trent
    59   Plod
    14   Steve
OUTPUT:
    0    Alice
    9    Trent
    14   Steve
    20   Carol
    59   Plod
    76   Mallory
    99   Bob
    172  Eve

I'd like you to do this by implementing long mystrtol(char *start, char **rest) based on the strtol() function added in to C99 to replace atoi().

This function should take a pointer to the start of a string, skip leading whitespace, and attempt to convert the next characters into a long integer. If it isn't a number, you should return 0.

The second parameter is where you can pass in the address to a character pointer. If this value is not NULL, you should store the address of the first character after the number you interpreted. If no number was interpreted, you should just set it to the start of the line, whitespace included.

NOTE: The real version of strtol() takes a third parameter that lets you specify the base of the number being interpreted, but we aren't implementing that.

Reversing the sort

Finally, you sometimes want things in descending order instead of ascending order. To do this, you will support the -r flag to reverse the sort, regardless of the other options specified. So you can do reverse numeric, folded, or normal sorts.

Programming notes

Some guidelines you should follow when working on your solution:


More on command line arguments

In addition to the flags listed above, I'd like you to implement a -h and -? flag that prints out a brief usage message and then exits the program with a non-zero value. You should do the same behavior if an unknown flag is passed to your program.

You should have your program's main function return 0 upon successful completion of the assigned task.


handin

README

Create a file called README that contains

  1. Your name and a description of the programs
  2. A listing of the files with a short one line description of the contents
  3. Any known bugs or incomplete functions
  4. Your affirmation as to the honor code if you followed it

Now you should make clean to get rid of your executables and handin your folder containing your source files, Makefile, and README.

    % cd ~/cs241
    % handin -c 241 -a 4 hw4
    % lshand

Grading

Here is what I am looking for in this assignment:


VI PoweredLast Modified: October 09, 2008 - Benjamin A. Kuperman