CSCI 241 - Homework 2:
Shell Scripting

Due by 11:59.59pm Sunday, 20 February 2016

Introduction

For this assignment you will be creating a number of shell scripts.

mypath

Create a shell script called mypath that takes a set of zero or more program names as parameter. The script should check for each argument given whether that file exists (and can be run by the user) in each of the directories of the current PATH. Your output should be ordered based on the sequence of the command line first, and then by directories in PATH second. Here's my output on Clyde with some extra newlines for line wrapping. Yours will depend on your PATH setting.

% ./mypath make emacs gzip 
/usr/bin/make: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), 
    dynamically linked (uses shared libs), for GNU/Linux 2.6.24,
    BuildID[sha1]=0x1d76b881b71091d37e6653d7c8b8e19a2a414591, stripped
/usr/bin/X11/make: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
    dynamically linked (uses shared libs), for GNU/Linux 2.6.24,
    BuildID[sha1]=0x1d76b881b71091d37e6653d7c8b8e19a2a414591, stripped
/usr/bin/emacs: symbolic link to `/etc/alternatives/emacs'
/usr/bin/X11/emacs: symbolic link to `/etc/alternatives/emacs'
/bin/gzip: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
    dynamically linked (uses shared libs), for GNU/Linux 2.6.24,
    BuildID[sha1]=0xe144f688d03d65a26b2de66426ea63f4bbef2dd6, stripped

You'll probably want to look over the man pages for file and sh for insight. The behavior is similar to a combination of which -a and file. NOTE: you are not supposed to use "which -a" in your solutions -- you need to create that functionality yourself.

myseq

Create a shell script myseq that prints to the screen a sequence of numbers based on the command line arguments. Here are how I want you to interpret the fields.

Three arguments: START STOP STEP
Two arguments: START STOP (assume step is 1)
One argument: STOP (assume start and step are both 1)

You may assume STEP is always positive, however if STOP is less than START you should subtract STEP at each interval, and of course the conditions on your loop should be different. HINT: because shell scripts are interpreted on the fly, you can just have local variables to hold your operator and test. If no/too many arguments are given, you should print out a message and exit with a failure value. Might want to look at expr and test.

% ./myseq 100 -20 17
100
83
66
49
32
15
-2
-19
% ./myseq
Usage: ./myseq <start> <stop> <step>     # go from start to stop by step
       ./myseq <start> <stop>            # assume step is 1
       ./myseq <stop>                    # assume start and step are 1

sumup

Create a shell script sumup that reads a sequence of integers from the standard input and then prints out the total. You may assume that each number appears on its own line and is an int and that the total won't cause any overflow. In shell, loops don't have to use test they can use any other command (such as read) for their tests.

% ./myseq 100 -20 17 | ./sumup
324

diskhog

Create a shell script diskhog that lists the 5 largest items (files or folders) in the current directory in decreasing order of size. Output the sizes in human readable format.

% cd ~/pub/cs241 
% ./diskhog 
3.9M    week03
572K    old
348K    hw06
152K    week06
112K    week05

You'll want to look through the manpages of du, cut, sort, xargs, and head (or tail).

For extra fun, you could have it take a flag to change the number of items to display, or maybe another that limits it to files/directories.

linecount

Create a shell script called linecount that by default will report the total number of lines in all of the files in the current working directory (recursively).

Have your shell script support an optional argument that will be used as a file glob pattern for the types of files. The user is responsible for properly quoting things on the command line. For example, to get a sum of all of the lines in your java source files you would use:

% ./linecount '*.java'

Finally, if a glob is specified and any other argument is supplied, use each of those as the directories to recursively examine. So, if you wanted to know how many lines of code were in your 150 and 151 class folders, you might run:

% ./linecount '*.java' ~/cs150 ~/cs151

You'll want to take a look at wc, pushd/popd or cd, find, and test.

Data file analysis

I often find myself using shell tools to answer questions about a data file that I'm working on. Here is a data file from a machine learning dataset that I'd like you download and unzip: adult.data.zip The fields in the data set are described at http://archive.ics.uci.edu/ml/datasets/Adult.

Answer the following questions in your README file (and give the commands used to find the answer):

How many entries are marked "Male" and how many are marked "Female"?
The last column is the label that is applied to the entry. How many of each label type are there?
Give the counts for each label used for "race" in decreasing order
Give the counts for a combined "race"/"sex" attribute in decreasing order

Potentially useful commands to look at include cut, sort, and uniq. If you include the commands you used to generate your answers, it might be possible to give you partial credit. Once you have answered the questions, you should delete the adult.data and adult.data.zip files so that you don't hand them in.

handin

README

Create a file called README that contains

Your name
A description of the programs
Your answers to the "Data File Analysis" questions and commands
An estimate of the amount of time it took to complete each part
Any known bugs or incomplete functions
Any interesting design decisions you'd like to share

Now you should clean up your folder (remove test case detritus, etc.) and handin your folder containing your scripts and README.

% cd ~/cs241
% handin -c 241 -a 2 hw2

% lshand

Grading

Here is what I am looking for in this assignment:

A working set of shell scripts as described above
All programs should work on clyde.cs.oberlin.edu and not use BASH evaluation extensions. (e.g., no $(( )) blocks)
Good comments
A README with the information requested above. The listing of known bugs is important.

Last Modified: February 10, 2016 - Roberto Hoyle

CSCI 241 - Homework 2:Shell Scripting