CSCI 241 - Homework 2:
Shell Scripting

Due by 11:59.59pm Friday, 25 September 2015

Introduction

For this assignment you will be creating a number of shell scripts.

mypath

Create a shell script called mypath that takes a set of zero or more program names as parameter. The script should check for each argument given whether that file exists (and can be run by the user) in each of the directories of the current PATH. Your output should be ordered based on the sequence of the command line first, and then by directories in PATH second. Here's my output on Clyde with some extra newlines for line wrapping. Yours will depend on your PATH setting.

% ./mypath make emacs gzip 
/usr/bin/make: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), 
    dynamically linked (uses shared libs), for GNU/Linux 2.6.24,
    BuildID[sha1]=0x1d76b881b71091d37e6653d7c8b8e19a2a414591, stripped
/usr/bin/X11/make: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
    dynamically linked (uses shared libs), for GNU/Linux 2.6.24,
    BuildID[sha1]=0x1d76b881b71091d37e6653d7c8b8e19a2a414591, stripped
/usr/bin/emacs: symbolic link to `/etc/alternatives/emacs'
/usr/bin/X11/emacs: symbolic link to `/etc/alternatives/emacs'
/bin/gzip: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
    dynamically linked (uses shared libs), for GNU/Linux 2.6.24,
    BuildID[sha1]=0xe144f688d03d65a26b2de66426ea63f4bbef2dd6, stripped

You'll probably want to look over the man pages for file and sh for insight. The behavior is similar to a combination of which -a and file. NOTE: you are not supposed to use "which -a" in your solutions -- you need to create that functionality yourself.

myseq

Create a shell script myseq that prints to the screen a sequence of numbers based on the command line arguments. Here are how I want you to interpret the fields.

You may assume STEP is always positive, however if STOP is less than START you should subtract STEP at each interval, and of course the conditions on your loop should be different. HINT: because shell scripts are interpreted on the fly, you can just have local variables to hold your operator and test. If no/too many arguments are given, you should print out a message and exit with a failure value. Might want to look at expr and test.

% ./myseq 100 -20 17
100
83
66
49
32
15
-2
-19
% ./myseq
Usage: ./myseq <start> <stop> <step>     # go from start to stop by step
       ./myseq <start> <stop>            # assume step is 1
       ./myseq <stop>                    # assume start and step are 1

sumup

Create a shell script sumup that reads a sequence of integers from the standard input and then prints out the total. You may assume that each number appears on its own line and is an int and that the total won't cause any overflow. In shell, loops don't have to use test they can use any other command (such as read) for their tests.

% ./myseq 100 -20 17 | ./sumup
324

diskhog

Create a shell script diskhog that lists the 5 largest items (files or folders) in the current directory in decreasing order of size. Output the sizes in human readable format.

% cd ~/pub/cs241 
% ./diskhog 
3.9M    week03
572K    old
348K    hw06
152K    week06
112K    week05

You'll want to look through the manpages of du, cut, sort, xargs, and head (or tail).

For extra fun, you could have it take a flag to change the number of items to display, or maybe another that limits it to files/directories.

linecount

Create a shell script called linecount that by default will report the total number of lines in all of the files in the current working directory (recursively).

Have your shell script support an optional argument that will be used as a file glob pattern for the types of files. The user is responsible for properly quoting things on the command line. For example, to get a sum of all of the lines in your java source files you would use:

% ./linecount '*.java' 

Finally, if a glob is specified and any other argument is supplied, use each of those as the directories to recursively examine. So, if you wanted to know how many lines of code were in your 150 and 151 class folders, you might run:

% ./linecount '*.java' ~/cs150 ~/cs151 

You'll want to take a look at wc, pushd/popd or cd, find, and test.

Data file analysis

I often find myself using shell tools to answer questions about a data file that I'm working on. Here is a data file from a machine learning dataset that I'd like you download and unzip: adult.data.zip The fields in the data set are described at http://archive.ics.uci.edu/ml/datasets/Adult.

Answer the following questions in your README file (and give the commands used to find the answer):

  1. How many entries are marked "Male" and how many are marked "Female"?
  2. The last column is the label that is applied to the entry. How many of each label type are there?
  3. Give the counts for each label used for "race" in decreasing order
  4. Give the counts for a combined "race"/"sex" attribute in decreasing order

Potentially useful commands to look at include cut, sort, and uniq. If you include the commands you used to generate your answers, it might be possible to give you partial credit. Once you have answered the questions, you should delete the adult.data and adult.data.zip files so that you don't hand them in.

handin

README

Create a file called README that contains

  1. Your name
  2. A description of the programs
  3. Your answers to the "Data File Analysis" questions and commands
  4. An estimate of the amount of time it took to complete each part
  5. Any known bugs or incomplete functions
  6. Any interesting design decisions you'd like to share

Now you should clean up your folder (remove test case detritus, etc.) and handin your folder containing your scripts and README.

% cd ~/cs241
% handin -c 241 -a 2 hw2

% lshand

Grading

Here is what I am looking for in this assignment:


Last Modified: September 11, 2015 - Roberto Hoyle