CSCI 241 - Homework 1:
Shell Scripting

Due by 11:59.59pm Friday October 22

We're going to start things off nice and simple. I'll recommend you create a cs241 directory in your CS account and a hw1 directory inside that.

First off, go to https://classroom.github.com/a/9i9oVJE_ and create a repository in the CS 241 organization.

Click on the link and head towards the created repository. It will be called hw1-username

Go back to your CS account to the cs241 directory above, and clone the repository into this account. Refer to the github tutorials for how to do this step.

To work with a partner, follow these steps:

Everyone is going to be expected to turn in a submission. It makes no sense to work on two separate repositories, however. For this assignment, one person will add the other as a collaborator and you will both do the assignment on that repository. For the other repository, just commit a README file with the name of your group (optional) and your partner.

Introduction

For this assignment you will be creating a number of shell scripts.

Part 1 - URL Testing

Write a shell script called testurl.sh that accepts a list of urls in a separate file and tests if the website is up or not. You might find it useful to checkout the curl, wget and tail commands.

rhoyle@clyde$ cat urls
http://cs.oberlin.edu/~ncare/cs241/labs/lab8.html
https://occs.cs.oberlin.edu/~rhoyle/17s-cs241/assignments/hw02.html
http://no.such.url
http://occs.cs.oberlin.edu
rhoyle@clyde$ ./testurl.sh urls
Not found: http://no.such.url

This script should also handle errors. If the user doesn't provide any urls to the script it should print out a usage message.


Part 2 - Back it up a step

Next, I want you to create a script called backup.sh. The script should take as arguments a directory to backup into followed by a list of one or more files to copy to the backup directory.

Your script should only copy files in if their timestamp is more recent than the file that exists in the backup directory when the script is run. You might find it helpful to check bash's test (i.e. [ ]) syntax. Additionally, you should make your script executable using chmod. That is, the command should be runnable as follows

$ ./backup.sh ~/.backup file1 file2 dir1

Part 3 - Diskhogger

Create a shell script called diskhog.sh that lists the 5 largest items (files or directories) in the current directory in decreasing order of size. You should output the sizes in a human readable format like so:

% cd ~rhoyle/pub/cs241
% ./diskhog.sh
3.9M week03
572K old
348K hw06
152K week06
112K week05

Check out the man pages for du, cut, sort, xargs and head (or tail)

If the script is run from an empty directory, make sure it prints nothing at all (and not an error message)

Part 4 - linecount

Create a shell script called linecount that by default will report the total number of lines in all of the files in the current working directory (recursively).

You'll want to take a look at wc, cd, find, and test.

Part 5 - Data file analysis

I often find myself using shell tools to answer questions about a data file that I'm working on. The file that we will work on can be downloaded from https://coronavirus.ohio.gov/static/dashboards/COVIDSummaryData.csv. You should use wget or curl to download it.

Answer the following questions in your README file (and give the commands used to find the answer):

  1. How many cases are in Lorain county?
  2. How many cases are there for Males vs. Females?
  3. Which county has the most cases?
  4. Which county has the most cases for 20-29 year olds?

Potentially useful commands to look at include cut, sort, and uniq. If you include the commands you used to generate your answers, it might be possible to give you partial credit. Include the downloaded data file with your submission as the file will change over time.

Programming Hints


Extra Credit

These are ideas for extra credit. You do not need to implement them all. You can also come up with your own ideas, just ask me if they would count for extra credit if you have doubts.

  1. Modify testurl.sh to output if a file is a valid HTML file according to the W3C validator at https://validator.w3.org/
  2. Modify your backup.sh script to keep a list of the five most recent backup directories and store copies as symlinks.
  3. Modify diskhog.sh to take a flag to change the number of items to display and another to limits it to files or directories.
  4. Make Diskhogger take a flag to change the number of items to display, or maybe another that limits it to files/directories.
  5. Have your linecount.sh script support an optional argument that will be used as a file glob pattern for the types of files. The user is responsible for properly quoting things on the command line. For example, to get a sum of all of the lines in your java source files you would use:
  6. % ./linecount '*.java' 
    
  7. Take the COVID data and create a plot of infections by county in Ohio through a shell script using gnuplot. All manipulation should be done via the shell or shell commands, no hand-editing is allowed.

Turning it In

README

Create a file called README that contains

  1. Your name
  2. A description of the programs
  3. Your answers to the "Data File Analysis" questions and commands
  4. An estimate of the amount of time it took to complete each part
  5. Any known bugs or incomplete functions
  6. Any interesting design decisions you'd like to share

Now you should clean up your folder (remove test case detritus, etc.) and handin your folder containing your scripts and README.

Grading

Here is what I am looking for in this assignment:


Last Modified: September 15, 2020 - Roberto Hoyle and Nick Care. Some material based on work by Benjamin Kuperman.