CSCI 151 - Prelab 8 Picking the very best webpage

Due 9am, Monday, April 15

In this prelab, you will familiarize yourself with some of the design and implementation issues in the upcoming lab 7. Please write or type up your solutions, and submit them on Gradescope before class on Monday. Remember, late prelabs receive zero credit.

Overview

In the previous lab, you created a WebPageIndex class that represents the data from a single document (either local file or URL). In this lab, you will be creating a collection of those indexes and then determining which page best matches what a user is searching for.

MyPriorityQueue

In this section of the lab, you will be implementing your own version of a binary heap based Priority Queue. To begin, you should probably look over the Java documentation for a PriorityQueue<T> and for java.util.AbstractQueue<T> which it extends.

Bridging the generational gap

You will be using an ArrayList as an efficient implementation of a complete binary tree. One of the activities you will need to be able to do is to move up and down the tree.

  1. Assuming that your ArrayList is a class variable named "heap"

    1. At what index will you store the root of the tree?
    2. At what index is the parent of the node at index i?
    3. At what index is the left child of the node at index i?
    4. At what index is the right child of the node at index i?

Comparators

In order to make these heaps work, you will need to create Comparators of various sorts. Begin by looking over the documentation for java.util.Comparator<T>. Pay special attention to the compare() method you are required to implement.

  1. Give the Java code for a comparator class StringComparator that compares two Strings, but does not care about the case of the strings themselves. Hint, you might want to just let the String's compareToIgnoreCase(); method do all the heavy lifting.

Methods

Inside our binary heap, there are a few private methods we will need to implement. We discussed both of these in class, but you will need to implement them yourself.

  1. Give the definition for the method percolateUp(int x) that takes the value currently located at index x and moves it up to the correct location.

Weighting web queries

In the application portion of this lab, you will be reading in and creating a number of WebPageIndex objects (from Lab 6), storing them in your heap, and then processing user search queries on those objects.

Simple queries

Explain how you compute the "score" of a particular web page given a String that represents a user query of one or more words under the following conditions (pseudocode or just a concise description is fine):

  1. Based on just the sum of the word counts of the page for the words in the query.
  2. Based on just the sum of the word counts, but requiring every word to be present

We'll be using these comparators in our PriorityQueue which will be based on a binary heap. Recall that in class we discussed that these are "min-heaps" -- heaps where the minimum value is at the root.

  1. Describe how you could use the previous score calculations within a Comparator to have the best scoring page be at the top of the heap (and therefore at the front of your PriorityQueue).

Advanced queries

Our WebPageIndex objects allow us to also search for phrases in our web pages.

  1. Explain how you would process a user query to identify phrases that are set off by double quotes, and then score the various pages. For example,

    pancakes "maple syrup" bacon

    is looking for pages that contain the words "pancakes" and "bacon" as well as the phrase "maple syrup".

Last Modified: April 05, 2015 - Benjamin A. Kuperman VI Powered