CMPT 225 Assignment 4 – Dictionary
In this assignment you are to implement a hash table class that you will use to store words. In the second part of the assignment you are to implement functions that could be used by a spell checker – which will use your hash table to look up valid English words.
Please read the requirements carefully, as usual pay attention to the names and input and output requirements of the class and its methods. We will be testing your class in our test program, and if you do not follow the requirements this test program may not compile, and your class may therefore not be marked.
This assignment is worth 6% of your final grade.
Hash Table Dictionary
Implement a hash table class called HashTable to store standard strings composed of lower-case letters. You should handle collisions using double hashing, which will require implementing a second hash function (h2).
Method parameters and return values are noted (and highlighted) in the method descriptions; you must not add additional parameters to the methods. If the method description does not mention a parameter it does not have one, similarly if no return value is mentioned the method is void (or a constructor or destructor).
Public Methods
▪ default constructor – creates a hash table of default size: 101; sets all array elements to the empty string (“”); sets the value used by h2
▪ constructor – creates a hash table to store n items where the value of n is given by the constructor’s single integer parameter, the size of the underlying array should be the smallest prime number equal to or greater than 2n; sets all array elements to the empty string; sets the value used by h2
▪ copy constructor – a constructor that creates a deep copy of its constant HashTable reference parameter
▪ destructor – de-allocates any dynamic memory associated with the calling object
▪ operator= – overloads the assignment operator for HashTable objects – makes the calling object a deep copy of its constant HashTable reference parameter; returns a reference to
the calling object
▪ insert – searches the hash table for the method’s string parameter, if a matching string is not
found it inserts the parameter string into the hash table; if the hash table’s load factor is greater then 0.67 after insertion (1) creates a new hash table of size equal to the first prime number greater than twice the size of the original table, (2) inserts the original contents in the new table at the appropriate places and (3) sets the value used by h2
▪ find – returns true if its string parameter is in the hash table, returns false if it is not
▪ size – returns the number of items stored in the hash table
▪ maxSize – returns the size of the hash table’s underlying array ▪ loadFactor – returns the load factor of the hash table
Private Methods and Attributes
I expect that you will need to implement helper methods (which should be private). Your class should (at least) have the following private attributes.
▪ string* to refer to the underlying array of string objects
▪ int to record the size of the underlying array
▪ int to record the value used by h2
▪ int to record the current number of items in the hash table
Additional Requirements
▪ Your class should be implemented in separate .h and .cpp files
▪ The calling object should be made constant for any method where the calling object’s
attribute values should not change
▪ The insert and find methods should be performed in constant time, with the exception of an
insertion which results in the doubling of the size of the underlying array
Constructor and Hash Table Size
As noted above the constructor(int) should create an underlying array of size roughly 2n, where the size is a prime number. This will entail writing a (private) Boolean method to find a suitable prime number. There is an obvious (O(n)) algorithm for this.
Hash Function
Insertions should be mapped to array indexes using the function: string_value % array_size. To achieve this your function must first generate the integer string_value from a given string. This value should be calculated by assigning the values 1 to 26 to the letters a to z (regardless of letter case) and calculating a unique value for each string by multiplying each character’s value by 32letter position. A string of length n therefore has the following value:
Where ch0 is the left most character of the string and n is the size of the string. For example, the string “cat” has this value:
Note that using the result of this calculation on large strings will cause overflow. You should therefore use Horner’s method to perform the calculation and apply the mod operator after computing each expression in Horner’s method. Horner’s method was discussed in class.
ch0 *32n-1 +ch1 *32n-2 +…+chn-2 *321 +chn-1 *320
3*322 +1*32+20=3,124
To find the integer corresponding to a lower-case character at position i in a string str subtract 96 from the character: int asc = str[i] – 96; //ASCII a = 97
You may assume that your hash table will only be required to store strings with lower case letters, and will not be tested with other characters.
Second Hash Function for Collisions
The hash function to determine the distance to move from the initial index in the event of a collision should be generated by h2: P2 – (string_value % P2) where string_value is an integer representation of the string (discussed above) and P2 is the first prime number greater than array_size / 2. P2 should be stored as an attribute of the HashTable object and be calculated on construction of the object and when the hash table size doubles on insertion. This attribute should not be named P2 but should be given some descriptive name.
Word File
Below is a link to a small file of (lower-case) English words that you can use to test your hash table and spell check functions.
wordList1000.txt
And, just to be generous, here is a function to read the contents of a file into a vector.
// Opens a file and reads the contents into a vector
// PARAM: infile = name of the file to be opened
// PRE: the file contains words separated by white space // POST: returns a vector containing the contents of infile vector
{
ifstream ist(infile.c_str());
if (ist.fail())
throw runtime_error(infile + ” not found”);
istream_iterator
vector
ist.close();
return result;
}
Spell Check Functions
You are to implement the functions described below. Each checks to see if its input string is in a given dictionary and, if not, returns an ordered list of words in the dictionary that is similar to its input, as determined by the function’s process. All the functions have the same return type and parameter lists. The general form of each functions is:
vector
where ht contains the dictionary and word is a string composed solely of lower-case letters.
There are three possible return values for each of the functions.
▪ success – if word parameter is in ht the result should contain word, and no other strings.
▪ near-miss – if word is not in ht but ht contains strings similar to word – as determined by the
function – the result should contain these near-misses in order.
▪ failure – if word is not in ht and ht contains no strings similar to word – as determined by the
function – the result should be empty.
Functions
▪ extraLetter – if word is not in ht returns all strings in ht that can be made by deleting a single letter from word
▪ transposition – if word is not in ht returns all strings in ht that can be made swapping any two neighbouring letters in word
▪ missingSpace – if word is not in ht returns all pairs of strings in ht that can be made by inserting a single space in word; each string in the result should consist of the two words from ht separated by a space
▪ missingLetter – if word is not in ht returns all strings in ht that can be made by inserting any single letter of the alphabet at any position in word
▪ incorrectLetter – if word is not in ht returns all strings in ht that can be made by changing any single letter of word to a different letter in the alphabet
Examples
For these examples I am not giving you the contents of the dictionary – but it is assumed to contain most common English words.
▪ extraLetter – input: chate, output: {chat, hate} – not hat or cat as they have two fewer letters
▪ transposition – input: atr, output: {tar, art} – not rat as that requires swapping more than two
neighbouring letters
▪ missingSpace – input: rateat, output: {rat eat, rate at}
▪ missingLetter – input: poe, output: {poet, poke, pole, pope, pose}
▪ incorrectLetter – input: thed, output: {shed, thee, them, they, thud, toed}
Additional Requirements
▪ These functions should be implemented in a .h file named spellcheck.h. You should make a separate file with a main function for your own testing which should not be submitted.
Assessment
The assignment is out of 55.
▪ Hash table structure (constructors, operator= etc.) – 5 ▪ Insertion – 10
▪ Find – 4 marks
▪ Size, max size and load factor methods – 6 (2 each)
▪ Spell check functions – 10 (2 each)
▪ Comments, naming and class design – 15 (see the code quality page for more detail) ▪ Memory management – 5 (valgrind check)
Assessment Details – PLEASE READ
▪ If your hash table class is unable to store our dictionary, we will not mark your spell check functions – focus on making your hash table work correctly before moving on to the rest of the assignment
▪ Make sure you test your hash table with large amounts of data (we will)
▪ Make sure you test your hash table with long strings (we will)
▪ You must implement your hash table using a hash function as described in class; if your insert
and find methods perform a linear search of the underlying array you will receive 0 marks for these methods
Submission
You should submit your assignment online to the CoursSys submission server. You must submit three files:
▪ HashTable.h
▪ HashTable.cpp
▪ spellcheck.h
These files should not be enclosed in a .zip file. Please read the documentation on site for further
information. The assignment is due by 11:59pm on Wednesday the 18th of March.
If you are unable to complete one of the methods, make sure that any calls to that method will still compile and run by writing a stub for that method that returns a default value of the appropriate type.
CMPT 225 Home
John Edgar (johnwill@sfu.ca)