Overview
CSCI 1100 — Computer Science 1 Homework 6 Sets and Files
This homework is worth 135 points toward your overall homework grade, and is due Thursday, November 8, 2018 at 11:59:59 pm. It has two parts, each worth 70 points. Note that 70 + 70 = 140 which is greater than 135. There is no extra credit, but this will give you some room for minor formatting issues. Please download hw6_files.zip. and unzip it into the directory for your HW6. You will find multiple data files to be used in both parts.
The goal of this assignment is to work with sets and files. Most of the file processing is pretty simple. The actual work lies in manipulating the data in them using sets. You must use sets in both parts; both to get points but also to make sure your program runs in a reasonable amount of time. An incorrect implementation that does not use sets, or one that inadvertently converts from sets to lists will consume unnecessary amounts of processing power in Submitty and may not complete, especially for part 1.
Please remember to name your files hw6Part1.py and hw6Part2.py.
As always, make sure you follow the program structure guidelines. You will be graded on program
correctness as well as good program structure.
Remember as well that we will be continuing to test homeworks for similarity, so follow our guide- lines for the acceptable levels of collaboration. You can download the guidelines from the resources section in Piazza if you need a refresher. Note that this includes using someone else’s code from a previous semester. Make sure the code you submit is truly your own.
Autocorrect fail…
How does autocorrect work? We use it almost every day. The idea behind it is very simple. If you type a word that is not in my dictionary, I go through all possible ways you could have misspelled a word (within reason) and check whether any of the corrections of misspellings is in my dictionary, if so I return it as my prediction.
Fair warning: we will solve a simplified version of autocorrect in this part. You must use sets. We will revisit the same problem in Homework #7 with a more complex solution using dictionaries. You will be able to reuse most of what you write here at that time, so spend some time to structure your code well, and use functions to make it more modular. Think about the future you of a week from now trying to read and modify your code, and be nice to that person!
To solve this problem, your program will read the name of two files: the first containing a dictionary of words and the second containing a list of words to autocorrect. Both files have a single word per line.
Your program will read the words from the dictionary into a set. All operations that check for membership of words in the dictionary or for finding the common words within a dictionary must be done with sets.
Your program will then go through every single word in the input file and autocorrect each word. To correct a single word, you will consider the following:
FOUND If the word is in the dictionary, it is correct. There is no need for a change. Print it as
found, and go on to the next word.
DROP If the word is not found, consider all possible ways to drop a single letter from the word. If any of them are in the dictionary, then print the word as corrected with drop, and stop. For example, quinecunx can be changed to quincunx by dropping e.
SWAP If the word is not yet corrected, consider all possible ways to swap two consecutive letters from the word. If any of one of these in the dictionary, then print the word as corrected with swap, and stop. For example, serednipity can be transformed to serendipity by swapping the letters d and n.
REPLACE If the word is not yet corrected, consider all possible ways to change a single letter in the word with any other letter from the alphabet. You can hardcode a list of all letters in the alphabet for this part:
letters = [ ‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’, ‘j’, ‘k’, ‘l’, ‘m’, ‘n’, ‘o’, ‘p’, ‘q’, ‘r’, ‘s’, ‘t’, ‘u’, ‘v’, ‘w’, ‘y’, ‘z’ ]
If any of these are in the dictionary, then print it as corrected with replace, and stop. For example, sockpolager can be changed to sockdolager by replacing p with d.
NO MATCH If the word is not corrected by any of the above steps, print NO MATCH.
If there are multiple matches in any step, return the match that is smallest lexicographically. This is not super meaningful, but it will do for now. Assume words only contain English letters, no hyphens, quotations or other non-alphabetical symbols.
Implement the potential ways to do autocorrection in the order given to you: Match, Drop, Swap, Replace. You will get plenty of partial credit for all of your possible matches. When printing words and matches, format them to to 15 characters.
Here is a potential output of your program for the dictionary we have given you (note that as usual, we will use a more extensive dictionary on Submitty, so your results will vary during submission):
Dictionary file => words_10pt.txt words_10pt.txt Input file => input_words.txt input_words.txt
ditrust catroon martial fridges throned frozen blizzard bailers ampuls fool generously ramosely imprudences thiotepa lair
-> ditrust -> cartoon -> marital -> fidges -> thorned -> frozen -> blizzard -> bailors -> ampul
-> food -> generously -> rimosely -> imprudence -> thiotepa -> liar
:NO MATCH :SWAP :SWAP :DROP :SWAP :FOUND :NO MATCH :REPLACE :DROP :REPLACE :NO MATCH :REPLACE :DROP :FOUND :SWAP
dissatisfaktion -> dissatisfaction :REPLACE
aphthae blunging casas blungign casas culverts butterscottches -> butterscotches :DROP
curries pymurgies aland babbled cruises seta melodise crepe
-> curries -> zymurgies -> alane -> babbler -> crises -> eta -> melodies -> creep
-> aphthae -> blunging -> casa -> blunging -> casa
:NO MATCH :FOUND :DROP :SWAP :DROP
-> culverts
:NO MATCH
:NO MATCH :REPLACE :REPLACE :REPLACE :DROP :DROP :SWAP :SWAP
When you are sure your homework works properly, submit it to Submitty. Your program must be named hw6Part1.py to work correctly.
To finish up, you can see problems with this approach. Some later matches may have been better than earlier ones for some words, but we considered them in a strict order. We also did not consider some potential misspellings like adding a letter or phonetic misspelling of werdz! We still would like to limit the potential changes to the most likely ones, like keeping in mind what keyboard people are using or which words are more common within the language. We will address some of these in the next homework by adding on to this solution.
Fantastic beasts in the wizarding world of CS1…
In this part, you will use a file containing various beasts that were featured in different titles in the Harry Potter series of books and movies. We give you a simple data file called titles.txt to test your program, but we will be working on a more comprehensive one for Submitty.
Each line of the file contains a title (movie or book), followed by all the beasts featured in this title. For example, the following line:
Harry Potter and the Goblet of Fire|Hippocampus|Merpeople|Niffler
mentions that the title Harry Potter and the Goblet of Fire featured three beasts: Hippocampus, Merpeople and Niffler. You can assume that the data is valid and that splitting on the ‘|’ string will correctly parse the lines in titles.txt
Your program must do the following:
• Read the name of a title from the user. The user will enter any part of the title (in any case, upper or lower). Find the first title in the list that contains the input string. For example, the user may enter FIRE, which will match the above line.
If no match is found, print the message.
If a match is found, then do the following:
– Print all the beasts that were featured in this title in lexicographical order
– Print all the other titles that have at least one beast in common with this title in lexicographical order
– Print all the beasts that were featured only in this title (i.e. no other title has these beasts), again in lexicographical order
Consider which set operation will help with the last two operations . . .
• Once you have finished processing the input (match or no match), ask for another input, until the user inputs stop.
Your output will likely contain long lists of values. It must be formatted nicely! If your output is correct but is not formatted properly, you will lose some points but not all. However, Submitty may not catch this and you may only get points back when your TA reads your homework. In either case, your formatting must be identical to that expected in Submitty to get full credit,
Each list you print should be separated with a comma and a space. You will find that using print to achieve this is difficult. It is easier to construct a string with the information you want, for example: line = val1 + “, ” + val2, and then print the line.
If you have a very long line, you can use the textwrap module to break it into a list of multiple lines. Here is an example use of this module.
>>> import textwrap >>> par = "this is a very long line with many unnecessary words using it to show how
textwrap works"
>>> lines = textwrap.wrap(par)
>>> lines
[‘this is a very long line with many unnecessary words using it to show’, ‘how textwrap
works’]
>>> for line in lines:
… print(line)
…
this is a very long line with many unnecessary words using it to show to how textwrap works
Here is an example run of your program:
Enter a title (stop to end) => fantastic fantastic
Found the following title: Fantastic Beasts Beasts in this title: Ashwinder, Billywig, Bowtruckle, Demiguise, Diricawl, Doxy, Erumpent, Fwooper, Graphorn, Hippocampus, Lethifold, Mooncalf, Murtlap, Nundu, Occamy, Runespoor, Swooping Evil, Thestral
Other titles containing beasts in common: Harry Potter and the Cursed Child, Harry Potter and the Deathly Hallows, Harry Potter and the Goblet of Fire, Harry Potter and the Order of the Phoenix, the Wizarding World of Harry Potter
Beasts appearing only in this title: Ashwinder, Billywig, Diricawl, Graphorn, Lethifold, Mooncalf, Nundu, Occamy, Runespoor, Swooping Evil
Enter a title (stop to end) => phoenix phoenix
Found the following title: Harry Potter and the Order of the Phoenix
Beasts in this title: Bowtruckle, Doxy, Murtlap, Thestral Other titles containing beasts in common: Fantastic Beasts Beasts appearing only in this title: Enter a title (stop to end) => end
end This title is not found!
Enter a title (stop to end) => stop stop
When you are sure your homework works properly, name it correctly as hw6Part2.py and submit it to Submitty.