程序代写代做代考 python Coursework 2A Questions 1-3: Words¶

Coursework 2A Questions 1-3: Words¶
Last updated: 4th November 2019¶
To check for later updates, go to the coursework web page.
Overview¶
The first part of the coursework is intended to develop basic skills of in extracting, testing, manipulating and presenting data. You will learn to work with a dictionary of English words to implement text processing functionality.
Required files:¶
• PP_CW2A_words.ipynb (this file)
• PP_CW2A_tests.py
• english_words.txt
• spelling.txt
Questions overview:¶
• Q1. Check whether a string is an English word (6 marks)

• Q2. Anagrams and palindromes
▪ (a) Test whether two strings are anagrams. (4 marks)
▪ (b) Find anagrams of a given word. (4 marks)
▪ (c) Test if a given string is a palindrome. (4 marks)
▪ (d) Identify English words of a given length that are palindromes. (2 mark)
• Q3. write a simple spell-checker program (10 marks)


Provided Code¶
I provide the following code which initialises the global variable ENGLISH_WORDS to be a list of all the words in the file english_words.txt. You will need to have that file in the same folder as this file (PP_CW2A_words.ipynb).
You are recommended to use ENGLISH_WORDS in the following questions, whenever you are required to perform a calculation involving all the words in english_words.txt. This is particularly important for a function that needs to check the list of words many times. Reading information from a file typically takes more computational time than other operations, so if data can be stored in memory (e.g. as the value of a Python variable) this will usually be a lot more efficient than reading it from a file each time it is needed. (Of course, when dealing with very large datasets, it may not be possible to store the whole dataset in memory.)
In [4]:
def get_english_words():
with open( “english_words.txt” ) as f:
words = f.readlines()
words = [word.strip() for word in words ]
return words

ENGLISH_WORDS = get_english_words()
Out[4]:
[49, 64]

Q1. Check whether a string is an English word¶
Write a function is_english_word(s), which will test if its input string s is an English word, according to the file english_words.txt. This file contains a large number of English words, including all common words and many very rare words. Proper names are not included, and all words are given in lower case, with one word on each line of the file. You need to download the file of English words. Note that it is not a program file and you do not need to edit it.
Your function is_english_word(s) should take any string as its argument and return a Boolean value — i.e. True or False. More specifically your function should return True if any of the following conditions hold:
• The input string is one of the words in english_words.txt.
• The input string is the same as one of the of the words in english_words.txt except that the input string starts with a capital letter (with all the other letters being small).
• The input string is the same as one of the words in english_words.txt except that the input string is all in capital letters.
• The input string is the word “I”.
If none of these conditions hold, your function should return False.
Examples:¶
INPUT
OUPUT
“python”
True
“Python”
True
“PYTHON”
True
“pyThon”
False
“splap”
False
In [ ]:
## Q1 answer code cell
## Modify this function definition to fulfill the given requirements.
## Expected code length: less than 15 lines.

def is_english_word(s):
pass # do nothing
In [ ]:
# Run this cell to test your is_english_word function
# The PP_CW2A_tests program must be in the same folder as this file.
from PP_CW2A_tests import do_tests
do_tests(is_english_word)

Q2. Identify anagrams and palindromes¶
This question involves determining properties of words and identifying words with those properties. More specifically, question parts Q2(a-d) are concerned with writing functions to identify and find anagrams and palindromes. These concepts are defined as follows:
• Two words are anagrams if they both contain the same letters but in different orders. For example listen is an anagram of silent.
• A word is a palindrome if it is the same forward and backward. In other words, reversing the order of its letters results in the same word. For instance, bob, rotator. The term palindrome is also applied to sentences, which may contain spaces and punctuation marks as well as letters. When considering whether a sentence is a palindrome, the punctuation marks and the case of the letters are usually ignored.

Q2(a) Test whether two strings are anagrams¶
Write a Boolean valued function anagrams(s1, s2), which returns True just in case the string s1 contains the same letters as string s2 but in a different order. The function should be case insensitive — in other words it should return the same value if any letters in either s1 or s2 are changed from upper to lower case or from lower to upper case. You may assume that the input strings contain only letters.
Examples¶
INPUT 1 (string)
INPUT 2 (string)
OUPUT (boolean)
“sidebar”
“seabird”
True
“cheese”
“frizbee”
False
“listen”
“silent”
True
“this”
“this”
False
“Tar”
“Rat”
True
“Tar”
“TAR”
False
In [ ]:
## Q2(a) answer code cell
## Modify this function definition to fulfill the given requirements.
## Expected code length: less than 10 lines.

def anagrams( s1, s2 ):
pass
In [ ]:
# Run this cell to test your anagrams function
# The PP_CW2A_tests program must be in the same folder as this file.
from PP_CW2A_tests import do_tests
do_tests(anagrams)

Q2(b) Find all anagrams of a word¶
Write a function find_all_anagrams(string), which take a string as input and returns a list of all words in the file english_words.txt that are anagrams of the input string. More specifically given an input string the function should return a list [word1, …, wordN] such that each word in this list is a word in the dictionary file such that the value of the function anagrams(string,word) is True (as specified in Q2(a) of this coursework).
Examples¶
INPUT
OUPUT
‘cheese’
[]
‘Python’
[‘phyton’, ‘typhon’]
‘relating’
[‘alerting’, ‘altering’, ‘integral’, ‘tanglier’, ‘triangle’]
Note: Though the instructions for this question are quite brief, they do exactly specify the requirements for this function. Since you should have already defined the anagrams(s1,s2) function for part Q2(a), you should call this function in your definition of the find_all_anagrams(string) function. It is much better programming style to do that, rather than repeat the full code for checking anagrams within `find_all_anagrams(string).
In [ ]:
## Q2(b) answer code cell
## Modify this function definition to fulfill the given requirements.
## Expected code length: less than 5 lines.

def find_all_anagrams(s):
pass
In [ ]:
# Run this cell to test your find_all_anagrams function.
# The PP_CW2A_tests program must be in the same folder as this file.
from PP_CW2A_tests import do_tests
do_tests(find_all_anagrams)

Q2(c) Test whether a string is a palindrome¶
Define a function is_palindrome(s) that it returns True if the given string is a palindrome, otherwise returns False.
More specifically, the function should return true if, the alphabetic characters in the input string form the same sequence if they are read forward as if they are read backwards. Any non-alphabet characters, such as spaces and punctuation marks should be ignored, and letter characters are considered to be the same if one is lower case and the other is upper case. Thus the string “Do geese see God?” is counted as a palindrome, so the function should return True for this string.
Examples:¶
Input
Output
“Bob”
True
“God”
False
“Abba”
True
“No lemon, no melon”
True
“I love Python!”
False
In [ ]:
## Q2(c) answer code cell
## Modify this function definition to fulfill the given requirements
## Expected code length: less than 10 lines.

def is_palindrome(s):
pass
In [ ]:
# Run this cell to test your is_palindrom function
# The PP_CW2A_tests program must be in the same folder as this file.
from PP_CW2A_tests import do_tests
do_tests( is_palindrome )

Q2(d) Find palindromes of length n (in english_words.txt)¶
Define a function find_all_palindromes(n) that returns, in alphabetical order, the list of all palindromes in english_words.txt that are n letters long.
In [ ]:
## Q2(d) answer code cell
## Modify this function definition to fulfill the given requirements.
## Expected code length: less than 5 lines.

def find_palindromes_of_length(n):
pass
In [ ]:
# Run this cell to test your find_all_palindromes function
# The PP_CW2A_tests program must be in the same folder as this file.
from PP_CW2A_tests import do_tests
do_tests( find_palindromes_of_length )

Q3. Write a simple spell-checker¶
To answer this question, you need to write a function spell_check_file, which takes one argument that is the name of a file and prints output similar to that given below, showing all words in the file that may be spelling mistakes. More specifically, for each line where potential spelling mistakes are identified, it should print out the line number followed by a list of the words that have been identified as possibly misspelled.
Thus the output should be similar to the following, which I obtained by running my code on the file spelling.txt:
3 [‘primarry’, ‘secondarry’]
4 [‘recieved’, ‘Phisics’]
5 [‘Comunication’, ‘Religeon’, ‘ll’]
8 [‘ambigous’]
9 [‘cource’]
10 [‘atempt’, ‘commprehend’, ‘Luckilly’]
12 [‘aquainted’, ‘langauge’]
13 [‘simillar’]
14 [‘paticular’]
22 [‘expresion’]
23 [‘expresive’]
Code Restrictions and Size Limits¶
I am looking for a self-contained, simple and concise piece of code which does a reasonable job of identifying potential spelling mistakes. Hence, Your code should satisfy the following limiations:
• It should not import any modules.
• It should not make use of any functions or constants defined in other cells of your PP_CW2A_words.ipynb file, apart from your is_english_word function, which you should definitely use.
• It should not exceed 30 lines in length.
• It should not contain any line longer than 80 characers wide.
My solution, that produced the output shown above, falls well within these restrictions, consisting of 16 lines of code, none longer than 40 characters (not counting indentation).
Notes and Sugestions¶
I have not specified exactly which strings in the input file should be counted as “potential spelling mistakes”. This is intentended, since I want you to think of the problem as one of real data handling, rather than a purely artificial exercise. Like many real problems involving manipulation of real data, exact criteria that determine its interpretation and classification may be difficult or even impossible to state precisely. In such cases, we typically start with an intuitive idea of what we want to do with the data, and soon realise that we have to make this more precise to actually implement a solution. We then try to specify the details of the processing required and results we want to obtain. After some analysis and consideration of examples, we can usually come up with a speicification that works well and gives useful results in nearly all cases. However, for a real problem involving a complex real data sorce, it is unlikely that we can find a solution that gives useful and desirable results in 100% of cases.
Some useful functions to construct a simple solution¶
Clearly, the is_english_word function that you defined for Q1(a), will be very useful. You may assume that any English word is not a spelling mistake.
The following basic Python built-in functions are very useful for manipulating raw data:
• tokens = s.split() — Use this kind of construct to chop a string (or whole document contents into parts. Given no arguments it will split a string at all points where there is whitespace (spaces, tabs and/or newlines).
• s = s.strip() — This construct will replace s by a cleaned up version with no whitespace at either end (but can be in the middle).
• s = s.replace(x,y) — replace all occurrences of x with y in the string s, where x and y can be either single characters or strings.
• s = s.replace(x,”) — replace all occurrences of x in s by nothing — i.e. delete them.
Issues to deal with¶
• Punctuation — this presents an immediate problem, especially since punctuation symbols may occur directly before or after a word. Worse still, certain punctuation marks such as hypens and apostrophes may occur within a word. Many cases can be dealt with by simply deleting punctuation symbols, but this is by no means a perfect solution.

• Odd capitalisation — our is_english_word function assumes that an English word must be either: all lower case, all uppoer case, or, start with an upper case letter and have the rest all lower. This is quite a good rule but different forms are sometimes used (for example eBook).

• Unusual or colloquial words — clearly many books contain peculiar non-standard words, especially in quoted speech.

• Proper names. Typically, these start with a capital letter; but how can you tell a proper name from a wrongly spelled word that is capitalised because it is at the beginning of a sentence? There is no easy way.

• Runtime — checking words against a list of corret English words can take a long time if every word is being checked against every word in the list. However, most of this time is unnecessary. Preprocessing of a list can enable one to tell much more quickly whether it contains a given element.

I do not expect your code to perfectly solve any of these problems, but try to deal with them as best you can given the code length restrictions given above and the limited time you have available.
In [ ]:
## Q3 answer code cell
## Modify this function definition to fulfill the given requirements.
## Maximum code length: 30 lines

def spell_check_file( filename ):
pass

spell_check_file testing¶
Run the following cell to run your spell_check_file function on the example file spelling.txt and see the result.
You may want to also test your spell_check_file function with some other longer examples (such as the book files provided for CW2 Part B). But please delete such tests from the final version you submit, since they may not run correctly if they try to load an external file.
In [ ]:
%%time
from PP_CW2A_tests import SPELL_TEST_FILENAME
print(“Testing spell_check_file on file:”, SPELL_TEST_FILENAME)
spell_check_file(SPELL_TEST_FILENAME)
In [ ]:
# Hand grading cell
# The marker will fill in your grade after running the cell above.
HAND_GRADES = {“spell_check_file”: None }

Feedback and Grading¶
Feedback¶
The marker will write some feedback here giving a brief explanation of the marking and reasons why you have lost marks. This will usually be just for your spell_check_file function, since the other functions have very specific requiremens for which automated tests will be made.
marker’s feedback goes here

Grading¶
The following code will run tests for all functions that count towards your grade for Coursework 2A, Questions 1-3, and calculate the final mark you will get for this part of the coursework. (2A Questions 4 and 5 are specified and graded in the separate file PP_CW2A_CSV_HTML.ipynb.)
To get your grade simply select the “Run All” option from the Jupyter “Cell” menu above. Test results followed by your overal grade will be shown below. You can do this at any point to see the marks you would get for what you’ve done so far.
In [ ]:
from PP_CW2A_tests import do_all_tests
functions = [
is_english_word,
anagrams,
find_all_anagrams,
is_palindrome,
find_palindromes_of_length
]

do_all_tests( functions, HAND_GRADES)

Submission¶
• Coursework should be submitted via an upload widget in on the Assessment page of the module’s Minerva website.
• An announcement will be made via Minerva and Email, to let you know when the assessement upload is available.
• You should submit your edited file PP_CW2A_words.ipynb.
• The deadline for submission will be posted on the Assessment page of the module’s Minerva site.