程序代写 CS 61B Fall 2022

Project 2B: NGordnet (Wordnet) | CS 61B Fall 2022

Copyright By PowCoder代写 加微信 powcoder

Course Info

Project 2B: NGordnet (Wordnet)

Each assignment will have an FAQ linked at the top. You can also access it by adding “/faq” to the end of the URL. The
FAQ for Project 2b is located

Due 10/28/2022 #

In this project, you’ll complete your implementation of the NGordnet tool.

Unlike Project 2a, the implementation for this part of the project is very open-ended. Deciding on an overall design is
an important skill that we’ll also revisit in Project 3. The number of lines of code for this project isn’t necessarily
large, but there are a lot of independent decisions that you’ll need to make along the way.

DISCLAIMER: As this is a totally new project, there may be occasional bugs or confusions with the spec. If you notice
anything of this sort, please post on Ed or contact Professor Hug directly with any anomalies that you observe at

Getting Started #

As you’ve done with other assignments in this class, run git pull skeleton main to get the skeleton code for this
NOTE: You’ll notice that this skeleton is (almost) the exact same as the Project 2a skeleton. Rather than having
you use your own implementations of TimeSeries, NGramMap, HistoryTextHandler, and HistoryHandler, we’ve
instead provided you with working (and obfuscated) implementations of these classes in library-fa22 (see the
next step).

To get the additional libraries for this project, cd into your library-fa22 directory and run git pull. Then,
import all the libraries from library-fa22 into this project like you normally would.
Now that you’ve pulled and imported the libraries, you’ll notice that the code in Main.java (including the lines
that use NGramMap) should no longer be red.

Download the new data files for this project
using this link
and move them into your proj2b folder on the same level as ngordnet.

Using the WordNet Dataset #

Before we can incorporate WordNet into our project, we first need to understand the WordNet dataset.

WordNet is a “semantic lexicon for the English language” that is used
extensively by computational linguists and cognitive scientists; for example, it was a key component in IBM’s Watson.
WordNet groups words into sets of synonyms called synsets and describes semantic relationships between them. One such
relationship is the is-a relationship, which connects a hyponym (more specific synset) to a hypernym (more
general synset). For example, “change” is a hypernym of “demotion”, since “demotion” is-a (type of) “change”. “
change” is in turn a hyponym of “action”, since “change” is-a (type of) “action”. A visual depiction of some hyponym
relationships in English is given below:

Each node in the graph above is a synset. Synsets consist of one or more words in English that all have the same
meaning. For example, one synset
is “jump, parachuting”
, which represents the act of descending to the ground with a parachute. “jump, parachuting” is a hyponym of “descent”,
since “jump, parachuting” is-a “descent”.

Words in English may belong to multiple synsets. This is just another way of saying words may have multiple meanings.
For example, the word “jump” also belongs to the
synset “jump, leap”
, which represents the more figurative notion of jumping (e.g. a jump in attendance) rather the literal meaning of jump
from the other synset (e.g. a jump over a puddle). The hypernym of the synset “jump, leap” is “increase”, since “jump,
leap” is-an “increase”. Of course, there are other ways to “increase” something: for example, we can increase something
through “augmentation,”
and thus it is no surprise that we have an arrow pointing downwards from “increase” to “augmentation” in the diagram

Synsets may include not just words, but also what are known as collocations.
You can think of these as single words that occur next to each other so often that they are considered a single word,
e.g. nasal_decongestant
. To avoid ambiguity, we will represent the constituent words of collocations as being separated with an underscore _
instead of the usual convention in English of separating them with spaces. For simplicity, we will refer to collocations
as simply “words” throughout this document.

A synset may be a hyponym of multiple synsets. For example, “actifed” is a hyponym of both “antihistamine” and “
nasal_decongestant”, since “actifed” is both of these things.

If you’re curious, you can browse the Wordnet database
by using the web interface
, though this is not necessary for this project.

Hyponyms (Basic Case) #

Setting up a HyponymsHandler #

In your web browser, open the ngordnet.html file in the static folder. You’ll see that there is a new button: “
Hyponyms”. Note that there is also a new input box called k.

Try clicking the Hyponyms button. You’ll see nothing happens (and if you open the developer tools feature of your web
browser, you’ll see that your browser shows an error).

In Project 2B, your primary task is to implement this button, which will require reading in a different type of dataset
and synthesizing the results with the dataset from Project 2A. Unlike 2A, it will be entirely up to you to decide what
classes you need to support this task.

Start by opening your ngordnet.main.Main.java file.
Register a new handler called HyponymsHandler that simply returns the word “Hello!” when the user clicks the
Hyponyms button in the browser. You’ll need to create a new HyponymsHandler class that extends
the NgordnetQueryHandler class. See your other Handler classes for examples. Make sure when you register your
handler that you use the string “hyponyms” as the first argument to the register method, and not “hyponym”.
Once you’ve modified Main so that your new handler is registered to handle hyponyms requests, start up Main and
try clicking the Hyponyms button in your web browser again. You should see text appear that says “Hello”.

Hyponyms Handler (Basic Case) #

Next, you’ll create a partial implementation of the Hyponyms button. For now, this button should:

Assume that the “words” entered is only a single word.
Ignore startYear, endYear, and k.
Return a string representation of a list of the hyponyms of the single word, including the word itself. The list
should be in alphabetical order, with no repeated words.

For example, suppose the WordNet dataset looks like the diagram below (given to you as the input files synsets11.txt
and hyponyms11.txt). Suppose that the user enters “descent” and clicks on the Hyponyms button.

In this case, the output of your handler should be the string representation of a list containing “descent”, “jump”
parachuting”, i.e [descent, jump, parachuting]. Note that the words are in alphabetical order.

As another example, suppose we’re using a bigger dataset such as the one below (given to you as the input
files synsets16.txt and hyponyms16.txt):

Suppose the user enters “change” and clicks on the Hyponyms button. In this case, the hyponyms are all the words in the
blue nodes in the diagram below:

That is the output
is [alteration, change, demotion, increase, jump, leap, modification, saltation, transition, variation]. Note that
even though “change” belongs to two different synsets, it only appears once.

Note: Don’t overthink this and make life harder than it needs to be. Specifically, observe that the output does

Synonyms of synonyms (e.g. does not include adjustment)
Hyponyms of synonyms (e.g. does not include conversion)
Hyponyms of other definitions of hyponyms (e.g. does not include “flashback”, which is a hyponym of another definition
of “transition”)

To complete this task, you’ll need to decide what classes you need to create to support the HyponymHandler. DO NOT
DO ALL THE WORK IN HYPONYMS HANDLER. Instead, you should have helper classes. For example, to handle the “History”
button, we created an NGramMap class. You’ll want to do something similar.

In order to complete this task, you’ll need to understand the input format of the WordNet dataset. This description is
given in the section below.

For this part, you may NOT import any existing graph library into your code. That is you can’t import, for example, the
graph implementations from the optional Princeton algorithms textbook. Instead, you should build your own graph class or

Just like NGramMap, you’ll want your helper classes to only parse the input files once, in the constructor. DO NOT
CREATE METHODS WHICH HAVE TO READ THE ENTIRE INPUT FILE EVERY TIME THEY ARE CALLED. This will be too slow!
We strongly recommend creating at least two classes for this part of the project as follows: One which implements the
idea of a directed graph. One which reads in the WordNet dataset and constructs an instance of the directed graph
class. This second class should also be able to take a word and return its hyponyms. You may also want additional
helper classes that represent the idea of a traversal.
Don’t worry about writing JUnit tests yet, we’ll talk about how to do that later in the spec. Simply use the web front
end to check the two input examples (“descent” and “change”) from the diagrams above for synsets16.txt and
hyponyms16.txt.
While you can (and should) write unit tests for the helper classes/methods that you create for this project, another
good way to test and see what’s going on with your code is to simply run Main.java, open ngordnet.html, enter some
inputs into the boxes, and click the “Hyponyms” button. You may find visual debugging can lead to some useful
discoveries in this project.
Because of the obfuscation that we applied to the Project 2a files (in particular, TimeSeries and NGramMap), the
argument name previews when using these classes in IntelliJ may look a little weird. You may see long, random strings;
these are intentional in order to obfuscate the code, and they do not represent an issue with your own code in any way.

WordNet File Format #

We now describe the two types of data files that store the WordNet dataset. These files are in comma separated format,
meaning that each line contains a sequence of fields, separated by commas.

File type #1: List of noun synsets. The file synsets.txt (and other smaller files with synset in the name) lists all
the synsets in WordNet. The first field is the synset id (an integer), the second field is the synonym set (or synset)
, and the third field is its dictionary definition (also called its “gloss” for some reason). For example, the line

36,AND_circuit AND_gate,a circuit in a computer that fires only when all of its inputs fire

means that the synset { AND_circuit, AND_gate } has an id number of 36 and its definition is “a circuit in a
computer that fires only when all of its inputs fire”. The individual nouns that comprise a synset are separated by
spaces (and a synset element is not permitted to contain a space). The S synset ids are numbered 0 through S − 1; the
id numbers will appear consecutively in the synset file. You will not (officially) use the definitions in this
project, though you’re welcome to use them in some interesting way if you’d like if you decide to add optional
features at the end of this project. The id numbers are useful because they also appear in the hyponym files,
described as file type #2.

File type #2: List of hyponyms. The file hyponyms.txt (and other smaller files with hyponym in the name) contains
the hyponym relationships: The first field is a synset id; subsequent fields are the id numbers of the synset’s direct
hyponyms. For example, the following line

79537,38611,9007

means that the synset 79537 (“viceroy vicereine”) has two hyponyms: 38611 (“exarch”) and 9007 (“Khedive”),
representing that exarchs and Khedives are both types of viceroys (or vicereine). The synsets are obtained from the
corresponding lines in the file synsets.txt:

79537,viceroy vicereine,governor of a country or province who rules…
38611,exarch,a viceroy who governed a large province in the
9007,Khedive,one of the Turkish viceroys who ruled Egypt between…

There may be more than one line that starts with the same synset ID. For example, in hyponyms16.txt, we have

This indicates that both synsets 12 and 13 are direct hyponyms of synset 11. These two could also have been combined
on to one line, i.e. the line below would have the exact same meaning, namely that synsets 12 and 13 are direct
hyponyms of synset 11.

You might ask why there are two ways of specifying the same thing. Real world data is often messy, and we have to deal

Suggested Steps to Take #

To get the Hyponyms button working you’ll need to:

Develop a graph class. If you aren’t familiar with this data structure, take a look at lectures 21 and 22.
Write code that converts the WordNet dataset files into a graph. This could be part of your graph class, or it
could be a class that uses your graph class.
Write code that takes a word, and uses a graph traversal to find all hyponyms of that word in the given graph.

We strongly recommended writing a test that evaluates the examples above (on synsets11/hypernyms11 and
synsets16/hypernyms16 using “descent” and “change” respectively).

Handling Lists of Words #

Your next task is to handle lists of words. As an example, if the user enters “change, occurrence” for the diagram
below, we should get only words from the nodes in blue,
i.e [alteration, change, increase, jump, leap, modification, saltation, transition]. “Demotion” and “variation” are not
included because they are not hyponyms of both words; specifically, they are not hyponyms of “occurrence”.

As you can see, we only want to return words which are hyponyms of ALL words in the list. Furthermore, note that the
list of words provided by the user can include more than just 2 words, even though our examples in this spec do not.

As another example which demonstrates the usefulness of this feature, let’s say we are using the full synsets.txt and
hyponyms.txt and enter “female, animal” in the words box. Then, clicking “Hyponyms” should
display [dam, female, female_mammal, filly, hen], as these are all the words which are hyponyms of female and animal.
Or, if we enter “female, leader” in the words box and then click “Hyponyms”, you should get
back [materfamilias, matriarch]. (
Interesting question to answer: Why don’t we get back specific female leaders? I know if you just enter “woman”, you get
back a ton of random people, some of whom are leaders. I’m guessing the leader annotation doesn’t include every property
of these people.)

To test this part of your code, we recommend manually constructing examples using synsets16.txt and hyponyms16.txt
and using the front end to evaluate correctness.

Handling k > 0 #

Above, we handled the situation where k = 0, which is the default value when the user does not enter a k value.

Your final required task is to handle the case where the user enters k. k represents the maximum number of hyponyms
that we want in our output. For example, if someone enters the word “dog”, and then enters k = 5, your code would
return exactly 5 words.

To choose the 5 hyponyms, you should return the k words which occurred the most times in the time range requested. For
example, if someone entered words = “food, cake”, startYear = 1950, endYear = 1990, and k = 5, then you would
find the 5 most popular words in that time period that are hyponyms of both food and cake. Here, the popularity is
defined as the total number of times the word appears over the entire time period. The words should be returned in
decreasing popularity. In this case, the answer is [biscuit, cake, kiss, snap, wafer]. Here it is purely coincidence
that the popularity of the words is also alphabetical.

If two words are tied, you may break ties arbitrarily. In other words, if k = 3, and “slime” occurred 93 times in the
given time period, “ooze” occurred 44 times, “unguent” occurred 12 times, and “oil” also occurred 12 times, then
[“slime”, “ooze”, “oil”] and [“slime”, “ooze”, “unguent”] would both be valid.

Note that if the front end doesn’t supply a year, default values of startYear = 1900 and endYear = 2020 are provided by
NGordnetQueryHandler.readQueryMap.

If k = 0, or the user does not enter k (which results in a default value of zero), then the startYear
and endYear should be totally ignored.

If a word never occurs in the time frame specified, i.e. the count is zero, it should not be returned. In other words,
if k > 0, we should not show any words that do not appear in the ngrams dataset.

If there are no words that have non-zero counts, you should return an empty list, i.e. “[]”.

This task will be a little trickier since you’ll need to figure out how to pass information around so that the
HyponymsHandler knows how to access a useful NGramMap.

IMPORTANT: DO NOT MAKE A STATIC NGRAMMAP FOR THIS TASK! It might be tempting to simply make some sort of
public static NGramMap that can be accessed from anywhere in your code. This is called a “global variable”. We
strongly discourage this way of thinking about programming, and instead suggest that you should be passing an NGramMap
to either constructors or methods. We’ll come back to talking about this during the software engineering lectures.

Until you use the autograder, you’ll need to construct your own test cases. We provide one
above: words = “food, cake”
, startYear = 1950, endYear = 1990, k = 5.
When constructing your own test cases, consider making your own input files. Using the large input files we provide is
extremely tedious.
In the coming sections of this spec, we’ll tell you how to set up your code for submission to the autograder, and how
to write your own JUnit tests to mimic the test cases provided by the grader.

Grading Details and Deliverables #

For Project 2b, the only deliverable is the HyponymsHandler.java file. As noted above, you will likely need to
implement a few helper classes along the way to fully implement this handler; however, we will not be directly grading
these classes, since they can vary from student to student.

This portion of project 2 will be worth 2400 points. The Gradescope autograder will be up sometime in the coming week,
and this section will be updated with additional grader details at that time.

Submitting Your Code, Automated Hyponym Testing, and Grader Compatibility #

Throughout this assignment, we’ve had you use your front end to test your code. Our grader is not sophisticated enough
to pretend to be a web browser and call your code. Instead, we’ll need you to provide a method in the
proj2b_testing.AutograderBuddy class that provides a handler that can deal with hyponyms request

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com