School of Computing and Information Systems The University of Melbourne COMP90042
NATURAL LANGUAGE PROCESSING (Semester 1, 2020)
Workshop exercises: Week 6
1. Give illustrative examples that show the difference between:
Discussion
(a) Synonyms and hypernyms
(b) Hyponyms and meronyms
2. Using some Wordnet visualisation tool, for example,
http://wordnetweb.princeton.edu/perl/webwn and the Wu & Palmer definition of word similarity, check whether the word information is more similar to the word retrieval or the word science (choose the sense which minimises the distance). Does this mesh with your intuition?
3. What is word sense disambiguation?
4. For the following term co-occurrence matrix (suitably interpreted):
cup not(cup) world 55 225
not(world) 315 1405
(a) Find the Point-wise Mutual Information (PMI) between these two terms in
this collection.
(b) What does the value from (a) tell us about distributional similarity?
5. In the 09-distributional-semantics iPython notebook, a term–document matrix is built to learn word vectors.
(a) What is the Singular Value Decomposition (SVD) method used for here? Why is this helpful?
6. What is a word embedding and how does it relate to distributional similarity? (a) What is the difference between a skip-gram model and a CBOW model?
(b) How are the above models trained?
Programming
1. Consider the iPython notebook 08-lexical-semantics. Repeat the exercise about word similarity from the Discussion problems above (Q2); confirm that you get the same answer. Now try the Lin similarity — do you get the same result? Why or why not?
2. Use the 09-distributional-semantics iPython notebook to find some in- teresting collocations, using PMI.
(a) Write a wrapper function which finds the 10 collocations with the greatest PMI, amongst all of bi-grams in the collection. (Note that you might want to be careful about your strategy for doing this in a very large collection!)
1
(b) NLTK has an in-built method collocations() (of a Text object) — does it come up with the same collocations as PMI? Why do you think this is the case?
Catch-up
• What is information, with respect to entropy? How might we calculate the in- formation of a word in a corpus? How might we calculate the information of a (textual) message with respect to the information in a corpus? (There are many different ways!)
• What is WordNet? What is a synset?
• What is the cosine similarity and how is it calculated?
• What is entropy and how is it calculated? What is entropy attempting to mea- sure?
• What is a term–document matrix? How is it different to an inverted index?
• What is a TF-IDF model? What are its intuitions and how do they appear in a
typical model (formula)?
Get ahead
• Choose an individual word and consider its different synsets in Wordnet.
– Find a number of instances of that word in a corpus. Assign each token to its
sense. Is one sense more frequent than the other senses?
• Build a system which attempts to use the Lesk strategy of WSD based on the Wordnet bindings in NLTK. Choose some word(s) in some sentence(s) and ob- serve its output — does the correct sense get returned? Why or why not?
• In the notebook 09-distributional-semantics:
– Try doing the same calculations on the collection without the SVD method.
How much time does the truncation step save?
– Try different values for truncating the decomposition; at what point do the results seem to become noticeably worse?
– How important is the TF-IDF step? Try the retrieval without it; do the results change? What if you omit it from only the document matrix, or only the query vector?
– Try to find some queries where the results are different with and without the TF-IDF transformation.
2