Semantics 1: Lexical Meaning & WordNet
This time:
Language and meaning Lexical Semantics
Lexemes, Lemmas and Word Senses Lexical Relations
Homonony/Polysemy Hyponomy/Hypernymy Synonymy/Antonymy Holonymy/Meronymy
WordNet
WordNet Synsets WordNet Hierarchies
WordNet-Based Similarity
Data Science Group (Informatics)
NLE/ANLP
Autumn 2015
1 / 24
Language and Meaning
1 Lexical Semantics
The meaning of individual words
2 Phrasal/Sentential Semantics
How do word meanings combine to build meanings for phrases? Compositional Semantics
3 Context and World Knowledge
How sentential meanings combine with each other and in context
Reference: she, it, here, that dog, my cat
Discourse semantics
Pragmatics: Could you pass the salt?, I like salt, Very funny!
Data Science Group (Informatics) NLE/ANLP Autumn 2015 2 / 24
Lexical Semantics
Lexical semantics concerns the meaning(s) or sense(s) of words – more generally, meaning of lexemes
The big questions:
How can lexical semantics be represented? How can we acquire it?
Problem:
Word sense is a hidden variable
We observe the words, not the word senses
Data Science Group (Informatics) NLE/ANLP Autumn 2015 3 / 24
Lexemes and Word Senses
A lexeme is an abstract pairing of meaning and form
A lemma is the grammatical form used to represent a lexeme
WEEP is the lemma for wept MOUSE is the lemma for mice
Specific surface forms: wept, weeps, mice are called wordforms The lemma BANK has two senses:
A bank can hold the investments in a custodial account
As agriculture burgeons on the east bank, the river will shrink
A sense is a discrete representation of one aspect of the meaning of a word
Data Science Group (Informatics) NLE/ANLP Autumn 2015 4 / 24
Word Senses
Consider the possible meanings (senses) of the word head: body part
leader
head teacher
Sense inventory:
A collection of senses
Can include information about how senses relate to each other
Data Science Group (Informatics) NLE/ANLP Autumn 2015 5 / 24
Lexico-Semantic Relations
Relationships between word senses:
Homonymy/Polysemy
— same form, multiple meanings
Hyponymy/Hypernymy
— relationship between the general and the specific
Holonymy/Meronymy
— relationship between a whole and its parts
Synonymy/Antonymy
— identical/opposite meanings
Data Science Group (Informatics) NLE/ANLP Autumn 2015 6 / 24
Homonymy & Polysemy
Homonymy: different lexemes with same form but unrelated meanings
bank (financial institution) vs. bank (riverside) bar (stick-like thing) vs. bar (place to buy drink) bass (musical instrument) vs. bass (fish)
Homonymy can be a problem for NLP applications:
Text-to-Speech
Same orthographic form but different pronunciation: (bass vs bass) Information Retrieval
Different meanings same orthographic form: QUERY: bass care Machine Translation
Speech Recognition
Data Science Group (Informatics) NLE/ANLP Autumn 2015 7 / 24
Homonymy & Polysemy
Polysemy: single lexeme with multiple, related meanings bank (the financial institution) vs. bank (the building)
chicken (the animal) vs. chicken (the meat) Most non-rare words have multiple meanings
– number of meanings generally increases with a word’s frequency
Verbs tend more to polysemy
Distinguishing polysemy & homonymy not always easy – or necessary
Data Science Group (Informatics) NLE/ANLP Autumn 2015 8 / 24
Hyponomy & Hypernymy
One sense is a hyponym of another if the first denotes a subclass of the second:
car is a hyponym of vehicle mouse is a hyponym of mammal tree is a hyponym of plant
Conversely:
vehicle is a hypernym of car mammal is a hypernym of mouse etc.
Hyponymy relation closely related to IS_A relation or class inclusion relation of inheritance hierarchies.
Data Science Group (Informatics) NLE/ANLP Autumn 2015 9 / 24
Synonymy & Antonymy
One word is a synonym of another if they have the same meaning:
automobile is a synonym of car big is a synonym of large
cry is a synonym of weep
Two lexemes are synonyms if they can be successfully substituted for each other:
he drove the car/automobile down the road she cried/wept tears of joy
Data Science Group (Informatics) NLE/ANLP Autumn 2015 10 / 24
Synonymy & Antonymy
Senses that are opposite in respect of some aspect of meaning:
hot is an antonym of cold success is an antonym of failure attack is an antonym of defend etc., etc.
More generally, antonymy is a (context dependent) binary opposition:
red vs. white (wine)
red vs. green (traffic lights) red vs. black (bank balance)
There are few (no?) examples of perfect synonymy/antonymy – Why?
Data Science Group (Informatics) NLE/ANLP Autumn 2015 11 / 24
Holonymy & Meronymy
Holonymy is the relationship between a whole and a part:
car is a holonym of engine mouse is a holonym of tail tree is a holonym of bark
Meronymy is the converse relation:
engine is a meronym of car tail is a meronym of mouse etc., etc.
Data Science Group (Informatics) NLE/ANLP Autumn 2015 12 / 24
WordNet: http://wordnet.princeton.edu
A hierarchically organized lexical database for English – online thesaurus + aspects of a dictionary
Nouns, verbs, adjectives and adverbs
– grouped into sets of synonyms (synsets)
Each synset expresses a distinct concept – synsets interlinked by lexical relations
Many APIs available:
– Java, Perl, Python, R, Ruby, …
Data Science Group (Informatics) NLE/ANLP Autumn 2015 13 / 24
WordNet: Some Statistics
POS
Word Forms
Synsets
Noun Verb Adjective Adverb
117097
11488
22141
4601
81426
13650
18877
3644
Totals
155327
117597
Data Science Group (Informatics) NLE/ANLP Autumn 2015 14 / 24
WordNet Hierarchies
beer (sense 1, hypernyms) ⇒brew
⇒ alcohol, alcoholic beverage, intoxicant, inebriant ⇒ beverage, drink, potable
⇒ food, nutrient
⇒ substance, matter
⇒ object, inanimate object, physical object ⇒ entity
Data Science Group (Informatics) NLE/ANLP Autumn 2015 15 / 24
WordNet Synsets
The main organisational relation in WordNet is synonymy
The set of (near-)synonyms for a WordNet sense is called a synset (synonym set)
– represents a sense or a concept Example: dog as a noun in the sense:
‘a smooth-textured sausage of minced beef or pork usually smoked; often served on a bread roll’
{frank, frankfurter, hotdog, hot_dog, dog, wiener, wienerwurst, weenie}
For WordNet, the meaning of this sense of dog is just this list.
Data Science Group (Informatics) NLE/ANLP Autumn 2015 16 / 24
Sense Inventories
Being definitive can be hard:
Forced to make decisions as to how to carve meanings up — sometimes feels rather arbitrary
— no scope for fuzziness
Sense inventories vary as to the number of senses included — how fine-grained are the distinctions
— WordNet is very fine-grained in some places
Particularly hard to be definitive about synonyms — almost no such thing as complete synonymy
Data Science Group (Informatics) NLE/ANLP Autumn 2015 17 / 24
WordNet in the NLTK
>>> from nltk.corpus import wordnet as wn
>>> wn.synsets(’mouse’)
[Synset(’mouse.n.01’), Synset(’shiner.n.01’), Synset(’mouse.n.03’), Synset(’mouse.n.04’), Synset(’sneak.v.01’), Synset(’mouse.v.02’)]
>>> wn.synset(’mouse.n.01’).hypernyms()
[Synset(’rodent.n.01’)]
>>> print(wn.synset(’mouse.n.03’).definition())
person who is quiet or timid
>>> print(wn.synset(’mouse.n.04’).definition())
a hand-operated electronic device that controls the coordinates of a cursor on your computer screen as you move it around on a pad; on the bottom of the device is a ball that rolls on the surface of the pad
Data Science Group (Informatics) NLE/ANLP Autumn 2015 18 / 24
Limitations of WordNet
WordNet has some limitations:
Does not include:
etymology pronunciation,
forms of irregular verbs
Doesn’t distinguish between homonymy and polysemy Contains limited information about word usage
A database of many common words
– does not cover special domain vocabulary
Data Science Group (Informatics) NLE/ANLP Autumn 2015 19 / 24
Word Similarity
Can be useful to know how similar two words are in meaning – NB: a looser notion of similarity than synonymy
Words are more similar if they share more features of meaning – NB: actually, similarity between word senses
Word similarity has application to many NLP tasks:
Spell checking
Information retrieval Question answering Machine translation
Natural language generation Language modeling Automatic essay grading
Data Science Group (Informatics) NLE/ANLP Autumn 2015 20 / 24
Word Similarity Algorithms
WordNet-based algorithms:
– Based on whether words are ‘near to’ on another in WordNet – How do we define ‘near to’ in WordNet?
Distributional algorithms:
– Based on whether words share ‘distributional contexts’ in corpora
– How do we define ‘distributional context’ in corpora?
Data Science Group (Informatics) NLE/ANLP Autumn 2015 21 / 24
WordNet Similarity
Make use of the WordNet hyponyn/hypernym hierarchy
Simplest idea: measure length of path between synsets
– the shorter the path the more similar two word senses are
Variations on this simple scheme: Leacock-Chodorow, Wu-Palmer
Data Science Group (Informatics) NLE/ANLP Autumn 2015 22 / 24
WordNet Similarity
Simple path-based measure has limitations:
Each link represents a uniform distance
– but nickel and money seem closer than nickel and standard Need metrics that associate a cost with each edge
Information-theoretic similarity measures – e.g. Resnik
Various similarity measures implemented for WordNet: – Ted Pedersen’s WordNet::Similarity package
Data Science Group (Informatics) NLE/ANLP Autumn 2015 23 / 24
Next Topic: Distributional Similarity
Distributional Models of Meaning Context Features
Words as Feature Vectors Measuring Similarity
Data Science Group (Informatics) NLE/ANLP Autumn 2015 24 / 24