程序代写代做代考 Java python scheme information retrieval database algorithm data science Semantics 1: Lexical Meaning & WordNet

Semantics 1: Lexical Meaning & WordNet
Language and Meaning
This time:
Language and meaning Lexical Semantics
Lexemes, Lemmas and Word Senses Lexical Relations
1 Lexical Semantics
The meaning of individual words
2 Phrasal/Sentential Semantics
How do word meanings combine to build meanings for phrases? Compositional Semantics
3 Context and World Knowledge
How sentential meanings combine with each other and in context
Reference: she, it, here, that dog, my cat
Discourse semantics
Pragmatics: Could you pass the salt?, I like salt, Very funny!
Data Science Group (Informatics) NLE/ANLP Autumn 2015 2 / 24
Lexemes and Word Senses
A lexeme is an abstract pairing of meaning and form
A lemma is the grammatical form used to represent a lexeme
WEEP is the lemma for wept MOUSE is the lemma for mice
Specific surface forms: wept, weeps, mice are called wordforms The lemma BANK has two senses:
A bank can hold the investments in a custodial account
As agriculture burgeons on the east bank, the river will shrink
A sense is a discrete representation of one aspect of the meaning of a word
Homonony/Polysemy Hyponomy/Hypernymy Synonymy/Antonymy Holonymy/Meronymy
WordNet
WordNet Synsets WordNet Hierarchies
WordNet-Based Similarity
Data Science Group (Informatics)
Lexical Semantics
NLE/ANLP
Autumn 2015
1 / 24
Lexical semantics concerns the meaning(s) or sense(s) of words – more generally, meaning of lexemes
The big questions:
How can lexical semantics be represented? How can we acquire it?
Problem:
Word sense is a hidden variable
We observe the words, not the word senses
Data Science Group (Informatics) NLE/ANLP Autumn 2015
3 / 24
Data Science Group (Informatics) NLE/ANLP Autumn 2015 4 / 24

Word Senses
Lexico-Semantic Relations
Consider the possible meanings (senses) of the word head: body part
leader
head teacher
Sense inventory:
A collection of senses
Can include information about how senses relate to each other
Data Science Group (Informatics) NLE/ANLP Autumn 2015
Homonymy & Polysemy
Homonymy: different lexemes with same form but unrelated meanings
bank (financial institution) vs. bank (riverside) bar (stick-like thing) vs. bar (place to buy drink) bass (musical instrument) vs. bass (fish)
Homonymy can be a problem for NLP applications:
5 / 24
Relationships between word senses:
Homonymy/Polysemy
— same form, multiple meanings
Hyponymy/Hypernymy
— relationship between the general and the specific
Holonymy/Meronymy
— relationship between a whole and its parts
Synonymy/Antonymy
— identical/opposite meanings
Data Science Group (Informatics) NLE/ANLP Autumn 2015
Homonymy & Polysemy
Polysemy: single lexeme with multiple, related meanings bank (the financial institution) vs. bank (the building)
chicken (the animal) vs. chicken (the meat) Most non-rare words have multiple meanings
– number of meanings generally increases with a word’s frequency
Verbs tend more to polysemy
Distinguishing polysemy & homonymy not always easy – or necessary
6 / 24
Text-to-Speech
Same orthographic form but different pronunciation: (bass vs bass) Information Retrieval
Different meanings same orthographic form: QUERY: bass care Machine Translation
Speech Recognition
Data Science Group (Informatics) NLE/ANLP Autumn 2015
7 / 24
Data Science Group (Informatics) NLE/ANLP Autumn 2015
8 / 24

Hyponomy & Hypernymy
Synonymy & Antonymy
One sense is a hyponym of another if the first denotes a subclass of the second:
One word is a synonym of another if they have the same meaning:
automobile is a synonym of car big is a synonym of large
cry is a synonym of weep
Two lexemes are synonyms if they can be successfully substituted for each other:
car is a hyponym of vehicle mouse is a hyponym of mammal tree is a hyponym of plant
Conversely:
vehicle is a hypernym of car mammal is a hypernym of mouse etc.
Hyponymy relation closely related to IS_A relation or class inclusion relation of inheritance hierarchies.
Data Science Group (Informatics) NLE/ANLP Autumn 2015
Synonymy & Antonymy
Senses that are opposite in respect of some aspect of meaning:
hot is an antonym of cold success is an antonym of failure attack is an antonym of defend etc., etc.
More generally, antonymy is a (context dependent) binary opposition:
red vs. white (wine)
red vs. green (traffic lights) red vs. black (bank balance)
There are few (no?) examples of perfect synonymy/antonymy – Why?
he drove the car/automobile down the road she cried/wept tears of joy
9 / 24
Data Science Group (Informatics)
Holonymy & Meronymy
NLE/ANLP
Autumn 2015
10 / 24
Holonymy is the relationship between a whole and a part:
car is a holonym of engine mouse is a holonym of tail tree is a holonym of bark
Meronymy is the converse relation:
engine is a meronym of car tail is a meronym of mouse etc., etc.
Data Science Group (Informatics) NLE/ANLP Autumn 2015
11 / 24
Data Science Group (Informatics) NLE/ANLP Autumn 2015
12 / 24

WordNet: http://wordnet.princeton.edu
WordNet: Some Statistics
A hierarchically organized lexical database for English – online thesaurus + aspects of a dictionary
POS
Word Forms
Synsets
Noun Verb Adjective Adverb
117097
11488
22141
4601
81426
13650
18877
3644
Totals
155327
117597
Nouns, verbs, adjectives and adverbs
– grouped into sets of synonyms (synsets)
Each synset expresses a distinct concept – synsets interlinked by lexical relations
Many APIs available:
– Java, Perl, Python, R, Ruby, …
Data Science Group (Informatics)
WordNet Hierarchies
beer (sense 1, hypernyms) ⇒brew
NLE/ANLP
Autumn 2015
13 / 24
Data Science Group (Informatics)
WordNet Synsets
NLE/ANLP
Autumn 2015
14 / 24
⇒ alcohol, alcoholic beverage, intoxicant, inebriant ⇒ beverage, drink, potable
⇒ food, nutrient
⇒ substance, matter
⇒ object, inanimate object, physical object ⇒ entity
The main organisational relation in WordNet is synonymy
The set of (near-)synonyms for a WordNet sense is called a synset (synonym set)
– represents a sense or a concept Example: dog as a noun in the sense:
‘a smooth-textured sausage of minced beef or pork usually smoked; often served on a bread roll’
{frank, frankfurter, hotdog, hot_dog, dog, wiener, wienerwurst, weenie}
For WordNet, the meaning of this sense of dog is just this list.
Data Science Group (Informatics)
NLE/ANLP Autumn 2015
15 / 24
Data Science Group (Informatics) NLE/ANLP Autumn 2015 16 / 24

Sense Inventories
WordNet in the NLTK
>>> from nltk.corpus import wordnet as wn
>>> wn.synsets(’mouse’)
[Synset(’mouse.n.01’), Synset(’shiner.n.01’), Synset(’mouse.n.03’), Synset(’mouse.n.04’), Synset(’sneak.v.01’), Synset(’mouse.v.02’)]
>>> wn.synset(’mouse.n.01’).hypernyms()
[Synset(’rodent.n.01’)]
>>> print(wn.synset(’mouse.n.03’).definition())
person who is quiet or timid
>>> print(wn.synset(’mouse.n.04’).definition())
a hand-operated electronic device that controls the coordinates of a cursor on your computer screen as you move it around on a pad; on the bottom of the device is a ball that rolls on the surface of the pad
Being definitive can be hard:
Forced to make decisions as to how to carve meanings up — sometimes feels rather arbitrary
— no scope for fuzziness
Sense inventories vary as to the number of senses included — how fine-grained are the distinctions
— WordNet is very fine-grained in some places
Particularly hard to be definitive about synonyms — almost no such thing as complete synonymy
Data Science Group (Informatics)
Limitations of WordNet
WordNet has some limitations:
Does not include:
etymology pronunciation,
forms of irregular verbs
NLE/ANLP
Autumn 2015
17 / 24
Data Science Group (Informatics) NLE/ANLP Autumn 2015
Word Similarity
Can be useful to know how similar two words are in meaning – NB: a looser notion of similarity than synonymy
Words are more similar if they share more features of meaning – NB: actually, similarity between word senses
Word similarity has application to many NLP tasks:
Spell checking
Information retrieval Question answering Machine translation
Natural language generation Language modeling Automatic essay grading
18 / 24
Doesn’t distinguish between homonymy and polysemy Contains limited information about word usage
A database of many common words
– does not cover special domain vocabulary
Data Science Group (Informatics) NLE/ANLP
Autumn 2015
19 / 24
Data Science Group (Informatics) NLE/ANLP Autumn 2015
20 / 24

Word Similarity Algorithms
WordNet Similarity
WordNet-based algorithms:
– Based on whether words are ‘near to’ on another in WordNet – How do we define ‘near to’ in WordNet?
Distributional algorithms:
– Based on whether words share ‘distributional contexts’ in corpora
– How do we define ‘distributional context’ in corpora?
Make use of the WordNet hyponyn/hypernym hierarchy
Simplest idea: measure length of path between synsets
– the shorter the path the more similar two word senses are
Variations on this simple scheme: Leacock-Chodorow, Wu-Palmer
Data Science Group (Informatics) NLE/ANLP
WordNet Similarity
Simple path-based measure has limitations:
Autumn 2015
21 / 24
Data Science Group (Informatics) NLE/ANLP
Next Topic: Distributional Similarity
Distributional Models of Meaning Context Features
Words as Feature Vectors Measuring Similarity
Autumn 2015
22 / 24
Each link represents a uniform distance
– but nickel and money seem closer than nickel and standard Need metrics that associate a cost with each edge
Information-theoretic similarity measures – e.g. Resnik
Various similarity measures implemented for WordNet: – Ted Pedersen’s WordNet::Similarity package
Data Science Group (Informatics) NLE/ANLP Autumn 2015
23 / 24
Data Science Group (Informatics)
NLE/ANLP
Autumn 2015
24 / 24