CS计算机代考程序代写 chain database Lexical Semantics

Lexical Semantics
COMP90042
Natural Language Processing
Lecture 9
Semester 1 2021 Week 5 Jey Han Lau
COPYRIGHT 2021, THE UNIVERSITY OF MELBOURNE
1

COMP90042
L9
Sentiment Analysis
• Bagofwords,kNNclassifier.Trainingdata:
“This is a good movie.” → ☺ “This is a great movie.” → ☺ “This is a terrible film.” → ☹
‣ ‣ ‣

• Twoproblems:
“This is a wonderful film.” → ?
‣ The model does not know that “movie” and “film” are synonyms. Since “film” appears only in negative examples the model learns that it is a negative word.
‣ “wonderful” is not in the vocabulary (OOV – Out-Of- Vocabulary).
2

COMP90042
L9
Sentiment Analysis Comparing words directly will not work. How to
make sure we compare word meanings instead?


Solution: add this information explicitly through a
lexical database.
3

COMP90042
L9


Lexical semantics (this lecture)
‣ How the meanings of words connect to one another. ‣ Manually constructed resources: lexical database.
Word Semantics
Distributional semantics (next)
‣ How words relate to each other in the text.
‣ Automatically created resources from corpora.
4

COMP90042
L9
• • •
Lexical Database
Word Similarity
Word Sense Disambiguation
Outline
5

COMP90042
L9

Their dictionary definition
‣ But dictionary definitions are necessarily circular ‣ Only useful if meaning is already understood
What Is Meaning?

Their relationships with other words ‣ Also circular, but better for text analysis
6

COMP90042
L9

A word sense describes one aspect of the meaning of a word
Definitions
7

COMP90042
L9


A word sense describes one aspect of the meaning of a word
Definitions
If a word has multiple senses, it is polysemous
8

COMP90042
L9


Gloss: textual definition of a sense, given by a dictionary
Meaning Through Dictionary
Bank
‣ financial institution that accepts deposits and channels the money into lending activities
‣ sloping land (especially the slope beside a body of water)
9

COMP90042
L9


Another way to define meaning: by looking at how it relates to other words

Antonymy: opposite meaning ‣ long vs. short
‣ big vs. little
Meaning Through Relations
Synonymy: near identical meaning ‣ vomit vs. throw up
‣ big vs. large
10

COMP90042
L9

Hypernymy: is-a relation ‣ cat is an animal
‣ mango is a fruit

Meronymy: part-whole relation ‣ leg is part of a chair
‣ wheel is part of a car
Meaning Through Relations (2)
11

COMP90042
L9
What are the relations for these words?
• dragon and creature
• book and page
• comedy and tragedy
PollEv.com/jeyhanlau569
12

COMP90042
L9
13

COMP90042
L9
14

COMP90042
L9
Meaning Through Relations (3)
hypernymy
15

COMP90042
L9
WordNet
A database of lexical relations
English WordNet includes ~120,000 nouns, ~12,000 verbs, ~21,000 adjectives, ~4,000 adverbs
• •

• •
On average: noun has 1.23 senses; verbs 2.16 WordNets available in most major languages
(www.globalwordnet.org, https://babelnet.org/)
English version freely available (accessible via NLTK)
16

COMP90042
L9
WordNet Example
17

COMP90042
L9
Synsets
• NodesofWordNetarenotwordsorlemmas,butsenses • Therearerepresentedbysetsofsynonyms,orsynsets • Basssynsets:
‣ {bass1, deep6}
‣ {bass6, bass voice1, basso2} • Anothersynset:
‣ {chump1, fool2, gull1, mark9, patsy1, fall guy1, sucker1, soft touch1, mug2}
‣ Gloss: a person who is gullible and easy to take advantage of
18

COMP90042
L9
Synsets (2) >>> nltk.corpus.wordnet.synsets(‘bank’)
[Synset(‘bank.n.01’), Synset(‘depository_financial_institution.n.01’), Synset(‘bank.n.03’), Synset(‘bank.n.04’), Synset(‘bank.n.05’), Synset(‘bank.n.06’), Synset(‘bank.n.07’), Synset(‘savings_bank.n.02’), Synset(‘bank.n.09’), Synset(‘bank.n.10’), Synset(‘bank.v.01’), Synset(‘bank.v.02’), Synset(‘bank.v.03’), Synset(‘bank.v.04’), Synset(‘bank.v.05’), Synset(‘deposit.v.02’), Synset(‘bank.v.07’), Synset(‘trust.v.01′)]
>>> nltk.corpus.wordnet.synsets(‘bank’)[0].definition()
u’sloping land (especially the slope beside a body of water)‘
>>> nltk.corpus.wordnet.synsets(‘bank’)[1].lemma_names()
[u’depository_financial_institution’, u’bank’, u’banking_concern’, u’banking_company’]
19

COMP90042
L9
Noun Relations in WordNet
20

COMP90042
L9
Hypernymy Chain
21

COMP90042
L9
Word Similarity
22

COMP90042
L9




Synonymy: film vs. movie
What about show vs. film? opera vs. film?
Word Similarity
Unlike synonymy (which is a binary relation), word similarity is a spectrum
We can use lexical database (e.g. WordNet) or thesaurus to estimate word similarity
23

COMP90042
L9
Word Similarity with Paths
• GivenWordNet,findsimilaritybasedonpathlength

pathlen(c1,c2) = 1+ edge length in the shortest path
between sense c1 and c2
• similaritybetweentwosenses(synsets)
simpath(c1, c2) =
• similaritybetweentwowords
1 pathlen(c1, c2)
Remember that a node

in the Wordnet graph is a
 synset (sense), not a word!
wordsim(w1, w2) = max simpath(c1, c2) c1∈senses(w1),c2∈senses(w2)
24

COMP90042
L9
Examples
1 = 1 pathlen(c1, c2) 1 + edgelen(c1, c2)
Each node is a synset!
 For simplicity we use just
 the representative word
simpath(c1, c2) = simpath(nickel,coin) = 1/2 = 0.5
simpath(nickel,currency) 
 = 1/4 = 0.25
simpath(nickel,money) 
 = 1/6 = 0.17
simpath(nickel,Richter scale) 
 = 1/8 = 0.13
25

COMP90042
L9
Beyond Path Length
• simpath(nickel,money) = 0.17
• simpath(nickel,Richter scale) = 0.13
• Problem: edges vary widely in actual semantic distance ‣ Much bigger jumps near top of hierarchy
• Solution 1: include depth information (Wu & Palmer)
‣ Use path to find lowest common subsumer (LCS)
‣ Compare using depths
2 × depth(LCS(c1, c2))
depth(c ) + depth(c ) 1 2
High simwup when: • parent is deep
• senses are shallow
simwup(c1, c2) =
26

COMP90042
L9
2 × depth(LCS(c1, c2)) depth(c1) + depth(c2)
simwup(nickel,money) = 2*2 / (6+3) = 0.44
simwup(dime,Richter scale) = ? PollEv.com/jeyhanlau569
simwup(c1, c2) =
Examples
27

COMP90042
L9
28

COMP90042
L9
2 × depth(LCS(c1, c2)) depth(c1) + depth(c2)
simwup(nickel,money) = 2*2 / (6+3) = 0.44
simwup(dime,Richter scale) = 2*1 / (6+3) = 0.22
simwup(c1, c2) =
Examples
29

COMP90042
L9

But node depth is still poor semantic distance metric
‣ simwup(nickel,money) = 0.44
‣ simwup(nickel,Richter scale) = 0.22
Nodes high in the hierarchy is very abstract or
general
How to better capture them?


Abstract Nodes
30



• •

Intuition:

general node → high concept probability (e.g. object)
 narrow node → low concept probability (e.g. vocalist)
Find all the children ∈ node, and sum up their unigram probabilities!
COMP90042
L9
Concept Probability Of A Node
∑s∈child(c)) count(s) N
P(c) =
child(c): synsets that are children of c
child(geological-formation) =
 {hill, ridge, grotto, coast, 
 natural elevation, cave, shore}
child(natural elevation) = 
 {hill, ridge}
31

COMP90042
L9

Abstract nodes higher in the hierarchy has a higher P(c)
Example
32

COMP90042
L9
Similarity with Information Content
general concept = small values narrow concept = large values
use IC instead of depth (simwup)
IC = − log P(c)
2 × IC(LCS(c1, c2) IC(c1) + IC(c2)
simlin(c1, c2) = simlin(hill, coast) =
=
high simlin when:
• conceptofparentisnarrow
• conceptofsensesaregeneral
2 × −log P(geological-formation) −log P(hill) − log P(coast)
−2 log 0.00176
−log 0.0000189 − log 0.0000216
If LCS is entity, -2 log(0.395)!
33

COMP90042
L9
Word Sense
Disambiguation
34

COMP90042
L9
Word Sense Disambiguation
• Task:selectsthecorrectsenseforwordsina sentence
• Baseline:
‣ Assume the most popular sense
• GoodWSDpotentiallyusefulformanytasks
‣ Knowing which sense of mouse is used in a
sentence is important!
‣ Less popular nowadays; because sense information is implicitly captured by contextual representations (lecture 11)
35

COMP90042
L9
Supervised WSD
Apply standard machine classifiers
Feature vectors typically words and syntax around target
‣ But context is ambiguous too!
‣ How big should context window be? (in practice small)

• •
Requires sense-tagged corpora
‣ E.g. SENSEVAL, SEMCOR (available in NLTK) ‣ Very time consuming to create!
36

COMP90042
L9
Unsupervised: Lesk
• Lesk:ChoosesensewhoseWordNetglossoverlapsmost with the context
• Thebankcanguaranteedepositswilleventuallycover future tuition costs because it invests in adjustable-rate mortgage securities.
• bank1:2overlappingnon-stopwords,depositsandmortgage
• bank2:0
37

COMP90042
L9
Unsupervised: Clustering Gather usages of the word
… a bank is a financial institution that …
… reserve bank, or monetary authority is an institution…
… bed of a river, or stream. The bank consists of the sides … … right bank to the right. The river channel …


Perform clustering on context words to learn the different senses
‣ Rationale: context words of the same sense should be similar
38

COMP90042
L9

Disadvantages:
‣ Sense cluster not very interpretable ‣ Need to align with dictionary senses
Unsupervised: Clustering
39

COMP90042
L9



Creation of lexical database involves expert curation (linguists)
Final Words
Modern methods attempt to derive semantic information directly from corpora, without human intervention
Distributional semantics (next lecture!)
40

COMP90042
L9

Reading JM3 Ch 18-18.4.1
41