Semantics 2: Distributional Lexical Similarity
This time:
The Distributional Hypothesis
Distributional Models of Meaning
Context Features
Bag-of-Words vs. Grammatical Relations
Comparing Word Meanings
Vector-Space Model for Words Impact of Feature Choice Feature Weighting
Lin’s Similarity Measure Some Examples
Data Science Group (Informatics) NLE/ANLP Autumn 2015 1 / 30
The Distributional Hypothesis
Words that appear in similar contexts tend to have similar meanings
— Harris, 1954
A word is characterised by the company it keeps — Firth, 1957
Data Science Group (Informatics) NLE/ANLP Autumn 2015 2 / 30
Distributional Models of Meaning
Ingredients
Meaning expressed in terms of co-occurrence features Aspects of the context in which the word or phrases appears Forms a ‘vector’ of (weighted) features for each word
Data Science Group (Informatics) NLE/ANLP Autumn 2015 3 / 30
Distributional Models of Meaning
Co-occurrence features for apple A noun that is modified by red
A noun that is the obect of the verb eat
A noun that is coordinated with orange
A noun that is immediately following the word think
Data Science Group (Informatics) NLE/ANLP Autumn 2015 4 / 30
Selectional Restrictions
Words are selective as to what they combine with Being the object of the verb eat is informative
Expect to be something that is edible You can eat an apple but not a table
Being coordinated with orange is informative Expect to be member of similar class of entities apple and orange versus apple and piano
Data Science Group (Informatics) NLE/ANLP Autumn 2015 5 / 30
Selectional Restrictions
Most often seen as features of verbs
You can break a window, but not drink one
You can fly a kite, but not a pair of glasses
Faces can be red, but tennis matches can’t
You can climb a (river) bank but not a (financial institution) bank
Note that this can be basis for word sense disambiguation
Data Science Group (Informatics) NLE/ANLP Autumn 2015 6 / 30
Co-occurrences
Co-occurrences are features of local context Aspects of an occurrence of a word
They help to determine intended sense
Data Science Group (Informatics) NLE/ANLP Autumn 2015 7 / 30
Context Features
Two kinds of co-occurrences:
Bag-of-words
Words that appear nearby
e.g. within same sentence or same paragraph or a fixed window
Ignore stop words
May use frequency cut-off
Capturing local context
Data Science Group (Informatics) NLE/ANLP Autumn 2015 8 / 30
Context Features
Grammatically related words
Words or phrases occurring in specific relation to target word — occur together more often than expected by chance
bright in adjectival relationship with star mouse in object relation with chase dog in subject relation with bark Cameron in co-ordination with Corbyn
Data Science Group (Informatics) NLE/ANLP Autumn 2015 9 / 30
Bag-of-Words vs. Grammatical Relations
Things to note:
Extracting bag-of-words is trivial
Extracting grammatical relations requires parsing Potential to use more data with bag-of-words approach
Data Science Group (Informatics) NLE/ANLP Autumn 2015 10 / 30
Comparing Word Meanings
Similar to situation with documents
Can capture “meaning” of word with a vector Can measure similarity of vectors
Data Science Group (Informatics) NLE/ANLP Autumn 2015 11 / 30
All Pairs Similarity
Co-occurrence features
f1 … fk … fm
t1 . ti . tj . tn
Data Science Group (Informatics)
NLE/ANLP
Autumn 2015
12 / 30
words or phrases
All Pairs Similarity
Co-occurrence features
f1 … fk … fm t1
.
ti xi,k .
tj
.
tn
entries encode feature strength
Data Science Group (Informatics)
NLE/ANLP
Autumn 2015 12 / 30
words or phrases
All Pairs Similarity
Co-occurrence features
f1 … fk … fm t1
.
ti xi,1 … xi,k … xi,m .
tj xj,1 … xj,k … xj,m .
tn
measure how similar two rows are
Data Science Group (Informatics) NLE/ANLP Autumn 2015 12 / 30
words or phrases
Components of a Distributional Similarity Model
Issues:
Components of vectors
— what are the salient distributional features?
Weights of features
— how significant is it that a feature occurs a certain number of times?
Comparing vectors
— how can we measure the similarity of two weighted feature vectors?
Data Science Group (Informatics) NLE/ANLP Autumn 2015 13 / 30
Selecting Co-occurrence Features
Some candidates
Being immediately after/near some particular word
Being in the same sentence/paragraph/document as some particular word
Being the object of a particular verb Being modified by an adjective
Data Science Group (Informatics) NLE/ANLP Autumn 2015 14 / 30
Selecting Co-occurrence Features
Being co-ordinated with another word
Being in a document about some topic/written by an expert Being the object in a phrase with a particular verb and subject Being in a positive review of a laptop computer
Data Science Group (Informatics) NLE/ANLP Autumn 2015 15 / 30
Impact of Feature Choice
Degree of substitutability
How does a co-occurrence feature constrain the meaning?
Bag-of-words given document/sentence co-occurrence gives loose association
– Given rise to topically-related words
e.g. horse is similar to saddle and cart, etc.
Sharing grammatical relations gives tighter semantic association – Similar words means more likely to be inter-substitutable
e.g. horse is similar to donkey and camel, etc.
Data Science Group (Informatics) NLE/ANLP Autumn 2015 16 / 30
Pointwise MI
From frequency to weight
Start by counting number of co-occurrences Not all co-occurrences are equally surprising Not all features equally significant
PMI(w,f) = log p(w,f) p(w)p(f)
Probabilities estimated based on observed frequencies
Data Science Group (Informatics) NLE/ANLP Autumn 2015 17 / 30
A Popular Measure: Lin, 1998
f is significant feature of w if PMI(w,f) > 0 Similarity of w1 and w2 is the following ratio:
PMI(w1, f ) + PMI(w2, f ) PMI(w1, f ) + PMI(w2, f )
Data Science Group (Informatics) NLE/ANLP Autumn 2015 18 / 30
A Popular Measure: Lin, 1998
f is significant feature of w if PMI(w,f) > 0 Similarity of w1 and w2 is the following ratio:
summation over f significant
to both w1 and w2
PMI(w1, f ) + PMI(w2, f ) PMI(w1, f ) + PMI(w2, f )
Data Science Group (Informatics) NLE/ANLP Autumn 2015 18 / 30
A Popular Measure: Lin, 1998
f is significant feature of w if PMI(w,f) > 0 Similarity of w1 and w2 is the following ratio:
summation over f significant
to both w1 and w2
PMI(w1, f ) + PMI(w2, f ) PMI(w1, f ) + PMI(w2, f )
f significant to w1
Data Science Group (Informatics)
NLE/ANLP Autumn 2015 18 / 30
A Popular Measure: Lin, 1998
f is significant feature of w if PMI(w,f) > 0 Similarity of w1 and w2 is the following ratio:
summation over f significant
to both w1 and w2
PMI(w1, f ) + PMI(w2, f ) PMI(w1, f ) + PMI(w2, f )
f significant to w1
f significant to w2
Data Science Group (Informatics)
NLE/ANLP
Autumn 2015
18 / 30
Example
↑
PMI 0
co-occurrence features
f1 f2 f3 f4 f5 f6 f7 f8
Data Science Group (Informatics) NLE/ANLP Autumn 2015 19 / 30
Example
Weight of features of ti
↑
PMI 0
co-occurrence features
f1 f2 f3 f4 f5 f6 f7 f8
Data Science Group (Informatics) NLE/ANLP Autumn 2015 19 / 30
Example
Weight of significant features of ti
↑
PMI 0
co-occurrence features
f1 f2 f3 f4 f5 f6 f7 f8
Data Science Group (Informatics) NLE/ANLP Autumn 2015 19 / 30
Example
Weight of features of tj
↑
PMI 0
co-occurrence features
f1 f2 f3 f4 f5 f6 f7 f8
Data Science Group (Informatics) NLE/ANLP Autumn 2015 19 / 30
Example
Weight of significant features of tj
↑
PMI 0
co-occurrence features
f1 f2 f3 f4 f5 f6 f7 f8
Data Science Group (Informatics) NLE/ANLP Autumn 2015 19 / 30
Example
Combined weight of significant features of ti and tj
↑
PMI 0
co-occurrence features
f1 f2 f3 f4 f5 f6 f7 f8
Data Science Group (Informatics) NLE/ANLP Autumn 2015 19 / 30
Example
Combined weight of common significant features of ti and tj
↑
PMI 0
co-occurrence features
f1 f2 f3 f4 f5 f6 f7 f8
Data Science Group (Informatics) NLE/ANLP Autumn 2015 19 / 30
Many Alternatives
Dice co-efficient
Jaccard
2 · # shared features
# features of w1 + # features of w2
# shared features (intersection) # features of either (union)
Others
L1 norm
α-skew divergence cosine
…
Data Science Group (Informatics)
NLE/ANLP
Autumn 2015
20 / 30
Examples
How well does this work?
Compute distributional neighbours using Lin’s measure
Using British National Corpus — 100M words
Parsed by the RASP toolkit
Co-occurence features invovle grammatical relationships — subject, direct object, adjectival modifier,…
Data Science Group (Informatics) NLE/ANLP Autumn 2015 21 / 30
Examples of Distributional Neighbours
ability skill 0.2653861287224737
ability capacity 0.24158674130868701 ability strength 0.2114777646773671 ability talent 0.2076767754080262 ability achievement 0.19765303477224855 ability competence 0.19559254341931434 ability capability 0.19138921847416082 ability complexity 0.18846415145752846 ability effectiveness 0.18821908294071854 ability quality 0.187423680699889 ability extent 0.18579047998139933 ability potential 0.1850935394990447 ability success 0.18431715359735912 ability importance 0.1841749565076892 ability power 0.18341563164153438
Data Science Group (Informatics) NLE/ANLP Autumn 2015 22 / 30
Examples of Distributional Neighbours
abnormality defect 0.250891348357222 abnormality disorder 0.2134658615339322 abnormality lesion 0.17280573936353938 abnormality similarity 0.17141592714689266 abnormality symptom 0.16713562882588154 abnormality complication 0.16181910601010285 abnormality anomaly 0.16131125869289575 abnormality disease 0.1582805417649711 abnormality damage 0.1559793457881014 abnormality alteration 0.15113014285180482 abnormality pathology 0.15092685913222414 abnormality disturbance 0.14960959094604764 abnormality disability 0.14922115998356333 abnormality deficiency 0.14648983524386774 abnormality difference 0.14648585652274848 abnormality tumour 0.1463837036208119
Data Science Group (Informatics) NLE/ANLP Autumn 2015 23 / 30
Examples of Distributional Neighbours
abode lodging 0.12037859727011194 abode catacomb 0.10408476402714487 abode respite 0.10009360258777102 abode playlist 0.09840801002693085 abode backstreet 0.09698117076010394 abode mooring 0.09665664880035932 abode werewolf 0.09598491030520157 abode stateroom 0.09584537476690254 abode schoolhouse 0.09287407501428385 abode oasis 0.08975639920522817 abode deputation 0.08884960555263872 abode chopper 0.08870767589643078 abode caldera 0.08653059788351661 abode bairn 0.08621180302490923 abode malting 0.086187727036622
Data Science Group (Informatics) NLE/ANLP Autumn 2015 24 / 30
Examples of Distributional Neighbours
absurdity futility 0.17913298381910323 absurdity impossibility 0.15998099614211908 absurdity flaw 0.14305651853884077 absurdity contradiction 0.13983984752211198 absurdity vulnerability 0.13959619970970927 absurdity reluctance 0.1335777187181582 absurdity necessity 0.1332709259629267 absurdity weakness 0.12717529677255307 absurdity fragility 0.12669122913562716 absurdity inadequacy 0.1260337663945221 absurdity hypocrisy 0.1253052888974753 absurdity paradox 0.12343145355210407 absurdity inconsistency 0.12087015353698795 absurdity ambiguity 0.1171107326690902
Data Science Group (Informatics) NLE/ANLP Autumn 2015 25 / 30
Examples of Distributional Neighbours
abuse crime 0.1684367318132214
abuse discrimination 0.1638288105649468 abuse violation 0.16236081451721868 abuse violence 0.156119124972739
abuse breach 0.15326122713610954
abuse rape 0.14761674307098122
abuse harassment 0.14729929413577136 abuse fraud 0.14715621410161467
abuse corruption 0.1362609591386032 abuse scandal 0.13587928418128017 abuse injustice 0.13547842908170693 abuse theft 0.1346134217813101
abuse murder 0.13388781688145954
abuse racism 0.1336783417207329
abuse assault 0.13279629881154606 abuse torture 0.1320158165999866
Data Science Group (Informatics) NLE/ANLP Autumn 2015 26 / 30
Examples of Distributional Neighbours
academy consulate 0.14104697339468888 academy embassy 0.1344775524996607
academy university 0.12267517447393221 academy navy 0.11898248292566677
academy seminary 0.11775987099677779 academy institute 0.11399317217792267 academy kibbutz 0.11362415353213605 academy observatory 0.11277019805768182 academy settler 0.11186926252299229 academy conservationist 0.11131336518566892 academy confederation 0.10941998628996055 academy marxist 0.10693724392811511 academy journal 0.10611580873799023 academy museum 0.10512981496364776
academy national 0.10333983951155033
Data Science Group (Informatics) NLE/ANLP Autumn 2015 27 / 30
Examples of Distributional Neighbours
acceleration displacement 0.1822200651873758 acceleration decrease 0.16287733208280517 acceleration deviation 0.16228010485840536 acceleration torque 0.14670994344578564 acceleration velocity 0.1458547852308107 acceleration improvement 0.1449987752793196 acceleration shift 0.14478255893502884 acceleration deterioration 0.1395964127489254 acceleration diminution 0.13866195528885972 acceleration slowing 0.13700349175571752 acceleration surge 0.13158985330555506 acceleration decline 0.13077782925619233 acceleration separation 0.13069269298032196 acceleration turnover 0.12936956547913667 acceleration disintegration 0.1287153901620071 acceleration rotation 0.12606730336631253
Data Science Group (Informatics) NLE/ANLP Autumn 2015 28 / 30
Examples of Distributional Neighbours
accommodation housing 0.2365364174174781 accommodation apartment 0.20179006736499902 accommodation premise 0.18043068438095392 accommodation flat 0.17501172937070936 accommodation dwelling 0.17078839043194455 accommodation hotel 0.17019641516413908 accommodation facility 0.16941191466547378 accommodation space 0.16121088256532623 accommodation bedroom 0.15674866925327283 accommodation villa 0.1518160889477184 accommodation venue 0.14475829849810373 accommodation residence 0.143432795282533 accommodation cottage 0.14252757946727565 accommodation rent 0.1382986957755147 accommodation bungalow 0.13794886644386686
Data Science Group (Informatics) NLE/ANLP Autumn 2015 29 / 30
Next Topic: Information Extraction
Phrasal Chunking
Named Entity Recognition Named Entity Linking Relation Extraction
Data Science Group (Informatics) NLE/ANLP Autumn 2015 30 / 30