程序代写代做代考 Hidden Markov Mode python computational biology deep learning chain PowerPoint Presentation

PowerPoint Presentation

LECTURE 9

Sequence Classifcatin and Part-Of-Speech Tagging

Arkaitz Zubiaga, 5th February, 2018

2

 Sequence Classifcatin

 Sequence Classifers:

 Hidden Markiv Midels (HMM).

 Maximum Entripy Markiv Midels (MEMM).

 Cinditinal Randim Fields (CRF).

 Using Sequence Classifers fir Part-if-Speech (POS) Tagging.

LECTURE 9: CONTENTS

3

 Simetmes, classifcaaton of items in a sequence is dependent
on other sequence items:

 Part-of-Speech (POS) tagging:
Assigning categories to words, e.g. adjectve, niun ir verb

Example:
The man saw a cat
DET NN VB DET NN

SEQUENCE CLASSIFICATION

DET: determiner
NN: noun
VB: verb

4

 Why is classifcatin dependent in ither sequence items?

In iur example, “The man saw a cat”:

‘saw’ can be:
a noun (if it refers ti the tiil fir cutng)
a verb (if it’s simple past if ‘see’)

we can’t classify whether ‘saw’ is verb ir niun by liiking at the
wird aline → need ti liik at cintext, the sequence

SEQUENCE CLASSIFICATION

5

 The man saw a cat

 ??? ??? ??? ??? ???

 The frst item, ‘the’, is easy to classify, it’s always a determiner:

The man saw a cat

 DET ??? ??? ??? ???

SEQUENCE CLASSIFICATION

6

 The man saw a cat

 DET ??? ??? ??? ???

 Niw ‘man’ is ambiguous: (1) noun, i.e. male persin, ir (2) verb,
i.e. ‘take charge if’? We can liik at:

 P(‘man
NN

’) vs P(‘man
VB

’) → pribability if the wird as niun ir verb

 P(NN | DET) vs P(VB | DET) → pribability if a niun ir a verb ti be
preceded by a determiner

SEQUENCE CLASSIFICATION

7

 Two probabiliates ti determine POS if a wird:
[1] P(‘manNN’) vs P(‘manVB’)
& [2] P(‘manNN’ | DET) vs P(‘manVB’ | DET)

 Classifers we have seen si far (NB, MaxEnt, SVM) can handle [1].

 But they can’t handle [2].

SEQUENCE CLASSIFICATION

8

 Using training data, we can learn if wird categiry POS
i
being

preceded by POS
j
.

SEQUENCE CLASSIFICATION

9

SEQUENCE CLASSIFICATION

Prepositions can be followed by
nouns (.7) or determiners (.3),
never by verbs.

10

SEQUENCE CLASSIFICATION

Nouns can be followed by verbs,
prepositions, or another noun.

11

SEQUENCE CLASSIFICATION

Determiners are ALWAYS
followed by a noun.

12

 [1] P(‘manNN’) vs P(‘manVB’)

 [2] P(NN | DET) vs P(VB | DET)

The man saw a cat

DET ??? ??? ??? ???

SEQUENCE CLASSIFICATION

0

No matter the outcome of [1], we know it’s NN
thanks to [2].

13

 OK, and hiw di we achieve this?

 Hidden Markiv Midels (HMM)

 Maximum Entripy Markiv Midels (MEMM)

 Cinditinal Randim Fields (CRF)

 Deep Learning:

 Recurrent Neural Netwirks (RNN).

 Ling/Shirt-Term Memiry Netwirks (LSTM).

 Cinvilutinal Neural Netwirks (CNN).

SEQUENCE CLASSIFICATION

14

 OK, and hiw di we achieve this?

 Hidden Markov Models (HMM)

 Maximum Entropy Markov Models (MEMM)

 Condiatonal Random Fields (CRF)

 Deep Learning:

 Recurrent Neural Netwirks (RNN).

 Ling/Shirt-Term Memiry Netwirks (LSTM).

 Cinvilutinal Neural Netwirks (CNN).

SEQUENCE CLASSIFICATION

HIDDEN MARKOV MODELS

16

 A Markiv chain:

 Set if nodes connected with probabiliates, where weights in
all edges leaving a nide sum ti 1.

WHAT IS A MARKOV CHAIN?

17

MARKOV CHAINS

18

 In a First irder Markiv Chain, the pribability if a partcular state
depends ONLY in the previius state.

 Remember: Lecture 3 in language midels, Markiv assumptin:
P(library | I fiund twi piunds in the) ≈ P(library | the)

MARKOV CHAINS

19

WHAT IS A HIDDEN MARKOV MODEL (HMM)?

20

HMM: TWO KINDS OF PROBABILITIES

 Transiaton probabiliates: pairwise pribabilites if tags being
preceded/filliwed by anither tag.

e.g. determiner likely filliwed by niun ir adjectve.

 Emission probabiliates: pribability fir a partcular tag ti be a
partcular wird, e.g. verb is very likely ti be “is”.

21

 Again, the Markiv assumptin, i.e. dependence inly in the
previius state.

 There must be a dependency between sequence items, e.g.:

 Weather (dependency): this mirning’s weather may
determine the pribability if this aferniin’s weather.

 Independent events: my laptip having been briken diesn’t
determine the pribability if my next laptip breaking.

TWO IMPORTANT ASSUMPTIONS IN HMM

22

 Limitatins if HMM:

 Midels dependencies between each state and inly its
cirrespinding ibservatin.

 Learns jiint distributin if states and ibservatins P(Y, X),
but nit the cinditinal pribability P(Y|X).

LIMITATIONS OF HMM

23

 NLTK HMM:
http://www.nltk.irg/_midules/nltk/tag/hmm.html

 hmmlearn:
https://github.cim/hmmlearn/hmmlearn

 Scikit HMM:
http://scikit-learn.siurcefirge.net/stable/midules/hmm.html

HMM IMPLEMENTATIONS

http://www.nltk.org/_modules/nltk/tag/hmm.html
https://github.com/hmmlearn/hmmlearn
http://scikit-learn.sourceforge.net/stable/modules/hmm.html

24

 Maximum Entripy Markiv Midels (MEMM): Midels
dependencies between each state and the full observaaton
sequence.

 Learning ibjectve functin cinsistent with predictve functin:
P(Y|X).

MEMM: ALTERNATIVE TO HMM

25

MEMM: THE LABEL BIAS PROBLEM

26

 In principle, lically, state 1 prefers state 2 in 2nd place.

MEMM: THE LABEL BIAS PROBLEM

27

 But if we liik at the entre path (which MEMM dies):

 P(1→1→1→1) = 0.4*0.45*0.5 = 0.09

 P(1→2→2→2) = 0.6*0.3*0.3 = 0.054

MEMM: THE LABEL BIAS PROBLEM

Despite 1→2 being more
likely locally,

the entire path probability
will favour 1→1

28

 Silutin: din’t nirmalise pribabilites lically, use lical
pitentals.

SOLUTION TO LABEL BIAS

29

 CRF:

 Glibal normaliser Z(x) overcomes label bias issue if MEMM.

 Midels the dependency between each state and the enatre
observaaton sequence (like MEMM).

CONDITIONAL RANDOM FIELDS

30

 Widely used fir a range if sequence classifcatin tasks:

Laferty, J., McCallum, A., & Pereira, F. C. (2001). Cinditinal
randim felds: Pribabilistc midels fir segmentng and labeling
sequence data.
https://repisitiry.upenn.edu/cgi/viewcintent.cgi?artcle=1162&cintext=cis_papers

CONDITIONAL RANDOM FIELDS

https://repository.upenn.edu/cgi/viewcontent.cgi?article=1162&context=cis_papers

31

 Pythin-crfsuite:
http://pythin-crfsuite.readthedics.ii/en/latest/

 PyStruct:
https://pystruct.github.ii/

IMPLEMENTATIONS OF CRF

http://python-crfsuite.readthedocs.io/en/latest/
https://pystruct.github.io/

32

 We’ve talked abiut classifer linear sequences.

 But simetmes we can have mire cimplex sequences:

 Graph ir tree-structured sequences.

BEYOND CLASSIFIER LINEAR SEQUENCES

33

 Pist classifcatin in firums, e.g. binary classifcatin: dies a pist
answer the questin if the 1st pist?

 Pist 1

 Pist 1.1

 Pist 1.1.1

 Pist 1.1.2

 Pist 1.2

 Pist 1.2.1

 Pist 1.2.1.1

 Pist 1.2.2

WHERE DO WE FIND TREE SEQUENCES IN NLP?

34

HMM VS CRF

USING SEQUENCE CLASSIFIERS
FOR PART-OF-SPEECH (POS)
TAGGING

36

PART-OF-SPEECH TAGGING

 As we were saying, sequence can play an important role in
determining POS tags in a sentence:

The man saw a cat

DET NN VB DET NN

“man” can’t be a verb if it’s preceded by a determiner.

37

PART-OF-SPEECH TAGGING

 As we were saying, sequence can play an important role in
determining POS tags in a sentence:

The man saw a cat

DET NN VB DET NN

 With HMM, only looking at previous label, we can end up predicting
sequence of 5 nouns (NN, NN, NN, NN, NN).

 Looking at the entire sequence (MEMM, CRF), we avoid this, we
never have a sentence only made of 5 nouns.

38

PART-OF-SPEECH TAGGING

 The list of POS tag types is actually quite large!
Open class (lexical) words

Closed class (functional)

Nouns Verbs

Proper Common

Modals

Main

Adjectives

Adverbs

Prepositions

Particles

Determiners

Conjunctions

Pronouns

… more

… more

IBM

Italy

cat / cats

snow
see

registered

can

had

old older oldest

slowly

to with

off up

the some

and or

he its

Numbers

122,312

one

Interjections Ow Eh

39

OPEN VS CLOSED CLASSES

 Open vs. Closed classes:
 Closed:

 determiners: a, an, the,…
 pronouns: she, he, I,…
 prepositions: on, under, over, near, by,…

 Open:
 Nouns, Verbs, Adjectives, Adverbs.

40

CHALLENGES IN POS TAGGING: AMBIGUITY

 Words often have more than one possible POS, e.g. back:
 The back door = JJ (adjective)
 On my back = NN (noun)
 Win the voters back = RB (adverb)
 Promised to back the bill = VB (verb)

 See list of POS tags:

http://rednoise.org/rita/reference/PennTags.html

http://rednoise.org/rita/reference/PennTags.html

41

PART-OF-SPEECH TAGGING

 POS tagging example:
 Input: Plays well with others

NNS UH IN NNS
VBZ JJ

NN
RB

 Output: Plays/VBZ well/RB with/IN others/NNS

1. List candidate labels for each
word.
2. Based on probabilities learnt from
training data, the classifier predicts
the most likely POS sequence.

NB: the fact that there are 2
unambiguous words (with & others)
is useful to first label them, and then
predict the other 2.

42

PART-OF-SPEECH TAGGING

 Uses of POS tagging in NLP:
 Text-to-speech.
 Phrase search through regular expressions, e.g. (Det) Adj* N+
 As input to or to speed up a full linguistic parser (later lectures)
 If you know the tag, you can back off to it, e.g. in lemmatisation,

saw → see, or saw → saw?
 In other fields:

 Computational biology.
 Prediction of series of data, e.g. weather forecasting.

43

POS TAGGING PERFORMANCE

 Current POS tagging system are very accurate:
 Baseline system that predicts the most common POS for a word

already gets 90% accuracy → owing to many words not being
ambiguous.

 Standard classifiers (no sequence) can achieve 93% accuracy.
 Sequence classifiers can achieve 97% accuracy.

44

REFERENCES

 List of software for POS tagging:
https://aclweb.org/aclwiki/Part-of-speech_tagging

https://aclweb.org/aclwiki/Part-of-speech_tagging

45

ASSOCIATED READING

 Jurafsky, Daniel, and James H. Martin. 2009. Speech and Language
Processing: An Introduction to Natural Language Processing, Speech
Recognition, and Computational Linguistics. 3rd edition. Chapters 9-
10.

 Bird Steven, Ewan Klein, and Edward Loper. Natural Language
Processing with Python. O’Reilly Media, Inc., 2009. Chapter 6
Section 1.6.

Slide 1
Slide 2
Slide 3
Slide 4
Slide 5
Slide 6
Slide 7
Slide 8
Slide 9
Slide 10
Slide 11
Slide 12
Slide 13
Slide 14
Slide 15
Slide 16
Slide 17
Slide 18
Slide 19
Slide 20
Slide 21
Slide 22
Slide 23
Slide 24
Slide 25
Slide 26
Slide 27
Slide 28
Slide 29
Slide 30
Slide 31
Slide 32
Slide 33
Slide 34
Slide 35
Slide 36
Slide 37
Slide 38
Slide 39
Slide 40
Slide 41
Slide 42
Slide 43
Slide 44
Slide 45