程序代写代做代考 Hidden Markov Mode algorithm graph ANLP Week 4/Unit 1 Part-of-speech tags and tagging

ANLP Week 4/Unit 1 Part-of-speech tags and tagging
Sharon Goldwater
(based on slides by Philipp Koehn)
Video 1: Intro to tagging and parts of speech
Sharon Goldwater ANLP Week 4/Unit 1 Sharon Goldwater ANLP Week 4/Unit 1 1
Orientation
Week 2
Task:
Model:
Language modelling
Sequence model,
all variables directly observed
Week 3
Task:
Model:
Text classification
Bag-of-words model, Includes hidden variables (categories of documents)
Model:
Week 4
Orientation
Week 2
Task:
Language modelling
Sequence model,
all variables directly observed
Week 3
Task:
Model:
Text classification
Bag-of-words model, Includes hidden variables (categories of documents)
Task:
Model:
Part-of-speech tagging
Sequence model,
Includes hidden variables (categories of words in sequence)

This unit
What is part of speech tagging?
• What are parts of speech and POS tagging?
• What linguistic information should we consider?
• What are some different tagsets and cross-linguistic issues? • What is a Hidden Markov Model?
• (Next unit: what algorithms do we need for HMMs?)
Sharon Goldwater ANLP Week 4/Unit 1 4
Other tagging tasks
Other problems can also be framed as tagging (sequence labelling):
• Case restoration: If we just get lowercased text, we may want
to restore proper casing, e.g. the river Thames
• Named entity recognition: it may also be useful to find names
of persons, organizations, etc. in the text, e.g. Barack Obama
• Information field segmentation: Given specific type of text (classified advert, bibiography entry), identify which words belong to which “fields” (price/size/#bedrooms, author/title/year)
• Prosodic marking: In speech synthesis, which words/syllables have stress/intonation changes, e.g. He’s going. vs He’s going?
• Given a string:
This is a simple sentence
• Identify parts of speech (syntactic categories):
This/DET is/VERB a/DET simple/ADJ sentence/NOUN
• First step towards syntactic analysis
• Illustrates use of hidden Markov models to label sequences
Sharon Goldwater ANLP Week 4/Unit 1 5
Parts of Speech
• Open class words (or content words)
– nouns, verbs, adjectives, adverbs
– mostly content-bearing: they refer to objects, actions, and
features in the world
– open class, since there is no limit to what these words are, new
ones are added all the time (email, website). • Closed class words (or function words)
– pronouns, determiners, prepositions, connectives, …
– there is a limited number of these
– mostly functional: to tie the concepts of a sentence together
Sharon Goldwater ANLP Week 4/Unit 1 6
Sharon Goldwater ANLP Week 4/Unit 1 7

Video 2: POS tags for English and other languages
How many parts of speech?
• Both linguistic and practical considerations
• Corpus annotators decide. Distinguish between
– proper nouns (names) and common nouns? – singular and plural nouns?
– past and present tense verbs?
– auxiliary and main verbs?
Sharon Goldwater
ANLP Week 4/Unit 1 8
English POS tag sets
– etc
Sharon Goldwater
ANLP Week 4/Unit 1 9
J&M Fig 5.6: Penn Treebank POS tags
Usually have 40-100 tags. For example, • Brown corpus (87 tags)
– One of the earliest large corpora collected for computational linguistics (1960s)
– A balanced corpus: different genres (fiction, news, academic, editorial, etc)
• Penn Treebank corpus (45 tags)
– First large corpus annotated with POS and full syntactic trees (1992)
– Possibly the most-used corpus in NLP
– Originally, just text from the Wall Street Journal (WSJ)
Sharon Goldwater ANLP Week 4/Unit 1 10

POS tags in other languages
Universal POS tags (Petrov et al., 2011)
• Morphologically rich languages often have compound morphosyntactic tags
Noun+A3sg+P2sg+Nom (J&M, p.196)
• Hundreds or thousands of possible combinations
• Predicting these requires more complex methods than what we will discuss (e.g., may combine an FST with a probabilistic disambiguation system)
Sharon Goldwater ANLP Week 4/Unit 1 12
Universal POS tags (Petrov et al., 2011)
NOUN (nouns)
VERB (verbs)
ADJ (adjectives)
ADV (adverbs)
PRON (pronouns)
DET (determiners and articles)
ADP (prepositions and postpositions) NUM (numerals)
CONJ (conjunctions)
PRT (particles)
’.’ (punctuation marks)
X (anything else, such as abbreviations or foreign words)
• A move in the other direction
• Simplify the set of tags to lowest common denominator across
languages
• Map existing annotations onto universal tags
{VB, VBD, VBG, VBN, VBP, VBZ, MD} ⇒ VERB • Allows interoperability of systems across languages
• Promoted by Google and others
Sharon Goldwater ANLP Week 4/Unit 1 13
Video 3: Hidden Markov model
Sharon Goldwater ANLP Week 4/Unit 1 14
Sharon Goldwater
ANLP Week 4/Unit 1 15

Why is POS tagging hard?
The usual reasons! • Ambiguity:
glass of water/NOUN vs. water/VERB the plants lie/VERB down vs. tell a lie/NOUN wind/VERB down vs. a mighty wind/NOUN
How about time flies like an arrow? • Sparse data:
Relevant knowledge for POS tagging
– Words we haven’t seen before (at all, or in this context) – Word-Tag pairs we haven’t seen before
Sharon Goldwater ANLP Week 4/Unit 1 16
A probabilistic model for tagging
Let’s define a new generative process for sentences. • To generate sentence of length n:
Let t0 = For i = 1 to n
Choose a tag conditioned on previous tag: P(ti|ti−1) Choose a word conditioned on its tag: P(wi|ti)
• So, model assumes:
– Each tag depends only on previous tag: a bigram model over tags.
– Words are conditionally independent given tags
Sharon Goldwater ANLP Week 4/Unit 1 18
Sharon Goldwater
ANLP Week 4/Unit 1 17
(homographs)
• The word itself
– Some words may only be nouns, e.g. arrow
– Some words are ambiguous, e.g. like, flies
– Probabilities may help, if one tag is more likely than another
• Local context
– two determiners rarely follow each other
– two base form verbs rarely follow each other
– determiner is almost always followed by adjective or noun
Generative process example
• Arrows indicate probabilistic dependencies:
DT NN VBD DT NNS VBG a cat saw the rats jumping

Probabilistic finite-state machine
Probabilistic finite-state machine
• One way to view the model: sentences are generated by walking through states in a graph. Each state represents a tag.
• When passing through a state, emit a word. like
flies
VB
• Prob of emitting w from state s (emission probability): P(wi =w|ti =s)
Sharon Goldwater ANLP Week 4/Unit 1 21
Example: computing joint prob. P(S,T)
What’s the probability of this tagged sentence?
This/DT is/VB a/DT simple/JJ sentence/NN
START
VB
DET
NN
IN
• Prob of moving from state s to s′ (transition probability): P(ti =s′|ti−1 =s)
Sharon Goldwater ANLP Week 4/Unit 1 20
What can we do with this model?
• Simplest thing: if we know the parameters (tag transition and word emission probabilities), can compute the probability of a tagged sentence.
• Let S = w1…wn be the sentence and T = t1…tn be the corresponding tag sequence. Then
􏰅n i=1
END
p(S, T ) =
P (ti|ti−1)P (wi|ti)
Sharon Goldwater ANLP Week 4/Unit 1 22
Sharon Goldwater
ANLP Week 4/Unit 1 23

Example: computing joint prob. P(S,T) What’s the probability of this tagged sentence?
This/DT is/VB a/DT simple/JJ sentence/NN
• First, add begin- and end-of-sentence and . Then:
􏰅n
p(S, T ) = P (ti|ti−1)P (wi|ti)
i=1
= P (DT|)P (VB|DT)P (DT|VB)P (JJ|DT)P (NN|JJ)P (|NN)
·P (This|DT)P (is|VB)P (a|DT)P (simple|JJ)P (sentence|NN) • But now we need to plug in probabilities… from where?
Sharon Goldwater ANLP Week 4/Unit 1 24
Training the model
Given a corpus annotated with tags (e.g., Penn Treebank),
Training the model
Given a corpus annotated with tags (e.g., Penn Treebank),
we estimate P (wi|ti) and P (ti|ti−1) (MLE/smoothing)
using
familiar
methods
we estimate P (wi|ti) and P (ti|ti−1) using (MLE/smoothing)
familiar methods
we estimate P (wi|ti) (MLE/smoothing)
and P (ti|ti−1)
using
familiar methods
Sharon Goldwater
ANLP Week 4/Unit 1
25
Training the model
Given a corpus annotated with tags (e.g., Penn Treebank),
(Fig from J&M draft 3rd edition)
Sharon Goldwater ANLP Week 4/Unit 1
26
Sharon Goldwater ANLP Week 4/Unit 1
27
(Fig from J&M draft 3rd edition)

Video 4: Answer to an HMM question
Summary
(no slides)
• Parts of speech (syntactic categories) provide the beginning of syntactic analysis, categorizing words by their behaviour.
• Hidden Markov models are a probabilistic model for POS tagging (and other sequence labelling tasks)
• HMM defines the joint probability of (tags, words).
• To find the best tag sequence, use the Viterbi Algorithm (details
Sharon Goldwater
References
ANLP Week 4/Unit 1 28
next time).
Sharon Goldwater
ANLP Week 4/Unit 1 29
Petrov, S., Das, D., and McDonald, R. (2011). A universal part-of-speech tagset. arXiv preprint arXiv:1104.2086.
Sharon Goldwater ANLP Week 4/Unit 1 30