PowerPoint Presentation
Comp90042
Workshop
Week 8
9 May
1
1
Language theory
Parsing
2
Table of Contents
What are regular grammar and regular language? How are they different?
Language: set of acceptable strings (e.g., sentences)
Grammar: a generative description of a language
Regular language: language accepted by regular expression w
Regular grammar: a formal grammar defined by a set of productions rules
A → xB,
A → x
A → ε
where A and B are non-terminals, x is a terminal and ε is the empty string.
3
Regular language
3
Rules:
S → A
A → a A
A → ε
Example strings?
a, aa, aaa, aaa.
Regular expression?
(a)*
4
Regular grammar example
Rules:
S → A
A → a A
A → ε
Example strings?
a, aa, aaa, aaa.
Regular expression?
(a)*
5
An FSA is formally represented as a 5-tuple M1 = (Q, Σ, δ, , F)
1. States Q = {, , },
2. Input alphabet Σ = {0,1},
3. Transition function δ
4. is the start state, and
5. Final state F = {}
What language does M1 accept?
6
Finite State Automata
0 1
Transition function
6
7
Example: Word Morphology
Draw a Finite State Acceptor (FSA) for word morphology to show the possible derivations from root forms using the words:
play, played, playing;
walk, walked, walking;
sit, sat, sitting.
7
Regular languages are closed under union, intersection and concatenation. What does it mean? Why is it important?
8
Properties of regular languages
If L1 and L2 are two regular languages, then the union, intersection and concatenation of L1 and L2 are also regular languages
NLP problems can be factored into small simple parts,
we can develop regular languages for each part,
and combine them into a complex system to handle the NLP problems.
union
concatenation
8
FSAs can only accept/reject inputs:: is it grammatical?
WFSAs give a score: how grammatical/likely a sequence is (eg. N-gram language model)
Add weights to initial/final states of FSA
λ : Q → R; assign weights to initial state
ρ : Q → R; assign weight to final state
δ : Q × Σ × Q → R; assign weight to transitions
Weighted FSA for scoring any path π = , …
S(π) = λ() + N X δ() + ρ().
Dijkstra algorithm for shortest path
9
Weighted FSA
9
This is the dog that worried the cat that killed the rat that ate the malt that lay in the house that Jack built
VS
A man that a woman that a child that a bird that I heard saw knows loves
10
Context-free grammar
Length is unbounded (recursive), but structure is local → can describe with FSA = Regular
Need to remember the n subject nouns, to ensure n verbs follow (and that they agree etc) * can’t be done with finite number of states
10
A context-free grammar is a 4-tuple G = (V, Σ, R, S)
V, a set of non-terminal symbols (or variables)
Σ, a set of terminal symbols (disjoint from V)
R a set of rules in the form A → β, where A is a non-terminal, β is a string of variables and terminals
S ∈ V, the start variable
11
Context-free grammar
11
A pushdown automaton is a 6-tuple (Q, Σ, Γ, δ, q0, F), where
1. Q is the set of states,
2. Σ is the input alphabet,
3. Γ is the stack alphabet,
4. δ : Q × Σε × Γε−→P(Q × Γε) is the transition function,
5. ∈ Q is the start state, and
6. F ⊆ Q is the set of accept states.
12
Pushdown automata
Parsing in general is the process of identifying the structure(s) of a sentence, according to a grammar of the language.
Example parse trees
13
Parsing
(a) What changes need to be made to this CFG to make it suitable for CYK parsing?
S -> NP VP
VP -> V NP | V NP PP
PP -> P NP
V -> “saw” | “walked”
NP -> “John” | “Bob” | Det N | Det N PP
Det -> “a” | “an” | “the” | “my”
N -> “man” | “cat” | “telescope” | “park”
P -> “on” | “by” | “with”
Step 1: Chomsky Normal Form: Each rule consists of
A -> B C
A -> a a (single) non-terminal which re-writes as a single terminal
14
Parsing
NP -> Det Y
Y -> N PP
VP -> V X
X -> NP PP
a (single) non-terminal which re-writes as exactly two non-terminals
14
15
CYK Example 1
S -> NP VP
VP -> V NP | V X
PP -> P NP
X -> NP PP
Y -> N PP
NP -> “John” | “Bob” | Det N | Det Y
V -> “saw” | “walked”
Det -> “a” | “an” | “the” | “my”
N -> “man” | “cat” | “telescope” | “park”
P -> “on” | “by” | “with”
S
VP
NP
N
Det
NP
V
man
a
John
saw
15
16
CYK example 2
S -> NP VP
VP -> V NP | V X
PP -> P NP
X -> NP PP
Y -> N PP
NP -> “John” | “Bob” | Det N | Det Y
V -> “saw” | “walked”
Det -> “a” | “an” | “the” | “my”
N -> “man” | “cat” | “telescope” | “park”
P -> “on” | “by” | “with”
17
CYK example 3
S -> NP VP
VP -> V NP | V X
PP -> P NP
X -> NP PP
Y -> N PP
NP -> “John” | “Bob” | Det N | Det Y
V -> “saw” | “walked”
Det -> “a” | “an” | “the” | “my”
N -> “man” | “cat” | “telescope” | “park”
P -> “on” | “by” | “with”
18
Takeaways
Regular languages:
Regular Grammar
Finite state acceptors (FSA)
Weighted finite state acceptors (WFSA)
Context Free Languages
Context Free Grammar
CYK algorithm for parsing
/docProps/thumbnail.jpeg