程序代写代做代考 data structure algorithm Computational Linguistics

Computational Linguistics

Computational

Linguistics

Copyright © 2017 Suzanne

Stevenson, Graeme Hirst and

Gerald Penn. All rights reserved.

2

2. Introduction to

syntax and parsing

Gerald Penn

Department of Computer Science, University of Toronto

CSC 2501 / 485

Fall 2018

Reading: Jurafsky & Martin: 5.0–1, 12.0–12.3.3, 12.3.7,

[13.1–2]. Bird et al: 8.0–4.

2

Syntactic structure 1
• Syntax:

• The combinatorial structure of words.

• How words can be linearly organized:
left/right precedence, and contiguity.

• How words can be hierarchically organized
into phrases and sentences.

3

Syntactic structure 2

The cat hunted the squirrel living in the tree with
persistence.

[ [The cat]

[hunted [the squirrel [living [in [the tree] ] ] ]

[with [persistence] ] ] ]

4

Syntactic structure 2

The cat hunted the squirrel living in the tree with
persistence.

hunted

the squirrel

The cat

in

the tree

with

persistence

living

5

Syntactic structure 2

The cat hunted the squirrel living in the tree with
persistence.

S

NP VP

V NP

NP

PP

P NP

hunted

the squirrel

The cat

in

the tree

PP

P NP

with

persistence

S

V

living N

DET N

DET N

DET N

6

Syntactic structure 3

• Goal: meaning, interpretation, semantics.

• So why do we care about syntax?

7

• Grammar:

• Formal specification of allowable structures.

• Knowledge

• Representation

• Parsing:

• Analysis of string of words to determine the
structure assigned by grammar.

• Algorithm

• Process

Grammars and parsing

8

• Main issues:

• Which words are grouped together into phrases.

• How words within a phrase project the properties
of a single, common word (the head of the
phrase).

• How different phrases relate to each other.

• Grammar encodes these relations. Some
grammars interpret these relations with
respect to meaning.

Using grammar to capture structure

Good and bad grammars
• There are many possible grammars for any

natural language.

• Some are better than others.

• Desiderata:

• Faithfulness to (vastly divergent) details about
language.

• Economy of description.

• Fidelity to some prevailing linguistic intuition.

• Efficiency of parsing.

9

X
NineX SevenX

Twenty-three

10

• Primitives: lexical categories or parts of
speech.

• Each word-type is a member of one or more.

• Each word-token is an instance of exactly one.

e.g. The cat in the hat sat.

• Categories are open or closed to new words.

• Eight main categories, many subcategories.

Elements of grammar

X SevenX
Twenty-three

11

• Primitives: lexical categories or parts of
speech.

• Each word-type is a member of one or more.

• Each word-token is an instance of exactly one.

• Categories are open or closed to new words.

• Eight main categories, many subcategories.

• The categories might possibly be language-
specific as well.

Elements of grammar

X
Nine

12

• Nouns: denote an object, a concept, a place,

• Count nouns: dog, spleen, Band-Aid, …

• Mass nouns: water, wheat, …

• Proper nouns: Fred, New York City, …

• Pronouns: he, she, you, I, they, …

• Adjectives: denote an attribute of the
denotation of a noun.
• Intersective: pink, furry, …

• Measure: big, …

• Intensional: former, alleged, …

Parts of speech 1

13

• Determiners, articles: specify certain
attributes of the denotation of a noun that are
grammatically relevant.

• the, a, some, …

• Verbs: predicates, denote an action or a
state. Numerous distinctions, e.g. transitivity:

• Intransitive: sleep, die, …

• Transitive: eat, kiss, …

• Ditransitive: give, sell, …

• Copula: be, feel, become, …

Parts of speech 2

14

• Adverbs: denote an attribute of the
denotation of a predicate.
• Time and place: today, there, now, …

• Manner: happily, furtively, …

• Degree: much, very, …

• Prepositions: relate two phrases with a
location, direction, manner, etc.

• up, at, with, in front of, before, …

Parts of speech 3

15

• Conjunctions: combine two clauses or
phrases:

• Coordinating conjunctions: and, or, but

• Subordinating conjunctions: because, while,…

• Interjections: stand-alone emotive
expressions:

• um, wow, oh dear, balderdash, crikey, …

Parts of speech 4

16

• Combinations:
• Phrase: a hierarchical grouping of words and/or

phrases.
• Clause: a phrase consisting of a verb and

(almost) all of its dependents.

• Sentence: a clause that is syntactically
independent of other clauses.

• Can be represented by tree (or a labelled
bracketing).

• Terminology: A constituent is a well-formed
phrase with overtones of semantic and/or
psychological significance.

Elements of grammar

17

• Noun phrase (NP):
• a mouse
• mice
• Mickey
• the handsome marmot
• the handsome marmot on the roof
• the handsome marmot whom I adore

• Verb phrase (VP):
• laughed loudly
• quickly gave the book to Mary

Types of phrase 1

18

• Adjective phrase (AP):
• green
• proud of Kyle
• very happy that you went

• Prepositional phrase (PP):
• in the sink
• without feathers
• astride the donkey

Types of phrase 2

19

• Clauses:
• Ross remarked upon Nadia’s dexterity
• to become a millionaire by the age of 30
• that her mother had lent her for the banquet

• Sentences:
• Ross remarked upon Nadia’s dexterity.
• Nathan wants to become a millionaire by the age

of 30.
• Nadia rode the donkey that her mother had lent

her for the banquet.
• The handsome marmot on the roof [in dialogue].

Clauses and sentences 1

20

• Clauses may act as noun phrases:
• To become a millionaire by the age of 30 is what

Ross wants.
• Nadia riding her donkey is a spectacular sight.
• Ross discovered that Nadia had been feeding his

truffles to the donkey.

Clauses and sentences 2

21

The structure of an idealized

phrase

XP → ZP X YP

XP

XZP YPsubject or
pre-modifier

object, complement or
post-modifier, adjunct

head

word
xxxx

22

Example phrases

very that you wenthappy

AP

AADV S

quickly with Mayago

VP

VADV PPPP

to the store

S

NP VPAUX

Kim gowill

23

• A context-free grammar is a quadruple
G = (Vt,Vn, P, S), where

• Vt is a finite set of terminal symbols.

• Vn is a finite set of non-terminal symbols.

• P is a finite set of production rules of the form

A → α

where A ∈ Vn and α is a sequence of symbols in
(Vn ∪ Vt)*.

• S ∈ Vn is the start symbol.

Formal definition of a CFG

24

Lexical

categories:

NT’s that rewrite

as a single T.

S = S,

A very simple grammar
S

NP

NP

NP

VP

VP

PP

Det

Adj

N

V

P

→ NP VP

→ Det N

→ Det Adj N

→ NP PP

→ V

→ V NP

→ P NP

→ the | a | an

→ old | red | happy | …

→ dog | park | statue | contumely | run | …

→ saw | ate | run | disdained | …

→ in | to | on | under | with | …

P = {

}

Vt and Vn can be

inferred from the

production rules.

The lexicon:

In practice, a sep-

arate data structure

25

• Non-terminal (NT):

A symbol that occurs on the left-hand side (LHS) of
some rule.

• Pre-terminal: a kind of non-terminal located on the
LHS of a lexical entry.

• Terminal (T):

A symbol that never occurs on the LHS of a rule.

• Start symbol:

A specially designated NT that must be the root of
any tree derived from the grammar.

In our grammars, it is usually S for sentence.

Terminology

27

• Parsing: Determining the structure of a
sequence of words, given a grammar.

• Which grammar rules should be used?

• To which symbols (words / terminals and nodes /
non-terminals) should each rule apply?

Parsing 1

28

• Input:

• A context-free grammar.

• A sequence of words
Time flies like an arrow

or, more precisely, of sets of parts of speech.

{noun,verb} {noun,verb} {verb,prep} {det} {noun}

• Process:
• Working from left to right, guess how each word

fits in.

Parsing 2

29

• If a guess leads to failure (parse is stymied),
back up to a choice point and try a different
guess.

• Backtracking, non-determinism.

• At each guess, must save state of parse on a
stack.

• (Or, explore in parallel.)

• Want to guess right:

• Order of preference for rules.

Parsing 3

31

• Top-down or rule-directed parsing:
“Can I take these rules and match them to this
input?”

• Initial goal is an S.

• Repeatedly look for rules that decompose /expand
current goals and give new goals.
E.g., goal of S may decompose to goals NP and VP.

• Eventually get to goals that look at input.
E.g., goal of NP may decompose to det noun.

• Succeed iff entire input stream is accounted for as S.

Top-down parsing 1

Top-down parsing 2
• Example: A recursive descent parser.
>>> nltk.app.rdparser()

• Operations on leftmost frontier node:

• Expand it.

• Match it to the next input word.

32

33

Top-down parsing 3
• Choice of next operation (in NLTK demo):

• If it’s a terminal, try matching it to input.

• If it’s a non-terminal, try expanding with first-listed
untried rule for that non-terminal.

34

Bottom-up parsing 1

35

• Bottom-up or data-directed parsing:
“Can I take this input and match it to these
rules?”

• Try to find rules that match a possible PoS of the
input words …

• … and then rules that match the constituents so
formed.

• Succeed iff the entire input is eventually matched
to an S.

Bottom-up parsing 2
• Example: A shift–reduce parser.
>>> nltk.app.srparser()

• Operations:

• Shift next input word onto stack.

• Match the top n elements of stack to RHS of rule,
reduce them to LHS.

36

37

Bottom-up parsing 3
• Choice of next operation (in NLTK demo):

• Always prefer reduction to shifting.

• Choose the first-listed reduction that applies.

• Choice of next operation (in real life):

• Always prefer reduction to shifting for words, but
not necessarily for larger constituents.

38

39

• Neither top-down nor bottom-up search
exploits useful idiosyncrasies that CFG rules,
alone or together, often have.

• Problems:

• Recomputation of constituents.

• Recomputation of common prefixes.

• Solution: Keep track of:

• Completed constituents.

• Partial matches of rules.

Problems