CS计算机代考程序代写 information retrieval database deep learning algorithm Natural Language Processing

Natural Language Processing

CMPSC 442
Week 13, Meeting 38, Three Segments

Outline

● Early Decades
● Shift to Machine Learning Paradigm
● NLP Deep Learning: Excerpts from Mirella Lapata 2017 Keynote

2Outline, Wk 13, Mtg 37

Natural Language Processing

CMPSC 442
Week 13, Meeting 38, Segment 1: Early Decades

Early Vision

● The Ultimate Goal – For computers to use natural language as
effectively as humans do . . .

4Early Decades

1989 White paper for DARPA on NLP

Application Areas

● Reading and writing text
○ Summarization
○ Extraction into Databases (Information Extraction)
○ Question Answering (as distinct from Information retrieval/search)

● Interactive spoken dialogue for human-machine interaction
○ Informal Speech Input and Output
○ Dialogue ≠ Alexa/Google Assistant/Siri

● Translation: Input and Output Across Different Human Languages
● Natural Language Generation (Translation from symbols to language)

5Early Decades

Speech: Continuous Acoustic Energy

● Acoustic models translate
continuous acoustic energy
to units of sound: many
possible analyses for one
utterance

● Language models translate
combinations of sound to
words and phrases: many
possible analyses

6Early Decades

● SHRDLU (Winograd, 1968) had a text-based
interface

● Could answer questions and simulate actions
on the blocks

An Early Interactive NLP System

7Early Decades

Another Early Interactive NLP System

● Lunar (Woods, 1971): NLP database access to
1971 lunar samples

● Handled 78% of sentences typed by
geologists at 1971 Lunar Rocks conference
○ What is the average concentration of aluminum in

high alkali rocks?

○ How many breccias contain olivine?
○ Give me the modal analyses of those samples for all

phases.

8Early Decades

Apollo 11 astronauts Buzz Aldrin, Michael
Collins & Neal Armstrong showing a
moonrock to the director of the Smithsonian

Information Extraction

Goal: extract facts

9Early Decades

1 – Label Named Entities (NEs)

2 – Assign Relations between NEs

3 – Create database entries

Zipf’s Law: A Few Words are Everywhere

● Word rank by word frequency on log-log
scale for first 10M words in 30 Wikipedias

● Accounts for 80/20 rule (Pareto principle):
“80% of the causes are due to 20% of the
effects”

○ 80% of the text comes from 20% of the
words in the vocabulary

○ Very long-tailed distribution of many, many
relatively rare words

● Most of language follows Zipf’s law, making it
relatively easy for ML to handle 80-90% of cases
and quite difficult to handle the rest

10Early Decades

Ambiguity and Language Efficiency (Piantadosi)

● Language processing evidence suggests
that use of contextual information for
inference is very rapid

● Capitalizing on re-use of the same word
forms for different meanings makes
language very efficient

● Ambiguity of words is highest for
monosyllabic words, i.e., lowest
production effort

● Methods do not yet exist for machines to
handle contextual information well

11Early Decades

Natural Language Processing

CMPSC 442
Week 13, Meeting 38, Segment 2: Scaling Up through
Machine Learning

Motivation for Syntax and Compositional Semantics

● European languages have more or less fixed word order inside phrases

● Some substrings can stand alone; others cannot

13Shift to Machine Learning

While white is the coolest summer shade, there are lots of pastel hues along with tintable fabrics
that will blend with any wardrobe color.

While white is the coolest summer shade, along with tintable fabrics there are lots of pastel hues
that will blend with any wardrobe color.

There are lots of pastel hues along with tintable fabrics that will blend with any wardrobe color,
while white is the coolest summer shade.

There are lots of pastel hues along with tintable fabrics

white is the coolest summer shade that will blend with any wardrobe color

Phrase Structure Parse Trees

● There are many different grammar formalisms
● Context Free Grammar (CFG)

14Shift to Machine Learning

S → NP VP Pro → {I, you, . . .} NP → Det Nom

NP → Pro VP → V NP Nom → Noun

Penn Treebank: 1988-1994

First large, annotated NLP corpus
● Mitchell P. Marcus, Beatrice Santorini, Mary Ann

Marcinkiewicz, Ann Taylor
● 1M Brown corpus + 1M Switchboard corpus + 1.3M

Wall Street Journal
● All tagged with part-of-speech and syntax;

consensus on syntactic structure
● Finished before it had practical use

15Shift to Machine Learning

Parsers Trained on Penn TreeBank

● 106 word TreeBank + ML = Robust parsing

16Shift to Machine Learning

Dramatic Gains in Parsing Coverage and Accuracy

17

Gaps between CFG Syntax and Logical Form

● Active versus passive constructions
○ The doctor saw the patient (Subject of the verb is the agent of “to see”)
○ The patient was seen by the doctor (Subject of the verb is not the agent)

● Syntactically elided arguments
○ The doctor decided to see the patient. (Elided subject of “to see”: doctor)
○ The doctor persuaded the patient to exercise. (Elided subject of “exercise”: patient)

● Expletive subjects (Logical form has no “agent”)
○ It rained.
○ There was a problem.

18Shift to Machine Learning

A Dual Formalism: Combinatory Categorial Grammar

● Categories: atomic elements or functions (alternative to POS)
○ Atomic elements: fewer than Penn Treebank POS tagset
○ Some words are syntactic functions: e.g., prove: (S\NP)/NP

○ Every syntactic rule is paired with a semantic rule
proved := (S\NP3s)/NP: λx.λx.prove’xy

19Shift to Machine Learning

Other Syntactic TreeBanks

● Prague Dependency Treebank for Czech, 1.5M words annotated for
morphology and dependency syntax

● Negra Treebank for German (355M words)
● CCGBank

○ Created by automatically translating phrase-structure trees from the Penn Treebank
via a rule-based approach

○ Produced successful translations of over 99% of the trees in the Penn Treebank
resulting in 48,934 sentences with CCG derivations

○ Provides a lexicon of 44,000 words with over 1200 categories
● Wikipedia page on Treebanks lists > 100 for three dozen languages

20

Abstract Meaning Representation (AMR)

● A semantic representation language with an annotated corpus
● Currently 60K sentences
● Rooted, directed, edge-labeled, leaf-labeled graphs

21Shift to Machine Learning

● Words (counts, weighted counts, proportions, conditional probabilities)
● Word classes

○ Syntactic: POS
○ Semantic and functional: Discourse cue words, sentiment words,

pronouns

● Syntactic features
○ CFG: Subtree/node types
○ Dependency grammar: Dependency relations

● Very time-consuming and sometimes lacking in generality
● Requires methods to select features, best as part of the training

22Shift to Machine Learning

Feature Engineering and Feature Selection for NLP

Natural Language Processing

CMPSC 442
Week 13, Meeting 38, Segment 3: Excerpts from Mirella
Lapata 2017 Keynote

Pros and Cons of Neural Architectures

Pros
● For certain tasks (language modeling, machine translation, semantic

parsing, natural language generation) can handle longer contexts
much more easily than statistical models

● Can reduce or eliminate costly feature engineering

Cons
● The usual: data hungry, large computational resource needs
● None of the above applications requires symbol grounding

○ Linking words to objects and actions in the world, across contexts

24

Translating from Multiple Modalities to Text and Back

● Excerpts from Mirella Lapata keynote address at 2017 annual
meeting of the Association for Computational Linguistics

25DNN for Symbol Mapping

Motivation
Case Studies

Research Goal
Methodology

NLP Comes to the Rescue!

riding a horse

define function with argument n

if n is not an integer value,

throw a TypeError exception

Suggs rushed for 82 yards and
scored a touchdown.

The Port Authority gave per-
mission to exterminate Snowy
Owls at NY City airports.

Which animals eat owls?

5 / 70

Motivation
Case Studies

Research Goal
Methodology

A Brief History of Neural Networks

Source: http://qingkaikong.blogspot.com/

7 / 70

Motivation
Case Studies

Research Goal
Methodology

Encoder-Decoder Modeling Framework
Kalchbrenner and Blunsom (2013); Cho et al. (2014); Sutskever et al. (2014);
Karpathy and Fei-Fei (2015); Vinyals et al. (2015).

Source: https://medium.com/@felixhill/
8 / 70

Motivation
Case Studies

Research Goal
Methodology

Encoder-Decoder Modeling Framework

1 End-to-end training All parameters are simultaneously
optimized to minimize a loss function on the network’s
output.

2 Distributed representations share strength Better
exploitation of word and phrase similarities.

3 Better exploitation of context We can use a much bigger
context – both source and partial target text – to translate
more accurately.

Essentially a Conditional Recurrent Language Model!

9 / 70

Motivation
Case Studies

The Simplication Task
Language to Code
Movie Summarization

In the Remainder of the Talk

We will look at the encoder-decoder framework across tasks and
along these dimensions:

Translation

different modalities

same modality

Data

comparable

parallel

Training Size

S

M

L

Model

encoder

decoder

training objective

19 / 70

Motivation
Case Studies

The Simplication Task
Language to Code
Movie Summarization

Deep Reinforcement Learning

X = x1 x2 x3 x4 x5

Ŷ = ŷ1 ŷ2 ŷ3

Vanilla encoder-decoder model only learns to copy.

We enforce task-specific constraints via reinforcement learning
(Ranzato et al., 2016; Li et al., 2016; Narashimhan et al., 2016;
Zhang and Lapata, 2017; Williams et al. 2017).

28 / 70

Motivation
Case Studies

The Simplication Task
Language to Code
Movie Summarization

Deep Reinforcement Learning

X = x1 x2 x3 x4 x5

Ŷ = ŷ1 ŷ2 ŷ3

Vanilla encoder-decoder model only learns to copy.
We enforce task-specific constraints via reinforcement learning
(Ranzato et al., 2016; Li et al., 2016; Narashimhan et al., 2016;
Zhang and Lapata, 2017; Williams et al. 2017).

28 / 70

Motivation
Case Studies

The Simplication Task
Language to Code
Movie Summarization

Deep Reinforcement Learning

X = x1 x2 x3 x4 x5

Ŷ = ŷ1 ŷ2 ŷ3
Get Action Seq. Ŷ

Update Agent

Simplicity
Model

Relevance
Model

Fluency
Model

REINFORCE algorithm

View model as an agent which reads source X .
Agent takes action ŷt 2 V according to policy PRL(ŷt |ŷ1:t�1,X ).
Agent outputs Ŷ = (ŷ1, ŷ2, . . . , ŷ|Ŷ |) and receives reward r .

29 / 70

Motivation
Case Studies

The Simplication Task
Language to Code
Movie Summarization

Deep Reinforcement Learning

X = x1 x2 x3 x4 x5

Ŷ = ŷ1 ŷ2 ŷ3
Get Action Seq. Ŷ

Update Agent

Simplicity
Model

Relevance
Model

Fluency
Model

REINFORCE algorithm

View model as an agent which reads source X .
Agent takes action ŷt 2 V according to policy PRL(ŷt |ŷ1:t�1,X ).
Agent outputs Ŷ = (ŷ1, ŷ2, . . . , ŷ|Ŷ |) and receives reward r .

29 / 70

Motivation
Case Studies

The Simplication Task
Language to Code
Movie Summarization

Deep Reinforcement Learning

X = x1 x2 x3 x4 x5

Ŷ = ŷ1 ŷ2 ŷ3
Get Action Seq. Ŷ

Update Agent

Simplicity
Model

Relevance
Model

Fluency
Model

REINFORCE algorithm

View model as an agent which reads source X .
Agent takes action ŷt 2 V according to policy PRL(ŷt |ŷ1:t�1,X ).
Agent outputs Ŷ = (ŷ1, ŷ2, . . . , ŷ|Ŷ |) and receives reward r .

29 / 70

Motivation
Case Studies

The Simplication Task
Language to Code
Movie Summarization

Deep Reinforcement Learning

X = x1 x2 x3 x4 x5

Ŷ = ŷ1 ŷ2 ŷ3
Get Action Seq. Ŷ

Update Agent

Simplicity
Model

Relevance
Model

Fluency
Model

REINFORCE algorithm

Simplicity SARI (Xu et al., 2016), arithmetic average of n-gram
precision and recall of addition, copying, and deletion.
Relevance cosine similarity between vectors representing
source X and predicted target Ŷ .
Fluency normalized sentence probability assigned by an LSTM
language model trained on simple sentences.

30 / 70

Motivation
Case Studies

The Simplication Task
Language to Code
Movie Summarization

Take-Home Message

Sequence-to-sequence model with task-specific objective.

RL framework could be used for other rewriting tasks.

Training data is not perfect, will never be huge.

Simplifications are decent, system performs well-out of
domain.

34 / 70

Summary

● Natural language processing (NLP) applies machine learning to
language data for a range of applications

● NLP depends heavily on collections of labeled data (TreeBanks,
Corpora)

● Early NLP handled grounded language (grounded in databases or
simulations of the world)

● Encoder-decoder models, which translate between different symbol
systems (language, symbolic) are the most prevalent neural
architectures for NLP

26Summary Wk 13, Mtg 38