CS计算机代考程序代写 deep learning l21-summarisation-v4

l21-summarisation-v4

COMP90042
Natural Language Processing

Lecture 21
Semester 1 2021 Week 11

Jey Han Lau

Summarisation

COMP90042 L21

Summarisation

• Distill the most important information from a text to
produce shortened or abridged version

• Examples

‣ outlines of a document

‣ abstracts of a scientific article

‣ headlines of a news article

‣ snippets of search result

COMP90042 L21

What to Summarise?

• Single-document summarisation

‣ Input: a single document

‣ Output: summary that characterise the content
• Multi-document summarisation

‣ Input: multiple documents

‣ Output: summary that captures the gist of all
documents

‣ E.g. summarise a news event from multiple
sources or perspectives

COMP90042 L21

How to Summarise?

• Extractive summarisation

‣ Summarise by selecting representative
sentences from documents

• Abstractive summarisation

‣ Summarise the content in your own words

‣ Summaries will often be paraphrases of the
original content

COMP90042 L21

Goal of Summarisation?

• Generic summarisation

‣ Summary gives important information in the
document(s)

• Query-focused summarisation

‣ Summary responds to a user query

‣ Similar to question answering

‣ But answer is much longer (not just a phrase)

COMP90042 L21

Query-Focused Summarisation

COMP90042 L21

Outline

• Extractive summarisation

‣ Single-document

‣ Multi-document

• Abstractive summarisation

‣ Single-document (deep learning models!)

• Evaluation

COMP90042 L21

Extractive: Single-Doc

COMP90042 L21

Summarisation System

• Content selection: select what sentences to
extract from the document

• Information ordering: decide how to order
extracted sentences

• Sentence realisation: cleanup to make sure
combined sentences are fluent

COMP90042 L21

Summarisation System

• We will focus on content selection

• For single-document summarisation, information
ordering not necessary

‣ present extracted sentences in original order

• Sentence realisation also not necessary if they are
presented in dot points

COMP90042 L21

Content Selection

• Not much data with ground truth extractive
sentences

• Mostly unsupervised methods

• Goal: Find sentences that are important or salient

COMP90042 L21

Method 1: TF-IDF

• Frequent words in a doc → salient

• But some generic words are very frequent but
uninformative

‣ function words

‣ stop words

• Weigh each word in document by its inverse
document frequency:

‣

w d

weight(w) = tfd,w × idfw

COMP90042 L21

Method 2: Log Likelihood Ratio
• Intuition: a word is salient if its probability in the input corpus is

very different to a background corpus

•

• is the ratio between:

‣ P(observing in ) and P(observing in ), assuming

weight(w) = {1, if − 2logλ(w) > 100, otherwise
λ(w)

w I w B
P(w | I) = P(w |B) = p

w I w B
P(w | I) = pI and P(w |B) = pB

(NIx ) px(1 − p)NI−x

(NBy ) py(1 − p)NB−y

(NIx ) pxI (1 − pI)NI−x (
NB
y ) pyB(1 − pB)NB−y

x + y
NI + NB

x
NI

y
NB

COMP90042 L21

Saliency of A Sentence?

•

• Only consider non-stop words in

weight(s) =
1

|S | ∑
w∈S

weight(w)

COMP90042 L21

Method 3: Sentence Centrality

• Alternative approach to ranking sentences

• Measure distance between sentences, and
choose sentences that are closer to other
sentences

• Use tf-idf BOW to represent sentence

• Use cosine similarity to measure distance

•
centrality(s) =

1
#sent ∑

s′�

costfidf(s, s′�)

COMP90042 L21

Final Extracted Summary

• Use top-ranked sentences as extracted summary

‣ Saliency (tf-idf or log likelihood ratio)

‣ Centrality

COMP90042 L21

Method 4: RST Parsing

With its distant orbit – 50 percent farther from the sun
than Earth – and slim atmospheric blanket, Mars
experiences frigid weather conditions. Surface
temperatures typically average about -70 degrees
Fahrenheit at the equator, and can dip to -123 degrees C
near the poles. Only the midday sun at tropical latitudes
is warm enough to thaw ice on occasion, but any liquid
water formed in this way would evaporate almost
instantly because of the low atmospheric pressure.
Although the atmosphere holds a small amount of water,
and water-ice clouds sometimes develop, most Martian
weather involves blowing dust or carbon dioxide.

COMP90042 L21

Method 4: RST Parsing

• Rhetorical structure theory (L12, Discourse):
explain how clauses are connected

• Define the types of relations between a nucleus
(main clause) and a satellite (supporting clause)

COMP90042 L21

Method 4: RST Parsing
• Nucleus more important than satellite

• A sentence that functions as a nucleus to more sentences = more
salient

Which sentence is the  
best summary sentence?

PollEv.com/jeyhanlau569

http://PollEv.com/jeyhanlau569
http://PollEv.com/jeyhanlau569

COMP90042 L21

Extractive: Multi-Doc

COMP90042 L21

Summarisation System

• Similar to single-document extractive
summarisation system

• Challenges:

‣ Redundancy in terms of information

‣ Sentence ordering

COMP90042 L21

Content Selection

• We can use the same unsupervised content
selection methods (tf-idf, log likelihood ratio,
centrality) to select salient sentences

• But ignore sentences that are redundant

COMP90042 L21

Maximum Marginal Relevance

• Iteratively select the best sentence to add to
summary

• Sentences to be added must be novel

• Penalise a candidate sentence if it’s similar to
extracted sentences:

‣

• Stop when a desired number of sentences are
added

MMR-penalty(s) = λ max
si∈𝒮

sim(s, si)

COMP90042 L21

Information Ordering
• Chronological ordering:

‣ Order by document dates

• Coherence:

‣ Order in a way that makes adjacent sentences similar

‣ Order based on how entities are organised (centering
theory, L12)

COMP90042 L21

Sentence Realisation

• Make sure entities are referred coherently

‣ Full name at first mention

‣ Last name at subsequent mentions

• Apply coreference methods to first extract names

• Write rules to clean up

COMP90042 L21

Sentence Realisation

COMP90042 L21

Abstractive: Single-Doc

COMP90042 L21

Example

• Paraphrase

• A very difficult task

• Can we train a neural network to generate
summary?

a detained iranian-american academic accused of acting against
national security has been released from a tehran prison after a

hefty bail was posted, a top judiciary official said tuesday

iranian-american academic held in tehran released on bail

COMP90042 L21

Encoder-Decoder?

Encoder RNN

RNN1

⽜牛

RNN1

吃

RNN1

草

Source sentence

cow eats grass

Decoder RNN

RNN2

grass

RNN2

cow

RNN2

eats

Target sentence

• What if we treat:

‣ Source sentence = “document”

‣ Target sentence = “summary”

COMP90042 L21

Decoder RNNEncoder RNN

RNN1

detained

RNN1

tuesday

Source sentence

iranian- american academic

RNN2

bail

RNN2

iranian-

RNN2

american

Target sentence

Encoder-Decoder?

a detained iranian-american academic accused of acting against
national security has been released from a tehran prison after a

hefty bail was posted, a top judiciary official said tuesday

iranian-american academic held in tehran released on bail

… …

COMP90042 L21

Data

• News headlines

• Document: First sentence of article

• Summary: News headline/title

• Technically more like a “headline generation task”

COMP90042 L21

And It Kind of Works…

Rush et al. (2015): A Neural Attention Model for Abstractive Sentence Summarisation

COMP90042 L21

More Summarisation Data

• But headline generation isn’t really exciting…

• Other summarisation data:

‣ CNN/Dailymail: 300K articles, summary in bullets

‣ Newsroom: 1.3M articles, summary by authors

– Diverse; 38 major publications

‣ XSum: 200K BBC articles

– Summary is more abstractive than other
datasets

COMP90042 L21

Improvements
• Attention mechanism

• Richer word features: POS tags, NER tags, tf-idf

• Hierarchical encoders

‣ One LSTM for words

‣ Another LSTM for sentences

input

hidden

output

input

hidden
(word)

hidden
(sent)

output

Nallapati et al. (2016): Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond

COMP90042 L21

• Has the potential to generate new details not in the
source document

• Unable to handle unseen words in the source document
• Information bottleneck: a vector is used to represent the

source document
• Can only generate one summary

PollEv.com/jeyhanlau569

Potential issues of an attention encoder-
decoder summarisation system?

http://PollEv.com/jeyhanlau569
http://PollEv.com/jeyhanlau569

COMP90042 L21

38See et al. (2017): Get To The Point: Summarization with Pointer-Generator Networks

Encoder-decoder with Attention

COMP90042 L21

Encoder-decoder with Attention + Copying

See et al. (2017): Get To The Point: Summarization with Pointer-Generator Networks

scalar, e.g. 0.8

probability of “copying”

P(Argentina) = (1 − pgen) × Pattn(Argentina) + pgen × Pvoc(Argentina)

COMP90042 L21

Copy Mechanism

• Generate summaries that reproduce details in the
document

• Can produce out-of-vocab words in the summary
by copying them in the document

‣ e.g. smergle = out of vocabulary

‣ p(smergle) = attention probability + generation
probability = attention probability

COMP90042 L21

Latest Development

• State-of-the-art models use transformers instead
of RNNs

• Lots of pre-training

• Note: BERT not directly applicable because we
need a unidirectional decoder (BERT is only an
encoder)

COMP90042 L21

Evaluation

COMP90042 L21

ROUGE
(Recall Oriented Understudy for Gisting Evaluation)

• Similar to BLEU, evaluates the degree of word
overlap between generated summary and
reference/human summary

• But recall oriented

• Measures overlap in N-grams separately (e.g.
from 1 to 3)

• ROUGE-2: calculates the percentage of bigrams
from the reference that are in the generated
summary

COMP90042 L21

ROUGE-2: Example

• Ref 1: Water spinach is a green leafy vegetable grown in the tropics.

• Ref 2: Water spinach is a commonly eaten leaf vegetable of Asia.

• Generated summary: Water spinach is a leaf vegetable commonly
eaten in tropical areas of Asia.

• ROUGE-2 =
3 + 6

10 + 9

ROUGE-2

COMP90042 L21

A Final Word

• Research focus on single-document abstractive
summarisation

‣ Mostly news data

• But many types of data for summarisation:

‣ Images, videos

‣ Graphs

‣ Structured data: e.g. patient records, tables

• Multi-document abstractive summarisation

Related Posts