Summarisation
COMP90042
Natural Language Processing Lecture 21
COPYRIGHT 2020, THE UNIVERSITY OF MELBOURNE
1
COMP90042
L21
•
•
Distill the most important information from a text to produce shortened or abridged version
Summarisation
Applications
‣ outlines of a document
‣ abstracts of a scientific article ‣ headlines of a news article
‣ snippets of search result
2
COMP90042
L21
What to Summarise?
• Single-documentsummarisation
‣ Input: a single document
‣ Output: summary that characterise the content
• Multi-documentsummarisation
‣ Input: multiple documents
‣ Output: summary that captures the gist of all documents
‣ E.g. summarise a news event from multiple sources or perspectives
3
COMP90042
L21
•
Extractive summarisation
‣ Summarise by selecting representative sentences from documents
•
Abstractive summarisation
‣ Summarise the content in your own words
‣ Summaries will often be paraphrases of the original content
How to Summarise?
4
5
COMP90042 L21
COMP90042
L21
•
Generic summarisation
‣ Summary gives important information in the document(s)
•
Query-focused summarisation
‣ Summary responds to a user query
‣ Similar to question answering
‣ But answer is much longer (not just a phrase)
Why Summarise?
6
COMP90042 L21
Query-Focused Summarisation
7
COMP90042
L21
•
Extractive summarisation ‣ Single-document
‣ Multi-document
•
•
Abstractive summarisation
‣ Single-document (deep learning models!)
Evaluation
Outline
8
COMP90042 L21
Extractive: Single-Doc
9
COMP90042
L21
•
•
•
Content selection: select what sentences to extract from the document
Summarisation System
Information ordering: decide how to order extracted sentences
Sentence realisation: cleanup to make sure combined sentences are fluent
10
COMP90042
L21
• •
We will focus on content selection
For single-document summarisation, information
ordering not necessary
‣ present extracted sentences in original order
•
Sentence realisation also not necessary if they are presented in dot points
Summarisation System
11
COMP90042
L21
•
• •
Not much data with ground truth extractive sentences
Content Selection
Mostly unsupervised methods
Goal: Find sentences that are important or salient
12
COMP90042
L21
• •
Frequent words in a doc → salient
But some generic words are very frequent but
uninformative
‣ function words ‣ stop words
Weigh each word w in document d by its inverse document frequency:
weight(w) = tfd,w × idfw
•
‣
Method 1: TF-IDF
13
COMP90042 L21
Method 2: Log Likelihood Ratio
• Intuition:awordissalientifitsprobabilityintheinputcorpusis
very different {to a background corpus weight(w) = 1, if − 2logλ(w) > 10
0, otherwise λ(w) is the ratio between:
(NI)px(1−p)NI−x x
(NB)py(1−p)NB−y y
• •
‣ ‣
P(observing w in I) and P(observing w in B), assuming P(w|I) = P(w|B) = p x + y
NI + NB P(observing w in I) and P(observing w in B), assuming
P(w|I) = pI and P(w|B) = pB
NI px(1 − pI)NI−x x y NB py(1 − pB)NB−y
()NN() xIIByB
14
COMP90042
L21
• •
Saliency of A Sentence?
weight(s) = 1 ∑ weight(w) |S| w∈S
Only consider non-stop words in S
15
COMP90042 L21
Method 3: Sentence Centrality
• •
• •
•
Alternative approach to ranking sentences
Measure distance between sentences, and choose sentences that are closer to other sentences
Use tf-idf to represent sentence
Use cosine similarity to measure distance
centrality(s) = 1 ∑ costfidf (s, s′) #sent s′
16
COMP90042
L21
•
Use top-ranked sentences as extracted summary ‣ Saliency (tf-idf or log likelihood ratio)
‣ Centrality
Final Extracted Summary
17
COMP90042
L21
Method 4: RST Parsing
With its distant orbit – 50 percent farther from the sun than Earth – and slim atmospheric blanket, Mars experiences frigid weather conditions. Surface temperatures typically average about -70 degrees Fahrenheit at the equator, and can dip to -123 degrees C near the poles. Only the midday sun at tropical latitudes is warm enough to thaw ice on occasion, but any liquid water formed in this way would evaporate almost instantly because of the low atmospheric pressure. Although the atmosphere holds a small amount of water, and water-ice clouds sometimes develop, most Martian weather involves blowing dust or carbon dioxide.
18
COMP90042
L21
•
•
Rhetorical structure theory (L12, Discourse): explain how clauses are connected
Method 4: RST Parsing
Define the types of relations between a nucleus (main clause) and a satellite (supporting clause)
19
COMP90042
L21
Method 4: RST Parsing • Nucleusmoreimportantthansatellite
• Asentencethatfunctionsasanucleustomoresentences=more salient
20
COMP90042 L21
Extractive: Multi-Doc
21
COMP90042
L21
•
•
Similar to single-document extractive summarisation system
Summarisation System
Challenges:
‣ Redundancy in terms of information ‣ Sentence ordering
22
COMP90042
L21
•
•
We can use the same unsupervised content selection methods (tf-idf, log likelihood ratio, centrality) to select salient sentences
Content Selection
But ignore sentences that are redundant
23
COMP90042 L21
Maximum Marginal Relevance
•
• •
‣
•
Iteratively select the best sentence to add to summary
Sentences to be added must be novel Penalise a candidate sentence if it’s similar to
extracted sentences:
MMR-penalty(s) = λ max sim(s, si) si∈𝒮
Stop when a desired number of sentences are added
24
COMP90042
L21
Sentence Simplification
• Createmultiplesimplifiedversionsofsentencesbefore extraction
• Former Democratic National Committee finance director Richard Sullivan faced more pointed questioning from Republicans during his second day on the witness stand in the Senate’s fund-raising investigation
‣ Richard Sullivan faced pointed questioning
‣ Richard Sullivan faced pointed questioning from Republicans during day on
stand in Senate fundraising investigation
• MMRtomakesureonlynon-redundantsentencesareselected
25
COMP90042
L21
•
•
Chronological ordering:
‣ Order by document dates Coherence:
‣ Order in a way that makes adjacent sentences similar
‣ Order based on how entities are organised (centering theory, L12)
Information Ordering
26
COMP90042
L21
Sentence Realisation
27
COMP90042
L21
•
Make sure entities are referred coherently ‣ Full name at first mention
‣ Last name at subsequent mentions
• •
Apply coreference methods to first extract names Write rules to clean up
Sentence Realisation
28
COMP90042 L21
Abstractive: Single-Doc
29
COMP90042
L21
Example
a detained iranian-american academic accused of acting against national security has been released from a tehran prison after a hefty bail was posted, a top judiciary official said tuesday
iranian-american academic held in tehran released on bail
•
•
•
Paraphrase
A very difficult task
Can we train a neural network to generate summary?
30
COMP90042
L21
Encoder-Decoder?
RNN1 RNN1 RNN1
⽜牛吃草
Encoder RNN
Source sentence
cow eats
grass
RNN2
grass
RNN2 RNN2 RNN2
Decoder RNN
Target sentence
• Whatifwetreat:
‣ Source sentence = “document” ‣ Target sentence = “summary”
31
COMP90042
L21
Encoder-Decoder?
RNN1 RNN1 … RNN1
Encoder RNN
a detained
tuesday
Source sentence
iranian-
RNN2
Decoder RNN
american
RNN2
iranian-
academic
RNN2
american
…
RNN2
bail
Target sentence
a detained iranian-american academic accused of acting against national security has been released from a tehran prison after a hefty bail was posted, a top judiciary official said tuesday
iranian-american academic held in tehran released on bail
32
COMP90042
L21
• • • •
News headlines
Document: First sentence of article
Summary: News headline/title
Technically more like a “headline generation task”
Data
33
COMP90042 L21
And It Kind of Works…
Rush et al. (2015): A Neural Attention Model for Abstractive Sentence Summarisation 34
COMP90042 L21
Improvements
• Attentionmechanism
• Richerwordfeatures:POStags,NERtags,tf-idf • Hierarchicalencoders
output
hidden
input
‣ One LSTM for words
‣ Another LSTM for sentences
output
hidden (sent)
hidden (word)
input
Nallapati et al. (2016): Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond 35
COMP90042
L21
•
•
Occasionally reproduce statements incorrectly (hallucinate new details!)
•
Solution: allow decoder to copy words directly from input document during generation
Issues
Unable to handle out-of-vocab words in document ‣ Generate UNK in summary
‣ E.g. new names in test documents
36
COMP90042 L21
Encoder-decoder with Attention
See et al. (2017): Get To The Point: Summarization with Pointer-Generator Networks 37
COMP90042 L21
P(Argentina) = (1 − pgen) × Pattn(Argentina) + pgen × Pvoc(Argentina)
probability of “copying”
scalar, e.g. 0.8
Encoder-decoder with Attention + Copying
See et al. (2017): Get To The Point: Summarization with Pointer-Generator Networks 38
COMP90042
L21
•
•
Generate summaries that reproduce details in the document
Copy Mechanism
Can produce out-of-vocab words in the summary by copying them in the document
‣ e.g. smergle = out of vocabulary
‣ p(smergle) = attention probability + generation probability = attention probability
39
COMP90042
L21
Generated Summaries
40
COMP90042
L21
• •
But headline generation isn’t really exciting… Latest summarisation data:
‣ CNN/Dailymail: 300K articles, summary in bullets ‣ Newsroom: 1.3M articles, summary by authors
–
–
More Summarisation Data
Diverse; 38 major publications ‣ XSum: 200K BBC articles
Summary is more abstractive than other datasets
41
COMP90042
L21
•
• •
State-of-the-art models use transformers instead of RNNs
Latest Development
Lots of pre-training
Note: BERT not directly applicable because we need a unidirectional decoder (BERT is only an encoder)
42
COMP90042
L21
Evaluation
43
COMP90042
L21
ROUGE
(Recall Oriented Understudy for Gisting Evaluation)
•
•
•
•
Similar to BLEU, evaluates the degree of word overlap between generated summary and reference/human summary
But recall oriented
Measures overlap in N-grams (e.g. from 1 to 3)
ROUGE-2: calculates the percentage of bigrams from the reference that are in the generated summary
44
COMP90042
L21
ROUGE-2: Example
ROUGE-2
• Ref1:Waterspinachisagreenleafyvegetablegrowninthetropics.
• Ref2:WaterspinachisacommonlyeatenleafvegetableofAsia.
• Generatedsummary:Waterspinachisaleafvegetablecommonly eaten in tropical areas of Asia.
3+6 10+9
•
ROUGE-2 =
45
COMP90042
L21
•
Research focus on single-document abstractive summarisation
‣ Mostly news data
But many types of data for summarisation:
‣ Images, videos
‣ Graphs
‣ Structured data: e.g. patient records, tables
•
•
Multi-document abstractive summarisation
A Final Word
46