程序代写代做代考 finance C deep learning graph Summarisation

Summarisation
COMP90042
Natural Language Processing Lecture 21
COPYRIGHT 2020, THE UNIVERSITY OF MELBOURNE
1

COMP90042
L21


Distill the most important information from a text to produce shortened or abridged version
Summarisation
Applications
‣ outlines of a document
‣ abstracts of a scientific article ‣ headlines of a news article
‣ snippets of search result
2

COMP90042
L21
What to Summarise?
• Single-documentsummarisation
‣ Input: a single document
‣ Output: summary that characterise the content
• Multi-documentsummarisation
‣ Input: multiple documents
‣ Output: summary that captures the gist of all documents
‣ E.g. summarise a news event from multiple sources or perspectives
3

COMP90042
L21

Extractive summarisation
‣ Summarise by selecting representative sentences from documents

Abstractive summarisation
‣ Summarise the content in your own words
‣ Summaries will often be paraphrases of the original content
How to Summarise?
4

5
COMP90042 L21

COMP90042
L21

Generic summarisation
‣ Summary gives important information in the document(s)

Query-focused summarisation
‣ Summary responds to a user query
‣ Similar to question answering
‣ But answer is much longer (not just a phrase)
Why Summarise?
6

COMP90042 L21
Query-Focused Summarisation
7

COMP90042
L21

Extractive summarisation ‣ Single-document
‣ Multi-document


Abstractive summarisation
‣ Single-document (deep learning models!)
Evaluation
Outline
8

COMP90042 L21
Extractive: Single-Doc
9

COMP90042
L21



Content selection: select what sentences to extract from the document
Summarisation System
Information ordering: decide how to order extracted sentences
Sentence realisation: cleanup to make sure combined sentences are fluent
10

COMP90042
L21
• •
We will focus on content selection
For single-document summarisation, information
ordering not necessary
‣ present extracted sentences in original order

Sentence realisation also not necessary if they are presented in dot points
Summarisation System
11

COMP90042
L21

• •
Not much data with ground truth extractive sentences
Content Selection
Mostly unsupervised methods
Goal: Find sentences that are important or salient
12

COMP90042
L21
• •
Frequent words in a doc → salient
But some generic words are very frequent but
uninformative
‣ function words ‣ stop words
Weigh each word w in document d by its inverse document frequency:
weight(w) = tfd,w × idfw


Method 1: TF-IDF
13

COMP90042 L21
Method 2: Log Likelihood Ratio
• Intuition:awordissalientifitsprobabilityintheinputcorpusis
very different {to a background corpus weight(w) = 1, if − 2logλ(w) > 10
0, otherwise λ(w) is the ratio between:
(NI)px(1−p)NI−x x
(NB)py(1−p)NB−y y
• •
‣ ‣
P(observing w in I) and P(observing w in B), assuming P(w|I) = P(w|B) = p x + y
NI + NB P(observing w in I) and P(observing w in B), assuming
P(w|I) = pI and P(w|B) = pB
NI px(1 − pI)NI−x x y NB py(1 − pB)NB−y
()NN() xIIByB
14

COMP90042
L21
• •
Saliency of A Sentence?
weight(s) = 1 ∑ weight(w) |S| w∈S
Only consider non-stop words in S
15

COMP90042 L21
Method 3: Sentence Centrality
• •
• •

Alternative approach to ranking sentences
Measure distance between sentences, and choose sentences that are closer to other sentences
Use tf-idf to represent sentence
Use cosine similarity to measure distance
centrality(s) = 1 ∑ costfidf (s, s′) #sent s′
16

COMP90042
L21

Use top-ranked sentences as extracted summary ‣ Saliency (tf-idf or log likelihood ratio)
‣ Centrality
Final Extracted Summary
17

COMP90042
L21
Method 4: RST Parsing
With its distant orbit – 50 percent farther from the sun than Earth – and slim atmospheric blanket, Mars experiences frigid weather conditions. Surface temperatures typically average about -70 degrees Fahrenheit at the equator, and can dip to -123 degrees C near the poles. Only the midday sun at tropical latitudes is warm enough to thaw ice on occasion, but any liquid water formed in this way would evaporate almost instantly because of the low atmospheric pressure. Although the atmosphere holds a small amount of water, and water-ice clouds sometimes develop, most Martian weather involves blowing dust or carbon dioxide.
18

COMP90042
L21


Rhetorical structure theory (L12, Discourse): explain how clauses are connected
Method 4: RST Parsing
Define the types of relations between a nucleus (main clause) and a satellite (supporting clause)
19

COMP90042
L21
Method 4: RST Parsing • Nucleusmoreimportantthansatellite
• Asentencethatfunctionsasanucleustomoresentences=more salient
20

COMP90042 L21
Extractive: Multi-Doc
21

COMP90042
L21


Similar to single-document extractive summarisation system
Summarisation System
Challenges:
‣ Redundancy in terms of information ‣ Sentence ordering
22

COMP90042
L21


We can use the same unsupervised content selection methods (tf-idf, log likelihood ratio, centrality) to select salient sentences
Content Selection
But ignore sentences that are redundant
23

COMP90042 L21
Maximum Marginal Relevance

• •


Iteratively select the best sentence to add to summary
Sentences to be added must be novel Penalise a candidate sentence if it’s similar to
extracted sentences:
MMR-penalty(s) = λ max sim(s, si) si∈𝒮
Stop when a desired number of sentences are added
24

COMP90042
L21
Sentence Simplification
• Createmultiplesimplifiedversionsofsentencesbefore extraction
• Former Democratic National Committee finance director Richard Sullivan faced more pointed questioning from Republicans during his second day on the witness stand in the Senate’s fund-raising investigation
‣ Richard Sullivan faced pointed questioning
‣ Richard Sullivan faced pointed questioning from Republicans during day on
stand in Senate fundraising investigation
• MMRtomakesureonlynon-redundantsentencesareselected
25

COMP90042
L21


Chronological ordering:
‣ Order by document dates Coherence:
‣ Order in a way that makes adjacent sentences similar
‣ Order based on how entities are organised (centering theory, L12)
Information Ordering
26

COMP90042
L21
Sentence Realisation
27

COMP90042
L21

Make sure entities are referred coherently ‣ Full name at first mention
‣ Last name at subsequent mentions
• •
Apply coreference methods to first extract names Write rules to clean up
Sentence Realisation
28

COMP90042 L21
Abstractive: Single-Doc
29

COMP90042
L21
Example
a detained iranian-american academic accused of acting against national security has been released from a tehran prison after a hefty bail was posted, a top judiciary official said tuesday
iranian-american academic held in tehran released on bail



Paraphrase
A very difficult task
Can we train a neural network to generate summary?
30

COMP90042
L21
Encoder-Decoder?
RNN1 RNN1 RNN1
⽜牛吃草
Encoder RNN
Source sentence
cow eats
grass

RNN2
grass
RNN2 RNN2 RNN2
cow eats
Decoder RNN
Target sentence
• Whatifwetreat:
‣ Source sentence = “document” ‣ Target sentence = “summary”
31

COMP90042
L21
Encoder-Decoder?
RNN1 RNN1 … RNN1
Encoder RNN
a detained
tuesday
Source sentence
iranian-
RNN2

Decoder RNN
american
RNN2
iranian-
academic
RNN2
american


RNN2
bail
Target sentence
a detained iranian-american academic accused of acting against national security has been released from a tehran prison after a hefty bail was posted, a top judiciary official said tuesday
iranian-american academic held in tehran released on bail
32

COMP90042
L21
• • • •
News headlines
Document: First sentence of article
Summary: News headline/title
Technically more like a “headline generation task”
Data
33

COMP90042 L21
And It Kind of Works…
Rush et al. (2015): A Neural Attention Model for Abstractive Sentence Summarisation 34

COMP90042 L21
Improvements
• Attentionmechanism
• Richerwordfeatures:POStags,NERtags,tf-idf • Hierarchicalencoders
output
hidden
input
‣ One LSTM for words
‣ Another LSTM for sentences
output
hidden (sent)
hidden (word)
input
Nallapati et al. (2016): Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond 35

COMP90042
L21


Occasionally reproduce statements incorrectly (hallucinate new details!)

Solution: allow decoder to copy words directly from input document during generation
Issues
Unable to handle out-of-vocab words in document ‣ Generate UNK in summary
‣ E.g. new names in test documents
36

COMP90042 L21
Encoder-decoder with Attention
See et al. (2017): Get To The Point: Summarization with Pointer-Generator Networks 37

COMP90042 L21
P(Argentina) = (1 − pgen) × Pattn(Argentina) + pgen × Pvoc(Argentina)
probability of “copying”
scalar, e.g. 0.8
Encoder-decoder with Attention + Copying
See et al. (2017): Get To The Point: Summarization with Pointer-Generator Networks 38

COMP90042
L21


Generate summaries that reproduce details in the document
Copy Mechanism
Can produce out-of-vocab words in the summary by copying them in the document
‣ e.g. smergle = out of vocabulary
‣ p(smergle) = attention probability + generation probability = attention probability
39

COMP90042
L21
Generated Summaries
40

COMP90042
L21
• •
But headline generation isn’t really exciting… Latest summarisation data:
‣ CNN/Dailymail: 300K articles, summary in bullets ‣ Newsroom: 1.3M articles, summary by authors


More Summarisation Data
Diverse; 38 major publications ‣ XSum: 200K BBC articles
Summary is more abstractive than other datasets
41

COMP90042
L21

• •
State-of-the-art models use transformers instead of RNNs
Latest Development
Lots of pre-training
Note: BERT not directly applicable because we need a unidirectional decoder (BERT is only an encoder)
42

COMP90042
L21
Evaluation
43

COMP90042
L21
ROUGE
(Recall Oriented Understudy for Gisting Evaluation)




Similar to BLEU, evaluates the degree of word overlap between generated summary and reference/human summary
But recall oriented
Measures overlap in N-grams (e.g. from 1 to 3)
ROUGE-2: calculates the percentage of bigrams from the reference that are in the generated summary
44

COMP90042
L21
ROUGE-2: Example
ROUGE-2
• Ref1:Waterspinachisagreenleafyvegetablegrowninthetropics.
• Ref2:WaterspinachisacommonlyeatenleafvegetableofAsia.
• Generatedsummary:Waterspinachisaleafvegetablecommonly eaten in tropical areas of Asia.
3+6 10+9

ROUGE-2 =
45

COMP90042
L21

Research focus on single-document abstractive summarisation
‣ Mostly news data
But many types of data for summarisation:
‣ Images, videos
‣ Graphs
‣ Structured data: e.g. patient records, tables


Multi-document abstractive summarisation
A Final Word
46