l19-qa-v3
COPYRIGHT 2021, THE UNIVERSITY OF MELBOURNE
1
COMP90042
Natural Language Processing
Lecture 19
Semester 1 2021 Week 10
Jey Han Lau
Question Answering
COMP90042 L19
2
Introduction
• Definition: question answering (“QA”) is the task
of automatically determining the answer for a
natural language question
• Mostly focus on “factoid” questions
COMP90042 L19
3
Factoid Questions
Factoid questions, have short precise answers:
• What war involved the battle of Chapultepec?
• What is the date of Boxing Day?
• What are some fragrant white climbing roses?
• What are tannins?
COMP90042 L19
4
Non-factoid Questions
General non-factoid questions require a longer
answer, critical analysis, summary, calculation and
more:
• Why is the date of Australia Day contentious?
• What is the angle 60 degrees in radians?
COMP90042 L19
5
• They are easier
• They have an objective answer
• Current NLP technologies cannot handle non-factoid
answers
• There’s less demand for systems to automatically
answer non-factoid questions
Why do we focus on
factoid questions in NLP?
COMP90042 L19
6
2 Key Approaches
• Information retrieval-based QA
‣ Given a query, search relevant documents
‣ Find answers within these relevant documents
• Knowledge-based QA
‣ Builds semantic representation of the query
‣ Query database of facts to find answers
COMP90042 L19
7
Outline
• IR-based QA
• Knowledge-based QA
• Hybrid QA
COMP90042 L19
8
IR-based QA
COMP90042 L19
9
IR-based Factoid QA: TREC-QA
1. Use question to make query for IR engine
2. Find document, and passage within document
3. Extract short answer string
COMP90042 L19
10
Question Processing
• Find key parts of question that will help retrieval
‣ Discard non-content words/symbols (wh-word, ?, etc)
‣ Formulate as tf-idf query, using unigrams or bigrams
‣ Identify entities and prioritise match
• May reformulate question using templates
‣ E.g. “Where is Federation Square located?”
‣ Query = “Federation Square located”
‣ Query = “Federation Square is located [in/at]”
• Predict expected answer type (here = LOCATION)
COMP90042 L19
11
Answer Types
• Knowing the type of answer can help in:
‣ finding the right passage containing the answer
‣ finding the answer string
• Treat as classification
‣ given question, predict
answer type
‣ key feature is question
headword
‣ What are the animals on the Australian coat of arms?
‣ Generally not a difficult task
COMP90042 L19
12
COMP90042 L19
13
Retrieval
• Find top n documents matching query (standard IR)
• Next find passages (paragraphs or sentences) in
these documents (also driven by IR)
• Should contain:
‣ many instances of the question keywords
‣ several named entities of the answer type
‣ close proximity of these terms in the passage
‣ high ranking by IR engine
• Re-rank IR outputs to find best passage (e.g., using
supervised learning)
COMP90042 L19
14
Answer Extraction
• Find a concise answer to the question, as a span
in the passage
‣ “Who is the federal MP for Melbourne?”
‣ The Division of Melbourne is an Australian Electoral
Division in Victoria, represented since the 2010
election by Adam Bandt, a member of the Greens.
‣ “How many Australian PMs have there been since
2013?”
‣ Australia has had five prime ministers in five years.
No wonder Merkel needed a cheat sheet at the G-20.
COMP90042 L19
15
How?
• Use a neural network to extract answer
• AKA reading comprehension task
• But deep learning models require lots of data
• Do we have enough data to train comprehension
models?
COMP90042 L19
16
• Crowdworkers write
fictional stories,
questions and answers
• 500 stories, 2000
questions
• Multiple choice
questions
MCTest
COMP90042 L19
17
SQuAD
• Use Wikipedia passages
• First set of crowdworkers create questions (given passage)
• Second set of crowdworkers label the answer
• 150K questions (!)
• Second version includes unanswerable questions
COMP90042 L19
18
Reading Comprehension
• Given a question and context passage, predict where
the answer span starts and end in passage?
• Compute:
‣ : prob. of token i is the starting token
‣ : prob. of token i is the ending token
pstart(i)
pend(i)
COMP90042 L19
19
LSTM-Based Model
• Feed question tokens to a bidirectional LSTM
• Aggregate LSTM outputs via weighted sum to
produce , the final question embeddingq
COMP90042 L19
20
LSTM-Based Model
• Process passage in a similar way, using another bidirectional LSTM
• More than just word embeddings as input
‣ A feature to denote whether the word matches a question word
‣ POS feature
‣ Weighted question embedding: produced by attending to each
question words
COMP90042 L19
21
• : one vector for each passage token from
bidirectional LSTM
• To compute start and end probability for each token:
{p1, . . . , pm}
pstart(i) ∝ exp(piWsq)
pend(i) ∝ exp(piWeq)
LSTM-Based Model
COMP90042 L19
22
Bert-Based Model
• Fine-tune BERT to predict answer span
pstart(i) ∝ exp(S
⊺T′�i)
pend(i) ∝ exp(E
⊺T′�i)
COMP90042 L19
23
• It has more parameters
• It’s pre-trained and so already “knows” language
before it’s adapted to the task
• Multi-head attention is the secret sauce
• Self-attention architecture allows fine-grained analysis
between words in question and context paragraph
Why BERT works better than LSTM?
COMP90042 L19
24
Knowledge-based QA
COMP90042 L19
25
QA over structured KB
• Many large knowledge bases
‣ Freebase, DBpedia, Yago, …
• Can we support natural language queries?
‣ E.g.
‣ Link “Ada Lovelace” with the correct entity in the
KB to find triple (Ada Lovelace, birth-year, 1815)
COMP90042 L19
26
But…
• Converting natural language sentence into triple is
not trivial
• Entity linking also an important component
‣ Ambiguity: “When was Lovelace born?”
• Can we simplify this two-step process?
COMP90042 L19
27
Semantic Parsing
• Convert questions into logical forms to query KB directly
‣ Predicate calculus
‣ Programming query (e.g. SQL)
COMP90042 L19
28
How to Build a Semantic Parser?
• Text-to-text problem:
‣ Input = natural language sentence
‣ Output = string in logical form
• Encoder-decoder model (lecture 17!)
COMP90042 L19
29
Hybrid QA
COMP90042 L19
30
Hybrid Methods
• Why not use both text-based and knowledge-
based resources for QA?
• IBM’s Watson which won the game show
Jeopardy! uses a wide variety of resources to
answer questions
‣ THEATRE: A new play based on this Sir Arthur
Conan Doyle canine classic opened on the
London stage in 2007.
‣ The Hound Of The Baskervilles
COMP90042 L19
31
Core Idea of Watson
• Generate lots of candidate answers from text-
based and knowledge-based sources
• Use a rich variety of evidence to score them
• Many components in the system, most trained
separately
COMP90042 L19
32
Watson
COMP90042 L19
33
Watson
• Use web documents (Wikipedia
e.g.) and knowledgebases
• Answer extraction: extract all
NPs, or use article title if
Wikipedia document
COMP90042 L19
34
Watson
• Use many sources of evidence to
score candidate answers
• Search for extra evidence: passages
that support the candidate answers
• Candidate answer should really match
answer type
COMP90042 L19
35
Watson
• Merge similar answers:
• JFK, John F. Kennedy, John
Fitzgerald Kennedy
• Use synonym lexicons, Wikipedia
COMP90042 L19
36
QA Evaluation
• IR: Mean Reciprocal Rank for systems returning
matching passages or answer strings
‣ E.g. system returns 4 passages for a query, first
correct passage is the 3rd passage
‣ MRR = ⅓
• MCTest: Accuracy
• SQuAD: Exact match of string against gold answer
COMP90042 L19
37
A Final Word
• IR-based QA: search textual resources to answer
questions
‣ Reading comprehension: assumes
question+passage
• Knowledge-based QA: search structured
resources to answer questions
• Hot area: many new approaches & evaluation
datasets being created all the time (narratives,
QA, commonsense reasoning, etc)
COMP90042 L19
38
Reading
• JM3 Ch. 23.2, 23.4, 23.6