Question Answering
COMP90042
Natural Language Processing Lecture 19
COPYRIGHT 2020, THE UNIVERSITY OF MELBOURNE
1
COMP90042
L19
•
•
Definition: question answering (“QA”) is the task of automatically determining the answer for a natural language question
Introduction
Main focus on “factoid” QA
‣ Who is the prime minister of the United Kingdom in 2020?
→ Boris Johnson
2
COMP90042
L19
Examples
Factoid questions, have short precise answers:
• What war involved the battle of Chapultepec?
• Who was Confucius?
• What is the date of Boxing Day?
• What are some fragrant white climbing roses?
• What are tannins?
General non-factoid questions require a longer answer, critical analysis, summary, calculation and more:
• Why is the date of Australia Day contentious?
• What is the angle 60 degrees in radians?
3
COMP90042
L19
•
Information retrieval-based QA
‣ Given a query, search relevant documents
‣ Find answers within these relevant documents
•
Knowledge-based QA
‣ Builds semantic representation of the query ‣ Query database of facts to find answers
2 Key Approaches
4
COMP90042
L19
IR-Based QA
5
COMP90042 L19
IR-based Factoid QA: TREC-QA
1. UsequestiontomakequeryforIRengine
2. Finddocument,andpassagewithindocument 3. Extractshortanswerstring
6
COMP90042
L19
•
Find key parts of question that will help retrieval ‣ discard structural parts (wh-word, ?, etc)
‣ formulate as tf-idf query, using unigrams or bigrams ‣ identify entities and prioritise match
• •
•
May reformulate question using templates
E.g., “Where is Federation Square located?” ‣ query = “Federation Square located”
‣ query = “Federation Square is located [in/at]”
Question Processing
Predict expected answer type (here = LOCATION) 7
COMP90042
L19
•
•
Knowing the type of answer can help in:
‣ finding the right passage containing the answer ‣ finding the answer string
Answer Types
Treat as classification
‣ given question, predict
answer type
‣ key feature is question
headword
‣ What are the animals on the Australian coat of arms?
‣ Generally not a difficult task
8
COMP90042 L19
9
COMP90042
L19
Retrieval
• Findtopndocumentsmatchingquery(standardIR)
• Nextfindpassages(paragraphsorsentences)in these documents
• Shouldcontain:
‣ many instances of the question keywords
‣ several named entities of the answer type
‣ close proximity of these terms in the passage ‣ high ranking by IR engine; etc
• Re-rankIRoutputstofindbestpassage(e.g.,using supervised learning)
10
COMP90042
L19
•
Find a concise answer to the question, as a span in the text
‣ “Who is the federal MP for Melbourne?”
‣ The Division of Melbourne is an Australian Electoral Division in Victoria, represented since the 2010 election by Adam Bandt, a member of the Greens.
‣ “How many Australian PMs have there been since 2013?”
‣ Australia has had five prime ministers in five years. No wonder Merkel needed a cheat sheet at the G-20.
Answer Extraction
11
COMP90042 L19
• •
•
Frame it as a classification problem
Classify whether a candidate answer (typically a short span) contains an answer
Feature-Based Answer Extraction
Various features based on match to question, expected entity type match, specific answer patterns
“Who is the federal MP for Melbourne?”
The Division of Melbourne is an Australian Electoral Division in Victoria, represented since the 2010 election by Adam Bandt, a member of the Greens.
12
COMP90042
L19
•
•
•
•
Use a neural network to extract answer
AKA reading comprehension task
But deep learning models requires lots of data
Neural Answer Extraction
Do we have enough data to train comprehension models?
13
COMP90042
L19
MCTest (Richardson et al. 2016)
•
•
•
Crowdworkers write fictional stories, questions and answers
500 stories, 2000 questions
Multiple choice questions
14
COMP90042 L19
SQuAD (Rajpurkar et al. 2016)
• UseWikipediapassages
• Firstsetofcrowdworkerscreatequestions(givenpassage) • Secondsetofcrowdworkerslabeltheanswer
• 150Kquestions(!)
• Secondversionincludesunanswerablequestions
15
COMP90042
L19
Reading Comprehension
• Answerspanstarts/endsatwhichtokeninpassage? • Compute:
pstart(i): prob. of token i is the starting token pend(i): prob. of token i is the ending token
‣ ‣
16
COMP90042 L19
LSTM-Based (Chen et al. 2017)
• •
Feed question tokens to a bidirectional LSTM
Aggregate LSTM outputs via weighted sum to produce q, the final question embedding
17
COMP90042 L19
• •
Process passage in a similar way, using another bidirectional LSTM More than just word embeddings as input
‣ A feature to denote whether the word matches a question word
‣ POS feature
‣ Weighted question embedding: produced by attending to each question words
LSTM-Based (Chen et al. 2017)
18
COMP90042 L19
LSTM-Based (Chen et al. 2017)
{p1, . . . , pm}: one vector for each passage token from bidirectional LSTM
•
• Tocomputestartandendprobabilityforeachtoken:
‣
pstart(i) ∝ exp(piWsq) pend(i) ∝ exp(piWeq)
19
COMP90042
L19
•
Bert-Based Fine-tune BERT to predict answer span
p (i) ∝ exp(S⊺T′) start ⊺ i
p (i) ∝ exp(E T′) end i
•
20
COMP90042 L19
Knowledge-Based QA
21
COMP90042
L19
QA over structured KB
•
• •
‣ E.g.,
‣ And use it to query KB to find triple (Ada Lovelace, birth-year, 1815) and provide answer 1815.
Many large knowledge bases
‣ Sports statistics, Moon rock data, … ‣ Freebase, DBpedia, Yago, …
Each with own query language SQL, SPARQL etc. Can we support natural language queries?
22
COMP90042
L19
Semantic Parsing
• Basedonalignedquestionsandtheirlogicalform,e.g., GeoQuery (Zelle & Mooney 1996)
‣ What is the capital of the state with the largest population? ‣ answer(C, (capital(S,C), largest(P, (state(S),
population(S,P))))).
• Canmodelusingparsing(Zettlemoyer&Collins2005)tobuild compositional logical form
Idaho b) What states border Texas NP (S/(S\NP))/N N (S\NP)/NP NP
)
idaho) <
o)
f. g. x.f (x) ^ g(x) x.state(x) x. y.borders(y, x) texas S/(S\NP) > (S\NP) >
g. x.state(x) ^ g(x)
x.state(x) ^ borders(x, texas)
idaho
>
S
y.borders(y, texas) >
23
x
)
h
Figure 2: Two examples of CCG parses.
COMP90042
L19
•
•
Why not use both text-based and knowledge- based resources for QA?
Hybrid Methods
IBM Watson (Ferrucci et al., 2010) which won the game show Jeopardy! uses a wide variety of resources to answer questions
‣ William Wilkinson’s “An Account of The Principalities of Wallachia and Moldovia” inspired this author’s most famous novel.
‣ Bram Stoker, Dracula
24
COMP90042
L19
IBM’s WATSON
• Extract focus, answer type
• Perform question classification
• Answer types particularly tricky
• Standard named entities only cover half of
20K train questions
• Found 5K answer types for these 20K
questions
• Not all is ML; plenty of rules
25
COMP90042
L19
IBM’s WATSON
• Use web documents (Wikipedia e.g.) and knowledgebases
• Queries for retrieving web documents: stopwords removed, certain terms upweighted
• Lots of answers come from Wikipedia article title!
26
COMP90042
L19
IBM’s WATSON
• Use many sources to evidence to score candidate answers
• Search for extra evidence: passages that support the candidate answers
• Candidate answer should really match answer type (uses WordNet)
27
COMP90042
L19
IBM’s WATSON
• Merge similar answers:
• JFK, John F. Kennedy, John
Fitzgerald Kennedy
• Use synonym lexicons, Wikipedia
28
COMP90042
L19
•
TREC-QA: Mean Reciprocal Rank for systems returning matching passages or answer strings
‣ E.g. system returns 4 passage for a query, first correct passage is the 3rd passage
‣ MRR = 1⁄3
•
•
QA Evaluation
SQuAD:
‣ Exact match of string against gold answer ‣ F1 score over bag of selected tokens
MCQ reading comprehension: Accuracy
29
COMP90042
L19
•
IR-based QA: search textual resources to answer questions
‣ Reading comprehension: assumes question+passage
•
•
Knowledge-based QA: search structured resources to answer questions
A Final Word
Hot area: many new approaches & evaluation datasets being created all the time (narratives, QA, commonsense reasoning, etc)
30
COMP90042
L19
• JM3 Ch. 25
Reading
31
COMP90042
L19
References
Chen, D., Fisch, A., Weston, J., and Bordes, A. (2017). Reading wikipedia
to answer open-domain questions. In ACL 2017.
•
•
•
•
•
Ferrucci, D., E. Brown, J. Chu-Carroll, J. Fan, D. Gondek, A. A. Kalyanpur, A. Lally, J. W. Murdock, E. Nyberg, J. Prager, et al. (2010). Building Watson: An overview of the DeepQA project. AI magazine 31(3), 59–79.
Rajpurkar, P., Zhang, J., Lopyrev, K., and Liang, P. (2016). SQuAD: 100,000+ questions for machine comprehension of text. In EMNLP 2016.
Zettlemoyer, L. and Collins, M. (2005). Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In UAI.
Zelle, J. M. and Mooney, R. J. (1996). Learning to parse database queries using inductive logic programming. In AAAI.
32