Computational
Linguistics
CSC 485 Summer 2020
12 Reading: Jurafsky & Martin: 21.0–8.
Copyright © 2017 Graeme Hirst and Gerald Penn.
All rights reserved.
12. Anaphora and coreference resolution
Gerald Penn
Department of Computer Science, University of Toronto
Anaphora and proforms
• Anaphora: Abbreviated backward reference in text.
Anaphor: A word that makes an anaphoric reference.
• Pronouns as canonical anaphors.
Eugene O’Neill was an American playwright. His plays involve characters who inhabit the fringes of society, engaging in depraved behavior, where they struggle to maintain their hopes and aspirations …
Example based on Wikipedia article on Eugene O’Neill, November 2007.
2
Anaphora and proforms
• Pro-verbs, proadjectives, ellipsis:
Ross stared at the TV screen in horror. Mary did too.
Ross was searching for a purple wombat, but such wombats are not easy to find.
Ross and Nadia wanted to dance together, but Nadia’s mother said they couldn’t.
• Anaphoric tense (German):
Ich habe gestern einen Artikel geschrieben. I have yesterday an article written
Das war sehr schwierig.
That was very difficult
3
Some antecedent problems
• Composite or implicit antecedents:
After Nadiai met Rossj, they{i,j} went swimming.
Ross gave each girli a crayonj. They{i} used them{j} to
draw pictures.
• Event antecedents (expressed as verbs or whole sentences):
The St. Louis bank Page, Bacon & Co. suspended operations in February; it caused a panic in San Francisco.
Based on “Banks, businesses cashed in by ‘mining the miners.’” by Dale Kasler, 18 Jan 1998. http://www.calgoldrush.com/part4/04business.html
4
Constraints on antecedents 1
• Pronouns must match antecedent in number and gender.
• Antecedent must be “nearby” in text.
Hobbs 1978: 90% of antecedents are in same sentence as
pronoun, 8% are in previous sentence.
• Antecedent must be “in focus”.
?Ross put the wine on the tablei. Iti was brown and round.
5
Reference and antecedence
• Loose usage: “referring back in text”.
• Better terminology:
Eugene O’Neill him
referent antecedent anaphor
The anaphor and antecedent corefer. Photo from Wikipedia article on Eugene O’Neill, November 2007.
6
Constraints on antecedents 2
• Syntactic contraints?
Rossi rewarded {himj|himselfi} for hisi,j? good work. Nadiai said that shei,j? had done good work.
Shei said that shei,j? was pleased with Nadiak.
A (non-possessive) pronoun in an NP or PP that refers to the subject NP at the same level must be reflexive. A pronoun subject cannot corefer with a full NP at the same or lower level.
Because hei,j? likes chocolate, Rossj bought a box of truffles.
The antecedent of a forward-referring pronoun must command it: i.e., the antecedent’s immediately-dominating S node must non- immediately dominate the pronoun. Exercise: Draw this.
General consensus now that syntactic constraints are by themselves unreliable.
7
Anaphor resolution
• Anaphor resolution: Determining the antecedent or referent of an anaphor.
• Antecedent might be another anaphor — a chain of coreferring expressions.
• Baseline algorithm: Choose most-recent NP that matches in gender and number.
• Helpful tool: Gender guesser for names.
8
Hobbs’s algorithm 1
• Traverse parse tree searching for candidate antecedents for pronouns.
• Choose first candidate NP that matches in gender and number
(and maybe also in basic selectional preferences).
• Search order:
• Start at pronoun, work upwards, scanning S and NP nodes left-to-right, breadth-first.
• If necessary, traverse previous sentences, scanning S and NP nodes left-to-right, breadth- first.
Hobbs, Jerry R. “Resolving pronoun references”, Lingua, 44, 1978, 311–338. 9
Hobbs’s algorithm 2
Adapted from: Hobbs 1978, figure 2.
10
Hobbs’s algorithm 3
Adapted from: Hobbs 1978, figure 2.
13
Hobbs’s algorithm 4
• Evaluation:
• In 300 examples, it found correct antecedent 88%
without selectional restrictions, 92% with.
• But 168 had only one plausible antecedent. Performance on this subset was apparently 100%.
• On the other 132, scores were 73%, 82% resp.
14
Definite reference
• Anaphors as special case of definite reference. • Any NP that somehow refers back.
Anaconda Mines Inc. today announced a new joint venture with Diamante Minerals Inc. Brenda Nichols, CEO of Anaconda, said the agreement covered the companies’ potash operations in Saskatchewan and the Yukon. Ms Nichols said that demand for the product had increased more than 12% in the past year.
[Lexical chains (Hirst and Morris, 1991)]
15
Soon, Ng, and Lim 2001 1
• A statistical pattern recognition approach. • Goal: Find chains of coreferences in text
(including definite references and anaphors). • Basic idea:
• Classify every pair of NPs in text as either coreferring or not.
• Classifier is learned from data: text marked with positive and negative examples of coreference.
• Features for classification are largely superficial, not syntactic.
Soon, Wee Meng; Ng, Hwee Tou; and Lim, Daniel Chung Yong. “A machine learning approach to coreference resolution of noun phrases.” Computational Linguistics, 27(4), 2001, 521–544.
16
Soon, Ng, and Lim 2001 2
• Method:
• Find “markables” in text (including nested ones, e.g.
both CEO of Anaconda and Anaconda).
• For each one, work backwards through preceding
markables until a coreferent one is found (or give up by running out of candidates).
• Yes/no decision on markable and candidate antecedent is made with decision-tree classifier induced from data by C5.0 algorithm.
Soon, Wee Meng; Ng, Hwee Tou; and Lim, Daniel Chung Yong. “A machine learning approach to coreference resolution of noun phrases.” Computational Linguistics, 27(4), 2001, 521–544.
17
Features for classification of pairs
• Distance apart (in sentences).
• Is either a pronoun?
• Is reference definite or demonstrative?
• String match or alias match?
Bart Simpson, Mr Simpson; IBM, International Business Machines.
• Number, gender agreement?
• Semantic class agreement (by first sense in
WordNet)?
FEMALE, MALE, PERSON, ORGANIZATION. LOCATION, DATE, TIME, MONEY, PERCENT, OBJECT.
• Are both proper names?
• One is a proper name, and the reference is appositive?
18
Training data
• Generate training data from text annotated with coreferences.
• From two Message Understanding Conferences shared data; 30 docs each. (12K, 19K words resp.)
• But only the markables that the pre-processing step is able to find.
• Positive examples: consecutive coreference pairs. (1360, 2150 pairs resp.)
• Negative examples: (prior) non-antecedents
closer to reference than true antecedent is. (19550, 46722 pairs, resp.)
19
Evaluation
• Two MUC standard test sets (30, 20 docs).
• Try with many different parameter settings in C5
algorithm.
• Best PRF = (.673, .586, .626), (.655, .561,.604).
• Better than most MUC competitors.
• Overtraining reduced performance.
20
Coreference classifier for MUC-6 data
String match?
YES NO
✔
Ref is pronoun?
YES
NO
Ref is appositive? NO YES
Gender match?
NO or UNKNOWN
✘
Alias match?
YES NO
YES
YES
✔✔✘ Same sentence? YES Number match?
NO YES NO
✘✔✘
✔
NO
Based on figure 2 of Soon et al., 2001.
21
Candidate is pronoun?
Issues
• Performance not very high.
• Accuracy depends on accuracy of pre-
processing:
• Finding markables (85%), determining semantic classes (???%).
• Not all features are used.
• Semantic classes too inaccurate?
• String match and alias are important features, but are sometimes misleading.
22
Building on this method
• Improve with classification features that make better use of syntax and semantics:
• E.g., (Vincent) Ng, 2007:
• Separate 7-feature classifier for semantic class.
• 22 features for syntactic roles of reference and antecedent.
• 9 different kinds of string matching.
• Corpus: Penn Treebank, hand-annotated for
pronoun coreference.
• Evaluation: 9-point increase in F over Soon et al.
Ng, Vincent. “Semantic class induction and coreference resolution.” Proceedings, 45th Annual Meeting of the Association for Computational Linguistics, Prague, June 2007, 536–543.
23
Raghunathan et al’s sieve method 1
• Avoid statistical methods that consider all features at once.
• Low-precision features may overwhelm high- precision features.
• Avoid methods that consider only one candidate at a time.
• Might make selection too soon.
• Avoid supervised learning-based models.
• Use knowledge-based heuristics (in 2010!).
Raghunathan, Karthik; Lee, Heeyoung; Rangarajan, Sudarshan; Chambers, Nate; Surdeanu, Mihai; Jurafsky, Dan; Manning, Christopher. “A Multi-Pass Sieve for Coreference Resolution.” Proc, 2010 Conf on Empirical Methods in Natural Language Processing, Cambridge, MA, 492– 501.
24
Raghunathan et al’s sieve method 2
• “Sieve” is pipeline of rule-based modules.
• Considers all candidates simultaneously.
• Starts with high-precision features, working through to low-precision.
• Mentions may be clustered (or added to cluster) even if not resolved.
• Label clusters with gender, number, etc., as they become known, e.g.:
The second attack occurred after some rocket firings aimed, apparently, toward
• Greedy algorithm: cluster the Israelis and we.
• Sieve: cluster the Israelis with Israel:[- animate].
25
Raghunathan et al’s sieve method 3
Modules (in order of decreasing precision):
• Exact match
• Appositives and similar, acronyms, demonyms.
• Strict head-matching (no non-matching stop words).
• Matches the Florida Supreme Court, the Florida court
but not Yale University, Harvard University.
• Progressively less strict head-matching (3
modules).
• Pronoun matches (lexicon for gender, animacy).
26
Raghunathan et al’s sieve method 4
• Results: On MUC data, PRF = (.905, .680, .777). • High precision, moderate recall (as expected by
design).
• Most noun recall errors are due to lack of semantic knowledge.
• E.g., recognizing that settlements are agreements, Gitano is the company.
27
Adding world knowledge
• Some anaphors seem to need complex knowledge and inference to resolve.
The city councillors denied the demonstrators a permit because they were communists.
The city councillors denied the demonstrators a permit because they feared violence.
Winograd, Terry. Understanding Natural Language. Academic Press, 1972.
28
Winograd Schema Challenge 1
• Resolve anaphors with different antecedents in minimally-different sentence pairs.
• World knowledge and inference is required. The trophy would not fit in the brown suitcase
because it was too [big | small].
What was too big?
• No “cheap tricks”: Searching corpora, using closed-world assumption.
• In practice: artificial problem, rarely seen in text;
solution requires puzzle-solving, not language
understanding.
Levesque, Hector. The Winograd schema challenge. Proc, AAAI- Spring Symposium on Logical Formalizations of Commonsense Reasoning, 2011.
29
Winograd Schema Challenge 2
Nonetheless, Rahman and Ng:
• Special resolver for Winograd sentences.
• 73% accuracy on test set of 282 pairs. (Baseline 50%; conventional system ~55%.)
• Uses FrameNet, Google searches, and from corpora: common narrative chains, connective relations, selectional restrictions.
• Cheap tricks or legitimate human-like method?
Rahman, Altaf and Ng, Vincent. Resolving complex cases of definite pronouns: The Winograd schema challenge. Proc, 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju, 2012, 777–789.
30