Computational Linguistics
Computational
Linguistics
Copyright © 2017 Graeme
Hirst, Suzanne Stevenson
and Gerald Penn. All rights
reserved.
1
1. Introduction to
computational linguistics
Gerald Penn
Department of Computer Science, University of Toronto
(many slides taken or adapted from others)
CSC 2501 / 485
Fall 2018
Reading: Jurafsky & Martin: 1.
Bird et al: 1, [2.3, 4].
Why would a computer need
to use natural language?
Why would anyone want to
talk to a computer?
2
• Computer as autonomous agent.
Has to talk and understand like a human.
3
• Computer as servant.
Has to take orders.
4
• Computer as personal assistant.
Has to take orders.
5
Schedule a meeting tomorrow with George.
Book me a flight to Vancouver for the
conference. Find out why our sales have
dropped in Lithuania. And write a thank-you
note to my grandma for the birthday present.
• Computer as researcher.
Needs to read and listen to everything.
6
• Computer as researcher.
Brings us the information we need.
7
Find me a well-rated hotel in or near
Stockholm where the food is good,
but not one that has any complaints
about noise.
Did people in 1878 really speak like the
characters in True Grit?
Is it true that if you turn a guinea pig
upside down, its eyes will fall out?
• Computer as researcher.
Organizes the information we need.
8
Please write a 500-
word essay for me on
“Why trees are important
to our environment”.
And also write a
thank-you note to my
grandma for the
birthday present.
• Computer as researcher.
Wins television game shows.
9
IBM’s Watson on Jeopardy!, 16 February 2011
https://www.youtube.com/watch?v=yJptrlCVDHI
https://www.youtube.com/watch?v=yJptrlCVDHI
• Computer as language expert.
Translates our communications.
10
• Input:
Spoken
Written
• Output:
An action
A document or artifact
Some chosen text or speech
Some newly composed text or speech
11
Intelligent language
processing
• Document applications
Searching for documents by meaning
Summarizing documents
Answering questions
Extracting information
Content and authorship analysis
Helping language learners
Helping people with disabilities
…
12
In a patient with
suspected MI, does
thrombolysis decrease
the risk of death even if it
is administered ten hours
after the onset of chest
pain?
Example: Answering clinical questions
at the point of care
13
Example: Early detection of Alzheimer’s
• Look for deterioration in complexity of
vocabulary and syntax.
• Study: Compare three British writers
14
Iris Murdoch P.D. James Agatha Christie
Died of Alzheimer’s No Alzheimer’s Suspected
Alzheimer’s
n.s.
Rise, p < .01 Rise, p < .01 Increase in short-distance word repetition 16 Spoken documents • “Google for speech” Search, indexing, and browsing through audio documents. • Speech summarization Automatically select the 5–20% most important sentences of audio documents. 17 Speech recognition for dysarthria • Use articulation data to improve speech recognition for people with speech disabilities • Created large database of dysarthric speech and articulation data for study 18 Speech transformation for dysarthria • Transform dysarthric speech to improve comprehensibility 19 Models of human language processing • Highly multidisciplinary approach • Exploit the relation between linguistic knowledge and statistical behaviour of words 20 Models of children’s language acquisition • Models of how children learn their language just from what they hear and observe • Apply machine-learning techniques to show how children can learn: ° to map words in a sentence to real world objects ° the relation between verbs and their arguments 21 Mathematics of syntax and language • Fowler’s algorithm (2009): first quasi- polynomial time algorithm for parsing with Lambek categorial grammars • McDonald’s algorithm (2005): novel dependency-grammar parsing algorithm based upon minimum spanning trees • Parsing in freer-word-order languages 22 Knowledge representation and reasoning 23 CL/NLP Linguistics Information Science Psycho- linguistics Machine Learning Signal processing Computational linguistics 1 • Anything that brings together computers and human languages … • … using knowledge about the structure and meaning of language (i.e., not just string processing). • The dream: “The linguistic computer”. • Human-like competence in language. 24 Computational linguistics 2 • The development of computational models with natural language as input and/or output. • Goal: A set of tools for processing language (semi-) automatically: • To access linguistic information easily and to transform it — e.g., summarize, translate, …. • To facilitate communication with a machine. • “NLP”: Natural language processing. 25 26 • Use of computational models in the study of natural language. • Goal: A scientific theory of communication by language: • To understand the structure of language and its use as a complex computational system. • To develop the data structures and algorithms that can implement/approximate that system. Computational linguistics 3 27 What does it mean to “understand” language? 28 In the first line of your sonnet which reads “Shall I compare thee to a summer’s day,” would not “a spring day” do as well or better? The Turing Test It wouldn’t scan. How about “a winter’s day”? That would scan all right. Yes, but nobody wants to be compared to a winter’s day. Alan Turing, “Computing machinery and intelligence”, Mind, 59, 1950, 433–460. 29 Would you say Mr Pickwick reminded you of Christmas? The Turing Test In a way. Yet Christmas is a winter’s day, and I do not think Mr Pickwick would mind the comparison. I don’t think you’re serious. By a winter’s day one means a typical winter’s day, rather than a special one like Christmas. Alan Turing, “Computing machinery and intelligence”, Mind, 59, 1950, 433–460. 30 Using language is not necessarily understanding language 31 Men are all alike. A computer psychiatrist Can you think of a specific example? Well, my boyfriend made me come here. Joseph Weizenbaum, Computer Power and Human Reason, W.H. Freeman, 1976. In what way? They’re always bugging us about something or the other. Your boyfriend made you come here. He says I’m depressed. 32 A computer psychiatrist Perhaps I could learn to get along with my mother. I’m sorry to hear you are depressed. It’s true. I am unhappy. Tell me more about your family. Do you think coming here will help you not to be unhappy? Joseph Weizenbaum, Computer Power and Human Reason, W.H. Freeman, 1976. 33 • 0. Keyword processing: Limited knowledge of particular words or phrases, or their collocations. • Chatterbots. • Information retrieval. • Web searching. Levels of understanding 1 34 • 1. Limited linguistic ability: Appropriate response to simple, highly constrained sentences. • Database queries in NL. “Show all sales staff who exceeded their quota in May.” • Simple NL interfaces. “I want to fly from Toronto to Vancouver next Sunday.” Levels of understanding 2 35 • 2. Full text comprehension: Understanding multi-sentence text and its relation to the “real world”. • Conversational dialogue. • Automatic knowledge acquisition. • 3. Emotional understanding/generation: • Responding to literature, poetry • Story narration. Levels of understanding 3 42 • Emphasis on large-scale NLP applications. • Combines: language processing and machine learning. • Availability of large text corpora, development of statistical methods. • Combines: grammatical theories and actual language use. • Embedding structure into known problem spaces (especially neural networks!). • Combines: statistical pattern recognition and some sophisticated linguistic knowledge. Current research trends 43 • Language interpretation, language generation, and machine translation. • Part-of-speech (PoS) tagging. • Parsing and grammars. • Reference resolution. • Dialogue management. Building blocks of CL systems 1 Does Flight 207 serve lunch? YNQ ( ∃e SERVING(e) ∧ SERVER(e, flight-207) ∧ SERVED(e, lunch) ) Natural language interpretation 44 Sally sprayed paint onto the wall. Natural language generation 45 (spray-1 (OBJECT paint-1) (PATH (path-1 (DESTINATION wall-1)))) (CAUSER sally-1) Machine translation 46 • Current systems based purely on statistical associations. • Getting incrementally better as they learn from more data. • Probably more emergent knowledge of linguistics in there than we give them credit for, but it’s nearly impossible for us to extract. 47http://www.duchcov.cz/gymnazium/ 48http://www.duchcov.cz/gymnazium/ Translated by Google Translate, 14 July 2008 50 http://gymdux.sokolici.eu/index.php/informace/historie-koly Translated by Google Translate, 3 August 2010. 52 http://gymdux.sokolici.eu/index.php/informace/historie-koly Translated by Google Translate, 17 June 2013. 53 http://www.gspsd.cz/historie/historie-skoly Translated by Google Translate, 26 May 2014. 54 • Information extraction • Chunking (instead of parsing). • Template filling. • Named-entity recognition. Building blocks of CL systems 2 Activity-1: Company: Bridgestone Sports Taiwan Co. Product: golf clubs Start date: January 1990 “Bridgestone Sports Co. said Friday it has set up a joint venture in Taiwan with a local concern and a Japanese trading house to produce golf clubs to be shipped to Japan. The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new Taiwan dollars, will start production in January 1990.” Tie-up-1: Relation: Tie-up Entities: Bridgestone Sports Co. a local concern a Japanese trading house Joint venture: Bridgestone Sports Taiwan Co. Activity: Activity-1 Amount: NT $ 20,000,000 Information extraction 55 56 • Lexical semantics • Word sense disambiguation (WSD). • Taxonomies of word senses. • Analysis of verbs and other predicates. • Computational morphology Building blocks of CL systems 3 • The structures that we are interested in are richer than strings – often hierarchical or scope-bearing. 57 Why is understanding hard? 1 Nadia knows Ross left. S NP VP V S NP VP Nadia knows Ross leftKNOWS(Nadia, LEFT(Ross)) • Mapping from surface-form to meaning is many-to-one: Expressiveness. 58 Nadia kisses Ross. Nadia gave Ross a kiss. Nadia gave a kiss to Ross. KISS (Nadia, Ross) Ross is kissed by Nadia. Why is understanding hard? 2 • Mapping is one-to-many: Ambiguity at all levels. • Lexical • Syntactic • Semantic • Pragmatic 59 Why is understanding hard? 3 60 The lawyer walked to the bar and addressed the jury. The lawyer walked to the bar and ordered a beer. You held your breath and the door for me. (Alanis Morissette) • Computational issues • Representing the possible meanings of words, and their frequencies and their indications. • Representing semantic relations between words. • Maintaining adequate context. Lexical ambiguity automated manufacturing plant in Fremont vast manufacturing plant and distribution chemical manufacturing plant , producing viscose keep a manufacturing plant profitable without computer manufacturing plant and adjacent discovered at a St. Louis plant manufacturing copper manufacturing plant found that they copper wire manufacturing plant , for example ‘s cement manufacturing plant in Alpena used to strain microscopic plant life from the zonal distribution of plant life . close-up studies of plant life and natural too rapid growth of aquatic plant life in water the proliferation of plant and animal life establishment phase of the plant virus life cycle that divide life into plant and animal kingdom many dangers to plant and animal life mammals . Animal and plant life are delicately vinyl chloride monomer plant , which is molecules found in plant and animal tissue Nissan car and truck plant in Japan is and Golgi apparatus of plant and animal cells union responses to plant closures . cell types found in the plant kingdom are company said the plant is still operating Although thousands of plant and animal species animal rather than plant tissues can be used to strain microscopic plant life from the zonal distribution of plant life . close-up studies of plant life and natural too rapid growth of aquatic plant life in water the proliferation of plant and animal life establishment phase of the plant virus life cycle that divide life into plant and animal kingdom many dangers to plant and animal life mammals . Animal and plant life are delicately vinyl chloride monomer plant , which is molecules found in plant and animal tissue Nissan car and truck plant in Japan is and Golgi apparatus of plant and animal cells union responses to plant closures . cell types found in the plant kingdom are company said the plant is still operating Although thousands of plant and animal species animal rather than plant tissues can be Decision list for plant LogL Collocation Sense 8.10 plant life → A 7.58 manufacturing plant → B 7.39 life (within ±2-10 words) → A 7.20 manufacturing (in ±2-10 words) → B 6.27 animal (within ±2-10 words) → A 4.70 equipment (within ±2-10 words) → B 4.39 employee (within ±2-10 words) → B 4.30 assembly plant → B 4.10 plant closure → B 3.52 plant species → A 3.48 automate (within ±2-10 words) → B 3.45 microscopic plant → A ... 63 Nadia saw the cop with the binoculars. S NP VP V NP PP P NP Nadia saw the cop with the binoculars S NP VP V NP NP PP P NP saw the cop Nadia with the binoculars Syntactic ambiguity 1 64 Put the book in the box on the table. Put the book in the red book box. Visiting relatives can be trying. Syntactic ambiguity 2 [ ][ ] [ ][ [ [ ]] [[ ] ] [ [ ]] Verb Verb phrase Noun Adj Noun phrase Noun • These are absolutely everywhere. Some real headlines: Juvenile Court to Try Shooting Defendant Teacher Strikes Idle Kids Stolen Painting Found by Tree Clinton Wins on Budget, but More Lies Ahead Hospitals are Sued by 7 Foot Doctors Ban on Nude Dancing on Governor’s Desk • Usually we don’t even notice – we’re that good at this kind of resolution. 65 Syntactic ambiguity 3 • Most syntactic ambiguity is local — resolved by syntactic or semantic context. Visiting relatives is trying. Visiting relatives are trying. Nadia saw the cop with the gun. • Sometimes, resolution comes too fast! The cotton clothing is made from comes from Mississippi. “Garden-path” sentences. [[ ] [ ]][ [ ]] [ ][ ][ [???? 66 Syntactic ambiguity 4 67 • Computational issues • Representing the possible combinatorial structure of words. • Capturing syntactic preferences and frequencies. • Devising incremental parsing algorithms. Syntactic ambiguity 5 Semantic ambiguity • Sentence can have more than one meaning, even when the words and structure are agreed on. Nadia wants a dog like Ross’s. Everyone here speaks two languages. Iraqi Head Seeks Arms. DCS Undergrads Make Nutritious Snacks. 68 69 • A sample dialogue Nadia: Do you know who’s going to the party? Emily: Who? Nadia: I don’t know. Emily: Oh … I think Carol and Amy will be there. • Computational issues • Representing intentions and beliefs. • Planning and plan recognition. • Inferencing and diagnosis. Pragmatic ambiguity Need for domain knowledge 1 Derivatization of the carboxyl function of retinoic acid by fluor- escent or electroactive reagents prior to liquid chromatography was studied. Ferrocenylethylamine was synthesized and could be coupled to retinoic acid. The coupling reaction involved activ- ation by diphenylphosphinyl chloride. The reaction was carried out at ambient temperature in 50 min with a yield of ca. 95%. The derivative can be detected by coulometric reduction (+100 mV) after on-line coulometric oxidation (+400 mV). The limit of de- tection was 1 pmol of derivative on-column, injected in a volume of 10µl, but the limit of quantification was 10 pmol of retinoic acid. 70 S. El Mansouri, M. Tod, M. Leclercq, M. Porthault, J. Chalom, “Precolumn derivatization of retinoic acid for liquid chromatography with fluorescence and coulometric detection.” Analytica Chimica Acta, 293(3), 29 July 1994, 245–250. Need for domain knowledge 2 In doing sociology, lay and professional, every reference to the “real world”, even where the reference is to physical or biological events, is a reference to the organized activities of everyday life. Thereby, in contrast to certain versions of Durkheim that teach that the objective reality of social facts is sociology’s fundamental principle, the lesson is taken instead, and used as a study policy, that the objective reality of social facts as an ongoing accomp- lishment of the concerted activities of daily life, with the ordinary, artful ways of that accomplishment being by members known, used, and taken for granted is, for members doing sociology, a fun- damental phenomenon. 71 Harold Garfinkel, Preface, Studies in Ethnomethodology, Prentice-Hall, 1967, page vii. 72 • Phonology • The sound system of a language. • Morphology • The minimal meaningful units of language (root of a word; suffixes and prefixes), and how they combine. • Lexicon • The semantic and syntactic properties of words. Levels of linguistic structure and analysis 1 73 • Syntax • The means of expressing meaning: how words can combine, and in what order. • Semantics • The meaning of a sentence (a logical statement?). • Pragmatics • The use of a sentence: pronominal referents; intentions; multi-sentence structure. Levels of linguistic structure and analysis 2 74 • Grammars and parsing. • Resolving "syntactic" ambiguities. • Determining "argument structure." • Lexical semantics, resolving word-sense ambiguities. • “Compositional” semantics. • Understanding pronouns. Focus of this course 1 75 • Current methods • Integrating statistical knowledge into grammars and parsing algorithms. • Using text corpora as sources of linguistic knowledge. Focus of this course 2 Not included • Machine translation, language models, text classification, part-of-speech tagging…* • Graph-theoretic and spectral methods% • Speech recognition and synthesis*¶ • Cognitively based methods§ • Semantic inference,% semantic change/drift^ • Understanding dialogues and conversations¶ • Bias, fake news detection, ethics in NLP$ * CSC 401 / 2511. % CSC 2517. ¶ CSC 2518. § CSC 2540. ^ CSC 2519. $csc 2528. 76 What about “Deep Learning?” • Yes, we’ll definitely cover neural methods. • Deep learning is more of a euphemism in NLP. • The depth of the networks hasn’t really paid off to the same extent that it has in other areas. • It would be more accurate to call what we do “fat learning.” • But deep/fat learning isn’t the answer to all of our problems… 77 The Case of Text-to-Speech Synthesis 78 • “We find that the proposed system significantly outperforms all other TTS systems, and results in an MOS comparable to that of the ground truth audio.” The Case of Text-to-Speech Synthesis 79 • “We find that the proposed system significantly outperforms all other TTS systems, and results in an MOS comparable to that of the ground truth audio.” human-generated speech “mean opinion score” Mean Opinion Score • Ask this guy and his friends to label the “naturalness” of speech samples, some of them human-generated, some synthesized. • …but assume all humans are professional readers • …and don’t look too closely at what “naturalness” means. 80 The Case of Text-to-Speech Synthesis 81 • “We find that the proposed system significantly outperforms all other TTS systems, and results in an MOS comparable to that of the ground truth audio.” “almost failed to prove incomparability” Disclaimer 1 82 • “The overall mean score of -0.270 ± 0.155 shows that raters have a small but statistically significant preference towards ground truth over our results.” (ICASSP 2018) Disclaimer 2 83 • We asked Google to provide us with Tacotron prompts for the text that we used in our study, and they refused. • So we used Polly, Amazon’s deep-learning-based TTS system. • We asked Google which statistical significance test they used in the ICASSP 2018 paper, and they did not respond to us. • So what we are going to describe now is what most recent publications have used for significance testing. “Significantly More Natural” 84 • What most people do is run Friedman’s test, together with a post-hoc pairwise comparison called the Wilcoxon signed rank test. • This was probably selected because it is a “paired” test. Pairs = naturalness judgements by a single human subject on 2 systems/speakers. • But paired implies a within-subjects protocol. MOS computes means across human judges. • And the null hypothesis is very strong: the naturalness distribution of the two systems is the same. • Better null hypothesis: P(A < B) = P(B < A). • Weaker null hypothesis → harder to refute. Other Corrections 85 • Broaden the “human” population: • Professional speakers • Native non-professional speakers of North American English • Native non-professional speakers of Indian English • 4 dialects of Polly (AU, GB, IN, US) • Include competitive non-deep-learning baselines: • 4 (non-DL) TTS systems from 2013 Blizzard Challenge • Compare paired AB tests in addition to MOS calculations • Lower variance than absolute 1-5 judgements • Known to correlate with 6/8 ITU P.85 scales for “quality” • Nominal data: need Stuart-Maxwell test with post-hoc binomial sign test. Post-hoc Binomial Sign Test 86 • 2013 TTS was already indistinguishable from Indian speakers. • Polly still not distinguishable from 2013 Blizzard systems. • First studied by USAF for cockpit speech interfaces (Sampson & Navarro, 1984). Thurstone’s Law of Comparative Judgement 87 • The dialectal provenance of the sample is an important factor in determining judgements of naturalness. • AB tests can be mapped onto an absolute scale. • These values were computed using the Bradley-Terry-Luce (BTL) model with Morrisey- Gulliksen scaling. • As observed earlier, foreign accent helps TTS naturalness, but harms human naturalness. The Morals of the Speech Synthesis Story 88 • Did deep learning usher us into a new age of human-like speech production? Well, it didn’t drag us out of it. • Deep learning is more competitive against professional readers and native speakers of North American English. • The purpose of a scientific evaluation is to create reasonable circumstances under which the experimental hypothesis might fail.