What is Natural Language Processing (NLP)?
From Wikipedia: “Natural language processing (NLP) is a subfield of linguistics, computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to process and analyze large amounts of natural language data.”
What is Computational Linguistics (CL)?
From the ACL website: “Computational linguistics is the scientific study of language from a computational perspective. Computational linguists are interested in providing computational models of various kinds of linguistic phenomena. These models may be ”knowledge-based” (”hand-crafted”) or ”data-driven” (”statistical” or ”empirical”). Work in computational linguistics is in some cases motivated from a scientific perspective in that one is trying to provide a computational explanation for a particular linguistic or psycholinguistic phenomenon; and in other cases the motivation may be more purely technological in that one wants to provide a working component of a speech or natural language system. ”
What is the relation between CL and NLP?
I Most of the time the two terms are used interchangeably.
I In practice, some researchers are more interested to build computational models as an explanation of some linguistic phenomena (“Can the structure of a sentence be represented with a tree?”) while others are more interested in building a working system or an NLP application (a Machine Translation system (MT), a dialogue system).
I There are more and more of the latter types as these applications are put to everyday use. Have you ordered anything with Alexa?
NLP Applications
I Sentiment, opinion, emotion analysis
I Information Extraction, Knowledge Acquisition
I Question Answering, Machine Reading
I Machine Translation
I Text summarization
I Spoken Language Understanding, Dialogue systems I Many others
Sentiment Analysis
Information Extraction
Machine Translation
Machine Translation
Machine Reading
Machine Reading
Question Answering
Text Summarization
Dialogue Systems
NLP Tasks
Many of these problems are complex and cannot be solved with a single model. So they are decomposed into smaller self-contained problems that can be solved individually and then chained together into a pipeline system. These problems are called tasks in NLP convention.
Take Information Extraction (IE), for example. I Named Entity Recognition (NER)
I Coreference resolution
I Relation Extraction
I Knowledge base population
Pipeline systems are susceptible to error propagation. Anytime you can combine individual tasks and do joint inference, you can usually improve system performance.
NLP Tasks are often influenced by linguistic conceptions
Linguistic layer Morphology
Syntax Semantics
Discourse Pragmatics
NLP task
Word tokenization/segmentation, morpho- logical analysis
POS tagging, Syntactic parsing (dependen- cy/constituency)
Semantic role labeling, meaning representa- tion parsing
Discourse parsing
Dialogue act tagging
Formal characterization of NLP tasks
I Simple classification problems
I Sentiment/opinion/emotion analysis, text classification, word
sense disambiguation (WSD), etc… I Sequence labeling problems
I Tokenization, POS tagging, NP-chunking, NER, code switching, dialogue act tagging, Multi-word expression (MWE) detection, etc…
I Problems that can be modeled as trees and graphs
I Syntactic (dependency and constituent) parsing, meaning
representation parsing
I Sequence-to-sequence problems
I Machine Translation, Text Summarization, dialogue systems(?)
Steps in developing an application
Supposed you are asked (or wanted ) to develop an NLP application, you need to think about:
I How do I decompose the application into solvable tasks, given the current state of the art?
I For each task, what is the most appropriate machine learning method for each of the tasks?
I What type of training data should I create (purchase, license)?
General formulation of Learning
Many NLP problems can be formulated mathematically as optimzation:
yˆ=argmax (x,y;✓) y2Y
where,
I x isanelementofasetX
I y is an element of the set Y(x)
I
I X ⇥ Y to real numbers
is a scoring function or a model, which maps from the set ✓ is a vector of parameters for
I yˆ is the predicted output, which is chosen to maximize the scoring function
Learning and search
I Search is the procedure of finding the output yˆ that gets the best score with respect to the input x by computing the argmax of the scoring function
I The search can be simple if it’s a matter of finding the best I label among a small set of labels (e.g., sentiment analysis) Or it needs a non-trivial search algorithm (e.g., finding the
best part-of-speech sequence for a sentence)
I Learning is the process of finding the parameters ✓.
I This is done by optimizing some function of the model and the I labeled data in a training process
The parameters are usually continuous, and learning algorithms generally rely on numerical optimization to identify vectors of real-valued numbers
Learning Methods for Simple Classification
I Supervised Learning I Generative
I Naive Bayes I Discriminative
I linear models: Logistic Regression, Perceptron, Support I Vector Machines⇤
Non-linear models (neural network models or deep learning, multiple layers: MLP, CNN, RNN
I Unsupervised Learning
I EM-based algorithms (backward-foward, inside-outside),
clustering algorithms (K-means, EM, Hierarchical) I Semi-supervised methods
I Search is usually trivial, and involves finding the label that gets the highest score.
Sequence Labeling methods
Sequence labeling methods can be viewed as classification combined with a search algorithm
I Supervised Learning I Generative
I Hidden Markov Models (HMM): Naive Bayes combined with the Viterbi Algorithm
I Discriminative
I linear models: Conditional Random Fields (CRF) Logistic Regression combined with the Viterbi Algorithm, Perceptron
I combined with the Viterbi Algorithm
Non-linear models: LSTM-CRF, a form of RNN combined with a search algorithm
I Unsupervised Learning
I Backward-forward, a form of EM algorithm for sequences
I semi-supervised methods
Tree-based learning algorithms
Tree-based methods can also be viewed as classification combined with a search algorithm
I Supervised Learning
I Generative parsing models
I Naive Bayes combined with the CKY algorithm for constituent parsing
I Discriminative
I Perceptron combined with Beam Search for dependency I parsing
Perceptron or logistic regression combined with greedy search for constituent parsing
I Non-linear models:
I LSTM combined with CKY
I Unsupervised Learning
I Inside-outside, a form of EM algorithm for trees
I semi-supervised methods
Learning and linguistic knowledge
I The relative importance of learning and linguistic knowledge has been a recurring topic of debate.
I “Every time I fire a linguist, the performance of our speech recognition system goes up.”
I Linguistic knowledge figures prominently in early rule-based systems
I Statistical systems in the 1990s and early 2000s focus on linguistically inspired tasks with carefully engineered features
I Deep learning methods enabled end-to-end systems
I Particularly e↵ective in areas like MT where there are large I scale training sets
I “Natural language processing from scratch”
Model architectures are still inspired by linguistic theories
I The debate is far from being settled. Linguistic knowledge is particularly needed in problems that require “deep understanding”.
What type of math is needed
I Probabilities
I The output of a classifier is often expressed in terms of a probabilistic distribution: “This review has a 82% probability
I of being positive, and a 18% probability of being negative” For some models (e.g., Naive Bayes), the parameters ⇥ are in the form of probabilities
I Calculus: know how to compute derivatives for various functions
I The most basic form of machine learning is optimization based on gradient descent/ascent to achieve the maximums or minimums
I Linear algebra: increasingly, you have to manipulate vectors and matricies (or “tensors”)
I Some of you may not have calculus. We plan to have some tutorials on this topic.
Where do you need linguistics?
I Most of the NLP tasks rely on linguistic concepts that are intuitive.
I The part speech of a word: verbs, nouns, adjectives
I Named entities: person, organization, geographic entities, I geo-political entities, etc.
Reviews: This is a positive review
I Some linguistic concepts require some formal linguistic training
I Syntactic trees (constituent or dependent)
I When breaking down an application into smaller tasks that you can solve with machine learning, you need a good understanding of di↵erent layers of analysis: morphology, syntax, semantics, discourse and dialogues, etc.
I You also need good linguistic intuition to come up with good features or architectures in your statistical model
What kind of programming skills do you need?
I Python, Python, Python! I Python libraries:
I numpy, Tensorflow (version 1.12 or 1.13), Keras (now part of I Tensorflow), PyTorch, MXNet
We are going to use PyTorch for some of our projects
I We are hold tutorials on PyTorch for people who are new to this during recitation.
Course requirements
I Prerequisites
I CS114 or permission by instructor (talk to me if you’re not I sure you should take the course)
Programming experience: familiar with Python and Python I packages such as Numpy and PyTorch
Statistics/machine learning/math background and ability to pick up some linguistics
I Disciplines Relevant to NLP
I Linguistics
I Computer science, Artificial Intelligence I Machine learning
Textbooks
I Textbooks
I Required: Introduction to Natural Language Processing by Jacob Eisenstein (pre-publication version can be downloaded from https://github.com/jacobeisenstein/gt-nlp-
I class/blob/master/notes/eisenstein-nlp-notes.pdf) Recommended: Aston Zhang, Zack C. Lipton, Mu Li, Alex J. Smola: Dive into Deep Learning (Online book, can be accessed from https://d2l.ai)
I Supplemental online material
I PyTorch tutorials:
I https://pytorch.org/tutorials/beginner/deep learning 60min blitz.htm
PyTorch Documentation: https://pytorch.org/docs/stable/index.html
l
Perusall
I I plan to experiment with Perusall: https://app.perusall.com/ I Perusall allows users to highlight specific areas of a PDF
document to enter your comments
I When possible, I plan to post the slides to Perusall before each the class so that you can enter comments and questions. I can then address those questions during the class.
I To access the course, enter the access code: XUE-XPWJB
Course work
I Participation: 5%
I Credits for active participation in class discussions, and for
asking questions. I 4 Projects (50%)
I Projects are more open-ended, though starter code is provided for some projects. Projects require experiments and write-up.
I 3-4 Assignments (30%):
I Programming assignments for well-defined problems
I 1 final Quiz (15%)
I The quiz tests important concepts covered in the course.
I Academic integrity: You should finish homework assignments, exams, and project reports on your own unless a project is explicitly stated as a collaborative project. Late projects are subject to grade deduction.