CS计算机代考程序代写 deep learning case study AI algorithm Analysis Methods in Neural Language Processing: A Survey

Analysis Methods in Neural Language Processing: A Survey

Yonatan Belinkov12 and James Glass1

1MIT Computer Science and Artificial Intelligence Laboratory
2Harvard School of Engineering and Applied Sciences

Cambridge, MA, USA
{belinkov, glass}@mit.edu

Abstract

The field of natural language processing has
seen impressive progress in recent years,
with neural network models replacing many
of the traditional systems. A plethora of new
models have been proposed, many of which
are thought to be opaque compared to their
feature-rich counterparts. This has led re-
searchers to analyze, interpret, and evalu-
ate neural networks in novel and more fine-
grained ways. In this survey paper, we re-
view analysis methods in neural language
processing, categorize them according to
prominent research trends, highlight exist-
ing limitations, and point to potential direc-
tions for future work.

1 Introduction

The rise of deep learning has transformed the
field of natural language processing (NLP) in re-
cent years. Models based on neural networks
have obtained impressive improvements in vari-
ous tasks, including language modeling (Mikolov
et al., 2010; Jozefowicz et al., 2016), syntactic
parsing (Kiperwasser and Goldberg, 2016), ma-
chine translation (MT) (Bahdanau et al., 2014;
Sutskever et al., 2014), and many other tasks; see
Goldberg (2017) for example success stories.

This progress has been accompanied by a myr-
iad of new neural network architectures. In many
cases, traditional feature-rich systems are being re-
placed by end-to-end neural networks that aim to
map input text to some output prediction. As end-
to-end systems are gaining prevalence, one may
point to two trends. First, some push back against
the abandonment of linguistic knowledge and call
for incorporating it inside the networks in different
ways.1 Others strive to better understand how neu-
ral language processing models work. This theme

1See, for instance, Noah Smith’s invited talk at ACL
2017: vimeo.com/234958746. See also a recent de-
bate on this matter by Chris Manning and Yann LeCun:

of analyzing neural networks has connections to
the broader work on interpretability in machine
learning, along with specific characteristics of the
NLP field.

Why should we analyze our neural NLP mod-
els? To some extent, this question falls into
the larger question of interpretability in machine
learning, which has been the subject of much de-
bate in recent years.2 Arguments in favor of in-
terpretability in machine learning usually mention
goals like accountability, trust, fairness, safety,
and reliability (Doshi-Velez and Kim, 2017; Lip-
ton, 2016). Arguments against typically stress per-
formance as the most important desideratum. All
these arguments naturally apply to machine learn-
ing applications in NLP.

In the context of NLP, this question needs to
be understood in light of earlier NLP work, often
referred to as feature-rich or feature-engineered
systems. In some of these systems, features are
more easily understood by humans – they can be
morphological properties, lexical classes, syntac-
tic categories, semantic relations, etc. In theory,
one could observe the importance assigned by sta-
tistical NLP models to such features in order to
gain a better understanding of the model.3 In con-
trast, it is more difficult to understand what hap-
pens in an end-to-end neural network model that
takes input (say, word embeddings) and generates
an output (say, a sentence classification). Much of
the analysis work thus aims to understand how lin-
guistic concepts that were common as features in
NLP systems are captured in neural networks.

As the analysis of neural networks for language

www.youtube.com/watch?v=fKk9KhGRBdI. (Videos
accessed on December 11, 2018.)

2See, for example, the NIPS 2017 debate:
www.youtube.com/watch?v=2hW05ZfsUUo. (Ac-
cessed on December 11, 2018.)

3Nevertheless, one could question how feasible such an
analysis is; consider for example interpreting support vectors
in high-dimensional support vector machines (SVMs).

ar
X

iv
:1

81
2.

08
95

1v
2

[
cs

.C
L

]
1

4
Ja

n
20

vimeo.com/234958746
www.youtube.com/watch?v=fKk9KhGRBdI
www.youtube.com/watch?v=2hW05ZfsUUo

is becoming more and more prevalent, neural net-
works in various NLP tasks are being analyzed;
different network architectures and components
are being compared; and a variety of new anal-
ysis methods are being developed. This survey
aims to review and summarize this body of work,
highlight current trends, and point to existing lacu-
nae. It organizes the literature into several themes.
Section 2 reviews work that targets a fundamen-
tal question: what kind of linguistic information
is captured in neural networks? We also point to
limitations in current methods for answering this
question. Section 3 discusses visualization meth-
ods, and emphasizes the difficulty in evaluating vi-
sualization work. In Section 4 we discuss the com-
pilation of challenge sets, or test suites, for fine-
grained evaluation, a methodology that has old
roots in NLP. Section 5 deals with the generation
and use of adversarial examples to probe weak-
nesses of neural networks. We point to unique
characteristics of dealing with text as a discrete
input and how different studies handle them. Sec-
tion 6 summarizes work on explaining model pre-
dictions, an important goal of interpretability re-
search. This is a relatively under-explored area,
and we call for more work in this direction. Sec-
tion 7 mentions a few other methods that do not
fall neatly into one of the above themes. In the
conclusion, we summarize the main gaps and po-
tential research directions for the field.

The paper is accompanied by online supple-
mentary materials that contain detailed references
for studies corresponding to Sections 2, 4, and
5 (Tables SM1, SM2, and SM3, respectively),
available at http://boknilev.github.io/
nlp-analysis-methods.

Before proceeding, we briefly mention some
earlier work of a similar spirit.

A historical note Reviewing the vast literature
on neural networks for language is beyond our
scope.4 However, we mention here a few repre-
sentative studies that focused on analyzing such
networks, in order to illustrate how recent trends
have roots that go back to before the recent deep
learning revival.

Rumelhart and McClelland (1986) built a feed-
forward neural network for learning the English

4For instance, a neural network that learns distributed rep-
resentations of words was developed already in Miikkulainen
and Dyer (1991). See Goodfellow et al. (2016, chapter 12.4)
for references to other important milestones.

past tense and analyzed its performance on a va-
riety of examples and conditions. They were es-
pecially concerned with the performance over the
course of training, as their goal was to model the
past form acquisition in children. They also ana-
lyzed a scaled-down version having 8 input units
and 8 output units, which allowed them to de-
scribe it exhaustively and examine how certain
rules manifest in network weights.

In his seminal work on recurrent neural net-
works (RNNs), Elman trained networks on syn-
thetic sentences in a language prediction task (El-
man, 1989, 1990, 1991). Through extensive anal-
yses, he showed how networks discover the no-
tion of a word when predicting characters; cap-
ture syntactic structures like number agreement;
and acquire word representations that reflect lexi-
cal and syntactic categories. Similar analyses were
later applied to other networks and tasks (Har-
ris, 1990; Niklasson and Linåker, 2000; Pollack,
1990; Frank et al., 2013).

While Elman’s work was limited in some
ways, such as evaluating generalization or various
linguistic phenomena—as Elman himself recog-
nized (Elman, 1989)—it introduced methods that
are still relevant today: from visualizing network
activations in time, through clustering words by
hidden state activations, to projecting representa-
tions to dimensions that emerge as capturing prop-
erties like sentence number or verb valency. The
sections on visualization (Section 3) and identi-
fying linguistic information (Section 2) contain
many examples for these kinds of analysis.

2 What linguistic information is
captured in neural networks

Neural network models in NLP are typically
trained in an end-to-end manner on input-output
pairs, without explicitly encoding linguistic fea-
tures. Thus a primary questions is the following:
what linguistic information is captured in neural
networks? When examining answers to this ques-
tion, it is convenient to consider three dimensions:
which methods are used for conducting the analy-
sis, what kind of linguistic information is sought,
and which objects in the neural network are be-
ing investigated. Table SM1 (in the supplementary
materials) categorizes relevant analysis work ac-
cording to these criteria. In the next sub-sections,
we discuss trends in analysis work along these
lines, followed by a discussion of limitations of
current approaches.

http://boknilev.github.io/nlp-analysis-methods
http://boknilev.github.io/nlp-analysis-methods

2.1 Methods

The most common approach for associating neu-
ral network components with linguistic properties
is to predict such properties from activations of
the neural network. Typically, in this approach
a neural network model is trained on some task
(say, MT) and its weights are frozen. Then, the
trained model is used for generating feature repre-
sentations for another task by running it on a cor-
pus with linguistic annotations and recording the
representations (say, hidden state activations). An-
other classifier is then used for predicting the prop-
erty of interest (say, part-of-speech (POS) tags).
The performance of this classifier is used for eval-
uating the quality of the generated representations,
and by proxy that of the original model. This kind
of approach has been used in numerous papers in
recent years; see Table SM1 for references.5 It is
referred to by various names, including “auxiliary
prediction tasks” (Adi et al., 2017b), “diagnostic
classifiers” (Veldhoen et al., 2016), and “probing
tasks” (Conneau et al., 2018).

As an example of this approach, let us
walk through an application to analyzing syn-
tax in neural machine translation (NMT) by
Shi et al. (2016b). In this work, two NMT
models were trained on standard parallel data
– English→French and English→German. The
trained models (specifically, the encoders) were
run on an annotated corpus and their hidden states
were used for training a logistic regression clas-
sifier that predicts different syntactic properties.
The authors concluded that the NMT encoders
learn significant syntactic information at both
word-level and sentence-level. They also com-
pared representations at different encoding layers
and found that “local features are somehow pre-
served in the lower layer whereas more global,
abstract information tends to be stored in the up-
per layer.” These results demonstrate the kind of
insights that the classification analysis may lead
to, especially when comparing different models or
model components.

Other methods for finding correspondences be-
tween parts of the neural network and certain
properties include counting how often attention
weights agree with a linguistic property like
anaphora resolution (Voita et al., 2018) or directly

5A similar method has been used to analyze hierarchi-
cal structure in neural networks trained on arithmetic expres-
sions (Veldhoen et al., 2016; Hupkes et al., 2018).

computing correlations between neural network
activations and some property, for example, cor-
relating RNN state activations with depth in a
syntactic tree (Qian et al., 2016a) or with Mel-
frequency cepstral coefficient (MFCC) acoustic
features (Wu and King, 2016). Such correspon-
dence may also be computed indirectly. For in-
stance, Alishahi et al. (2017) defined an ABX dis-
crimination task to evaluate how a neural model of
speech (grounded in vision) encoded phonology.
Given phoneme representations from different lay-
ers in their model, and three phonemes, A, B, and
X, they compared whether the model representa-
tion for X is closer to A or B. This discrimina-
tion task enabled them to draw conclusions about
which layers encoder phonology better, observing
that lower layers generally encode more phonolog-
ical information.

2.2 Linguistic phenomena

Different kinds of linguistic information have been
analyzed, ranging from basic properties like sen-
tence length, word position, word presence, or
simple word order, to morphological, syntactic,
and semantic information. Phonetic/phonemic in-
formation, speaker information, and style and ac-
cent information have been studied in neural net-
work models for speech, or in joint audio-visual
models. See Table SM1 for references.

While it is difficult to synthesize a holistic pic-
ture from this diverse body of work, it appears
that neural networks are able to learn a substan-
tial amount of information on various linguistic
phenomena. These models are especially success-
ful at capturing frequent properties, while some
rare properties are more difficult to learn. Linzen
et al. (2016), for instance, found that long short-
term memory (LSTM) language models are able
to capture subject-verb agreement in many com-
mon cases, while direct supervision is required for
solving harder cases.

Another theme that emerges in several studies
is the hierarchical nature of the learned represen-
tations. We have already mentioned such findings
regarding NMT (Shi et al., 2016b) and a visually
grounded speech model (Alishahi et al., 2017).
Hierarchical representations of syntax were also
reported to emerge in other RNN models (Blevins
et al., 2018).

Finally, a couple of papers discovered that mod-
els trained with latent trees perform better on nat-

ural language inference (NLI) (Williams et al.,
2018; Maillard and Clark, 2018) than ones trained
with linguistically-annotated trees. Moreover, the
trees in these models do not resemble syntactic
trees corresponding to known linguistic theories,
which casts doubts on the importance of syntax-
learning in the underlying neural network.6

2.3 Neural network components
In terms of the object of study, various neural neu-
ral network components were investigated, includ-
ing word embeddings, RNN hidden states or gate
activations, sentence embeddings, and attention
weights in sequence-to-sequence (seq2seq) mod-
els. Generally less work has analyzed convolu-
tional neural networks (CNNs) in NLP, but see
Jacovi et al. (2018) for a recent exception. In
speech processing, researchers have analyzed lay-
ers in deep neural networks for speech recognition
and different speaker embeddings. Some analy-
sis has also been devoted to joint language-vision
or audio-vision models, or to similarities between
word embeddings and convolutional image rep-
resentations. Table SM1 provides detailed refer-
ences.

2.4 Limitations
The classification approach may find that a cer-
tain amount of linguistic information is captured
in the neural network. However, this does not
necessarily mean that the information is used by
the network. For example, Vanmassenhove et al.
(2017) investigated aspect in NMT (and in phrase-
based statistical MT). They trained a classifier on
NMT sentence encoding vectors and found that
they can accurately predict tense about 90% of the
time. However, when evaluating the output trans-
lations, they found them to have the correct tense
only 79% of the time. They interpreted this re-
sult to mean that “part of the aspectual informa-
tion is lost during decoding”. Relatedly, Cífka and
Bojar (2018) compared the performance of vari-
ous NMT models in terms of translation quality
(BLEU) and representation quality (classification
tasks). They found a negative correlation between
the two, suggesting that high-quality systems may
not be learning certain sentence meanings. In con-
trast, Artetxe et al. (2018) showed that word em-
beddings contain divergent linguistic information,

6Others found that even simple binary trees may work
well in MT (Wang et al., 2018b) and sentence classifica-
tion (Chen et al., 2015).

which can be uncovered by applying a linear trans-
formation on the learned embeddings. Their re-
sults suggest an alternative explanation, showing
that “embedding models are able to encode diver-
gent linguistic information but have limits on how
this information is surfaced.”

From a methodological point of view, most of
the relevant analysis work is concerned with cor-
relation: how correlated are neural network com-
ponents with linguistic properties? What may be
lacking is a measure of causation: how does the
encoding of linguistic properties affect the sys-
tem output. Giulianelli et al. (2018) make some
headway on this question. They predicted number
agreement from RNN hidden states and gates at
different time steps. They then intervened in how
the model processes the sentence by changing a
hidden activation based on the difference between
the prediction and the correct label. This improved
agreement prediction accuracy, and the effect per-
sisted over the course of the sentence, indicating
that this information has an effect on the model.
However, they did not report the effect on overall
model quality, for example by measuring perplex-
ity. Methods from causal inference may shed new
light on some of these questions.

Finally, the predictor for the auxiliary task is
usually a simple classifier, such as logistic re-
gression. A few studies compared different clas-
sifiers and found that deeper classifiers lead to
overall better results, but do not alter the respec-
tive trends when comparing different models or
components (Qian et al., 2016b; Belinkov, 2018).
Interestingly, Conneau et al. (2018) found that
tasks requiring more nuanced linguistic knowl-
edge (e.g., tree depth, coordination inversion) gain
the most from using a deeper classifier. However,
the approach is usually taken for granted; given
its prevalence, it appears that better theoretical or
empirical foundations are in place.

3 Visualization

Visualization is a valuable tool for analyzing neu-
ral networks in the language domain and beyond.
Early work visualized hidden unit activations in
RNNs trained on an artificial language modeling
task, and observed how they correspond to certain
grammatical relations such as agreement (Elman,
1991). Much recent work has focused on visu-
alizing activations on specific examples in mod-
ern neural networks for language (Karpathy et al.,

Figure 1: A heatmap visualizing neuron activa-
tions. In this case, the activations capture position
in the sentence.

2015; Kádár et al., 2017; Qian et al., 2016a; Liu
et al., 2018) and speech (Wu and King, 2016;
Nagamine et al., 2015; Wang et al., 2017b). Fig-
ure 1 shows an example visualization of a neuron
that captures position of words in a sentence. The
heatmap uses blue and red colors for negative and
positive activation values, respectively, enabling
the user to quickly grasp the function of this neu-
ron.

The attention mechanism that originated in
work on NMT (Bahdanau et al., 2014) also lends
itself to a natural visualization. The alignments
obtained via different attention mechanisms have
produced visualizations ranging from tasks like
NLI (Rocktäschel et al., 2016; Yin et al., 2016),
summarization (Rush et al., 2015), MT post-
editing (Jauregi Unanue et al., 2018), and morpho-
logical inflection (Aharoni and Goldberg, 2017),
to matching users on social media (Tay et al.,
2018). Figure 2 reproduces a visualization of
attention alignments from the original work by
Bahdanau et al.. Here grayscale values corre-
spond to the weight of the attention between words
in an English source sentence (columns) and its
French translation (rows). As Bahdanau et al.
explain, this visualization demonstrates that the
NMT model learned a soft alignment between
source and target words. Some aspects of word
order may also be noticed, as in the reordering
of noun and adjective when translating the phrase
“European Economic Area”.

Another line of work computes various saliency
measures to attribute predictions to input features.
The important or salient features can then be vi-
sualized in selected examples (Li et al., 2016a;
Aubakirova and Bansal, 2016; Sundararajan et al.,
2017; Arras et al., 2017a,b; Ding et al., 2017; Mur-
doch et al., 2018; Mudrakarta et al., 2018; Mon-
tavon et al., 2018; Godin et al., 2018). Saliency
can also be computed with respect to intermediate
values, rather than input features (Ghaeini et al.,
2018).7

7Generally, many of the visualization methods are
adapted from the vision domain, where they have been ex-
tremely popular; see Zhang and Zhu (2018) for a survey.

Figure 2: A visualization of attention weights,
showing soft alignment between source and target
sentences in an NMT model. Reproduced from
Bahdanau et al. (2014), with permission.

An instructive visualization technique is to clus-
ter neural network activations and compare them
to some linguistic property. Early work clustered
RNN activations, showing that they organize in
lexical categories (Elman, 1989, 1990). Similar
techniques have been followed by others. Re-
cent examples include clustering of sentence em-
beddings in an RNN encoder trained in a multi-
task learning scenario (Brunner et al., 2017), and
phoneme clusters in a joint audio-visual RNN
model (Alishahi et al., 2017).

A few online tools for visualizing neu-
ral networks have recently become available.
LSTMVis (Strobelt et al., 2018b) visualizes RNN
activations, focusing on tracing hidden state dy-
namics.8 Seq2Seq-Vis (Strobelt et al., 2018a)
visualizes different modules in attention-based
seq2seq models, with the goal of examining model
decisions and testing alternative decisions. An-
other tool focused on comparing attention align-
ments was proposed by Rikters (2018). It also pro-
vides translation confidence scores based on the
distribution of attention weights. NeuroX (Dalvi
et al., 2019b) is a tool for finding and analyzing
individual neurons, focusing on machine transla-
tion.

Evaluation As in much work on interpretability,
evaluating visualization quality is difficult and of-
ten limited to qualitative examples. A few notable

8RNNVis (Ming et al., 2017) is a similar tool, but its on-
line demo does not seem to be available at the time of writing.

exceptions report human evaluations of visualiza-
tion quality. Singh et al. (2018) showed humans
hierarchical clusterings of input words generated
by two interpretation methods, and asked them
to evaluate which method is more accurate, or in
which method they trust more. Others reported
human evaluations for attention visualization in
conversation modeling (Freeman et al., 2018) and
medical code prediction tasks (Mullenbach et al.,
2018).

The availability of open-source tools of the sort
described above will hopefully encourage users to
utilize visualization in their regular research and
development cycle. However, it remains to be seen
how useful visualizations turn out to be.

4 Challenge sets

The majority of benchmark datasets in NLP are
drawn from text corpora, reflecting a natural
frequency distribution of language phenomena.
While useful in practice for evaluating system
performance in the average case, such datasets
may fail to capture a wide range of phenomena.
An alternative evaluation framework consists of
challenge sets, also known as test suites, which
have been used in NLP for a long time (Lehmann
et al., 1996), especially for evaluating MT sys-
tems (King and Falkedal, 1990; Isahara, 1995;
Koh et al., 2001). Lehmann et al. (1996) noted
several key properties of test suites: systematicity,
control over data, inclusion of negative data, and
exhaustivity. They contrasted such datasets with
test corpora, “whose main advantage is that they
reflect naturally occurring data.” This idea under-
lines much of the work on challenge sets and is
echoed in more recent work (Wang et al., 2018a).
For instance, Cooper et al. (1996) constructed a se-
mantic test suite that targets phenomena as diverse
as quantifiers, plurals, anaphora, ellipsis, adjecti-
val properties, and so on.

After a hiatus of a couple of decades,9 challenge
sets have recently gained renewed popularity in
the NLP community. In this section, we include
datasets used for evaluating neural network mod-
els that diverge from the common average-case
evaluation. Many of them share some of the prop-
erties noted by Lehmann et al. (1996), although
negative examples (ill-formed data) are typically

9One could speculate that their decrease in popularity can
be attributed to the rise of large-scale quantitative evaluation
of statistical NLP systems.

less utilized. The challenge datasets can be cate-
gorized along the following criteria: the task they
seek to evaluate, the linguistic phenomena they
aim to study, the language(s) they target, their
size, their method of construction, and how perfor-
mance is evaluated.10 Table SM2 (in the supple-
mentary materials) categorizes many recent chal-
lenge sets along these criteria. Below we discuss
common trends along these lines.

4.1 Task

By far, the most targeted tasks in challenge sets
are NLI and MT. This can partly be explained by
the popularity of these tasks and the prevalence
of neural models proposed for solving them. Per-
haps more importantly, tasks like NLI and MT ar-
guably require inferences at various linguistic lev-
els, making the challenge set evaluation especially
attractive. Still, other high-level tasks like read-
ing comprehension or question answering have not
received as much attention, and may also benefit
from the careful construction of challenge sets.

A significant body of work aims to evaluate
the quality of embedding models by correlating
the similarity they induce on word or sentence
pairs with human similarity judgments. Datasets
containing such similarity scores are often used
to evaluate word embeddings (Finkelstein et al.,
2002; Bruni et al., 2012; Hill et al., 2015, in-
ter alia) or sentence embeddings; see the many
shared tasks on semantic textual similarity in Se-
mEval (Cer et al., 2017, and previous editions).
Many of these datasets evaluate similarity at a
coarse-grained level, but some provide a more
fine-grained evaluation of similarity or related-
ness. For example, some datasets are dedicated
for specific word classes such as verbs (Gerz et al.,
2016) or rare words (Luong et al., 2013), or for
evaluating compositional knowledge in sentence
embeddings (Marelli et al., 2014). Multilingual
and cross-lingual versions have also been col-
lected (Leviant and Reichart, 2015; Cer et al.,
2017). Although these datasets are widely used,
this kind of evaluation has been criticized for
its subjectivity and questionable correlation with
downstream performance (Faruqui et al., 2016).

10Another typology of evaluation protocols was put forth
by Burlot and Yvon (2017). Their criteria are partially over-
lapping with ours, although they did not provide a compre-
hensive categorization as the one compiled here.

4.2 Linguistic phenomena

One of the primary goals of challenge sets is to
evaluate models on their ability to handle spe-
cific linguistic phenomena. While earlier stud-
ies emphasized exhaustivity (Cooper et al., 1996;
Lehmann et al., 1996), recent ones tend to fo-
cus on a few properties of interest. For exam-
ple, Sennrich (2017) introduced a challenge set for
MT evaluation focusing on 5 properties: subject-
verb agreement, noun phrase agreement, verb-
particle constructions, polarity, and transliteration.
Slightly more elaborated is an MT challenge set
for morphology, including 14 morphological prop-
erties (Burlot and Yvon, 2017). See Table SM2 for
references to datasets targeting other phenomena.

Other challenge sets cover a more diverse range
of linguistic properties, in the spirit of some of
the earlier work. For instance, extending the cat-
egories in Cooper et al. (1996), the GLUE anal-
ysis set for NLI covers more than 30 phenom-
ena in four coarse categories (lexical semantics,
predicate-argument structure, logic, and knowl-
edge). In MT evaluation, Burchardt et al. (2017)
reported results using a large test suite cover-
ing 120 phenomena, partly based on Lehmann
et al. (1996).11 Isabelle et al. (2017) and Is-
abelle and Kuhn (2018) prepared challenge sets
for MT evaluation covering fine-grained phenom-
ena at morpho-syntactic, syntactic, and lexical lev-
els.

Generally, datasets that are constructed pro-
grammatically tend to cover less fine-grained lin-
guistic properties, while manually constructed
datasets represent more diverse phenomena.

4.3 Languages

As unfortunately usual in much NLP work, espe-
cially neural NLP, the vast majority of challenge
sets are in English. This situation is slightly better
in MT evaluation, where naturally all datasets fea-
ture other languages (see Table SM2). A notable
exception is the work by Gulordava et al. (2018),
who constructed examples for evaluating number
agreement in language modeling in English, Rus-
sian, Hebrew, and Italian. Clearly, there is room
for more challenge sets in non-English languages.
However, perhaps more pressing is the need for
large-scale non-English datasets (besides MT) to
develop neural models for popular NLP tasks.

11Their dataset does not seem to be available yet, but more
details are promised to appear in a future publication.

4.4 Scale

The size of proposed challenge sets varies greatly
(Table SM2). As expected, datasets constructed
by hand are smaller, with typical sizes in the
hundreds. Automatically-built datasets are much
larger, ranging from several thousands to close to a
hundred thousand (Sennrich, 2017), or even more
than one million examples (Linzen et al., 2016).
In the latter case, the authors argue that such a
large test set is needed for obtaining a sufficient
representation of rare cases. A few manually-
constructed datasets contain a fairly large number
of examples, up to 10K (Burchardt et al., 2017).

4.5 Construction method

Challenge sets are usually created either program-
matically or manually, by hand-crafting specific
examples. Often, semi-automatic methods are
used to compile an initial list of examples that
is manually verified by annotators. The specific
method also affects the kind of language use and
how natural or artificial/synthetic the examples
are. We describe here some trends in dataset con-
struction methods in the hope that they may be
useful for researchers contemplating new datasets.

Several datasets were constructed by modify-
ing or extracting examples from existing datasets.
For instance, Sanchez et al. (2018) and Glockner
et al. (2018) extracted examples from SNLI (Bow-
man et al., 2015) and replaced specific words such
as hypernyms, synonyms, and antonyms, followed
by manual verification. Linzen et al. (2016), on
the other hand, extracted examples of subject-verb
agreement from raw texts using heuristics, result-
ing in a large-scale dataset. Gulordava et al. (2018)
extended this to other agreement phenomena, but
they relied on syntactic information available in
treebanks, resulting in a smaller dataset.

Several challenge sets utilize existing test suites,
either as a direct source of examples (Burchardt
et al., 2017) or for searching similar naturally oc-
curring examples (Wang et al., 2018a).12

Sennrich (2017) introduced a method for eval-
uating NMT systems via contrastive translation
pairs, where the system is asked to estimate the
probability of two candidate translations that are
designed to reflect specific linguistic properties.
Sennrich generated such pairs programmatically

12Wang et al. (2018a) also verified that their examples do
not contain annotation artifacts, a potential problem noted in
recent studies (Gururangan et al., 2018; Poliak et al., 2018b).

by applying simple heuristics, such as changing
gender and number to induce agreement errors, re-
sulting in a large-scale challenge set of close to
100K examples. This framework was extended
to evaluate other properties, but often requir-
ing more sophisticated generation methods like
using morphological analyzers/generators (Burlot
and Yvon, 2017) or more manual involvement
in generation (Bawden et al., 2018) or verifica-
tion (Rios Gonzales et al., 2017).

Finally, a few of studies define templates
that capture certain linguistic properties and in-
stantiate them with word lists (Dasgupta et al.,
2018; Rudinger et al., 2018; Zhao et al., 2018a).
Template-based generation has the advantage of
providing more control, for example for obtaining
a specific vocabulary distribution, but this comes
at the expense of how natural the examples are.

4.6 Evaluation

Systems are typically evaluated by their perfor-
mance on the challenge set examples, either with
the same metric used for evaluating the system in
the first place, or via a proxy, as in the contrastive
pairs evaluation of Sennrich (2017). Automatic
evaluation metrics are cheap to obtain and can be
calculated on a large scale. However, they may
miss certain aspects. Thus a few studies report hu-
man evaluation on their challenge sets, such as in
MT (Isabelle et al., 2017; Burchardt et al., 2017).

We note here also that judging the quality of a
model by its performance on a challenge set can
be tricky. Some authors emphasize their wish to
test systems on extreme or difficult cases, “beyond
normal operational capacity” (Naik et al., 2018).
However, whether or not one should expect sys-
tems to perform well on specially chosen cases (as
opposed to the average case) may depend on one’s
goals. To put results in perspective, one may com-
pare model performance to human performance on
the same task (Gulordava et al., 2018).

5 Adversarial examples

Understanding a model requires also an under-
standing of its failures. Despite their success in
many tasks, machine learning systems can also be
very sensitive to malicious attacks or adversarial
examples (Szegedy et al., 2014; Goodfellow et al.,
2015). In the vision domain, small changes to the
input image can lead to misclassification, even if
such changes are indistinguishable by humans.

The basic setup in work on adversarial examples
can be described as follows.13 Given a neural net-
work model f and an input example x, we seek to
generate an adversarial example x′ that will have
a minimal distance from x, while being assigned a
different label by f :

min
x′
||x− x′||

s.t. f(x) = l, f(x′) = l′, l 6= l′

In the vision domain, x can be the input image pix-
els, resulting in a fairly intuitive interpretation of
this optimization problem: measuring the distance
||x− x′|| is straightforward, and finding x′ can be
done by computing gradients with respect to the
input, since all quantities are continuous.

In the text domain, the input is discrete (for ex-
ample, a sequence of words), which poses two
problems. First, it is not clear how to measure the
distance between the original and adversarial ex-
amples, x and x′, which are two discrete objects
(say, two words or sentences). Second, minimiz-
ing this distance cannot be easily formulated as an
optimization problem, as this requires computing
gradients with respect to a discrete input.

In the following, we review methods for han-
dling these difficulties according to several cri-
teria: the adversary’s knowledge, the specificity
of the attack, the linguistic unit being modified,
and the task on which the attacked model was
trained.14 Table SM3 (in the supplementary ma-
terials) categorizes work on adversarial examples
in NLP according to these criteria.

5.1 Adversary’s knowledge
Adversarial examples can be generated using ac-
cess to model parameters, also known as white-
box attacks, or without such access, with black-
box attacks (Papernot et al., 2016a, 2017; Narodyt-
ska and Kasiviswanathan, 2017; Liu et al., 2017).

White-box attacks are difficult to adapt to the
text world as they typically require computing gra-
dients with respect to the input, which would be
discrete in the text case. One option is to com-
pute gradients with respect to the input word em-
beddings, and perturb the embeddings. Since this
may result in a vector that does not correspond to

13The notation here follows Yuan et al. (2017).
14These criteria are partly taken from Yuan et al. (2017),

where a more elaborate taxonomy is laid out. At present,
though, the work on adversarial examples in NLP is more
limited than in computer vision, so our criteria will suffice.

any word, one could search for the closest word
embedding in a given dictionary (Papernot et al.,
2016b); Cheng et al. (2018) extended this idea to
seq2seq models. Others computed gradients with
respect to input word embeddings to identify and
rank words to be modified (Samanta and Mehta,
2017; Liang et al., 2018). Ebrahimi et al. (2018b)
developed an alternative method by representing
text edit operations in vector space (e.g., a bi-
nary vector specifying which characters in a word
would be changed) and approximating the change
in loss with the derivative along this vector.

Given the difficulty in generating white-box ad-
versarial examples for text, much research has
been devoted to black-box examples. Often, the
adversarial examples are inspired by text edits that
are thought to be natural or commonly generated
by humans, such as typos, misspellings, and so
on (Sakaguchi et al., 2017; Heigold et al., 2018;
Belinkov and Bisk, 2018). Gao et al. (2018) de-
fined scoring functions to identify tokens to mod-
ify. Their functions do not require access to model
internals, but they do require the model prediction
score. After identifying the important tokens, they
modify characters with common edit operations.

Zhao et al. (2018c) used generative adversar-
ial networks (GANs) (Goodfellow et al., 2014) to
minimize the distance between latent representa-
tions of input and adversarial examples, and per-
formed perturbations in latent space. Since the la-
tent representations do not need to come from the
attacked model, this is a black-box attack.

Finally, Alzantot et al. (2018) developed an in-
teresting population-based genetic algorithm for
crafting adversarial examples for text classifica-
tion, by maintaining a population of modifications
of the original sentence and evaluating fitness of
modifications at each generation. They do not re-
quire access to model parameters, but do use pre-
diction scores. A similar idea was proposed by
Kuleshov et al. (2018).

5.2 Attack specificity

Adversarial attacks can be classified to targeted
vs. non-targeted attacks (Yuan et al., 2017). A
targeted attack specifies a specific false class, l′,
while a non-targeted attack only cares that the pre-
dicted class is wrong, l′ 6= l. Targeted attacks
are more difficult to generate, as they typically re-
quire knowledge of model parameters, i.e., they
are white-box attacks. This might explain why

the majority of adversarial examples in NLP are
non-targeted (see Table SM3). A few targeted at-
tacks include Liang et al. (2018), which specified
a desired class to fool a text classifier, and Chen
et al. (2018a), which specified words or captions
to generate in an image captioning model. Oth-
ers targeted specific words to omit, replace, or
include when attacking seq2seq models (Cheng
et al., 2018; Ebrahimi et al., 2018a).

Methods for generating targeted attacks in NLP
could possibly take more inspiration from adver-
sarial attacks in other fields. For instance, in at-
tacking malware detection systems, several stud-
ies developed targeted attacks in a black-box sce-
nario (Yuan et al., 2017). A black-box targeted at-
tack for MT was proposed by Zhao et al. (2018c),
who used GANs to search for attacks on Google’s
MT system after mapping sentences into contin-
uous space with adversarially regularized autoen-
coders (Zhao et al., 2018b).

5.3 Linguistic unit
Most of the work on adversarial text examples
involves modifications at the character- and/or
word-level; see Table SM3 for specific references.
Other transformations include adding sentences
or text chunks (Jia and Liang, 2017) or gen-
erating paraphrases with desired syntactic struc-
tures (Iyyer et al., 2018). In image captioning,
Chen et al. (2018a) modified pixes in the input im-
age to generate targeted attacks on the caption text.

5.4 Task
Generally, most work on adversarial examples
in NLP concentrates on relatively high-level lan-
guage understanding tasks, such as text classifi-
cation (including sentiment analysis) and reading
comprehension, while work on text generation fo-
cuses mainly on MT. See Table SM3 for refer-
ences. There is relatively little work on adversar-
ial examples for more low-level language process-
ing tasks, although one can mention morphologi-
cal tagging (Heigold et al., 2018) and spelling cor-
rection (Sakaguchi et al., 2017).

5.5 Coherence & perturbation measurement
In adversarial image examples, it is fairly straight-
forward to measure the perturbation, either by
measuring distance in pixel space, say ||x − x′||
under some norm, or with alternative measures
that are better correlated with human percep-
tion (Rozsa et al., 2016). It is also visually com-

pelling to present an adversarial image with imper-
ceptible difference from its source image. In the
text domain, measuring distance is not as straight-
forward and even small changes to the text may
be perceptible by humans. Thus, evaluation of at-
tacks is fairly tricky. Some studies imposed con-
straints on adversarial examples to have a small
number of edit operations (Gao et al., 2018). Oth-
ers ensured syntactic or semantic coherence in
different ways, such as filtering replacements by
word similarity or sentence similarity (Alzantot
et al., 2018; Kuleshov et al., 2018), or by us-
ing synonyms and other word lists (Samanta and
Mehta, 2017; Yang et al., 2018).

Some reported whether a human can classify
the adversarial example correctly (Yang et al.,
2018), but this does not indicate how perceptible
the changes are. More informative human studies
evaluate grammaticality or similarity of the adver-
sarial examples to the original ones (Zhao et al.,
2018c; Alzantot et al., 2018). Given the inherent
difficulty in generating imperceptible changes in
text, more such evaluations are needed.

6 Explaining predictions

Explaining specific predictions is recognized as
a desideratum in intereptability work (Lipton,
2016), argued to increase the accountability of ma-
chine learning systems (Doshi-Velez et al., 2017).
However, explaining why a deep, highly non-
linear neural network makes a certain prediction
is not trivial. One solution is to ask the model to
generate explanations along with its primary pre-
diction (Zaidan et al., 2007; Zhang et al., 2016),15

but this approach requires manual annotations of
explanations, which may be hard to collect.

An alternative approach is to use parts of the
input as explanations. For example, Lei et al.
(2016) defined a generator that learns a distribu-
tion over text fragments as candidate rationales
for justifying predictions, evaluated on sentiment
analysis. Alvarez-Melis and Jaakkola (2017) dis-
covered input-output associations in a sequence-
to-sequence learning scenario, by perturbing the
input and finding the most relevant associations.
Gupta and Schütze (2018) inspected how informa-
tion is accumulated in RNNs towards a prediction,
and associated peaks in prediction scores with im-
portant input segments. As these methods use in-

15Other work considered learning textual-visual explana-
tions from multi-modal annotations (Park et al., 2018).

put segments to explain predictions, they do not
shed much light on the internal computations that
take place in the network.

At present, despite the recognized importance
for interpretability, our ability to explain predic-
tions of neural networks in NLP is still limited.

7 Other methods

We briefly mention here several analysis methods
that do not fall neatly into the previous sections.

A number of studies evaluated the effect of eras-
ing or masking certain neural network compo-
nents, such as word embedding dimensions, hid-
den units, or even full words (Li et al., 2016b;
Feng et al., 2018; Khandelwal et al., 2018; Bau
et al., 2018). For example, Li et al. (2016b) erased
specific dimensions in word embeddings or hid-
den states and computed the change in proba-
bility assigned to different labels. Their exper-
iments revealed interesting differences between
word embedding models, where in some models
information is more focused in individual dimen-
sions. They also found that information is more
distributed in hidden layers than in the input layer,
and erased entire words to find important words in
a sentiment analysis task.

Several studies conducted behavioral experi-
ments to interpret word embeddings by defining
intrusion tasks, where humans need to identify
an intruder word, chosen based on difference in
word embedding dimensions (Murphy et al., 2012;
Fyshe et al., 2015; Faruqui et al., 2015).16 In this
kind of work, a word embedding model may be
deemed more interpretable if humans are better
able to identify the intruding words. Since the
evaluation is costly for high-dimensional represen-
tations, alternative automatic metrics were consid-
ered (Park et al., 2017; Senel et al., 2018).

A long tradition in work on neural networks is
to evaluate and analyze their ability to learn dif-
ferent formal languages (Das et al., 1992; Casey,
1996; Gers and Schmidhuber, 2001; Bodén and
Wiles, 2002; Chalup and Blair, 2003). This trend
continues today, with research into modern ar-
chitectures and what formal languages they can
learn (Weiss et al., 2018; Bernardy, 2018; Suzgun
et al., 2019), or the formal properties they pos-
sess (Chen et al., 2018b).

16The methodology follows earlier work on evaluating the
interpretability of probabilistic topic models with intrusion
tasks (Chang et al., 2009).

8 Conclusion

Analyzing neural networks has become a hot topic
in NLP research. This survey attempted to review
and summarize as much of the current research as
possible, while organizing it along several promi-
nent themes. We have emphasized aspects in anal-
ysis that are specific to language – namely, what
linguistic information is captured in neural net-
works, which phenomena they are successful at
capturing, and where they fail. Many of the analy-
sis methods are general techniques from the larger
machine learning community, such as visualiza-
tion via saliency measures, or evaluation by ad-
versarial examples. But even those sometimes re-
quire non-trivial adaptations to work with text in-
put. Some methods are more specific to the field,
but may prove useful in other domains. Challenge
sets or test suites are such a case.

Throughout this survey, we have identified sev-
eral limitations or gaps in current analysis work:

• The use of auxiliary classification tasks for
identifying which linguistic properties neural
networks capture has become standard prac-
tice (Section 2), while lacking both a theoret-
ical foundation and a better empirical consid-
eration of the link between the auxiliary tasks
and the original task.

• Evaluation of analysis work is often lim-
ited or qualitative, especially in visualization
techniques (Section 3). Newer forms of eval-
uation are needed for determining the success
of different methods.

• Relatively little work has been done on ex-
plaining predictions of neural network mod-
els, apart from providing visualizations (Sec-
tion 6). With the increasing public de-
mand for explaining algorithmic choices in
machine learning systems (Doshi-Velez and
Kim, 2017; Doshi-Velez et al., 2017), there is
pressing need for progress in this direction.

• Much of the analysis work is focused on the
English language, especially in constructing
challenge sets for various tasks (Section 4),
with the exception of MT due to its inherent
multilingual character. Developing resources
and evaluating methods on other languages is
important as the field grows and matures.

• More challenge sets for evaluating other tasks
besides NLI and MT are needed.

Finally, as with any survey in a rapidly evolving
field, this paper is likely to omit relevant recent
work by the time of publication. While we in-
tend to continue updating the online appendix with
newer publications, we hope that our summariza-
tion of prominent analysis work and its categoriza-
tion into several themes will be a useful guide for
scholars interested in analyzing and understanding
neural networks for NLP.

Acknowledgments

We would like to thank the anonymous reviewers
and the TACL Action Editor for their very help-
ful comments. This work was supported by the
Qatar Computing Research Institute. Y.B. is also
supported by the Harvard Mind, Brain, Behavior
Initiative.

References

Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer
Lavi, and Yoav Goldberg. 2017a. Analysis of
sentence embedding models using prediction
tasks in natural language processing. IBM Jour-
nal of Research and Development, 61(4):3–9.

Yossi Adi, Einat Kermany, Yonatan Belinkov, Ofer
Lavi, and Yoav Goldberg. 2017b. Fine-grained
Analysis of Sentence Embeddings Using Auxil-
iary Prediction Tasks. In International Confer-
ence on Learning Representations (ICLR).

Roee Aharoni and Yoav Goldberg. 2017. Morpho-
logical Inflection Generation with Hard Mono-
tonic Attention. In Proceedings of the 55th
Annual Meeting of the Association for Compu-
tational Linguistics (Volume 1: Long Papers),
pages 2004–2015. Association for Computa-
tional Linguistics.

Wasi Uddin Ahmad, Xueying Bai, Zhechao
Huang, Chao Jiang, Nanyun Peng, and Kai-Wei
Chang. 2018. Multi-task Learning for Univer-
sal Sentence Embeddings: A Thorough Evalua-
tion using Transfer and Auxiliary Tasks. arXiv
preprint arXiv:1804.07911v2.

Afra Alishahi, Marie Barking, and Grzegorz Chru-
pała. 2017. Encoding of phonology in a re-
current neural model of grounded speech. In
Proceedings of the 21st Conference on Com-
putational Natural Language Learning (CoNLL

https://doi.org/10.18653/v1/P17-1183
https://doi.org/10.18653/v1/P17-1183
https://doi.org/10.18653/v1/P17-1183
https://doi.org/10.18653/v1/K17-1037
https://doi.org/10.18653/v1/K17-1037

2017), pages 368–378. Association for Compu-
tational Linguistics.

David Alvarez-Melis and Tommi Jaakkola. 2017.
A causal framework for explaining the predic-
tions of black-box sequence-to-sequence mod-
els. In Proceedings of the 2017 Conference on
Empirical Methods in Natural Language Pro-
cessing, pages 412–421. Association for Com-
putational Linguistics.

Moustafa Alzantot, Yash Sharma, Ahmed Elgo-
hary, Bo-Jhang Ho, Mani Srivastava, and Kai-
Wei Chang. 2018. Generating Natural Lan-
guage Adversarial Examples. In Proceedings
of the 2018 Conference on Empirical Methods
in Natural Language Processing, pages 2890–
2896. Association for Computational Linguis-
tics.

Leila Arras, Franziska Horn, Grégoire Montavon,
Klaus-Robert Müller, and Wojciech Samek.
2017a. “What is relevant in a text document?”:
An interpretable machine learning approach.
PLOS ONE, 12(8):1–23.

Leila Arras, Grégoire Montavon, Klaus-Robert
Müller, and Wojciech Samek. 2017b. Explain-
ing Recurrent Neural Network Predictions in
Sentiment Analysis. In Proceedings of the
8th Workshop on Computational Approaches to
Subjectivity, Sentiment and Social Media Anal-
ysis, pages 159–168. Association for Computa-
tional Linguistics.

Mikel Artetxe, Gorka Labaka, Inigo Lopez-
Gazpio, and Eneko Agirre. 2018. Uncover-
ing Divergent Linguistic Information in Word
Embeddings with Lessons for Intrinsic and Ex-
trinsic Evaluation. In Proceedings of the 22nd
Conference on Computational Natural Lan-
guage Learning, pages 282–291. Association
for Computational Linguistics.

Malika Aubakirova and Mohit Bansal. 2016. In-
terpreting Neural Networks to Improve Polite-
ness Comprehension. In Proceedings of the
2016 Conference on Empirical Methods in Nat-
ural Language Processing, pages 2035–2041.
Association for Computational Linguistics.

Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua
Bengio. 2014. Neural Machine Translation by
Jointly Learning to Align and Translate. arXiv
preprint arXiv:1409.0473v7.

Anthony Bau, Yonatan Belinkov, Hassan Sajjad,
Nadir Durrani, Fahim Dalvi, and James Glass.
2018. Identifying and Controlling Important
Neurons in Neural Machine Translation. arXiv
preprint arXiv:1811.01157v1.

Rachel Bawden, Rico Sennrich, Alexandra Birch,
and Barry Haddow. 2018. Evaluating Dis-
course Phenomena in Neural Machine Transla-
tion. In Proceedings of the 2018 Conference
of the North American Chapter of the Associ-
ation for Computational Linguistics: Human
Language Technologies, Volume 1 (Long Pa-
pers), pages 1304–1313. Association for Com-
putational Linguistics.

Yonatan Belinkov. 2018. On Internal Language
Representations in Deep Learning: An Analy-
sis of Machine Translation and Speech Recog-
nition. Ph.D. thesis, Massachusetts Institute of
Technology.

Yonatan Belinkov and Yonatan Bisk. 2018. Syn-
thetic and Natural Noise Both Break Neural
Machine Translation. In International Confer-
ence on Learning Representations (ICLR).

Yonatan Belinkov, Nadir Durrani, Fahim Dalvi,
Hassan Sajjad, and James Glass. 2017a. What
do Neural Machine Translation Models Learn
about Morphology? In Proceedings of the 55th
Annual Meeting of the Association for Compu-
tational Linguistics (Volume 1: Long Papers),
pages 861–872. Association for Computational
Linguistics.

Yonatan Belinkov and James Glass. 2017. Ana-
lyzing Hidden Representations in End-to-End
Automatic Speech Recognition Systems. In
I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach,
R. Fergus, S. Vishwanathan, and R. Garnett, ed-
itors, Advances in Neural Information Process-
ing Systems 30, pages 2441–2451. Curran As-
sociates, Inc.

Yonatan Belinkov, Lluís Màrquez, Hassan Sajjad,
Nadir Durrani, Fahim Dalvi, and James Glass.
2017b. Evaluating Layers of Representation in
Neural Machine Translation on Part-of-Speech
and Semantic Tagging Tasks. In Proceedings
of the Eighth International Joint Conference on
Natural Language Processing (Volume 1: Long
Papers), pages 1–10. Asian Federation of Natu-
ral Language Processing.

http://aclweb.org/anthology/D17-1042
http://aclweb.org/anthology/D17-1042
http://aclweb.org/anthology/D17-1042
http://aclweb.org/anthology/D18-1316
http://aclweb.org/anthology/D18-1316
https://doi.org/10.1371/journal.pone.0181142
https://doi.org/10.1371/journal.pone.0181142
http://aclweb.org/anthology/W17-5221
http://aclweb.org/anthology/W17-5221
http://aclweb.org/anthology/W17-5221
http://aclweb.org/anthology/K18-1028
http://aclweb.org/anthology/K18-1028
http://aclweb.org/anthology/K18-1028
http://aclweb.org/anthology/K18-1028
https://doi.org/10.18653/v1/D16-1216
https://doi.org/10.18653/v1/D16-1216
https://doi.org/10.18653/v1/D16-1216
http://aclweb.org/anthology/N18-1118
http://aclweb.org/anthology/N18-1118
http://aclweb.org/anthology/N18-1118
https://doi.org/10.18653/v1/P17-1080
https://doi.org/10.18653/v1/P17-1080
https://doi.org/10.18653/v1/P17-1080
http://papers.nips.cc/paper/6838-analyzing-hidden-representations-in-end-to-end-automatic-speech-recognition-systems.pdf
http://papers.nips.cc/paper/6838-analyzing-hidden-representations-in-end-to-end-automatic-speech-recognition-systems.pdf
http://papers.nips.cc/paper/6838-analyzing-hidden-representations-in-end-to-end-automatic-speech-recognition-systems.pdf
http://aclweb.org/anthology/I17-1001
http://aclweb.org/anthology/I17-1001
http://aclweb.org/anthology/I17-1001

Jean-Philippe Bernardy. 2018. Can Recurrent
Neural Networks Learn Nested Recursion?
LiLT (Linguistic Issues in Language Technol-
ogy), 16(1).

Arianna Bisazza and Clara Tump. 2018. The Lazy
Encoder: A Fine-Grained Analysis of the Role
of Morphology in Neural Machine Translation.
In Proceedings of the 2018 Conference on Em-
pirical Methods in Natural Language Process-
ing, pages 2871–2876. Association for Compu-
tational Linguistics.

Terra Blevins, Omer Levy, and Luke Zettlemoyer.
2018. Deep RNNs Encode Soft Hierarchical
Syntax. In Proceedings of the 56th Annual
Meeting of the Association for Computational
Linguistics (Volume 2: Short Papers), pages 14–
19. Association for Computational Linguistics.

Mikael Bodén and Janet Wiles. 2002. On
learning context-free and context-sensitive lan-
guages. IEEE Transactions on Neural Net-
works, 13(2):491–493.

Samuel R. Bowman, Gabor Angeli, Christopher
Potts, and Christopher D. Manning. 2015. A
large annotated corpus for learning natural lan-
guage inference. In Proceedings of the 2015
Conference on Empirical Methods in Natural
Language Processing, pages 632–642. Associ-
ation for Computational Linguistics.

Elia Bruni, Gemma Boleda, Marco Baroni, and
Nam Khanh Tran. 2012. Distributional Seman-
tics in Technicolor. In Proceedings of the 50th
Annual Meeting of the Association for Compu-
tational Linguistics (Volume 1: Long Papers),
pages 136–145. Association for Computational
Linguistics.

Gino Brunner, Yuyi Wang, Roger Wattenhofer,
and Michael Weigelt. 2017. Natural Language
Multitasking: Analyzing and Improving Syn-
tactic Saliency of Hidden Representations. The
31st Annual Conference on Neural Information
Processing (NIPS) – Workshop on Learning Dis-
entangled Features: from Perception to Control.

Aljoscha Burchardt, Vivien Macketanz, Jon De-
hdari, Georg Heigold, Jan-Thorsten Peter, and
Philip Williams. 2017. A Linguistic Evaluation
of Rule-Based, Phrase-Based, and Neural MT
Engines. The Prague Bulletin of Mathematical
Linguistics, 108(1):159–170.

Franck Burlot and François Yvon. 2017. Evaluat-
ing the morphological competence of Machine
Translation Systems. In Proceedings of the Sec-
ond Conference on Machine Translation, pages
43–55. Association for Computational Linguis-
tics.

Mike Casey. 1996. The Dynamics of Discrete-
Time Computation, with Application to Recur-
rent Neural Networks and Finite State Machine
Extraction. Neural computation, 8(6):1135–
1178.

Daniel Cer, Mona Diab, Eneko Agirre, Inigo
Lopez-Gazpio, and Lucia Specia. 2017.
SemEval-2017 Task 1: Semantic Textual Sim-
ilarity Multilingual and Crosslingual Focused
Evaluation. In Proceedings of the 11th Inter-
national Workshop on Semantic Evaluation
(SemEval-2017), pages 1–14. Association for
Computational Linguistics.

Rahma Chaabouni, Ewan Dunbar, Neil Zeghi-
dour, and Emmanuel Dupoux. 2017. Learning
weakly supervised multimodal phoneme em-
beddings. In Interspeech 2017.

Stephan K. Chalup and Alan D. Blair. 2003. Incre-
mental Training of First Order Recurrent Neu-
ral Networks to Predict a Context-sensitive Lan-
guage. Neural Networks, 16(7):955–972.

Jonathan Chang, Sean Gerrish, Chong Wang, Jor-
dan L. Boyd-graber, and David M. Blei. 2009.
Reading Tea Leaves: How Humans Interpret
Topic Models. In Y. Bengio, D. Schuurmans,
J. D. Lafferty, C. K. I. Williams, and A. Cu-
lotta, editors, Advances in Neural Information
Processing Systems 22, pages 288–296. Curran
Associates, Inc.

Hongge Chen, Huan Zhang, Pin-Yu Chen, Jinfeng
Yi, and Cho-Jui Hsieh. 2018a. Attacking visual
language grounding with adversarial examples:
A case study on neural image captioning. In
Proceedings of the 56th Annual Meeting of the
Association for Computational Linguistics (Vol-
ume 1: Long Papers), pages 2587–2597. Asso-
ciation for Computational Linguistics.

Xinchi Chen, Xipeng Qiu, Chenxi Zhu, Shiyu Wu,
and Xuanjing Huang. 2015. Sentence Model-
ing with Gated Recursive Neural Network. In

http://aclweb.org/anthology/D18-1313
http://aclweb.org/anthology/D18-1313
http://aclweb.org/anthology/D18-1313
http://aclweb.org/anthology/P18-2003
http://aclweb.org/anthology/P18-2003
https://doi.org/10.18653/v1/D15-1075
https://doi.org/10.18653/v1/D15-1075
https://doi.org/10.18653/v1/D15-1075
http://aclweb.org/anthology/P12-1015
http://aclweb.org/anthology/P12-1015
http://aclweb.org/anthology/W17-4705
http://aclweb.org/anthology/W17-4705
http://aclweb.org/anthology/W17-4705
https://doi.org/10.1162/neco.1996.8.6.1135
https://doi.org/10.1162/neco.1996.8.6.1135
https://doi.org/10.1162/neco.1996.8.6.1135
https://doi.org/10.1162/neco.1996.8.6.1135
https://doi.org/10.18653/v1/S17-2001
https://doi.org/10.18653/v1/S17-2001
https://doi.org/10.18653/v1/S17-2001
https://doi.org/10.1016/S0893-6080(03)00054-6
https://doi.org/10.1016/S0893-6080(03)00054-6
https://doi.org/10.1016/S0893-6080(03)00054-6
https://doi.org/10.1016/S0893-6080(03)00054-6
http://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models.pdf
http://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models.pdf
http://aclweb.org/anthology/P18-1241
http://aclweb.org/anthology/P18-1241
http://aclweb.org/anthology/P18-1241
https://doi.org/10.18653/v1/D15-1092
https://doi.org/10.18653/v1/D15-1092

Proceedings of the 2015 Conference on Empir-
ical Methods in Natural Language Processing,
pages 793–798. Association for Computational
Linguistics.

Yining Chen, Sorcha Gilroy, Andreas Maletti,
Jonathan May, and Kevin Knight. 2018b. Re-
current Neural Networks as Weighted Lan-
guage Recognizers. In Proceedings of the
2018 Conference of the North American Chap-
ter of the Association for Computational Lin-
guistics: Human Language Technologies, Vol-
ume 1 (Long Papers), pages 2261–2271. Asso-
ciation for Computational Linguistics.

Minhao Cheng, Jinfeng Yi, Huan Zhang, Pin-Yu
Chen, and Cho-Jui Hsieh. 2018. Seq2Sick:
Evaluating the Robustness of Sequence-to-
Sequence Models with Adversarial Examples.
arXiv preprint arXiv:1803.01128v1.

Grzegorz Chrupała, Lieke Gelderloos, and Afra
Alishahi. 2017. Representations of language in
a model of visually grounded speech signal. In
Proceedings of the 55th Annual Meeting of the
Association for Computational Linguistics (Vol-
ume 1: Long Papers), pages 613–622. Associa-
tion for Computational Linguistics.

Ondřej Cífka and Ondřej Bojar. 2018. Are BLEU
and Meaning Representation in Opposition? In
Proceedings of the 56th Annual Meeting of the
Association for Computational Linguistics (Vol-
ume 1: Long Papers), pages 1362–1371. Asso-
ciation for Computational Linguistics.

Alexis Conneau, Germán Kruszewski, Guillaume
Lample, Loïc Barrault, and Marco Baroni.
2018. What you can cram into a single $&!#*
vector: Probing sentence embeddings for lin-
guistic properties. In Proceedings of the 56th
Annual Meeting of the Association for Compu-
tational Linguistics (Volume 1: Long Papers),
pages 2126–2136. Association for Computa-
tional Linguistics.

Robin Cooper, Dick Crouch, Jan van Eijck, Chris
Fox, Josef van Genabith, Jan Jaspars, Hans
Kamp, David Milward, Manfred Pinkal, Mas-
simo Poesio, Steve Pulman, Ted Briscoe, Hol-
ger Maier, and Karsten Konrad. 1996. Using
the framework. Technical report, The FraCaS
Consortium.

Fahim Dalvi, Nadir Durrani, Hassan Sajjad,
Yonatan Belinkov, D. Anthony Bau, and James
Glass. 2019a. What Is One Grain of Sand
in the Desert? Analyzing Individual Neurons
in Deep NLP Models. In Proceedings of the
Thirty-Third AAAI Conference on Artificial In-
telligence (AAAI).

Fahim Dalvi, Nadir Durrani, Hassan Sajjad,
Yonatan Belinkov, and Stephan Vogel. 2017.
Understanding and Improving Morphological
Learning in the Neural Machine Translation
Decoder. In Proceedings of the Eighth In-
ternational Joint Conference on Natural Lan-
guage Processing (Volume 1: Long Papers),
pages 142–151. Asian Federation of Natural
Language Processing.

Fahim Dalvi, Avery Nortonsmith, D. Anthony
Bau, Yonatan Belinkov, Hassan Sajjad, Nadir
Durrani, and James Glass. 2019b. NeuroX:
A Toolkit for Analyzing Individual Neurons
in Neural Networks. In Proceedings of the
Thirty-Third AAAI Conference on Artificial In-
telligence (AAAI): Demonstrations Track.

Sreerupa Das, C. Lee Giles, and Guo-Zheng Sun.
1992. Learning Context-free Grammars: Capa-
bilities and Limitations of a Recurrent Neural
Network with an External Stack Memory. In
Proceedings of The Fourteenth Annual Confer-
ence of Cognitive Science Society. Indiana Uni-
versity, page 14.

Ishita Dasgupta, Demi Guo, Andreas Stuhlmüller,
Samuel J. Gershman, and Noah D. Good-
man. 2018. Evaluating Compositionality
in Sentence Embeddings. arXiv preprint
arXiv:1802.04302v2.

Dhanush Dharmaretnam and Alona Fyshe. 2018.
The Emergence of Semantics in Neural Net-
work Representations of Visual Information.
In Proceedings of the 2018 Conference of the
North American Chapter of the Association for
Computational Linguistics: Human Language
Technologies, Volume 2 (Short Papers), pages
776–780. Association for Computational Lin-
guistics.

Yanzhuo Ding, Yang Liu, Huanbo Luan, and
Maosong Sun. 2017. Visualizing and Under-
standing Neural Machine Translation. In Pro-
ceedings of the 55th Annual Meeting of the As-

http://aclweb.org/anthology/N18-1205
http://aclweb.org/anthology/N18-1205
http://aclweb.org/anthology/N18-1205
https://doi.org/10.18653/v1/P17-1057
https://doi.org/10.18653/v1/P17-1057
http://aclweb.org/anthology/P18-1126
http://aclweb.org/anthology/P18-1126
http://aclweb.org/anthology/P18-1198
http://aclweb.org/anthology/P18-1198
http://aclweb.org/anthology/P18-1198
http://aclweb.org/anthology/I17-1015
http://aclweb.org/anthology/I17-1015
http://aclweb.org/anthology/I17-1015
http://aclweb.org/anthology/N18-2122
http://aclweb.org/anthology/N18-2122
https://doi.org/10.18653/v1/P17-1106
https://doi.org/10.18653/v1/P17-1106

sociation for Computational Linguistics (Vol-
ume 1: Long Papers), pages 1150–1159. Asso-
ciation for Computational Linguistics.

Finale Doshi-Velez and Been Kim. 2017. To-
wards A Rigorous Science of Interpretable
Machine Learning. In arXiv preprint
arXiv:1702.08608v2.

Finale Doshi-Velez, Mason Kortz, Ryan Budish,
Chris Bavitz, Sam Gershman, David O’Brien,
Stuart Shieber, James Waldo, David Wein-
berger, and Alexandra Wood. 2017. Account-
ability of AI Under the Law: The Role of Ex-
planation. Berkman Center Publication Forth-
coming.

Jennifer Drexler and James Glass. 2017. Analy-
sis of Audio-Visual Features for Unsupervised
Speech Recognition. In International Work-
shop on Grounding Language Understanding.

Javid Ebrahimi, Daniel Lowd, and Dejing Dou.
2018a. On Adversarial Examples for Character-
Level Neural Machine Translation. In Proceed-
ings of the 27th International Conference on
Computational Linguistics, pages 653–663. As-
sociation for Computational Linguistics.

Javid Ebrahimi, Anyi Rao, Daniel Lowd, and De-
jing Dou. 2018b. HotFlip: White-Box Adver-
sarial Examples for Text Classification. In Pro-
ceedings of the 56th Annual Meeting of the As-
sociation for Computational Linguistics (Vol-
ume 2: Short Papers), pages 31–36. Association
for Computational Linguistics.

Ali Elkahky, Kellie Webster, Daniel Andor, and
Emily Pitler. 2018. A Challenge Set and Meth-
ods for Noun-Verb Ambiguity. In Proceedings
of the 2018 Conference on Empirical Methods
in Natural Language Processing, pages 2562–
2572. Association for Computational Linguis-
tics.

Zied Elloumi, Laurent Besacier, Olivier Galib-
ert, and Benjamin Lecouteux. 2018. Analyzing
Learned Representations of a Deep ASR Per-
formance Prediction Model. In Proceedings of
the 2018 EMNLP Workshop BlackboxNLP: An-
alyzing and Interpreting Neural Networks for
NLP, pages 9–15. Association for Computa-
tional Linguistics.

Jeffrey L. Elman. 1989. Representation and Struc-
ture in Connectionist Models. Technical report,
University of California, San Diego, Center for
Research in Language.

Jeffrey L. Elman. 1990. Finding Structure in
Time. Cognitive science, 14(2):179–211.

Jeffrey L. Elman. 1991. Distributed representa-
tions, simple recurrent networks, and grammat-
ical structure. Machine learning, 7(2-3):195–
225.

Allyson Ettinger, Ahmed Elgohary, and Philip
Resnik. 2016. Probing for semantic evidence
of composition by means of simple classifica-
tion tasks. In Proceedings of the 1st Workshop
on Evaluating Vector-Space Representations for
NLP, pages 134–139. Association for Computa-
tional Linguistics.

Manaal Faruqui, Yulia Tsvetkov, Pushpendre Ras-
togi, and Chris Dyer. 2016. Problems With
Evaluation of Word Embeddings Using Word
Similarity Tasks. In Proc. of the 1st Workshop
on Evaluating Vector Space Representations for
NLP.

Manaal Faruqui, Yulia Tsvetkov, Dani Yogatama,
Chris Dyer, and Noah A. Smith. 2015. Sparse
Overcomplete Word Vector Representations. In
Proceedings of the 53rd Annual Meeting of the
Association for Computational Linguistics and
the 7th International Joint Conference on Natu-
ral Language Processing (Volume 1: Long Pa-
pers), pages 1491–1500. Association for Com-
putational Linguistics.

Shi Feng, Eric Wallace, Alvin Grissom II, Mo-
hit Iyyer, Pedro Rodriguez, and Jordan Boyd-
Graber. 2018. Pathologies of Neural Models
Make Interpretations Difficult. In Proceedings
of the 2018 Conference on Empirical Methods
in Natural Language Processing, pages 3719–
3728. Association for Computational Linguis-
tics.

Lev Finkelstein, Evgeniy Gabrilovich, Yossi Ma-
tias, Ehud Rivlin, Zach Solan, Gadi Wolfman,
and Eytan Ruppin. 2002. Placing Search in
Context: The Concept Revisited. ACM Trans-
actions on information systems, 20(1):116–131.

Robert Frank, Donald Mathis, and William
Badecker. 2013. The Acquisition of Anaphora

http://aclweb.org/anthology/C18-1055
http://aclweb.org/anthology/C18-1055
http://aclweb.org/anthology/P18-2006
http://aclweb.org/anthology/P18-2006
http://aclweb.org/anthology/D18-1277
http://aclweb.org/anthology/D18-1277
http://aclweb.org/anthology/W18-5402
http://aclweb.org/anthology/W18-5402
http://aclweb.org/anthology/W18-5402
https://doi.org/10.18653/v1/W16-2524
https://doi.org/10.18653/v1/W16-2524
https://doi.org/10.18653/v1/W16-2524
http://arxiv.org/pdf/1605.02276v1.pdf
http://arxiv.org/pdf/1605.02276v1.pdf
http://arxiv.org/pdf/1605.02276v1.pdf
https://doi.org/10.3115/v1/P15-1144
https://doi.org/10.3115/v1/P15-1144
http://aclweb.org/anthology/D18-1407
http://aclweb.org/anthology/D18-1407
https://doi.org/10.1145/503104.503110
https://doi.org/10.1145/503104.503110
https://doi.org/10.1080/10489223.2013.796950

by Simple Recurrent Networks. Language Ac-
quisition, 20(3):181–227.

Cynthia Freeman, Jonathan Merriman, Abhinav
Aggarwal, Ian Beaver, and Abdullah Mueen.
2018. Paying Attention to Attention: Highlight-
ing Influential Samples in Sequential Analysis.
arXiv preprint arXiv:1808.02113v1.

Alona Fyshe, Leila Wehbe, Partha P. Talukdar,
Brian Murphy, and Tom M. Mitchell. 2015.
A Compositional and Interpretable Semantic
Space. In Proceedings of the 2015 Conference
of the North American Chapter of the Associ-
ation for Computational Linguistics: Human
Language Technologies, pages 32–41. Associ-
ation for Computational Linguistics.

David Gaddy, Mitchell Stern, and Dan Klein.
2018. What’s Going On in Neural Constituency
Parsers? An Analysis. In Proceedings of the
2018 Conference of the North American Chap-
ter of the Association for Computational Lin-
guistics: Human Language Technologies, Vol-
ume 1 (Long Papers), pages 999–1010. Associ-
ation for Computational Linguistics.

J. Ganesh, Manish Gupta, and Vasudeva Varma.
2017. Interpretation of Semantic Tweet Rep-
resentations. In Proceedings of the 2017
IEEE/ACM International Conference on Ad-
vances in Social Networks Analysis and Mining
2017, ASONAM ’17, pages 95–102, New York,
NY, USA. ACM.

Ji Gao, Jack Lanchantin, Mary Lou Soffa,
and Yanjun Qi. 2018. Black-box Genera-
tion of Adversarial Text Sequences to Evade
Deep Learning Classifiers. arXiv preprint
arXiv:1801.04354v5.

Lieke Gelderloos and Grzegorz Chrupała. 2016.
From phonemes to images: Levels of represen-
tation in a recurrent neural model of visually-
grounded language learning. In Proceedings of
COLING 2016, the 26th International Confer-
ence on Computational Linguistics: Technical
Papers, pages 1309–1319, Osaka, Japan. The
COLING 2016 Organizing Committee.

Felix A. Gers and Jürgen Schmidhuber. 2001.
LSTM Recurrent Networks Learn Simple
Context-Free and Context-Sensitive Lan-
guages. IEEE Transactions on Neural
Networks, 12(6):1333–1340.

Daniela Gerz, Ivan Vulić, Felix Hill, Roi Reichart,
and Anna Korhonen. 2016. SimVerb-3500: A
Large-Scale Evaluation Set of Verb Similarity.
In Proceedings of the 2016 Conference on Em-
pirical Methods in Natural Language Process-
ing, pages 2173–2182. Association for Compu-
tational Linguistics.

Hamidreza Ghader and Christof Monz. 2017.
What does Attention in Neural Machine Trans-
lation Pay Attention to? In Proceedings of the
Eighth International Joint Conference on Natu-
ral Language Processing (Volume 1: Long Pa-
pers), pages 30–39. Asian Federation of Natural
Language Processing.

Reza Ghaeini, Xiaoli Fern, and Prasad Tadepalli.
2018. Interpreting Recurrent and Attention-
Based Neural Models: A Case Study on Nat-
ural Language Inference. In Proceedings of the
2018 Conference on Empirical Methods in Nat-
ural Language Processing, pages 4952–4957.
Association for Computational Linguistics.

Mario Giulianelli, Jack Harding, Florian Mohn-
ert, Dieuwke Hupkes, and Willem Zuidema.
2018. Under the Hood: Using Diagnostic Clas-
sifiers to Investigate and Improve how Lan-
guage Models Track Agreement Information.
In Proceedings of the 2018 EMNLP Workshop
BlackboxNLP: Analyzing and Interpreting Neu-
ral Networks for NLP, pages 240–248. Associ-
ation for Computational Linguistics.

Max Glockner, Vered Shwartz, and Yoav Gold-
berg. 2018. Breaking NLI Systems with Sen-
tences that Require Simple Lexical Inferences.
In Proceedings of the 56th Annual Meeting of
the Association for Computational Linguistics
(Volume 2: Short Papers), pages 650–655. As-
sociation for Computational Linguistics.

Fréderic Godin, Kris Demuynck, Joni Dambre,
Wesley De Neve, and Thomas Demeester. 2018.
Explaining Character-Aware Neural Networks
for Word-Level Prediction: Do They Discover
Linguistic Rules? In Proceedings of the 2018
Conference on Empirical Methods in Natural
Language Processing, pages 3275–3284. Asso-
ciation for Computational Linguistics.

Yoav Goldberg. 2017. Neural Network methods
for Natural Language Processing, volume 10 of

https://doi.org/10.1080/10489223.2013.796950
https://doi.org/10.3115/v1/N15-1004
https://doi.org/10.3115/v1/N15-1004
http://aclweb.org/anthology/N18-1091
http://aclweb.org/anthology/N18-1091
https://doi.org/10.1145/3110025.3110083
https://doi.org/10.1145/3110025.3110083
http://aclweb.org/anthology/C16-1124
http://aclweb.org/anthology/C16-1124
http://aclweb.org/anthology/C16-1124
https://doi.org/10.1109/72.963769
https://doi.org/10.1109/72.963769
https://doi.org/10.1109/72.963769
https://doi.org/10.18653/v1/D16-1235
https://doi.org/10.18653/v1/D16-1235
http://aclweb.org/anthology/I17-1004
http://aclweb.org/anthology/I17-1004
http://aclweb.org/anthology/D18-1537
http://aclweb.org/anthology/D18-1537
http://aclweb.org/anthology/D18-1537
http://aclweb.org/anthology/W18-5426
http://aclweb.org/anthology/W18-5426
http://aclweb.org/anthology/W18-5426
http://aclweb.org/anthology/P18-2103
http://aclweb.org/anthology/P18-2103
http://aclweb.org/anthology/D18-1365
http://aclweb.org/anthology/D18-1365
http://aclweb.org/anthology/D18-1365

Synthesis Lectures on Human Language Tech-
nologies. Morgan & Claypool Publishers.

Ian Goodfellow, Yoshua Bengio, and Aaron
Courville. 2016. Deep Learning. MIT Press.
http://www.deeplearningbook.org.

Ian Goodfellow, Jean Pouget-Abadie, Mehdi
Mirza, Bing Xu, David Warde-Farley, Sherjil
Ozair, Aaron Courville, and Yoshua Bengio.
2014. Generative Adversarial Nets. In Ad-
vances in neural information processing sys-
tems, pages 2672–2680.

Ian J. Goodfellow, Jonathon Shlens, and Christian
Szegedy. 2015. Explaining and Harnessing Ad-
versarial Examples. In International Confer-
ence on Learning Representations (ICLR).

Kristina Gulordava, Piotr Bojanowski, Edouard
Grave, Tal Linzen, and Marco Baroni. 2018.
Colorless Green Recurrent Networks Dream
Hierarchically. In Proceedings of the 2018
Conference of the North American Chapter
of the Association for Computational Linguis-
tics: Human Language Technologies, Volume 1
(Long Papers), pages 1195–1205. Association
for Computational Linguistics.

Abhijeet Gupta, Gemma Boleda, Marco Baroni,
and Sebastian Padó. 2015. Distributional vec-
tors encode referential attributes. In Proceed-
ings of the 2015 Conference on Empirical Meth-
ods in Natural Language Processing, pages 12–
21. Association for Computational Linguistics.

Pankaj Gupta and Hinrich Schütze. 2018. LISA:
Explaining Recurrent Neural Network Judg-
ments via Layer-wIse Semantic Accumulation
and Example to Pattern Transformation. In
Proceedings of the 2018 EMNLP Workshop
BlackboxNLP: Analyzing and Interpreting Neu-
ral Networks for NLP, pages 154–164. Associ-
ation for Computational Linguistics.

Suchin Gururangan, Swabha Swayamdipta, Omer
Levy, Roy Schwartz, Samuel Bowman, and
Noah A. Smith. 2018. Annotation Artifacts in
Natural Language Inference Data. In Proceed-
ings of the 2018 Conference of the North Amer-
ican Chapter of the Association for Computa-
tional Linguistics: Human Language Technolo-
gies, Volume 2 (Short Papers), pages 107–112.
Association for Computational Linguistics.

Catherine L. Harris. 1990. Connectionism and
Cognitive Linguistics. Connection Science,
2(1-2):7–33.

David Harwath and James Glass. 2017. Learn-
ing Word-Like Units from Joint Audio-Visual
Analysis. In Proceedings of the 55th Annual
Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers), pages
506–517. Association for Computational Lin-
guistics.

Georg Heigold, Günter Neumann, and Josef van
Genabith. 2018. How Robust Are Character-
Based Word Embeddings in Tagging and MT
Against Wrod Scramlbing or Randdm Nouse?
In Proceedings of the 13th Conference of The
Association for Machine Translation in the
Americas (Volume 1: Research Track), pages
68–79.

Felix Hill, Roi Reichart, and Anna Korhonen.
2015. SimLex-999: Evaluating Semantic
Models With (Genuine) Similarity Estimation.
Computational Linguistics, 41(4):665–695.

Dieuwke Hupkes, Sara Veldhoen, and Willem
Zuidema. 2018. Visualisation and ’diagnos-
tic classifiers’ reveal how recurrent and recur-
sive neural networks process hierarchical struc-
ture. Journal of Artificial Intelligence Research,
61:907–926.

Pierre Isabelle, Colin Cherry, and George Foster.
2017. A Challenge Set Approach to Evaluat-
ing Machine Translation. In Proceedings of the
2017 Conference on Empirical Methods in Nat-
ural Language Processing, pages 2486–2496.
Association for Computational Linguistics.

Pierre Isabelle and Roland Kuhn. 2018. A Chal-
lenge Set for French–> English Machine Trans-
lation. arXiv preprint arXiv:1806.02725v2.

Hitoshi Isahara. 1995. JEIDA’s test-sets for qual-
ity evaluation of MT systems-technical evalua-
tion from the developer’s point of view. In Pro-
ceedings of MT Summit V.

Mohit Iyyer, John Wieting, Kevin Gimpel, and
Luke Zettlemoyer. 2018. Adversarial Exam-
ple Generation with Syntactically Controlled
Paraphrase Networks. In Proceedings of the

http://www.deeplearningbook.org
http://aclweb.org/anthology/N18-1108
http://aclweb.org/anthology/N18-1108
https://doi.org/10.18653/v1/D15-1002
https://doi.org/10.18653/v1/D15-1002
http://aclweb.org/anthology/W18-5418
http://aclweb.org/anthology/W18-5418
http://aclweb.org/anthology/W18-5418
http://aclweb.org/anthology/W18-5418
http://aclweb.org/anthology/N18-2017
http://aclweb.org/anthology/N18-2017
https://doi.org/10.1080/09540099008915660
https://doi.org/10.1080/09540099008915660
https://doi.org/10.18653/v1/P17-1047
https://doi.org/10.18653/v1/P17-1047
https://doi.org/10.18653/v1/P17-1047
https://doi.org/10.1162/COLI_a_00237
https://doi.org/10.1162/COLI_a_00237
http://aclweb.org/anthology/D17-1263
http://aclweb.org/anthology/D17-1263
http://aclweb.org/anthology/N18-1170
http://aclweb.org/anthology/N18-1170
http://aclweb.org/anthology/N18-1170

2018 Conference of the North American Chap-
ter of the Association for Computational Lin-
guistics: Human Language Technologies, Vol-
ume 1 (Long Papers), pages 1875–1885. Asso-
ciation for Computational Linguistics.

Alon Jacovi, Oren Sar Shalom, and Yoav Gold-
berg. 2018. Understanding Convolutional Neu-
ral Networks for Text Classification. In Pro-
ceedings of the 2018 EMNLP Workshop Black-
boxNLP: Analyzing and Interpreting Neural
Networks for NLP, pages 56–65. Association
for Computational Linguistics.

Inigo Jauregi Unanue, Ehsan Zare Borzeshi, and
Massimo Piccardi. 2018. A Shared Attention
Mechanism for Interpretation of Neural Auto-
matic Post-Editing Systems. In Proceedings of
the 2nd Workshop on Neural Machine Transla-
tion and Generation, pages 11–17. Association
for Computational Linguistics.

Robin Jia and Percy Liang. 2017. Adversarial ex-
amples for evaluating reading comprehension
systems. In Proceedings of the 2017 Confer-
ence on Empirical Methods in Natural Lan-
guage Processing, pages 2021–2031. Associa-
tion for Computational Linguistics.

Rafal Jozefowicz, Oriol Vinyals, Mike Schuster,
Noam Shazeer, and Yonghui Wu. 2016. Explor-
ing the Limits of Language Modeling. arXiv
preprint arXiv:1602.02410v2.

Akos Kádár, Grzegorz Chrupała, and Afra Al-
ishahi. 2017. Representation of Linguistic
Form and Function in Recurrent Neural Net-
works. Computational Linguistics, 43(4):761–
780.

Andrej Karpathy, Justin Johnson, and Fei-
Fei Li. 2015. Visualizing and Understand-
ing Recurrent Networks. arXiv preprint
arXiv:1506.02078v2.

Urvashi Khandelwal, He He, Peng Qi, and Dan Ju-
rafsky. 2018. Sharp Nearby, Fuzzy Far Away:
How Neural Language Models Use Context. In
Proceedings of the 56th Annual Meeting of the
Association for Computational Linguistics (Vol-
ume 1: Long Papers), pages 284–294. Associa-
tion for Computational Linguistics.

Margaret King and Kirsten Falkedal. 1990. Using
Test Suites in Evaluation of Machine Transla-
tion Systems. In COLNG 1990 Volume 2: Pa-
pers presented to the 13th International Confer-
ence on Computational Linguistics.

Eliyahu Kiperwasser and Yoav Goldberg. 2016.
Simple and Accurate Dependency Parsing Us-
ing Bidirectional LSTM Feature Representa-
tions. Transactions of the Association for Com-
putational Linguistics, 4:313–327.

Sungryong Koh, Jinee Maeng, Ji-Young Lee,
Young-Sook Chae, and Key-Sun Choi. 2001. A
test suite for evaluation of English-to-Korean
machine translation systems. In MT Summit
Conference.

Arne Köhn. 2015. What’s in an Embedding? Ana-
lyzing Word Embeddings through Multilingual
Evaluation. In Proceedings of the 2015 Con-
ference on Empirical Methods in Natural Lan-
guage Processing, pages 2067–2073, Lisbon,
Portugal. Association for Computational Lin-
guistics.

Volodymyr Kuleshov, Shantanu Thakoor,
Tingfung Lau, and Stefano Ermon. 2018.
Adversarial Examples for Natural Language
Classification Problems.

Brenden Lake and Marco Baroni. 2018. Gener-
alization without Systematicity: On the Com-
positional Skills of Sequence-to-Sequence Re-
current Networks. In Proceedings of the 35th
International Conference on Machine Learning,
volume 80 of Proceedings of Machine Learning
Research, pages 2873–2882, Stockholmsmäs-
san, Stockholm, Sweden. PMLR.

Sabine Lehmann, Stephan Oepen, Sylvie Regnier-
Prost, Klaus Netter, Veronika Lux, Judith Klein,
Kirsten Falkedal, Frederik Fouvry, Dominique
Estival, Eva Dauphin, Herve Compagnion, Ju-
dith Baur, Lorna Balkan, and Doug Arnold.
1996. TSNLP – Test Suites for Natural Lan-
guage Processing. In COLING 1996 Volume 2:
The 16th International Conference on Compu-
tational Linguistics.

Tao Lei, Regina Barzilay, and Tommi Jaakkola.
2016. Rationalizing Neural Predictions. In
Proceedings of the 2016 Conference on Empir-
ical Methods in Natural Language Processing,

http://aclweb.org/anthology/W18-5408
http://aclweb.org/anthology/W18-5408
http://aclweb.org/anthology/W18-2702
http://aclweb.org/anthology/W18-2702
http://aclweb.org/anthology/W18-2702
http://aclweb.org/anthology/D17-1215
http://aclweb.org/anthology/D17-1215
http://aclweb.org/anthology/D17-1215
https://doi.org/10.1162/COLI_a_00300
https://doi.org/10.1162/COLI_a_00300
https://doi.org/10.1162/COLI_a_00300
http://aclweb.org/anthology/P18-1027
http://aclweb.org/anthology/P18-1027
http://www.aclweb.org/anthology/C90-2037
http://www.aclweb.org/anthology/C90-2037
http://www.aclweb.org/anthology/C90-2037
http://aclweb.org/anthology/Q16-1023
http://aclweb.org/anthology/Q16-1023
http://aclweb.org/anthology/Q16-1023
http://aclweb.org/anthology/D15-1246
http://aclweb.org/anthology/D15-1246
http://aclweb.org/anthology/D15-1246
https://openreview.net/forum?id=r1QZ3zbAZ
https://openreview.net/forum?id=r1QZ3zbAZ
http://proceedings.mlr.press/v80/lake18a.html
http://proceedings.mlr.press/v80/lake18a.html
http://proceedings.mlr.press/v80/lake18a.html
http://proceedings.mlr.press/v80/lake18a.html
http://www.aclweb.org/anthology/C96-2120
http://www.aclweb.org/anthology/C96-2120
https://doi.org/10.18653/v1/D16-1011

pages 107–117. Association for Computational
Linguistics.

Ira Leviant and Roi Reichart. 2015. Separated
by an Un-common Language: Towards Judg-
ment Language Informed Vector Space Model-
ing. arXiv preprint arXiv:1508.00106v5.

Jiwei Li, Xinlei Chen, Eduard Hovy, and Dan Ju-
rafsky. 2016a. Visualizing and Understanding
Neural Models in NLP. In Proceedings of the
2016 Conference of the North American Chap-
ter of the Association for Computational Lin-
guistics: Human Language Technologies, pages
681–691. Association for Computational Lin-
guistics.

Jiwei Li, Will Monroe, and Dan Jurafsky.
2016b. Understanding Neural Networks
through Representation Erasure. arXiv preprint
arXiv:1612.08220v3.

Bin Liang, Hongcheng Li, Miaoqiang Su, Pan
Bian, Xirong Li, and Wenchang Shi. 2018.
Deep Text Classification Can be Fooled. In
Proceedings of the Twenty-Seventh Interna-
tional Joint Conference on Artificial Intelli-
gence, IJCAI-18, pages 4208–4215. Interna-
tional Joint Conferences on Artificial Intelli-
gence Organization.

Tal Linzen, Emmanuel Dupoux, and Yoav Gold-
berg. 2016. Assessing the Ability of LSTMs to
Learn Syntax-Sensitive Dependencies. Trans-
actions of the Association for Computational
Linguistics, 4:521–535.

Zachary C. Lipton. 2016. The Mythos of Model
Interpretability. In ICML Workshop on Human
Interpretability of Machine Learning.

Nelson F. Liu, Omer Levy, Roy Schwartz, Chen-
hao Tan, and Noah A. Smith. 2018. LSTMs Ex-
ploit Linguistic Attributes of Data. In Proceed-
ings of The Third Workshop on Representation
Learning for NLP, pages 180–186. Association
for Computational Linguistics.

Yanpei Liu, Xinyun Chen, Chang Liu, and Dawn
Song. 2017. Delving into Transferable Adver-
sarial Examples and Black-box Attacks. In In-
ternational Conference on Learning Represen-
tations (ICLR).

Thang Luong, Richard Socher, and Christopher
Manning. 2013. Better Word Representa-
tions with Recursive Neural Networks for Mor-
phology. In Proceedings of the Seventeenth
Conference on Computational Natural Lan-
guage Learning, pages 104–113. Association
for Computational Linguistics.

Jean Maillard and Stephen Clark. 2018. La-
tent Tree Learning with Differentiable Parsers:
Shift-Reduce Parsing and Chart Parsing. In
Proceedings of the Workshop on the Relevance
of Linguistic Structure in Neural Architectures
for NLP, pages 13–18. Association for Compu-
tational Linguistics.

Marco Marelli, Luisa Bentivogli, Marco Ba-
roni, Raffaella Bernardi, Stefano Menini, and
Roberto Zamparelli. 2014. SemEval-2014 Task
1: Evaluation of Compositional Distributional
Semantic Models on Full Sentences through
Semantic Relatedness and Textual Entailment.
In Proceedings of the 8th International Work-
shop on Semantic Evaluation (SemEval 2014),
pages 1–8. Association for Computational Lin-
guistics.

R. Thomas McCoy, Robert Frank, and Tal Linzen.
2018. Revisiting the poverty of the stimulus:
Hierarchical generalization without a hierarchi-
cal bias in recurrent neural networks. In Pro-
ceedings of the 40th Annual Conference of the
Cognitive Science Society.

Risto Miikkulainen and Michael G. Dyer. 1991.
Natural Language Processing With Modular
Pdp Networks and Distributed Lexicon. Cog-
nitive Science, 15(3):343–399.

Tomáš Mikolov, Martin Karafiát, Lukáš Bur-
get, Jan Černockỳ, and Sanjeev Khudanpur.
2010. Recurrent neural network based language
model. In Eleventh Annual Conference of the
International Speech Communication Associa-
tion.

Yao Ming, Shaozu Cao, Ruixiang Zhang, Zhen
Li, Yuanzhe Chen, Yangqiu Song, and Huamin
Qu. 2017. Understanding Hidden Memories of
Recurrent Neural Networks. In IEEE Confer-
ence on Visual Analytics Science and Technol-
ogy (IEEE VAST 2017).

Grégoire Montavon, Wojciech Samek, and Klaus-
Robert Müller. 2018. Methods for interpreting

https://doi.org/10.18653/v1/N16-1082
https://doi.org/10.18653/v1/N16-1082
https://doi.org/10.24963/ijcai.2018/585
http://aclweb.org/anthology/Q16-1037
http://aclweb.org/anthology/Q16-1037
http://aclweb.org/anthology/W18-3024
http://aclweb.org/anthology/W18-3024
http://aclweb.org/anthology/W13-3512
http://aclweb.org/anthology/W13-3512
http://aclweb.org/anthology/W13-3512
http://aclweb.org/anthology/W18-2903
http://aclweb.org/anthology/W18-2903
http://aclweb.org/anthology/W18-2903
https://doi.org/10.3115/v1/S14-2001
https://doi.org/10.3115/v1/S14-2001
https://doi.org/10.3115/v1/S14-2001
https://doi.org/10.3115/v1/S14-2001
https://doi.org/10.1207/s15516709cog1503_2
https://doi.org/10.1207/s15516709cog1503_2
http://www.myaooo.com/projects/rnnvis/
http://www.myaooo.com/projects/rnnvis/
https://doi.org/https://doi.org/10.1016/j.dsp.2017.10.011

and understanding deep neural networks. Digi-
tal Signal Processing, 73:1 – 15.

Pramod Kaushik Mudrakarta, Ankur Taly,
Mukund Sundararajan, and Kedar Dhamdhere.
2018. Did the Model Understand the Question?
In Proceedings of the 56th Annual Meeting of
the Association for Computational Linguistics
(Volume 1: Long Papers), pages 1896–1906.
Association for Computational Linguistics.

James Mullenbach, Sarah Wiegreffe, Jon Duke,
Jimeng Sun, and Jacob Eisenstein. 2018. Ex-
plainable Prediction of Medical Codes from
Clinical Text. In Proceedings of the 2018
Conference of the North American Chapter
of the Association for Computational Linguis-
tics: Human Language Technologies, Volume 1
(Long Papers), pages 1101–1111. Association
for Computational Linguistics.

W. James Murdoch, Peter J. Liu, and Bin Yu.
2018. Beyond Word Importance: Contex-
tual Decomposition to Extract Interactions from
LSTMs. In International Conference on Learn-
ing Representations.

Brian Murphy, Partha Talukdar, and Tom Mitchell.
2012. Learning Effective and Interpretable Se-
mantic Models using Non-Negative Sparse Em-
bedding. In Proceedings of COLING 2012,
pages 1933–1950. The COLING 2012 Organiz-
ing Committee.

Tasha Nagamine, Michael L. Seltzer, and Nima
Mesgarani. 2015. Exploring How Deep Neural
Networks Form Phonemic Categories. In Inter-
speech 2015.

Tasha Nagamine, Michael L. Seltzer, and Nima
Mesgarani. 2016. On the Role of Nonlin-
ear Transformations in Deep Neural Network
Acoustic Models. In Interspeech 2016, pages
803–807.

Aakanksha Naik, Abhilasha Ravichander, Nor-
man Sadeh, Carolyn Rose, and Graham Neu-
big. 2018. Stress Test Evaluation for Nat-
ural Language Inference. In Proceedings of
the 27th International Conference on Compu-
tational Linguistics, pages 2340–2353. Associ-
ation for Computational Linguistics.

Nina Narodytska and Shiva Kasiviswanathan.
2017. Simple Black-Box Adversarial Attacks

on Deep Neural Networks. In 2017 IEEE Con-
ference on Computer Vision and Pattern Recog-
nition Workshops (CVPRW), pages 1310–1318.

Lars Niklasson and Fredrik Linåker. 2000. Dis-
tributed representations for extended syntac-
tic transformation. Connection Science, 12(3-
4):299–314.

Tong Niu and Mohit Bansal. 2018. Adversar-
ial Over-Sensitivity and Over-Stability Strate-
gies for Dialogue Models. In Proceedings of
the 22nd Conference on Computational Natural
Language Learning, pages 486–496. Associa-
tion for Computational Linguistics.

Nicolas Papernot, Patrick McDaniel, and Ian
Goodfellow. 2016a. Transferability in Ma-
chine Learning: From Phenomena to Black-
Box Attacks using Adversarial Samples. arXiv
preprint arXiv:1605.07277v1.

Nicolas Papernot, Patrick McDaniel, Ian Goodfel-
low, Somesh Jha, Z. Berkay Celik, and Anan-
thram Swami. 2017. Practical Black-Box At-
tacks Against Machine Learning. In Proceed-
ings of the 2017 ACM on Asia Conference on
Computer and Communications Security, ASIA
CCS ’17, pages 506–519, New York, NY, USA.
ACM.

Nicolas Papernot, Patrick McDaniel, Ananthram
Swami, and Richard Harang. 2016b. Craft-
ing Adversarial Input Sequences for Recurrent
Neural Networks. In Military Communications
Conference, MILCOM 2016-2016 IEEE, pages
49–54. IEEE.

Dong Huk Park, Lisa Anne Hendricks, Zeynep
Akata, Anna Rohrbach, Bernt Schiele, Trevor
Darrell, and Marcus Rohrbach. 2018. Multi-
modal Explanations: Justifying Decisions and
Pointing to the Evidence. In The IEEE Confer-
ence on Computer Vision and Pattern Recogni-
tion (CVPR).

Sungjoon Park, JinYeong Bak, and Alice Oh.
2017. Rotated Word Vector Representations
and their Interpretability. In Proceedings of the
2017 Conference on Empirical Methods in Nat-
ural Language Processing, pages 401–411. As-
sociation for Computational Linguistics.

Matthew Peters, Mark Neumann, Luke Zettle-
moyer, and Wen-tau Yih. 2018. Dissecting

https://doi.org/https://doi.org/10.1016/j.dsp.2017.10.011
http://aclweb.org/anthology/P18-1176
http://aclweb.org/anthology/N18-1100
http://aclweb.org/anthology/N18-1100
http://aclweb.org/anthology/N18-1100
https://openreview.net/forum?id=rkRwGg-0Z
https://openreview.net/forum?id=rkRwGg-0Z
https://openreview.net/forum?id=rkRwGg-0Z
http://www.aclweb.org/anthology/C12-1118
http://www.aclweb.org/anthology/C12-1118
http://www.aclweb.org/anthology/C12-1118
https://doi.org/10.21437/Interspeech.2016-1406
https://doi.org/10.21437/Interspeech.2016-1406
https://doi.org/10.21437/Interspeech.2016-1406
http://aclweb.org/anthology/C18-1198
http://aclweb.org/anthology/C18-1198
https://doi.org/10.1109/CVPRW.2017.172
https://doi.org/10.1109/CVPRW.2017.172
https://doi.org/10.1080/09540090010014070
https://doi.org/10.1080/09540090010014070
https://doi.org/10.1080/09540090010014070
http://aclweb.org/anthology/K18-1047
http://aclweb.org/anthology/K18-1047
http://aclweb.org/anthology/K18-1047
https://doi.org/10.1145/3052973.3053009
https://doi.org/10.1145/3052973.3053009
http://aclweb.org/anthology/D17-1041
http://aclweb.org/anthology/D17-1041
http://aclweb.org/anthology/D18-1179

Contextual Word Embeddings: Architecture
and Representation. In Proceedings of the 2018
Conference on Empirical Methods in Natural
Language Processing, pages 1499–1509. Asso-
ciation for Computational Linguistics.

Adam Poliak, Aparajita Haldar, Rachel Rudinger,
J. Edward Hu, Ellie Pavlick, Aaron Steven
White, and Benjamin Van Durme. 2018a. Col-
lecting Diverse Natural Language Inference
Problems for Sentence Representation Evalua-
tion. In Proceedings of the 2018 Conference on
Empirical Methods in Natural Language Pro-
cessing, pages 67–81. Association for Compu-
tational Linguistics.

Adam Poliak, Jason Naradowsky, Aparajita
Haldar, Rachel Rudinger, and Benjamin
Van Durme. 2018b. Hypothesis Only Baselines
in Natural Language Inference. In Proceedings
of the Seventh Joint Conference on Lexical
and Computational Semantics, pages 180–191.
Association for Computational Linguistics.

Jordan B. Pollack. 1990. Recursive dis-
tributed representations. Artificial Intelligence,
46(1):77 – 105.

Peng Qian, Xipeng Qiu, and Xuanjing Huang.
2016a. Analyzing Linguistic Knowledge in Se-
quential Model of Sentence. In Proceedings of
the 2016 Conference on Empirical Methods in
Natural Language Processing, pages 826–835,
Austin, Texas. Association for Computational
Linguistics.

Peng Qian, Xipeng Qiu, and Xuanjing Huang.
2016b. Investigating Language Universal and
Specific Properties in Word Embeddings. In
Proceedings of the 54th Annual Meeting of the
Association for Computational Linguistics (Vol-
ume 1: Long Papers), pages 1478–1488, Berlin,
Germany. Association for Computational Lin-
guistics.

Marco Tulio Ribeiro, Sameer Singh, and Carlos
Guestrin. 2018. Semantically Equivalent Ad-
versarial Rules for Debugging NLP models. In
Proceedings of the 56th Annual Meeting of the
Association for Computational Linguistics (Vol-
ume 1: Long Papers), pages 856–865. Associa-
tion for Computational Linguistics.

Matı̄ss Rikters. 2018. Debugging Neural
Machine Translations. arXiv preprint
arXiv:1808.02733v1.

Annette Rios Gonzales, Laura Mascarell, and Rico
Sennrich. 2017. Improving Word Sense Disam-
biguation in Neural Machine Translation with
Sense Embeddings. In Proceedings of the Sec-
ond Conference on Machine Translation, pages
11–19. Association for Computational Linguis-
tics.

Tim Rocktäschel, Edward Grefenstette,
Karl Moritz Hermann, Tomáš Kočiskỳ,
and Phil Blunsom. 2016. Reasoning about
Entailment with Neural Attention. In Interna-
tional Conference on Learning Representations
(ICLR).

Andras Rozsa, Ethan M. Rudd, and Terrance E.
Boult. 2016. Adversarial Diversity and Hard
Positive Generation. In Proceedings of the
IEEE Conference on Computer Vision and Pat-
tern Recognition Workshops, pages 25–32.

Rachel Rudinger, Jason Naradowsky, Brian
Leonard, and Benjamin Van Durme. 2018.
Gender Bias in Coreference Resolution. In Pro-
ceedings of the 2018 Conference of the North
American Chapter of the Association for Com-
putational Linguistics: Human Language Tech-
nologies, Volume 2 (Short Papers), pages 8–14.
Association for Computational Linguistics.

D. E. Rumelhart and J. L. McClelland. 1986. Par-
allel Distributed Processing: Explorations in the
Microstructure of Cognition. volume 2, chapter
On Learning the Past Tenses of English Verbs,
pages 216–271. MIT Press, Cambridge, MA,
USA.

Alexander M. Rush, Sumit Chopra, and Jason
Weston. 2015. A Neural Attention Model for
Abstractive Sentence Summarization. In Pro-
ceedings of the 2015 Conference on Empiri-
cal Methods in Natural Language Processing,
pages 379–389. Association for Computational
Linguistics.

Keisuke Sakaguchi, Kevin Duh, Matt Post, and
Benjamin Van Durme. 2017. Robsut Wrod Re-
ocginiton via Semi-Character Recurrent Neu-
ral Network. In Proceedings of the Thirty-
First AAAI Conference on Artificial Intelli-

http://aclweb.org/anthology/D18-1179
http://aclweb.org/anthology/D18-1179
http://aclweb.org/anthology/D18-1007
http://aclweb.org/anthology/D18-1007
http://aclweb.org/anthology/D18-1007
http://aclweb.org/anthology/D18-1007
http://aclweb.org/anthology/S18-2023
http://aclweb.org/anthology/S18-2023
https://doi.org/https://doi.org/10.1016/0004-3702(90)90005-K
https://doi.org/https://doi.org/10.1016/0004-3702(90)90005-K
https://aclweb.org/anthology/D16-1079
https://aclweb.org/anthology/D16-1079
http://www.aclweb.org/anthology/P16-1140
http://www.aclweb.org/anthology/P16-1140
http://aclweb.org/anthology/P18-1079
http://aclweb.org/anthology/P18-1079
http://attention.lielakeda.lv
http://attention.lielakeda.lv
http://aclweb.org/anthology/W17-4702
http://aclweb.org/anthology/W17-4702
http://aclweb.org/anthology/W17-4702
http://aclweb.org/anthology/N18-2002
http://dl.acm.org/citation.cfm?id=21935.42475
http://dl.acm.org/citation.cfm?id=21935.42475
http://dl.acm.org/citation.cfm?id=21935.42475
https://doi.org/10.18653/v1/D15-1044
https://doi.org/10.18653/v1/D15-1044
http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14332
http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14332
http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14332

gence, February 4-9, 2017, San Francisco, Cal-
ifornia, USA., pages 3281–3287. AAAI Press.

Suranjana Samanta and Sameep Mehta. 2017.
Towards Crafting Text Adversarial Samples.
arXiv preprint arXiv:1707.02812v1.

Ivan Sanchez, Jeff Mitchell, and Sebastian Riedel.
2018. Behavior Analysis of NLI Models: Un-
covering the Influence of Three Factors on Ro-
bustness. In Proceedings of the 2018 Confer-
ence of the North American Chapter of the As-
sociation for Computational Linguistics: Hu-
man Language Technologies, Volume 1 (Long
Papers), pages 1975–1985. Association for
Computational Linguistics.

Motoki Sato, Jun Suzuki, Hiroyuki Shindo, and
Yuji Matsumoto. 2018. Interpretable Adversar-
ial Perturbation in Input Embedding Space for
Text. In Proceedings of the Twenty-Seventh In-
ternational Joint Conference on Artificial In-
telligence, IJCAI-18, pages 4323–4330. Inter-
national Joint Conferences on Artificial Intelli-
gence Organization.

Lutfi Kerem Senel, Ihsan Utlu, Veysel Yucesoy,
Aykut Koc, and Tolga Cukur. 2018. Seman-
tic Structure and Interpretability of Word Em-
beddings. IEEE/ACM Transactions on Audio,
Speech, and Language Processing.

Rico Sennrich. 2017. How Grammatical is
Character-level Neural Machine Translation?
Assessing MT Quality with Contrastive Trans-
lation Pairs. In Proceedings of the 15th Confer-
ence of the European Chapter of the Association
for Computational Linguistics: Volume 2, Short
Papers, pages 376–382. Association for Com-
putational Linguistics.

Haoyue Shi, Jiayuan Mao, Tete Xiao, Yuning
Jiang, and Jian Sun. 2018. Learning Visually-
Grounded Semantics from Contrastive Adver-
sarial Samples. In Proceedings of the 27th In-
ternational Conference on Computational Lin-
guistics, pages 3715–3727. Association for
Computational Linguistics.

Xing Shi, Kevin Knight, and Deniz Yuret. 2016a.
Why Neural Translations are the Right Length.
In Proceedings of the 2016 Conference on Em-
pirical Methods in Natural Language Process-
ing, pages 2278–2282. Association for Compu-
tational Linguistics.

Xing Shi, Inkit Padhi, and Kevin Knight. 2016b.
Does String-Based Neural MT Learn Source
Syntax? In Proceedings of the 2016 Con-
ference on Empirical Methods in Natural Lan-
guage Processing, pages 1526–1534, Austin,
Texas. Association for Computational Linguis-
tics.

Chandan Singh, W. James Murdoch, and Bin
Yu. 2018. Hierarchical interpretations for
neural network predictions. arXiv preprint
arXiv:1806.05337v1.

Hendrik Strobelt, Sebastian Gehrmann, Michael
Behrisch, Adam Perer, Hanspeter Pfister, and
Alexander M. Rush. 2018a. Seq2Seq-Vis:
A Visual Debugging Tool for Sequence-
to-Sequence Models. arXiv preprint
arXiv:1804.09299v1.

Hendrik Strobelt, Sebastian Gehrmann, Hanspeter
Pfister, and Alexander M. Rush. 2018b. LST-
MVis: A Tool for Visual Analysis of Hidden
State Dynamics in Recurrent Neural Networks.
IEEE transactions on visualization and com-
puter graphics, 24(1):667–676.

Mukund Sundararajan, Ankur Taly, and Qiqi Yan.
2017. Axiomatic Attribution for Deep Net-
works. In Proceedings of the 34th Interna-
tional Conference on Machine Learning, vol-
ume 70 of Proceedings of Machine Learning
Research, pages 3319–3328, International Con-
vention Centre, Sydney, Australia. PMLR.

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le.
2014. Sequence to Sequence Learning with
Neural Networks. In Advances in neural infor-
mation processing systems, pages 3104–3112.

Mirac Suzgun, Yonatan Belinkov, and Stuart M.
Shieber. 2019. On Evaluating the Generaliza-
tion of LSTM Models in Formal Languages. In
Proceedings of the Society for Computation in
Linguistics (SCiL).

Christian Szegedy, Wojciech Zaremba, Ilya
Sutskever, Joan Bruna, Dumitru Erhan, Ian
Goodfellow, and Rob Fergus. 2014. Intrigu-
ing properties of neural networks. In Interna-
tional Conference on Learning Representations
(ICLR).

Gongbo Tang, Rico Sennrich, and Joakim Nivre.
2018. An Analysis of Attention Mechanisms:

http://aclweb.org/anthology/N18-1179
http://aclweb.org/anthology/N18-1179
http://aclweb.org/anthology/N18-1179
https://doi.org/10.24963/ijcai.2018/601
https://doi.org/10.24963/ijcai.2018/601
https://doi.org/10.24963/ijcai.2018/601
https://doi.org/10.1109/TASLP.2018.2837384
https://doi.org/10.1109/TASLP.2018.2837384
https://doi.org/10.1109/TASLP.2018.2837384
http://aclweb.org/anthology/E17-2060
http://aclweb.org/anthology/E17-2060
http://aclweb.org/anthology/E17-2060
http://aclweb.org/anthology/E17-2060
http://aclweb.org/anthology/C18-1315
http://aclweb.org/anthology/C18-1315
http://aclweb.org/anthology/C18-1315
https://doi.org/10.18653/v1/D16-1248
https://aclweb.org/anthology/D16-1159
https://aclweb.org/anthology/D16-1159
http://lstm.seas.harvard.edu
http://lstm.seas.harvard.edu
http://lstm.seas.harvard.edu
http://proceedings.mlr.press/v70/sundararajan17a.html
http://proceedings.mlr.press/v70/sundararajan17a.html
http://aclweb.org/anthology/W18-6304

The Case of Word Sense Disambiguation in
Neural Machine Translation. In Proceedings of
the Third Conference on Machine Translation:
Research Papers, pages 26–35. Association for
Computational Linguistics.

Yi Tay, Anh Tuan Luu, and Siu Cheung Hui.
2018. CoupleNet: Paying Attention to Couples
with Coupled Attention for Relationship Rec-
ommendation. In Proceedings of the Twelfth In-
ternational AAAI Conference on Web and Social
Media (ICWSM).

Ke Tran, Arianna Bisazza, and Christof Monz.
2018. The Importance of Being Recurrent
for Modeling Hierarchical Structure. In Pro-
ceedings of the 2018 Conference on Empiri-
cal Methods in Natural Language Processing,
pages 4731–4736. Association for Computa-
tional Linguistics.

Eva Vanmassenhove, Jinhua Du, and Andy Way.
2017. Investigating ‘Aspect’ in NMT and
SMT: Translating the English Simple Past and
Present Perfect. Computational Linguistics in
the Netherlands Journal, 7:109–128.

Sara Veldhoen, Dieuwke Hupkes, and Willem
Zuidema. 2016. Diagnostic Classifiers: Reveal-
ing how Neural Networks Process Hierarchical
Structure. In CEUR Workshop Proceedings.

Elena Voita, Pavel Serdyukov, Rico Sennrich, and
Ivan Titov. 2018. Context-Aware Neural Ma-
chine Translation Learns Anaphora Resolution.
In Proceedings of the 56th Annual Meeting of
the Association for Computational Linguistics
(Volume 1: Long Papers), pages 1264–1274.
Association for Computational Linguistics.

Ekaterina Vylomova, Trevor Cohn, Xuanli He,
and Gholamreza Haffari. 2016. Word Rep-
resentation Models for Morphologically Rich
Languages in Neural Machine Translation.
arXiv preprint arXiv:1606.04217v1.

Alex Wang, Amapreet Singh, Julian Michael, Fe-
lix Hill, Omer Levy, and Samuel R. Bowman.
2018a. GLUE: A Multi-Task Benchmark and
Analysis Platform for Natural Language Under-
standing. arXiv preprint arXiv:1804.07461v1.

Shuai Wang, Yanmin Qian, and Kai Yu. 2017a.
What Does the Speaker Embedding Encode? In
Interspeech 2017, pages 1497–1501.

Xinyi Wang, Hieu Pham, Pengcheng Yin, and
Graham Neubig. 2018b. A Tree-based Decoder
for Neural Machine Translation. In Conference
on Empirical Methods in Natural Language
Processing (EMNLP), Brussels, Belgium.

Yu-Hsuan Wang, Cheng-Tao Chung, and Hung-yi
Lee. 2017b. Gate Activation Signal Analysis
for Gated Recurrent Neural Networks and Its
Correlation with Phoneme Boundaries. In In-
terspeech 2017.

Gail Weiss, Yoav Goldberg, and Eran Yahav. 2018.
On the Practical Computational Power of Finite
Precision RNNs for Language Recognition. In
Proceedings of the 56th Annual Meeting of the
Association for Computational Linguistics (Vol-
ume 2: Short Papers), pages 740–745. Associa-
tion for Computational Linguistics.

Adina Williams, Andrew Drozdov, and Samuel R.
Bowman. 2018. Do latent tree learning mod-
els identify meaningful structure in sentences?
Transactions of the Association for Computa-
tional Linguistics, 6:253–267.

Zhizheng Wu and Simon King. 2016. Investigat-
ing gated recurrent networks for speech syn-
thesis. In 2016 IEEE International Confer-
ence on Acoustics, Speech and Signal Process-
ing (ICASSP), pages 5140–5144. IEEE.

Puyudi Yang, Jianbo Chen, Cho-Jui Hsieh, Jane-
Ling Wang, and Michael I. Jordan. 2018.
Greedy Attack and Gumbel Attack: Generating
Adversarial Examples for Discrete Data. arXiv
preprint arXiv:1805.12316v1.

Wenpeng Yin, Hinrich Schütze, Bing Xiang, and
Bowen Zhou. 2016. ABCNN: Attention-Based
Convolutional Neural Network for Modeling
Sentence Pairs. Transactions of the Association
for Computational Linguistics, 4:259–272.

Xiaoyong Yuan, Pan He, Qile Zhu, and Xiaolin
Li. 2017. Adversarial Examples: Attacks and
Defenses for Deep Learning. arXiv preprint
arXiv:1712.07107v3.

Omar Zaidan, Jason Eisner, and Christine Piatko.
2007. Using “Annotator Rationales” to Im-
prove Machine Learning for Text Categoriza-
tion. In Human Language Technologies 2007:

http://aclweb.org/anthology/W18-6304
http://aclweb.org/anthology/W18-6304
http://aclweb.org/anthology/D18-1503
http://aclweb.org/anthology/D18-1503
http://aclweb.org/anthology/P18-1117
http://aclweb.org/anthology/P18-1117
https://doi.org/10.21437/Interspeech.2017-1125
https://arxiv.org/abs/1808.09374
https://arxiv.org/abs/1808.09374
http://aclweb.org/anthology/P18-2117
http://aclweb.org/anthology/P18-2117
http://aclweb.org/anthology/Q18-1019
http://aclweb.org/anthology/Q18-1019
http://aclweb.org/anthology/Q16-1019
http://aclweb.org/anthology/Q16-1019
http://aclweb.org/anthology/Q16-1019
http://www.aclweb.org/anthology/N07-1033
http://www.aclweb.org/anthology/N07-1033
http://www.aclweb.org/anthology/N07-1033

The Conference of the North American Chap-
ter of the Association for Computational Lin-
guistics; Proceedings of the Main Conference,
pages 260–267. Association for Computational
Linguistics.

Quan-shi Zhang and Song-chun Zhu. 2018. Vi-
sual interpretability for deep learning: A sur-
vey. Frontiers of Information Technology &
Electronic Engineering, 19(1):27–39.

Ye Zhang, Iain Marshall, and Byron C. Wal-
lace. 2016. Rationale-Augmented Convolu-
tional Neural Networks for Text Classification.
In Proceedings of the 2016 Conference on Em-
pirical Methods in Natural Language Process-
ing, pages 795–804. Association for Computa-
tional Linguistics.

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente
Ordonez, and Kai-Wei Chang. 2018a. Gen-
der Bias in Coreference Resolution: Evaluation
and Debiasing Methods. In Proceedings of the
2018 Conference of the North American Chap-
ter of the Association for Computational Lin-
guistics: Human Language Technologies, Vol-
ume 2 (Short Papers), pages 15–20. Association
for Computational Linguistics.

Junbo Zhao, Yoon Kim, Kelly Zhang, Alexan-
der Rush, and Yann LeCun. 2018b. Adversar-
ially Regularized Autoencoders. In Proceed-
ings of the 35th International Conference on
Machine Learning, volume 80 of Proceedings
of Machine Learning Research, pages 5902–
5911, Stockholmsmässan, Stockholm, Sweden.
PMLR.

Zhengli Zhao, Dheeru Dua, and Sameer Singh.
2018c. Generating Natural Adversarial Exam-
ples. In International Conference on Learning
Representations.

https://doi.org/10.1631/FITEE.1700808
https://doi.org/10.1631/FITEE.1700808
https://doi.org/10.1631/FITEE.1700808
https://doi.org/10.18653/v1/D16-1076
https://doi.org/10.18653/v1/D16-1076
http://aclweb.org/anthology/N18-2003
http://aclweb.org/anthology/N18-2003
http://aclweb.org/anthology/N18-2003
http://proceedings.mlr.press/v80/zhao18b.html
http://proceedings.mlr.press/v80/zhao18b.html
https://openreview.net/forum?id=H1BLjgZCb
https://openreview.net/forum?id=H1BLjgZCb

Supplementary Materials

Reference Component Property Method

(Elloumi et al., 2018) CNN activations Style, accent, broadcast program Classification
(Belinkov and Glass, 2017) CNN/RNN activations Phonetic units Classification
(Dalvi et al., 2019a) NMT and LM neurons POS, morphology, lexical semantics Classification
(Shi et al., 2016a) NMT encoder neurons Sentence length Regression
(Belinkov et al., 2017b) NMT states POS, lexical semantics Classification
(Belinkov et al., 2017a;
Dalvi et al., 2017)

NMT states POS, morphology Classification

(Bisazza and Tump, 2018) NMT states Morphology Classification
(Tran et al., 2018) RNN / self-attention

states
Subject-verb agreement Likelihood com-

parison, direct
classification

(Wang et al., 2017b) RNN gates Phoneme boundaries Change in activa-
tion signal

(McCoy et al., 2018) RNN sentence embed-
ding

Hierarchical structure Classification

(Blevins et al., 2018) RNN states POS, ancestor label prediction, dependency
relation prediction

Classification

(Shi et al., 2016b) RNN states POS, top syntactic sequence, smallest con-
stituent, tense, voice

Classification

(Gulordava et al., 2018) RNN states Number agreement Likelihood com-
parison

(Linzen et al., 2016) RNN states Subject-verb agreement Likelihood com-
parison, direct
classification

(Liu et al., 2018) RNN states Word presence Direct classifica-
tion

(Alishahi et al., 2017) RNN states in audio-
visual model

Phonemes, synonyms Classification,
clustering, dis-
crimination

(Gelderloos and Chrupała,
2016)

RNN states in
language-vision model

Word boundary, word similarity Classification

(Qian et al., 2016a) RNN states/gates POS, syntactic role, gender, case, definite-
ness, verb form, mood

Classification,
correlation

(Wu and King, 2016) RNN states/gates Acoustic features Correlation
(Ghader and Monz, 2017) Attention weights POS, word alignment Distribution mea-

sures, match with
alignments

(Voita et al., 2018) Attention weights Anaphora Attention score
(Tang et al., 2018) Attention weights Word sense disambiguation Distribution mea-

sures
(Drexler and Glass, 2017) Audio-visual CNN ac-

tivations
Phonemes, speakers, word identity Clustering, dis-

crimination
(Harwath and Glass, 2017) Audio-visual CNN em-

beddings
Word classes Clustering

(Chrupała et al., 2017) Audio-visual RNN ac-
tivations

Utterance length, word presence, homonym
disambiguation

Classification, re-
gression, similar-
ity measures

(Peters et al., 2018) biLM representations
(RNN, Transformer,
gated CNN)

POS, constituency parsing, coreference Classification;
similarity scores

(Nagamine et al., 2016) Hidden activations in
feed-forward acoustic
model

Phonemes, phonetic features Classification,
clustering mea-
sures

Table SM1: A categorization of work trying to find linguistic information in neural networks according
to the neural network component investigated, the linguistic property sought, and the analysis method.

Continued on next page

Continued from previous page

Reference Component Property Method

(Nagamine et al., 2015) Hidden activations in
feed-forward acoustic
model

Phonemes, phonetic features, gender Clustering, aver-
age activations by
group/label

(Chaabouni et al., 2017) Hidden activations in
feed-forward audio-
visual model

Phonetic features Discrimination

(Vylomova et al., 2016) NMT word embeddings synonyms, morphological features Nearest neighbors
(Gaddy et al., 2018) Parser word embed-

dings
Word features (shape, etc.) Classification;

also other meth-
ods

(Ettinger et al., 2016) Sentence embeddings Semantic role, word presence Classification
(Adi et al., 2017a,b) Sentence embeddings Sentence length, word presence, word order Classification
(Ahmad et al., 2018) Sentence embeddings Sentence length, word presence, word order;

POS, word sense disambiguation; sentence
order

Classification

(Ganesh et al., 2017) Sentence embeddings Sentence length, word presence, word order;
orthography; social tasks

Classification

(Conneau et al., 2018) Sentence embeddings Sentence length, word presence, word order;
tree depth, top constituent; main tense, sub-
ject/object number, semantic odd man out,
coordinate inversion

Classification

(Brunner et al., 2017) Sentence embeddings Synthetic syntactic patterns Clustering
(Wang et al., 2017a) Speaker embeddings Speaker, speech content, word order, utter-

ance length, channel, gender, speaking rate
Classification

(Qian et al., 2016b) Word embeddings POS, dependency relations, morphological
features, emotions

Classification

(Köhn, 2015) Word embeddings POS, head POS, dependency relation, gender,
case, number, tense

Classification

(Gupta et al., 2015) Word embeddings Referential attributes Classification
(Dharmaretnam and Fyshe,
2018)

Word embeddings, vi-
sion CNN

Concepts Similarity mea-
sures

Reference Task Phenomena Languages Size Construction

(Naik et al., 2018) NLI Antonyms, quantities, spelling,
word overlap, negation, length

English 7596 Automatic

(Dasgupta et al., 2018) NLI Compositionality English 44010 Automatic
(Sanchez et al., 2018) NLI Antonyms, hyper/hyponyms English 6279 Semi-auto.
(Wang et al., 2018a) NLI Diverse semantics English 550 Manual
(Glockner et al., 2018) NLI Lexical inference English 8193 Semi-auto.
(Poliak et al., 2018a) NLI Diverse English 570K Manual,

semi-auto.,
automatic

(Rios Gonzales et al.,
2017)

MT Word sense disambiguation German→English/
French

13900 Semi-auto.

(Burlot and Yvon, 2017) MT Morphology English→Czech/Latvian 18500 Automatic
(Sennrich, 2017) MT Polarity, verb-particle construc-

tions, agreement, transliteration
English→German 97K Automatic

(Bawden et al., 2018) MT Discourse English→French 400 Manual
(Isabelle et al., 2017; Is-
abelle and Kuhn, 2018)

MT Morpho-syntax, syntax, lexicon English↔French 108 + 506 Manual

(Burchardt et al., 2017) MT Diverse English↔German 10000 Manual
(Linzen et al., 2016) LM Subject-verb agreement English ∼1.35M Automatic
(Gulordava et al., 2018) LM Number agreement English, Russian,

Hebrew, Italian
∼10K Automatic

(Rudinger et al., 2018) Coref. Gender bias English 720 Semi-auto.
(Zhao et al., 2018a) Coref. Gender bias English 3160 Semi-auto.
(Lake and Baroni, 2018) seq2seq Compositionality English 20910 Automatic
(Elkahky et al., 2018) POS

tagging
Noun-verb ambiguity English 32654 Semi-auto.

Table SM2: A categorization of challenge sets for evaluating neural networks according to the NLP task,
the linguistic phenomena, the represented languages, the dataset size, and the construction method.

Method Knowledge Targeted Unit Task

(Belinkov and Bisk, 2018) Black 7 Char MT
(Heigold et al., 2018) Black 7 Char MT, morphology
(Sakaguchi et al., 2017) Black 7 Char Spelling correction
(Zhao et al., 2018c) Black 3, 7 Word MT, natural language inference
(Gao et al., 2018) Black 7 Char Text classification, sentiment
(Jia and Liang, 2017) Black 7 Sentence Reading comprehension
(Iyyer et al., 2018) Black 7 Syntax Sentiment, entailment
(Shi et al., 2018) Black 7 Word Image captioning
(Alzantot et al., 2018) Black 7 Word NLI, sentiment
(Kuleshov et al., 2018) Black 7 Word Text classification, sentiment
(Ribeiro et al., 2018) Black 7 Word Reading comprehension, visual QA, sentiment
(Niu and Bansal, 2018) Black 7 Word Dialogue
(Chen et al., 2018a) White 3 Pixels Image captioning
(Ebrahimi et al., 2018a) White 3 Word MT
(Cheng et al., 2018) White 3 Word MT, summarization
(Mudrakarta et al., 2018) White 7 Word Reading comprehension, visual and table QA
(Papernot et al., 2016b) White 7 Word Sentiment
(Samanta and Mehta, 2017) White 7 Word Sentiment, gender detection
(Sato et al., 2018) White 7 Word Text classification, sentiment, grammatical er-

ror detection
(Liang et al., 2018) White 3 Word/Char Text classification
(Ebrahimi et al., 2018b) White 7 Word/Char Text classification
(Yang et al., 2018) White 7 Word/Char Text classification

Table SM3: A categorization of methods for adversarial examples in NLP according to adversary’s knowl-
edge (white-box vs. black-box), attack specificity (targeted vs. non-targeted), the modified linguistic unit
(words, characters, etc.), and the attacked task.

Related Posts