Last Modified: March 5, 2018 |
CS 295: Statistical NLP: Winter 2018 Homework 4: Neural Machine Translation Sameer Singh (and Robert L. Logan) http://sameersingh.org/courses/statnlp/wi18/ One of the most widespread and public-facing applications of natural language processing is machine trans- lation. It has gained a lot of attention in recent years, both infamously for its lack of ability to understand the nuance in human communications, and for near human-level performance achieved using neural models. In this homework, we will be looking at neural machine translation from modern English to Shakespearean English. The submissions are due by midnight on March 18, 2018. 1 Task: Neural Machine Translation Machine translation is the task of designing a model which automatically translates text or speech from one lan- guage (which we refer to as the source language S) to another language (which we refer to as the target language T). In order to train such a model, we assume we have access to a parallel corpus C = {< s,t > |s ∈ S,t ∈ T} of pairs of sentences in each language which have equivalent meanings. Our goal is then to find a model which maximizes the probability P(t|s) for the sentences in this corpus. The sub-field of neural machine translation parameterizes this probability distribution as a neural network. I will briefly describe the neural machine translation model in this section. For more details you should refer to Sutskever et al.’s paper on the topic available here: https://arxiv.org/pdf/1409.3215.pdf. 1.1 Sequence-to-sequence Models At its core, this problem entails mapping a sequence of inputs (words in the source language) to a sequence of outputs (words in the target language). As we’ve discussed in class, recurrent neural networks (RNNs) are effective at working with this kind of sequential data. One difficulty that arises in machine translation is that there is not a one-to-one correspondence between the input and output sequence. That is, the sequences are typically of different lengths and the word alignment may be non-trivial (e.g. words that are direct translations of each other may not occur in the same order). To address these issues, we will use a more flexible architecture known as a sequence-to-sequence model. This model is composed of two parts, an encoder and a decoder, both of which are RNNs. The encoder takes as its input the sequence of words in the source language, and outputs the final hidden states of the RNN layers. The decoder is similar, except it also has an additional fully connected layer (w/ softmax activation) used to define a probability distribution over the next words in the translation. In this way, the decoder essentially functions as a neural language model for the target language. The key difference is that the decoder uses the output of the encoder as its initial hidden state, as opposed to a vector of zeros. 1.2 Data and Source Code I have released the initial source code, available at https://github.com/sameersingh/uci-statnlp/tree/ master/hw4, and the data archive available on Canvas. You will need to uncompress the archive, and put it in the ◦ folder for the code to work. The source code contains the following: used by the model. It will be copied to the folder where your model checkpoints are saved during training. This is done to prevent you from inadvertently making changes to your hyper-parameters in the middle of training, as well as to remind you of the exact configurations used for the experiments you perform (which is useful if you train many different models). If you introduce additional hyper-parameters to the model (e.g. dropout-rate, number of layers, etc.) I recommend keeping track of them in this file. data/ config.yaml ◦ model.py:Thiscodeprovidesimplementationsofthe and modules.Thebasicstructure is similar to the examples in the recurrent neural network tutorial notebook1, however there are a few key differences: Encoder Decoder 1 https://github.com/sameersingh/uci- statnlp/tree/master/tutorials/rnn_examples.ipynb |
Homework 4 UC Irvine 1/ 4 |
CS 295: Statistical NLP Winter 2018 |
Any architectural changes you make should be done in this file. train.py:Thisscriptisusedtotrainyourmodel.Exampleusage: python train.py --config config.yaml You will probably not need to modify this file unless you decide to use a training method other than teacher- forcing, or an optimizer other than Adam. evaluation.py:ThisscriptwillmeasuretheBLEUscoreofyourmodelonthetestset,aswellasprovide a couple example translations for qualitative evaluation. Example usage: python evaluate.py –config config.yaml : Defines the class. Used to associate an integer id to the words encountered in Details about what you need to implement is in the sections below. What to Submit? 1 1 ◦ ◦ ◦ ◦ 2 utils/data.py the corpora. ShakespeareDataset utils/vocab.py Vocab This assignment is designed to be extremely open-ended. The only task is to improve upon the provided baseline model. Accordingly, there are a variety of things you can try. Here are a couple suggestions: Trivial modifications:
2.1 Difficulty of Approach (50 points) As the suggestions above indicate, some modifications may be much more difficult to implement than others. In order to encourage students to pursue more difficult approaches, points will be assigned based on the amount of effort put into the work. We assume as a baseline that all students will make the trivial modifications to the model. After that basic rubric is: 1 hard modification = 2 medium modifications = 3 easy modifications. To make this more clear, full credit would be earned in each of the following scenarios: 2 https://code.google.com/archive/p/word2vec/ 3 https://nlp.stanford.edu/projects/glove/ Decoder Encoder |
Homework 4 UC Irvine 2/ 4 |
CS 295: Statistical NLP Winter 2018 |
2.2 Quality of the Write-Up (50 points) You should give a detailed description of the approach you took in the above section, using tables and figures to describe what you did. Separately, you need to perform an analysis of your approach, in terms of the both quantitiative (BLEU score) and qualitative (examples of translations) results, again, using tables and figures as necessary. In the above examples the students could receive full credit by doing the following:
|
Homework 4 UC Irvine 3/ 4 |
CS 295: Statistical NLP Winter 2018 |
4 Statement of Collaboration It is mandatory to include a Statement of Collaboration in each submission, with respect to the guidelines below. Include the names of everyone involved in the discussions (especially in-person ones), and what was discussed. All students are required to follow the academic honesty guidelines posted on the course website. For pro- gramming assignments, in particular, I encourage the students to organize (perhaps using Piazza) to discuss the task descriptions, requirements, bugs in my code, and the relevant technical content before they start working on it. However, you should not discuss the specific solutions, and, as a guiding principle, you are not allowed to take anything written or drawn away from these discussions (i.e. no photographs of the blackboard, written notes, referring to Piazza, etc.). Especially after you have started working on the assignment, try to restrict the discussion to Piazza as much as possible, so that there is no doubt as to the extent of your collaboration. Since we do not have a leaderboard for this assignment, you are free to discuss the numbers you are getting with others, and again, I encourage you to use Piazza to post your translations and compare them with others. Acknowledgements This homework was made possible by the course reader Robert Logan, who wrote both the source code and this assignment description. In addition, we would like to thank the PyTorch team whose machine translation tutorial was drawn heavily from to create this assignment. |
Homework 4 UC Irvine 4/ 4 |