深度学习 自然语言处理 nlp代写 Independent Project 3: Language Modeling

1 Data

Independent Project 3: Language Modeling

For this project, we are going to use the wikitext-2 data for language modeling. I did some additional pre-processing on this dataset, therefore it is not exactly the same as the one available online.

2

In the data files, there are four special tokens

• <unk>: special token for low-frequency words
• <num>: special token for all the numbers
• <start>: special token to indicate the beginning of a sentence • <stop>: special token to indicate the end of a sentence

Here are some simple statistics about the dataset

Recurrent Neural Network Language Models

In this section, you are going to implement a RNN for language modeling. To be specific, it is a RNN with LSTM units. As a starting point, you need to implement a very simple RNN with LSTM units, so please read the instruction carefully!

The goal of this part includes
• learn to implement a simple RNN LM with LSTM units

• get some experience on tuning hyper-parameters for a better model

I recommend to use PyTorch for the all the implementation in this section.
1. (5 points) Please implement a simple RNN with LSTM units to meet the following requirements:

Training Development Test

# Sentences

17,556 1,841 2,183

# Tokens

1,800,391 188,963 212,719

Table 1: Dataset statistics 1

  • Input and hidden dimensions: 32
  • No mini-batch or mini-batch size is 1
  • No truncation on sentence length, every token in a sentence must be read into the RNN LM to compute a hidden state, except the last token <stop>
  • Use SGD with no momentum and no weight decay, you may want to norm clipping to make sure you can train the model without being interrupted by gradient explosion
  • Use single-layer LSTM
  • Use nn.Embedding with default initialization for word embeddings

    Please write the code into a Python file with name simple-rnnlm.py. Please follow the require- ments strictly, otherwise you will lose some points for this question and your answers for the following questions will be invalid. If you want to use some technique or a deep learning trick that is not covered in the requirement, feel free to use it and explain it in your report.

2. (3 points) Perplexity. It should be computed on corpus level. For example, if a corpus has only two sentences as following

<start>, w1,1 , . . . , w1,N1 , <stop> <start>, w2,1 , . . . , w2,N2 , <stop>

To compute the perplexity, we need to compute the average of the log probabilities first as avg = 1 {log p(w1,1) + · · · + log p(w1,N1 ) + log p(<stop>)

N1 +1+N2 +1
+ log p(w2,1) + · · · + log p(w2,N2 ) + log p(<stop>)}

(1)

  1. (3points)ReporttheperplexitynumbersofyoursimpleRNNLM(asin[computingID]-simple-rnnlm.py) on the training and development datasets. In addition, run your model on the test data, and write
    the log probabilities into the file [computingID]-tst-logprob.txt with the following format in
    each line

    token\t log probability

  2. (3 points) Stacked LSTM. Based on your implementation in [computingID]-simple-rnnlm.py, modify the model to use multi-layer LSTM (stacked LSTM). Based on the perplexity on the dev data, you can tune the number of hidden layers n as a hyper-parameter to find a better model. Here, 1 ≤ n ≤ 3. In your report, explain the following information

    • the value of n in the better model
    • perplexity number on the training data based the better model • perplexity number on the dev data based on the better model

    Submit your code with file name [computingID]-stackedlstm-rnnlm.py

  3. (3 points) Optimization. Based on your implementation [computingID]-simple-rnnlm.py again, choose different optimization methods (SGD with momentum, AdaGrad, etc.) to find a better model. In your report, explain the following information

    • the optimization method used in the better model
    • perplexity number on the training data based the better model

and then
Please implement the function to compute perplexity as explained above and write the code into a

Perplexity = e−avg. (2) separate Python file with name [computingID]-perplexity.py.

2

• perplexity number on the dev data based on the better model Submit your code with file name [computingID]-opt-rnnlm.py

  1. (3 points) Model Size. Based on your implementation [computingID]-simple-rnnlm.py again, choose different input/hidden dimensions. Suggested dimensions for both input and hidden are 32, 64, 128, 256. Try different combinations of input/hidden dimensions to find a better model. In your report, explain the following information

    • input/hidden dimensions used in the better model
    • perplexity number on the training data based the better model • perplexity number on the dev data based on the better model

    Submit your code with file name [computingID]-model-rnnlm.py

  2. (5 points, extra) Mini-batch. Based on your implementation [computingID]-simple-rnnlm.py again, modify this code to add the mini-batch function. Tune the mini-batch size b as {16, 24, 32, 64} to see whether it makes a difference. In your report, explain the following information

    • whether different mini-batch sizes make a difference. If the answer is yes, then • the best batch size
    • perplexity number on the training data based the better model
    • perplexity number on the dev data based on the better model

    Submit your code with file name [computingID]-minibatch-rnnlm.py

3