readme.md
Deep Learning
Due: 12. October (23:55)
Overview
For gaining a better understanding of deep learning, we’re going to be looking at deep averaging networks. These are a very simple framework, but they work well for a variety of tasks and will help introduce some of the core concepts of using deep learning in practice.
In this homework, you’ll use pytorch to implement a DAN classifier for determining which category the quiz-bowl question is talking about (Literature, History or Science).
You’ll turn in your code on the submit server. This assignment is worth 40 points.
Dataset
The data is sampled from quiz bowl bonus question. We tokenize the questuion and split them into train/dec/test set. Each example includes the question text and the label (0 for Literature, 1 for History and 2 for Science).
Pytorch data loader
In this homework, we use pytorch build-in data loader to do data mini-batching, which provides single or multi-process iterators over the dataset(https://pytorch.org/docs/stable/data.html).
For data loader, there includes two functions, batichfy and vectorize. For each example, we need to vectorize the question text in to a vector using vocabuary. In this assignment, you need to write the vectorize function yourself. Then we provide the batchify function to split the dataset into mini-batches.
What you have to do
Coding:
- Understand the structure of the code.
- Write the data vectorize funtion.
- Write DAN model initialization.
- Write model forward function.
- Write the model training/testing function. We don’t have unit test for this part, but to get reasonable performance, it’s necessary to get it correct.
Analysis:
- Report the accuracy of test set. (It would get around 0.8 accuracy)
- Look at the dev set, give some examples and explain the possible reasons why these examples are predicted incorrectly.
Pytorch install
In this homework, we use CPU version of Pytorch 0.4.1(latest version)
You can install it using the following command(linux): conda install pytorch torchvision -c pytorch or conda install pytorch=0.4.1 torchvision -c pytorch
If you are using macos or windows, check the pytorch website for installation
For more information, check https://pytorch.org/get-started/locally/
Extra Credit
For extra credit, you need to initialize the word representations with word2vec, GloVe, or some other representation. Compare the final performance based on these initializations and see how the word representations change. Write down your findings in analysis.pdf.
What to turn in
- Submit your dan.py file
- Submit your analysis.pdf file
No more than one page
Include your name at the top of the pdf
GPUs
In this homework, we don’t use GPUs, the training is fast enough for CPU based code.