1
2 2.1
COMP6714 Project Specification (stage 2)
October 4, 2018
COMP6714 18s2 Project
Stage 2: Modify a baseline model of hyponymy classification Deadline and Late Penalty
The project deadline is 23:59 26 Oct 2018 (Fri).
Late penalty is -10% each day for the first three days, and then -20% each day afterwards.
2.2 Ob jective
As explained in stage 1, in this project you need to build a system that can extract hyponym and hypernym relations from a sentence.
Now, we provide you with the baseline model and you are required to modifiy it according to the specification given below. The baseline model follows the achitecture introduced in the Stage 1 spec.
2.3 Run (and understand) the baseline model
In order to play with the baseline model, you just need to execute the following command:
python train.py
where * You can modify config.py to change hyper-parameters. * You can modify randomness.py to manipulate the randomness (e.g., change random seed). * If you want to test the performance of the trained model, then you need to implement a test method by yourself.
We suggest that you read and understand the baseline model first.
2.4 Your tasks
You need to complete your implementation in the file todo.py. You are required to implemnet the following three methods. * get char sequence() * new LSTMCell() * evaluate()
The modified todo.py will be submitted for evaluation.
NOTE: you can modify config.py to enable the above modifications in your model.
2.5 Task 1: Implement evaluate() (30%)
You are required to implement the evaluate() method in todo.py. This method computes the F1 score of the given predicted tags and golden tags (i.e., ground truth).
The input arguments of evaluate() are: * golden list is a list of list of tags, which stores the golden tags. * predict list is a list of list of tags, which stores the predicted tags.
The method should return the F1 score based on golden list and predict list. In this project, we only consider the phrase level matching for TAR and HYP (O is not considered). Two entities are matched when both the boundaries and the tags are the same.
For example, given
1
golden_list = [[’B-TAR’, ’I-TAR’, ’O’, ’B-HYP’], [’B-TAR’, ’O’, ’O’, ’B-HYP’]]
predict_list = [[’B-TAR’, ’O’, ’O’, ’O’], [’B-TAR’, ’O’, ’B-HYP’, ’I-HYP’]]
• The first TAR in golden list does not match with predict list, as the boundary is not incorrect (e.g., predict list[0][1] is O, which should be I-TAR for a correct matching).
• The second TAR in golden list matches with the second TAR in predict list, as both the boundary and the tag are the same.
• The number of false positives in the above example is 2, the number of false negative in the above
example is 3, and the number of true positive is 1. Therefore, the F1 should be 0.286.
NOTE: * The length of the two lists are the same, and length of the i-th instance in both lists are the same as well. Which means that you do not need to handle the alignment issue.
2.6 Task 2: Implement new LSTMCell() (30%)
You are required to implement a new version of the LSTM Cell (i.e., new LSTMCell() in todo.py), which has a different logic of controlling the input gate.
Instead of separately deciding what to forget and what we should add new information to, we make those decisions together. We only forget when we’re going to input something in its place. We only input new values to the state when we forget something older.
Specifically, before the modification, we have
C t = f t ∗ C t − 1 + i t ∗ C ̃ t
where it is the activation vector of the input gate, and ft is the activation vector of the forget gate. By letting it = 1 − ft, after the modification, we have
Ct = ft ∗ Ct−1 + (1 − ft) ∗ C ̃t
NOTE: * Your implementation should base on the original implementation (i.e., torch.nn. functions.rnn.LSTMCell()). Please read and understand it first. * Please do not change the input arguments of the method, i.e.,
def new_LSTMCell(input, hidden, w_ih, w_hh, b_ih=None, b_hh=None):
• We do not use GPU in this project, therefore, you do not need to change the following part (or simply remove them from your implementation), as input.is cuda is always false:
if input.is_cuda:
igates = F.linear(input, w_ih)
hgates = F.linear(hidden[0], w_hh)
state = fusedBackend.LSTMFused.apply
return state(igates, hgates, hidden[1]) if b_ih is None else state(igates, hgates, hidden[1], b
• In order to avoid unnecessary errors during the project evaluation, please do not change the following three lines from the original implementation, although you may not use all the vaiables.
hx, cx = hidden
gates = F.linear(input, w_ih, b_ih) + F.linear(hx, w_hh, b_hh)
ingate, forgetgate, cellgate, outgate = gates.chunk(4, 1)
2
2.7 Task 3: Implement get char sequence() (20%)
You are required to implement a BiLSTM layer for Character embedding (i.e., char BiLSTM in the fol-
lowing figure). The output of the char BiLSTM will be concatenated with word embedding from the data preprocessing to form the new input tensor.
The new architecture will be like
More specifically, you need to implement the method get char sequence() in todo.py. Its input argu- ments are: * model is an object of sequence labeling, refer to line 49 of model.py to see how the method is called * batch char index matrices is a tensor that can be viewed as a list of matrices storing char ids, where each matrix corresponds to a sentence, each sentence corresponds to a list of words, and each word corresponds to a list of char ids. * batch word len lists is tensor that can be viewed as a list of lists. Where each list corresponds to a sentence, and stores the length of each word.
The output dimension of the char BiLSTM is defined in config.py (i.e., char lstm output dim)
Hint: * We suggest you to read and understand the code in model.py first, especially the part of BiLSTM layer.
An example of the char BiLSTM layer is as below:
2.8 Report (20%)
You are required to experiment your implementation and submit a report (named as report.pdf). The report should at least answer the following questions (you should answer them in different sections)
• How do you implement evaluate()?
• How does Modification 1 (i.e., storing model with best performance on the development set) affect the
performance?
• How do you implement new LSTMCell()?
• How does Modification 2 (i.e., re-implemented LSTM cell) affect the performance?
• How do you implement get char sequence()?
• How does Modification 3 (i.e., adding Char BiLSTM layer) affect the performance?
You may need to implement a test function in order to test the performance of models.
2.9 Submission
You need to submit the following 2 files: 1. todo.py 2. report.pdf
NOTE: The detail of how to submit your files will be announced later in the Piazza forum.
2.10 Bonus
After completing the project, you are welcomed to implement your own model (rather than modifying the given baseline implementation). If you choose to do so, please make sure that
1. your implementation outperforms the baseline model by a large margin (the number will be announced later) on the given test set.
2. you report the implementation details as a short report.
There are some research papers for your reference: * Long Short-Term Memory as a Dynamically Com- puted Element-wise Weighted Sum – ACL18 * Deep contextualized word representations – NAACL18 * Fast and accurate entity recognition with iterated dilated convolutions – EMNLP17 * Neural models for sequence chunking – AAAI17
2.10.1 Submission of Bonus Part
If you choose to do a bonus part, you need to submit a .zip file which contains: 1. the code of your model 2. the report (as a pdf file)
3
The report should contain at least the following two parts: 1. The implementation detail of your model (e.g., what are the differences between your model and the baseline model). 2. The instruction of how to execute your code.
NOTE: * It is unnecessary to include the training, development and testing files in your submission. * The detail of the bonus part will be announced later in the Piazza forum.
4