Format
Word Count Due date
Marking Weighting
1
Advanced Natural Language Engineering (G5114): Assessed coursework
Julie Weeds February 18, 2021
Submit a single zip file containing at least 1 pdf and an appendix of your code (which may be a .ipynb or a .py file)
8 pages (approx. 3000 words) plus code appendix
Submit your assignment online before 4pm on Friday 30th April (week 11). Submissions will be accepted up to 7 days late but there is a penalty for this.
You will be told your mark and receive feedback via Canvas before Friday 21st May This assignment is worth 60% of your mark for this module.
Practical assignment (3000 words)
The Microsoft Research Sentence Completion Challenge (Zweig and Burges, 2011) requires a system to be able to predict which is the most likely word (from a set of 5 possibilities) to complete a sentence. In the labs you have evaluated using unigram and bigram models. In this assignment you are expected to investigate at least 2 extensions or alternative approaches to making predictions. Your solution does not need to be novel. You might choose to investigate 2 of the following approaches or 1 of the following approaches and 1 of your own devising.
• Tri-gram (or even quadrigram) models
• Word similarity methods e.g., using Googlenews vectors or WordNet?
• Combining n-gram methods with word similarity methods e.g., distributional smoothing? • Using a neural language model?
It does not matter how well your method(s) perform. However, your methods should be clearly described, any hyper-parameters (either fixed, varied or optimised) should be discussed and there should be a clear comparison of the approaches with each other and the unigram and bigram baselines – both from a practical and empirical perspective.
You have been provided with the training and test data for this task in the labs. You may (and are expected to) use any of the code that you have developed throughout the labs. This includes code provided to you in the exercises or solutions. You may use any other resources to which you have access. You are encouraged to make use of one or more of WordNet, the Lin dependency thesaurus provided in NLTK and/or Word2Vec word embeddings (all discussed in earlier labs). You may also download other resources from the Internet and make use of any Python libraries that you are familiar with.
Your report should be in the style of an academic paper. It should include an introduction to the problem and the methods you have implemented. You should discuss the hyper-parameter settings – both those which you have decided to fix and any which you are investigating. You should discuss and justify the method of evaluation. You should provide your results and compare them with the unigram and bigram baselines. You should also provide some analysis of errors – do the approaches make the same or different mistakes and can you comment on the types or causes of errors being made? You should end with your conclusions and areas for further work. You should also submit your code as an appendix. Your report (including figures and bibliography but not including code appendix) should be no longer
than 8 sides (3000 words of text plus figures and bibliography). Your code in the appendix should be clearly commented.
Marks will not be awarded simply for how well your system does or for programming wizardry. Marks will be awarded for clearly evaluating possible solutions to the sentence completion challenge.
2 Marking Criteria and Requirements
Table 1 shows the number of marks available for each requirement (Total = 60).
Requirement
Max mark
Interpretation
problem outline method
hyper-parameter settings evaluation
analysis
conclusion further work
academic style
code appendix
7 10
5
10
10
3 5
5
5
Does the introduction explain the task and the motivation for finding methods which do well at this task?
Is there a clear description of the proposed methods for tackling the task? Do the proposed methods seem sensible? Novel or more inter- esting methods may score highly here (if well-described) but methods will not necessarily gain more marks simply by being more ambitious. Within each proposed method, are there any hyper-parameter set- tings which are being fixed or explored? Are these clearly explained? Is the method of evaluation stated, explained and justified? Are results clearly presented (in a table and/or a graph!)?
Is there an analysis of errors of the methods? Are there particular types of question which one or both methods do badly at?
Is there a sensible conclusion?
Are there sensible suggestions for further work to do in this area. These might include improvements to the method, other methods or other applications of the method.
Is the report written in the style of a research paper? Are major points backed up with references? Is the report well-written and well-structured?
Is the code in the appendix clear and correct?
Table 1: Breakdown of marks
For each requirement, the following scale will be used when deciding the number of marks awarded.
85%-100% Outstanding. Demonstrates a thorough understanding and appreciation of the material without significant error or omission; evidence of extra study or creative thought
70%-84% Excellent. Demonstrates a thorough understanding and appreciation of the material producing work without significant error or omission
60%-69% Very good. Clear understanding demonstrated, substantially complete and correct. There may be minor gaps in knowledge/understanding. Evidence of independent thought
50%-59% Reasonable knowledge and understanding of basic issues demonstrated.
45%-49% Basic knowledge and understanding demonstrated with some appreciation of the issues involved. Gaps in knowledge and understanding; confusion over more complex material.
40%-44% Significant issues neglected with little or no appreciation of the complexity of the problem. 20%-39% Some correct or relevant material but significant issues neglected / sig. errors or misconceptions
0%-19% Very little or nothing that is correct and relevant
References
Geoffrey Zweig and Christopher Burges. 2011. The microsoft research sentence completion challenge. Technical report, Microsoft Research, December.