Natural Language Engineering: Assessed Coursework

Natural Language Engineering: Assessed Coursework

Submission format: You should submit one file, and that file must be an iPython notebook.

Due date: Submit your iPython notebook on the module’s Study Direct site before 4pm on Wednesday 25th November. This is Wednesday of week 10. The standard late penalties apply.

Return date: Marks and feedback will be provided on Sussex Direct on Friday December 11th for all submissions that are submitted by the due date.

Weighting This assessment contributes 50% of the mark for the module.

Overview

For this assignment you are asked to write a report on the activities cov- ered in lab sessions 8 and 9. During these sessions you extended an opin- ion extractor function provided by us so that it deals with a variety of ways that opinions are expressed in Amazon DVD reviews. For this report you should create a single iPython notebook containing the following (further details are given below):

  1. the python code for your opinion extractor;
  2. a demonstration that your opinion extractor produces the correct output for the example sentences that we have given for each of the extensions;
  3. an assessment of the limitations of your extractor when applied to the sentences in the corpus of DVD amazon reviews that we have provided; and
  4. a proposal as to how you would go about creating a website that au- tomatically produces summaries of what people think about various aspects of a DVD.

    1

Marking Criteria and Requirements

Your submission will be marked out of 100, based on the following criteria.

Overall quality of report: 15 marks

This concerns issues such as writing style, organisation of material, clarify of presentation, etc.

  • Your report should be no more than 3000 words in length ex- cluding the content of graphs and tables and any references. You must specify the length of your report. This is a strict limit.
  • You must use a formal writing style.
  • All figures and tables should be clearly numbered, and have an

    appropriate caption.

  • All graphs should have each axis clearly labelled.
  • Use subsections with meaningful headings.
  • Do not add external text (e.g. code, output) as images, or if you absolutely must, make sure it is high-resolution.
  • There is no need to include the CONLL representation of the parsed sentences as it is equivalent to the tree representation, but harder to read.

    Quality of opinion extraction algorithm: 15 marks

    This concerns both the quality of the Python code implementation (coding style, appropriate use of comments and efficiency).

  • Details of the extensions to the basic opinion extractor are given in the notebook Session8.ipynb.
  • You are required to do extensions 1 to 5 only. Dealing with the examples given in Section Additional Extensions is optional.
  • Donotwriteseparateopinionextractorfunctionsforeachofthe different extension. It is important that all of the different exten- sions are integrated, so that a single opinion extractor function deals with all of the cases you are covering.
  • You should state explicitly whether or not you have not been successful in extending your opinion extractor in the ways re- quired.

    2

  • Do not forget to include comments where necessary. Split long functions into smaller coherent sub-functions.
  • YoushouldsubmitasingleiPythonNotebookthatincludeallof the Python code you have written or adapted, i.e. all code that you have written except code in the NLTK and Sussex NLTK packages.

    Validating opinion extractor on the examples test set: 20 marks

    This concerns how well you demonstrate that your opinion extrac- tor produces the correct output for the sentences in the examples test set. For each extension we have provided a number of exam- ples that illustrate what we are looking for. These examples can be found in the Examples Test Set. For each extension you need to show that your opinion extractor produces the correct output for those sen- tences that are used as examples in the description of that particular extension.

    Discussion Performance on Amazon DVD review sentences: 20 marks

    This concerns how well you describe the limitations of your opinion extractor.

  • Consider performance on examples involving all four aspects of the DVD: plot, characters, cinematography, and dialogue.
  • Illustrate your remarks with examples from the Amazon DVD review corpus.
  • In cases where an incorrect analysis has been made, make it clear whether the problem has arisen due to: (i) a deficiency in your opinion extractor algorithm (ii) errors made by the part- of-speech tagger, or (iii) errors made by the dependency parser.
  • When analysing your option extractor, provide a quantitative breakdown such as “I looked at 50 sentences, 40 were correct. Out of the other 10, 8 were due to parser errors and 2 were due to PoS tagger errors”.

    Technical understanding: 20 marks

    This concerns how well you have explained each of the methods that you have used in your experiments.

    • Youshouldexplainbothpartofspeechtagginganddependency parsing in sufficient detail that a well-educated computer sci-

    3

entist who does not know about these particular Natural Lan- guage Processing methods will be able to understand what you have done.

• Donotjustrepeatdetailsdirectlyfromthelecturenotesorother sources. Your explanations must be expressed in your own words and be relate to the specific context of this report.

Quality of proposal for DVD summary website: 10 marks

This concerns how well you describe the the opportunities and chal- lenges involved in creating such a website.

  • You should explain what you see as the motivation for having the a website.
  • You should explain what would be involved in creating the website.
  • You should give a realistic assessment as to how accurate the summaries on the website would be.
  • Focus on the backend and underlying NLP system. What prob- lems might you encounter? How could you approach them? How well is your website likely to work? Why would users be interested in your product? We are not interested in issues such as graphic design, implementation language, database backend, etc. but on what the service does and how it works.

4