COMP8410 Data Mining 2022 Assignment 2
Maximum marks Weight
Submission deadline
Submission mode Estimated time Penalty for lateness First posted:
Copyright By PowCoder代写 加微信 powcoder
Last modified: Questions to:
25% of the total marks for the course
Maximum of 10 pages excluding cover sheet, bibliography and appendices.
A4 margin, at least 11 point type size, use of typeface, margins and headings consistent with a professional style.
9:00am, Monday, 9 , via Wattle
100% after the deadline has passed 29th March, 10 AM
29th March, 10 AM
Wattle Discussion Forum
This assignment specification may be updated to reflect clarifications and modifications after it is first issued. It is strongly suggested that you start working on the assignment right away. You can submit as many times as you like. Only the most recent submission at the due date will be assessed.
In this assignment, you are required to submit a single report in the form of a PDF file. You may also attach supporting information (appendices) as one or more identified sections at the end of the same PDF file. Appendices will not be marked but may be treated as supporting information to your report. Please use a cover sheet at the front that identifies you as author of the work using your u- number and name and identifies this as your submission for COMP8410 Assignment 2. The cover sheet and appendices do not contribute to the page limit or word count.
You are expected to write in a style appropriate to a professional report. You may refer to http://www.anu.edu.au/students/learning-development/writing-assessment/report-writing for some stylistic advice. You are expected to have both an introduction and a conclusion in your report.
No particular layout is specified, but you should follow use no smaller than 11 point typeface and stay within the maximum specified page count. Page margins, heading sizes, paragraph breaks and so forth are not specified but a professional style must be maintained. Text beyond the page limit will be treated as non-existent.
This is a single-person assignment and should be completed on your own. Make certain you carefully reference all the material that you use, although the nature of this assignment suggests few references will be needed. It is unacceptable to cut and paste another author’s work and pass it off
as your own. Anyone found doing this, from whatever source, will get a mark of zero for the assignment and, in addition, CECS procedures for plagiarism will apply.
No particular referencing style is required. However, you are expected to reference conventionally, conveniently, and consistently. References are not included in the page limit. Due to the context in which this assignment is placed, you may refer to the course notes or course software where appropriate (e.g. “For this experiment Rattle was used”), without formal reference to original sources, unless you copy text which always requires a formal reference to the source.
An assessment rubric is provided. The rubric will be used to mark your assignment. You are advised to use it to supplement your understanding of what is expected for the assignment and to direct your effort towards the most rewarding parts of the work.
Your assignment submission will be treated confidentially. It will be available to ANU staff involved in the course for the purposes of marking. It may be shared, de-identified, as an exemplar for other students.
You are to study the supplied data set and to apply data mining processes and techniques to discover interesting things about the data. You are to write a short report that justifies and explains your methods in detail, presents your results, and evaluates and interprets the results you find. In the following, the task is described in terms of what your report should contain, not in terms of the steps you should take to carry out the assignment. In your report, similarly, you should describe the methods used in terms of the language of data mining, not in the terms of commands you typed or buttons you selected.
1. Introduce the problem
You must provide some context to the data mining project you are working on. You could properly refer to the purpose of learning and assessment for COMP8410, but in addition you should set some goals for the exercise – what do you expect to learn from the data? What are you looking for? It is possible that you may not achieve the goals you set here, but it should be possible to trace the results you present back to the goals as motivating questions. Furthermore, you should review the goals you state here in your conclusion.
2. Describe your data
• identify the source of the data and the population over which the data is sampled,
• broadly describe the attributes in the data,
• offer a cursory assessment of data quality, and
• include a basic statistical summary of the data you have.
This should comprise a brief description of the data necessary to explain the context for the work presented here in a self-contained way, although for more detail it might refer to information provided with this assignment specification or elsewhere.
3. Describe your methods
You are encouraged to use Rattle or R for this assignment. You may use external tools instead for
part or all of the work (e.g. you might prefer to use python for data pre-processing). Use of 2
alternative tools may make your explanations of methods more wordy, your methods more difficult to reproduce, and your assignment harder to mark, so take this into account. You will not be awarded marks for methods where your method cannot be understood.
You must use at least two clearly distinct data mining methods as taught in this course. The distinct methods should be diverse regarding: i. different problems like classification, prediction, etc. and ii. different algorithms like NN, DT, etc. You may additionally use multiple other methods taught in this course. Further, you may choose to use some methods not addressed in this course. You must justify your choice of methods with reference to the data types involved, the questions you are looking to answer, the benefit of application to practice, computational feasibility, experimentation experience, or other reasons.
Application of some methods, or addressing particular questions, may require you to pre-process the data in some way. For example, if you are looking to predict outcomes independently of time or political preference, you could consider removing time and political preference attributes from the dataset. You must include either a statement that no such processing was performed or else brief information on any
• removal of provided data from consideration,
• imputation or other transformation, or
• differences in the basic data summary from that you prepared from the original
Data pre-processing can be a never-ending task. Be careful to exercise your judgement on how much you do here, taking account of the marking rubric.
Your description must be sufficient for a reasonably competent professional in the field to reproduce your major results. You may choose to attach detailed specifications or configuration parameters as an appendix (which does not contribute to the word count). If you are using methods that were not taught in the course it would normally be necessary to provide extra detail over that that can be assumed for methods taught in the course. Extended technical detail may be included in an appendix or by well-chosen references that contain enough information to implement the technique.
4. Present your results
You must explain what you found. This should not be a complete listing of everything you found. You should select results that are interesting, surprising, explanatory, answer your initial questions, or are otherwise meaningful, and explain why they are meaningful. Your selected results must be supported by appropriate formal quality measures and must be interpreted within the context of the problem context you gave. Your interpretation must be pitched towards an expert in the field related to the data source and business problem but who may not be an expert in data mining. You might consider using diagrams to assist but use your judgement about any added value of diagrams.
5. Conclude with opportunities for application of your results and identification of further work
Here you should write about the significance of your results and the challenge (or not) of using the results to make changes in the practice for which your data was collected. This analysis should be made in the context of the goals you set in your introduction, and you can afford to speculate about possible impacts of what you found.
You are not expected to be an expert in the area of application, nor to solve challenges you might raise with putting your results into practice. Identifying further work may include identifying
additional data that could be used to refine the results you found, or alternative methods that should be tried with additional resources.
Assessment Rubric
This rubric will be used to mark your assignment. You are advised to use it to supplement your understanding of what is expected for the assignment and to direct your effort towards the most rewarding parts of the work. Your assignment will be marked out of 100, and marks will be scaled back to contribute to the defined weighting for assessment of the course.
Review Max Exemplary Criteria Mark
Good Acceptable
Unsatisfactory
Overall holistic evaluation of the report
Communication, Structure and Presentation
Highly original and very interesting.
Excellent, detailed and relevant discussion that develops and enhances the reader’s understanding of the topic.
Very clear key message and closely associated conclusion.
Interesting with some originality.
Relevant discussion of sufficient detail to allow the reader to develop a clear understanding of the topic.
Identifiable key message and related conclusion.
Interesting but lacking originality.
Although mostly relevant, discussion sometimes lacks sufficient detail to allow the reader to develop a consistent understanding of the topic.
Apparent key message and associated conclusion.
Not very interesting or original.
Discussion is not always relevant nor sufficiently detailed to enable the reader to develop an understanding of the topic.
Difficult to be certain what the key message is and how the conclusion relates to it
Boring and mundane.
Discussion lacks detail, is mostly irrelevant and doesn’t help the reader to develop an understanding of the topic.
No discernible key message or conclusion.
Exemplary use of language enhancing the quality of the submission.
Very well ordered with logical and clear structure supported by appropriate headings and sub headings.
All use of others’ ideas and materials acknowledged. References are all included and are formatted
Very good use of language.
Well-ordered and logical. Headings and sub-headings help to clarify text.
All use of others’ ideas and material is acknowledged. All references are included, though some minor inconsistency of in-text citation or formatting.
Reasonable but needs some revision.
Mostly well-ordered and logical, most supported by headings and sub-headings
All use of others’ ideas and material is acknowledged. Some references are missing and occasional inconsistencies of in-text citation and formatting.
Poor, needs significant revision.
Order is not always logical and is sometimes confusing. Headings are largely those suggested by the assignment specification and the questions posed.
All use of other’s ideas and material is acknowledged, though sometimes
Very difficult to understand.
Order is confusing and not always logical. Headings and sub-headings do little to help clarify the text
Not all use of other’s ideas and material is acknowledged. Missing in- text citations, i.e. plagiarism. References in the bibliography not used in the
Review Max Exemplary Excellent Criteria Mark
Good Acceptable Unsatisfactory
Problem Description
Data Description
consistently and appropriately.
Diagrams and/or images are ideally suited to the points where they are used.
Diagrams and/or images are used effectively.
Diagrams and/or images improve readability.
inconsistently. Missing references and inconsistent in-text citation and formatting.
Diagrams and/or images are not well selected.
text. Poorly and inconsistently formatted.
Diagrams and/or images detract from the key messages.
Goals are clear, challenging, and suitable for the data used.
Wider context of goals is discussed (e.g. expected impact or importance).
The problem description provides context for the data mining that is connected and used in the rest of the work.
Problem description is clear and suitable for the data used.
The problem description provides context for the data mining that is connected and used in the rest of the work.
The problem description provides adequate context for the mining work although some key elements could be expanded to support richer analytical work.
Problem description is barely adequate for the purpose.
Problem description does not connect tightly with the work performed.
Key elements of the problem description are missing or insufficiently explained.
Source, attributes, population, quality assessment and basic statistical summary provided.
Description demonstrates deep understanding of the data.
Source, attributes, population, quality, and basic statistical summary provided.
Data interpreted in terms of challenges for mining or suitability for the problem goals.
Most of the source, attributes, population, quality, and basic statistical summary provided.
Some of the required information provided and correct.
Required information not provided and/or incorrect or misleading, demonstrating lack of engagement with the data.
Review Max Exemplary Excellent Criteria Mark
Good Acceptable Unsatisfactory
Method description
At least 2 course methods applied, plus at least one more method which may or may not be taught in the course.
R, Rattle and other tools have been properly identified and used appropriately.
Reproduction of major results is possible from description of methods.
Data pre-processing is well- suited to the methods used and the mining goals, with justification (or no-pre- processing, with convincing justification).
Careful parameter setting and tuning explained and justified by experimentation or theory or both.
Justification for methods chosen demonstrates careful attention to the applicability and limitations of the methods to the problem goals.
At least 2 course methods applied.
R, Rattle and other tools have been properly identified and used appropriately.
Reproduction of major results is possible from description of methods.
Extensive, directed, experimentation with data preprocessing or tuning parameters explained.
Justification for methods chosen is clear and linked to problem goals.
At least 2 course methods applied.
R, Rattle and other tools have been properly identified and used appropriately.
Reproduction of major results is possible from description of methods.
Some experimentation with data preprocessing or tuning parameters evident.
Justification for methods chosen is clear.
At least 2 course methods applied.
Not always clear what software tools were used. Unclear that reproduction of results is possible from description of methods.
Experimentation with data pre-processing or tuning parameters barely evident, suggesting a simplistic approach to the problem.
Weak justification for methods chosen.
Less than 2 course methods applied.
Not clear what software tools were used
Methods not described in adequate detail for reproduction.
Justification for methods chosen absent or unconvincing.
Review Max Exemplary Excellent Criteria Mark
Good Acceptable Unsatisfactory
Conclusion and further work
Outstandingly useful and potentially actionable results found.
Results are clearly interpreted for a domain expert.
Results supported by well- selected quality measures, explained in terms of impact for domain expert.
Results presented are well selected for significance to the mining goals.
Results are interpreted for domain expert.
Results are well supported by selected quality measures that are explained for domain expert.
Major results are clearly presented, with quality measures present but overall interpretation for domain experts could be sharper.
Results are clearly presented, with typical quality measures present.
Scant attention to evaluation appropriate to the methods used.
Thoughtful analysis of how results could be applied, including identifying challenges.
Ideas for further work are creative, relevant and exciting, and tied to application context.
Analysis of potential application of results ties to problem goals and recognises the application context.
Ideas for further work are significant and realistic.
Statement of how results could be applied to the domain is realistic in the context of the problem goals.
Ideas for further work are present but could be better tied to the problem and application domain.
Statement of how results could be applied to the domain is given.
Possible further work identified.
Missing analysis of application or extension.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com