KIT108 ARTIFICIAL INTELLIGENCE
Assignment
Due Friday, 31st May 2019
TASMANIA ABALONE RING PREDICTION
Description
In this assignment we will apply machine learning techniques learned in the lectures and tutorials to analyse the population biology of abalone in Tasmania. In particular, we will predict the age of abalone from physical measurements. By observation, the age of abalone can be estimated by the number of rings which can be seen by a microscope. In this assignment you will predict the number of rings given other attributes.
Generated by Akari
Name
—-
sex
length diameter height
whole weight shucked weight viscera weight shell weight
edible rings
Data Type
———
nominal
continuous
continuous
continuous
continuous
continuous
continuous
continuous
boolean.
integer
Meas. —–
mm
mm
mm
grams
grams
grams
grams
Description
———–
M, F, and I (infant) Longest shell measurement perpendicular to length with meat in shell
whole abalone
weight of meat
gut weight (after bleeding) after being dried
True, and False
+1.5 gives the age in years
Page 1
KIT108 Artificial Intelligence
Task 1: Data Collection – 10%
Identify irrelevant information from the data and remove it to clean the data.
Hint: Use Weka or Excel.
Task 2: Data Pre-processing – 10%
There are some missing values for height attribute and rings. Decide the way you handle this issue and explain why.
Hint: Use Weka or Excel.
Task 3: Data Transformation – 20%
We need to create a new attribute called volume from other attributes as: volume = length * diameter * height.
Normalise the data into [0-1] range.
Hint: Use excel or write program (if you know how to do it).
Task 4: Data Mining & Pattern Evaluation- 30%
4.1 Prepare your data from the to have:
– A training set of the first 2500 samples
– A validation set of the next 633 samples
– A test set of the last 1044 samples
4.2 Run 15 machine learning algorithms and report their accuracy on the validation set to a table.
4.3 Select the algorithm that give the highest accuracy in validation set and run the algorithm using the training set and test set. Report this result in test set.
There will be a table of top test result. Email me the results of test set at any time (screenshot of your results in Weka) to put your name to the rank.
4.4. Explain how the best algorithms work (in the report) Tips: How to improve performance?
– Handle the missing data issue effectively, use data normalisation, ury different techniques learned from lectures and select the ones that give top accuracy in validation
Task 5: Write a report -30%
Write a report using the following template.
Generated by Akari
Page 2 KIT108 Artificial Intelligence
ABALONE RINGS DETECTION USING AI
Student name-Student ID
Abstract— Briefly explain what you have done. I. INTRODUCTION
Introduce the problem you are solving, what is the impact? how you solved that using AI techniques?
II. DATA COLLECTION Explain what you did for Task 1
III. DATA PROCESSING Explain what you did for Task 2
IV. DATA TRANSFORMATION Explain what you did for Task 3
How & What to submit.
V. DATA MINING & PATTERN EVALUATION
Explain what you did for Task 4.
VI. VII. CONCLUSION
Summarize the report, what did you find (best data processing method, best learning algorithm methods, etc ..)
Reference:
[1]
Assignments will be submitted via MyLO (an Assignment submission will be created). Your submission will include:
• A compressed file containing: o A report
o The data sets as in task 4.1.
o The results for 5 best algorithms (screen captures of Weka).
Plagiarism and Academic misconduct
Plagiarism
Plagiarism is a form of cheating. It is taking and using someone else’s thoughts, writings or inventions and representing them as your own; for example, using an author’s words without putting them in quotation marks and citing the source, using an author’s ideas without proper acknowledgment and citation, copying another student’s work.
If you have any doubts about how to refer to the work of others in your assignments, please consult your lecturer or tutor for relevant referencing guidelines. You may also find the Academic Honesty site on MyLO of assistance.
Generated by Akari
Page 3 KIT108 Artificial Intelligence
The intentional copying of someone else’s work as one’s own is a serious offence punishable by penalties that may range from a fine or deduction/cancellation of marks and, in the most serious of cases, to exclusion from a unit, a course or the University.
The University and any persons authorised by the University may submit your assessable works to a plagiarism checking service, to obtain a report on possible instances of plagiarism. Assessable works may also be included in a reference database. It is a condition of this arrangement that the original author’s permission is required before a work within the database can be viewed.
For further information on this statement and general referencing guidelines, see the Plagiarism and Academic Integrity page on the University web site or the Academic Honesty site on MyLO.
Academic misconduct includes cheating, plagiarism, allowing another student to copy work for an assignment or an examination, and any other conduct by which a student:
a. seeks to gain, for themselves or for any other person, any academic advantage or advancement to which they or that other person are not entitled; or
b. improperly disadvantages any other student.
Students engaging in any form of academic misconduct may be dealt with under the Ordinance of Student Discipline, and this can include imposition of penalties that range from a deduction/cancellation of marks to exclusion from a unit or the University. Details of penalties that can be imposed are available in Ordinance 9: Student Discipline – Part 3 Academic Misconduct.
Generated by Akari
Page 4 KIT108 Artificial Intelligence
KIT108 ARTIFICIAL INTELLIGENCE: MAJOR ASSIGNMENT
Synopsis of the task and its context
This is an individual assignment making up 20% of the overall unit assessment. The assessment criteria for this task are: 1) Apply machine learning pipeline to solve a real-world problem (Biology of Tasmania Abalone).
a) Identify relevant data
b) Process and clean data
c) Transform data (making new attribute and normalise data)
d) Apply machine learning techniques to predict abalones’ rings.
2) Writing a scientific report (1.5-2 pages A4, double column)
a) Understand the impact of this work.
b) Analysis of the results.
c) Identify the best technique for this problem and understand how it works.
Match between learning outcomes and criteria for the task:
Unit learning outcomes
On successful completion of this unit…
Task criteria:
1. understand the local and global impact of AI on individuals, organizations, and society
2
2. adapt and apply techniques for acquiring, representing, and reasoning with data, information, and knowledge
1
Generated by Akari
Page 5
KIT108 Artificial Intelligence
3. select and effectively apply techniques to develop simple AI solutions
1
4. analyze a problem, apply knowledge of AI principles, and use ICT technical skills to develop potential solutions
1, 2
5. evaluate strengths and weaknesses of potential AI solutions
1, 2
Generated by Akari
Page 6 KIT108 Artificial Intelligence
Criteria
HD (High Distinction)
DN (Distinction)
CR (Credit)
PP (Pass)
NN (Fail)
1. Machine learning pipeline(70%)
a) Data collection
Excellent choice of irrelevant attribute(s) and be able to perform removal of that attribute(s) from the original dataset.
Can identify the irrelevant attribute(s) but also remove 1 other relevant attribute.
Can identify the irrelevant attribute(s) but also remove more than 1 relevant attribute(s).
Wrongly identify the irrelevant attribute(s).
Cannot identify any irrelevant attributes.
b) Data processing (10%)
Identify and apply relevant data processing techniques for both height and ring attributes and do not lose any data samples.
Apply processing techniques but lose some data samples.
Apply data processing techniques only one of two attributes.
Attempted but apply incorrect data processing techniques.
Cannot identify and apply any data processing techniques.
c) Data transformation
Can create new attribute volume and normalise data attributes to [0-1] range (except rings).
Can create new attribute and normalise some (not all, except rings) attributes.
Cannot normalise data but manage to create new attribute.
Cannot create new attribute but managed to normalise data.
Cannot create new attribute and cannot normalise data.
d) Data Mining and Pattern evaluation
Can divide the data, create ARFF files and run all 15 different machine learning techniques and report the accuracy. The best accuracy is not less than 10% of the best one from all students.
Can divide the data, create ARFF files and run all 15 different machine learning techniques and report the accuracy. The best accuracy is less than 10% of the best one from all students.
Can divide the data, create ARFF files and run more than 1 (but not all 15) different machine learning techniques and report the accuracy.
Can divide the data, create ARFF files and run 1 machine learning algorithm.
Cannot divide the data and create ARFF files.
2. Scientific Report(30%)
a) Understand the impact of the work
Good introduction. Explain clearly how machine learning can help in the case of biology of Tasmania Abalone.
Provided a dataset or use cases for training evaluation that has the potential to establish the likely success or failure of the proposed method
Provided a dataset or use cases for training evaluation
Provided limited data for training/testing
Generated by Akari
Page 7 KIT108 Artificial Intelligence
b) Analysis of the results
Report the results in a table. Provide analysis over the results, i.e. explain the effect of data selection, data processing, and data transformation step on the results.
Report the results in a table. Explain the effect of some steps (among data selection, data processing and data transformation) on the results.
Report the results and give just general explanation
Just report the results.
Fail to report of the results and there is no analysis.
c) Explain the best ML technique.
Show his/her understanding of the selected ML technique which give the best results. Explain the advantages of this ML technique which contribute the best performance.
Show his/her understanding of the selected ML technique which give the best results.
Provide general description of how the selected ML technique works.
Provide general description of how the selected ML technique works with some errors.
Cannot describe the selected ML algorithm.
Generated by Akari
Page 8 KIT108 Artificial Intelligence