INTRODUCTION
This assignment contributes 50% of the overall module mark for AINT252 and is an individual assignment. Both the report and code must be submitted to the DLE by the specified submission dates.
The coursework assesses the following assessed learning outcomes:
ALO2. Compare computing and artificial intelligence paradigms and evaluate the appropriateness of a particular computing paradigm for specific application domains.
ALO3. Choose and apply appropriate computation theory and artificial intelligence methods to a chosen sample domain.
The coursework has two parts – one is a machine learning exercise and the second is about evolutionary computation. You must complete and submit both parts. Each part is worth 50% of the coursework mark. A Jupyter notebook has been placed on the DLE for you to use. You should download it, and either use it locally or in the Azure cloud environment we have used during the lab sessions.
PART 1 – MACHINE LEARNING
You have been provided with a dataset relating to wine quality. The dataset contains data relating to different types of wine, each of which has been given a quality score. There are 4,898 types of wine. The data has 11 inputs, which describe the composition of each type of wine. Your task is to implement a program that enables the regression of the quality value of each type of wine according to the 11 inputs. You must complete the following tasks.
Task 1.1 – Data preparation (10% of total mark)
The first phase of the work requires you to load the data you have been provided with into your Python program. Before the data can be used to train and test the classifier you must first prepare it – this means that the inputs must be normalised. There is no missing data in the dataset.
Task 1.2 – Regression (20% of total mark)
Having prepared the data you must now build a regression model that can predict new points. Use the following regression implementations within the scikit-learn package to construct classifiers for the dataset, all of which follow the same approach to training and prediction:
• Random Forest (sklearn.ensemble.RandomForestRegressor)
• Neural Network (sklearn.neural_network.MLPRegressor)
• Support Vector Machine (sklearn.svm.SVR)
You must demonstrate that the models are capable of providing a predicted quality value for a given input (e.g. a type of wine).
Phase 1.3 – Assessment of classification (20% of total mark)
The regression models you have used in the previous section must be assessed. To do this, you are required to assess the mean absolute error rate for each model. You may use the MAE implementation available in scikit-learn to do this. It is not sufficient to report a single MAE rate. You must use cross validation to report training results and test results and report these values using a boxplots.
You must also produce scatter plots of the target/model output results. There should be a plot for each regression model.
PART 2 – EVOLUTIONARY COMPUTATION
The second part of this assignment requires you to construct an evolutionary algorithm that can optimise single-objective optimisation problems. The problems are as follows:
Problem
Formulation
Sphere
Rosenbrock
Rastrigin
Each solution should have D = 10 continuous decision variables that can take any value (you are recommended to start with random values between -1 and 1 for all three problems).
Task 2.1 – Generation of random solutions (10%)
When evaluating a MOEA it is standard to compare against random. Generate 500 random solutions to the problem and plot their fitness values using Matplotlib
Task 2.2 – Algorithm implementation (25%)
You should implement a population-based evolutionary algorithm as described in the lectures. Your algorithm must have the following features:
• A crossover operator that performs uniform crossover.
• A mutation operator that performs an additive Gaussian mutation.
• A selection operator that combines a generation’s parent and child populations and identified the parent solutions for the next generation.
Task 2.4 – Visualisation of results (15%)
Modify your optimiser to record the average fitness at each generation. Then, after your optimiser has run, produce a plot showing the change in average fitness over the runtime of the algorithm. Your visualization code must be separate to the optimiser.
COURSEWORK DELIVERABLES
A Jupyter notebook has been provided on the DLE for you to use for this coursework. You should implement your code in it, and submit it to the DLE ahead of the deadline specified in the module calendar earlier in this handbook. Please indicate which task each section of the notebook refers to using a Markdown cell.
Please consider submitting your files from a computer on the University campus as submitting files of this size over an unreliable connection can be problematic. Please check your submitted files are correct by downloading them again and checking that they work. You will receive a confirmation receipt by email when your work has been properly submitted – if you do not receive this email then your work has not been submitted.
PLAGIARISM AND ACADEMIC OFFENCES
You are expected to write your own code. Discussing ideas with other people is acceptable, but getting help to write your code is not acceptable. Failure to demonstrate a good understanding of your own code during the demonstration may lead to a significantly reduced mark on this assignment, or, in the worst case, action taken against you under the University’s regulations on examination and assessment offences. Your attention is drawn to these regulations, available at https://www1.plymouth.ac.uk/essentialinfo/regulations/Pages/Plagiarism.aspx.
If you submit code files or libraries that you have not written yourself you must make this clear in your submission. Submitting code that you have not written yourself without declaring it clearly in the report may lead to action taken against you under the University’s regulations on examination and assessment offences.
ASSESSMENT AND MARKING
Each task will be assessed according to the following mark scheme:
• Quality of completed task (60%): To what extent does the code fully implement the requirements specified? Are there elements of the task that have been omitted? Does it run with errors? How well are the results presented? Are visualisations of data clear and properly constructed?
• Efficiency of implementation (40%): Have you provided an efficient implementation? Is the code well structured?
Q3. Regular Expressions
For each of the following examples you should write a Regular Expression and the corresponding Finite State Automaton. For each of the states, you should identify:
• the set of accepting states;
• the transition table;
• the set of states;
• and the alphabet the automaton uses.
You are required to produce Regular Expressions and a Finite State Automaton for:
• An IP address.
• A UK national insurance number (the correct format is described here: https://www.gov.uk/hmrc-internal-manuals/national-insurance-manual/nim39110).
You should submit your Python source code for the the Regular Expressions (include this in your PDF report) alongside the Finite State Automata. You must include some examples of the Regular Expression working with test data.