Assignment 2: Naive Bayes [20 marks]¶
Student Name:
Copyright By PowCoder代写 加微信 powcoder
Student ID:
General info¶
Due date: Friday, 2 September 2022, 5pm
Submission method: Canvas submission
Submission materials: completed copy of this iPython notebook
Late submissions: -10% per day up to 5 days (both weekdays and weekends count)
one day late, -2.0;
two days late, -4.0;
three days late, -6.0;
four days late, -8.0;
five days late, -10.0;
Marks: 20% of mark for class.
Materials: See Using Jupyter Notebook and Python page on Canvas (under Modules> Coding Resources) for information on the basic setup required for this class, including an iPython notebook viewer.If your iPython notebook doesn’t run on the marker’s machine, you will lose marks. You should use Python 3.
Evaluation: Your iPython notebook should run end-to-end without any errors in a reasonable amount of time, and you must follow all instructions provided below, including specific implementation requirements and instructions for what needs to be printed (please avoid printing output we don’t ask for). You should implement functions for the skeletons listed below. You may implement any number of additional (helper) functions. You should leave the output from running your code in the iPython notebook you submit, to assist with marking. The amount each section is worth is given in parenthesis after the instructions.
You will be marked not only on the correctness of your methods, but also the quality and efficiency of your code: in particular, you should be careful to use Python built-in functions and operators when appropriate and pick descriptive variable names that adhere to Python style requirements. If you think it might be unclear what you are doing, you should comment your code to help the marker make sense of it. We reserve the right to deduct up to 4 marks for unreadable or excessively inefficient code.
7 of the marks available for this Project will be assigned to whether the five specified Python functions work in a manner consistent with the materials from COMP90049. Any other implementation will not be directly assessed (except insofar as it is required to make these five functions work correctly).
13 of the marks will be assigned to your responses to the questions, in terms of both accuracy and insightfulness. We will be looking for evidence that you have an implementation that allows you to explore the problem, but also that you have thought deeply about the data and the behaviour of the Naive Bayes classifier.
Updates: Any major changes to the assignment will be announced via Canvas. Minor changes and clarifications will be announced on the discussion board (Piazza -> Assignments -> A2); we recommend you check it regularly.
Academic misconduct: While you may discuss this homework in general terms with other students, it ultimately is still an individual task. Reuse of code or other instances of clear influence will be considered cheating. Please check the CIS Academic Honesty training for more information. We will be checking submissions for originality and will invoke the University’s Academic Misconduct policy where inappropriate levels of collusion or plagiarism are deemed to have taken place.
Please carefully read and fill out the Authorship Declaration form at the bottom of the page. Failure to fill out this form results in the following deductions:
missing Authorship Declaration at the bottom of the page, -10.0
incomplete or unsigned Authorship Declaration at the bottom of the page, -5.0
Part 1: Base code [7 marks]¶
Instructions
Do not shuffle the data set
Treat the features as nominal and use them as provided (e.g., do not convert them to other feature types, such as numeric ones). Implement a Naive Bayes classifier with appropriate likelihood function for the data.
You should implement the Naive Bayes classifier from scratch. Do not use existing implementations/learning algorithms. You must use epsilon smoothing strategy as discussed in the Naive Bayes lecture.
Apart from the instructions in point 3, you may use libraries to help you with data reading, representation, maths or evaluation.
Ensure that all and only required information is printed, as indicated in the final three code cells. Failure to adhere to print the required information will result in [-1 mark] per case. (We don’t mind details like you print a list or several numbers — just make sure the information is displayed so that it’s easily accessible)
Please place the jupyter notebook into the same folder as the input data.
# This function should open a csv file and read the data into a useable format [0.5 mark]
def preprocess(filename):
# This function should build a supervised NB model [3 marks]
def train():
# This function should predict the class for a set of instances, based on a trained model [1.5 marks]
def predict():
# This function should evaluate a set of predictions [1 mark]
def evaluate():
Bank Marketing¶
# This cell should act as your “main” function where you call the above functions
# on the full Bank Marketing data set, and print the evaluation score. [0.33 marks]
# First, read in the data and apply your NB model to the Bank Marketing data
# Second, print the full evaluation results from the evaluate() function
# Third, print data statistics and model predictions, as instructed below
# N is the total number of instances, F the total number of features, L the total number of labels
# The “class probabilities” may be unnormalized
print(“Feature vectors of instances [0, 1, 2]: “, )
print(“\nNumber of instances (N): “, )
print(“Number of features (F): “, )
print(“Number of labels (L): “, )
print(“\n\nPredicted class probabilities for instance N-3: “, )
print(“Predicted class for instance N-3: “, )
print(“\nPredicted class probabilities for instance N-2: “, )
print(“Predicted class for instance N-2: “, )
print(“\nPredicted class probabilities for instance N-1: “, )
print(“Predicted class for instance N-1: “, )
# This cell should act as your “main” function where you call the above functions
# on the full Student data set, and print the evaluation score. [0.33 marks]
# First, read in the data and apply your NB model to the Student data
# Second, print the full evaluation results from the evaluate() function
# Third, print data statistics and model predictions, as instructed below
# N is the total number of instances, F the total number of features, L the total number of labels
# The “class probabilities” may be unnormalized
print(“Feature vectors of instances [0, 1, 2]: “, )
print(“\nNumber of instances (N): “, )
print(“Number of features (F): “, )
print(“Number of labels (L): “, )
print(“\n\nPredicted class probabilities for instance N-3: “, )
print(“Predicted class for instance N-3: “, )
print(“\nPredicted class probabilities for instance N-2: “, )
print(“Predicted class for instance N-2: “, )
print(“\nPredicted class probabilities for instance N-1: “, )
print(“Predicted class for instance N-1: “, )
# This cell should act as your “main” function where you call the above functions
# on the full Obesity data set, and print the evaluation score. [0.33 marks]
# First, read in the data and apply your NB model to the Obesity data
# Second, print the full evaluation results from the evaluate() function
# Third, print data statistics and model predictions, as instructed below
# N is the total number of instances, F the total number of features, L the total number of labels
# The “class probabilities” may be unnormalized
print(“Feature vectors of instances [0, 1, 2]: “, )
print(“\nNumber of instances (N): “, )
print(“Number of features (F): “, )
print(“Number of labels (L): “, )
print(“\n\nPredicted class probabilities for instance N-3: “, )
print(“Predicted class for instance N-3: “, )
print(“\nPredicted class probabilities for instance N-2: “, )
print(“Predicted class for instance N-2: “, )
print(“\nPredicted class probabilities for instance N-1: “, )
print(“Predicted class for instance N-1: “, )
Part 2: Conceptual questions [13 marks]¶
Question 1: One-R Baseline [3 marks]¶
# Write additional code here, if necessary (you may insert additional code cells)
# You should implement the One-R classifier from scratch. Do not use existing implementations/learning algorithms.
# Print the feature name and its corresponding error rate that One-R selects, in addition to any evaluation scores.
Provide your text answer to Question 1.b of 100-150 words in this cell.
Question 2: Evaluation strategy [3 marks]¶
# Write additional code here, if necessary (you may insert additional code cells)
Provide your text answer to Question 2 100-150 words in this cell.
Question 3: Feature Selection and Naive Bayes Assumptions [3 marks]¶
# Write additional code here, if necessary (you may insert additional code cells)
Provide your text answer to Question 3.a of 100-150 words in this cell.
Provide your text answer to Question 3.b of 100-150 words in this cell.
Question 4: Feature Selection and Ethics [4 marks]¶
# Write additional code here, if necessary (you may insert additional code cells)
Provide your text answer to Question 4.a of 100-150 words in this cell.
Provide your text answer to Question 4.b of 100-150 words in this cell.
Provide your text answer to Question 4.c of 100-150 words in this cell.
Authorship Declaration:
(1) I certify that the program contained in this submission is completely
my own individual work, except where explicitly noted by comments that
provide details otherwise. I understand that work that has been developed
by another student, or by me in collaboration with other students,
or by non-students as a result of request, solicitation, or payment,
may not be submitted for assessment in this subject. I understand that
submitting for assessment work developed by or in collaboration with
other students or non-students constitutes Academic Misconduct, and
may be penalized by mark deductions, or by other penalties determined
via the University of Melbourne Academic Honesty Policy, as described
at https://academicintegrity.unimelb.edu.au.
(2) I also certify that I have not provided a copy of this work in either
softcopy or hardcopy or any other form to any other student, and nor will
I do so until after the marks are released. I understand that providing
my work to other students, regardless of my intention or any undertakings
made to me by that other student, is also Academic Misconduct.
(3) I further understand that providing a copy of the assignment
specification to any form of code authoring or assignment tutoring
service, or drawing the attention of others to such services and code
that may have been made available via such a service, may be regarded
as Student General Misconduct (interfering with the teaching activities
of the University and/or inciting others to commit Academic Misconduct).
I understand that an allegation of Student General Misconduct may arise
regardless of whether or not I personally make use of such solutions
or sought benefit from such actions.
Signed by: [Enter your full name and student number here before submission]
Dated: [Enter the date that you “signed” the declaration]
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com