DATA MINING AND MACHINE LEARNING (NEU) ASSIGNMENT 1
DUE ON 23:59 3 AUG, 2018 (SAT)
Q1. (20 marks)
In Gaussian elimination, we perform three types of elementary row operations on a
matrix M:
• Type 1: Swap the positions of two rows.
• Type 2: Multiply a row by a nonzero scalar.
• Type 3: Add to one row a scalar multiple of another.
Each of the above operation can be implemented as a matrix multiplication MO. Write out the three O matrices. For concreteness, you can assume M is a 5-by-5 matrix and fix any parameter of the operation (e.g., for Type 1, you may assume you want to swap the first row and the second row).
Q2. (40 marks)
Consider the following training dataset with three features (A, B, and C), and one class
variable (Class).
A B C Class
TTT Y FFTY TFTY FFTN FFTN TTF N
Answer the following questions. You need to show your steps.
(1) Which attribute will be selected as the first splitting attribute in a decision tree based on the gini index?
(2) Consider the test data (F, T, F). Predict its class using Naive Bayes classifier with add-1 smoothing.
Q3. (20 marks)
Consider applying logistic regression on the following data set:
1
2
(1) (2)
Play (1)
(2)
DUE ON 23:59 3 AUG, 2018 (SAT)
What’s the likelihood of the data set for w0 = −5, w1 = 4, and w2 = 1?
Is it possible that this w = w0 w1 w2⊤ is what the Logistic Regression classifier
finally uses after learning has finished? Justify your answer.
Q4. (20 marks)
several rounds of the Akinator game at http://en.akinator.com/.1
It is not uncommon that users may give completely or partially wrong answers during a game. Assume the site maintains a large table, where each row is about a person, and each column is a Boolean-type question, and each cell value is the correct answer (“Yes” or “No”), and that the core algorithm the site uses is a decision tree. To accommodate possible errors, let’s assume the site allows up to one error in a game. That is, a person will still be a candidate if at most one question answer the user provided does not match the correct answer in the data table. Now describe how you will modify the ID3 decision tree construction algorithm to build a decision tree for the site while allowing up to one error in a game.
Assume that you do not think the site uses decision trees as the backbone algo- rithm. What are the reason(s) to support this conjecture? You may list more than one reason. If you design some experiments and will refer to them, please include the setup and the details of the experiments.
Submission
x1
x2
y
1 2 1
5
4 −1
0 1 0
Please write down your answers in a file named ass1_NAME.pdf, where NAME is your full name in Pinyin (Last name + first name).
Please see the course homepage for the submission instruction.
Late Penalty. -10% per day for the first two days, and -20% for each of the following days.
1You need to use a VPN to access the website.