ELEC2103/9103 Assignment 2015 Table of Contents
Student details …………………………………………………………………………………………………….. 1 Abstract …………………………………………………………………………………………………………….. 1 Load Data ………………………………………………………………………………………………………….. 1 Classification Tree Method ……………………………………………………………………………………… 2 Naive Bayes method ……………………………………………………………………………………………… 2 Neural Network Method …………………………………………………………………………………………. 3 Summary …………………………………………………………………………………………………………… 7 University of Sydney statement concerning plagiarism ……………………………………………………… 8
Put your assignment title in this comment
Student details
NAME = ‘Your name’;
SID = ‘Your SID’;
DATE = ‘The date’;
Abstract
I use Customer Household Data which have details of individual households. I use this data to do income level classification. I will use the ASSRTD_HHOLD_INCOME_GROUP_CD as target fea- ture which has three classes ‘LOW’, ‘MED’, ‘HI’ represented by 1, 2 and 3 in this program. I use oth- er features such as ASSRTD_GAS_USAGE_GROUP_CD, ASSRTD_ELECTRICITY_USE_GRP_CD, DRYER_USAGE_CD, HAS_GENERATION to predict this feature.
I used three different methods to do the classification. They are classification tree , naive Bayes and neural network.
Load Data
read the data and split to training and testing parts I have convert the category string to integer values, for example, ‘LOW’, ‘MED’, ‘HI’ to 1, 2 and 3. The converted data stored in out.csv.
% load the data
d = csvread(‘out.csv’ ,1 , 0);
% 40000 used training, remaining used for testing
trainNum = 60000;
label = d(1:trainNum, 2);
data = d(1:trainNum, 3:end);
testLable = d(trainNum+1:end, 2);
testData = d(trainNum+1:end, 3:end);
testAccuracies = [0, 0, 0];
Warning: The encoding ‘GB2312′ is not supported.
See the documentation for FOPEN.
1
ELEC2103/9103 Assignment 2015
Classification Tree Method
Classification Tree is a simple classification model and it is easy to interpret by human. It is suitable for training data where most features are categorical values.
% train the classification tree
tree = fitctree(data, label);
% predict on training data
pre = tree.predict(data);
% compute training accuracy
trainAccuracy = sum(pre == label)/ length(label)
% compute training confuse matrix
trainConfusematrix = confusionmat(label, pre)
% predict on testing data
preTest = tree.predict(testData);
% compute testing accuracy
testAccuracy = sum(preTest == testLable)/ length(testLable)
% compute testing confuse matrix
testConfusematrix = confusionmat(testLable, preTest)
testAccuracies(1) = testAccuracy;
trainAccuracy =
0.7172
trainConfusematrix =
17719 2343 3907
3089 6397
testAccuracy =
0.6661
testConfusematrix =
4753
1005
2370
16182 577
655 9131
978 1505
5061 216
176 2656
Naive Bayes method
It is a probability model. It assumes that the features are conditional independence which may not be true in this data. But we still can give it a try.
2
ELEC2103/9103 Assignment 2015
model = fitNaiveBayes(data, label);
trainPre = model.predict(data);
trainAccuracy = sum(trainPre == label)/ length(label)
trainConfusematrix = confusionmat(label, trainPre)
p = model.predict(testData);
testAccuracy = sum(p == testLable)/ length(testLable)
testConfusematrix = confusionmat(testLable, p)
testAccuracies(2) = testAccuracy;
trainAccuracy =
0.4392
trainConfusematrix =
19410 2191 2368
13353 4170 2325
11853 1560 2770
testAccuracy =
0.4337
testConfusematrix =
5818 705 713
3831 1425 1026
3800 527 875
Neural Network Method
It is a relative complex model, but we can use neural toolbox to easily implement it. It has powerful learning capacity and suitable for cases where we have large number of training data.
inputs = data’;
% the label of neural network should use one-hot representation
% label 1 represented as [1 0 0], 2 as [0 1 0], 3 as [0 0 1]
oneshot_label = toOneShot(label);
targets = oneshot_label’;
% set the network architecture with 5 hidden size
net = patternnet(5);
% set data division
3
ELEC2103/9103 Assignment 2015
net.divideParam.trainRatio = 80/100;
net.divideParam.valRatio = 10/100;
net.divideParam.testRatio = 10/100;
% train the Network
[net,tr] = train(net,inputs,targets);
% plots the training, validation, and test performances
figure, plotperform(tr)
% plots the training state
figure, plottrainstate(tr)
% view the net
view(net)
% prediction on training data
trainPre = net(inputs);
% compute train accuracy
trainAccuracy = perform(net,targets,trainPre)
% show train confusion matrix
figure, plotconfusion(targets,targets)
testTargets = toOneShot(testLable);
testTargets = testTargets’;
% prediction on testing data
outputs = net(testData’);
% compute testing accuracy
testAccuracy = perform(net,testTargets,outputs)
% show test confusion matrix
figure, plotconfusion(testTargets,outputs)
testAccuracies(3) = testAccuracy;
trainAccuracy =
0.5125
testAccuracy =
0.5206
4
ELEC2103/9103 Assignment 2015
5
ELEC2103/9103 Assignment 2015
6
ELEC2103/9103 Assignment 2015
Summary
Classification tree has the highest test accuracy, neural network the second and naive bayes the lowest. The reason for this I think is that the conditional independence assumption naive bayes made is not valid for this data set and the dataset is not large enough for neural network.
figure, bar(testAccuracies);
xlabel(‘1: Classificaiton Tree 2: Naive Bayes 3: Neural Netowrk’);
ylabel(‘Test Accuracy’);
7
ELEC2103/9103 Assignment 2015
University of Sydney statement concerning plagiarism
I certify that:
(1) I have read and understood the University of Sydney Academic Dishonesty and Plagiarism Policy (found by following the Academic Honesty link on the Unit web page).’;
(2) I understand that failure to comply with the Student Plagiarism: Coursework Policy and Procedure can lead to the University commencing proceedings against me for potential student misconduct under Chapter 8 of the University of Sydney By-Law 1999 (as amended).’;
(3) This Work is substantially my own, and to the extent that any part of this Work is not my own I have indicated that it is not my own by Acknowledging the Source of that part or those parts of the Work as described in the assignment instructions.’;
Name: Please put your name in this comment SID: Please put your SID in this comment Date: Please put the date in this comment
Published with MATLAB® R2014b
8