代写 Northeastern University

Northeastern University
College of Professional Studies

ALY6020 – Predictive Analytics
Spring 2019 CPS Quarter – Second Half
Dr. Marco Montes de Oca

Assignment: Image recognition with Logistic Regression.

Due date: 06/06/19
Goal

Implement a basic image recognition system of fashion items using logistic regression. The data set provided contains instances of 10 different clothing items (these items are labeled with ten integers: 0 through 9). Your goal is to train and test 10 different logistic regression models, one per clothing item.

Task

0. Download the training and test sets: https://drive.google.com/file/d/1FHrIMthmGaVzdmxrPs-YMs4t0_6QjjIt/view?usp=sharing. Two files are provided. Each of these files contains thousands of 28×28 grayscale images such as the following:

Each image is encoded as a row of 784 integer values between 0 and 255 indicating the brightness of each pixel. The label associated with each image is encoded as an integer value between 0 and 9. The arrangement of the data is as follows:

If you want to see the image encoded in each row, you can use the following functions:

rotate <- function(x) { return(t(apply(x, 2, rev))) } plot_matrix <- function(vec) { q <- matrix(vec, 28, 28, byrow = TRUE) nq <- apply(q, 2, as.numeric) image(rotate(nq), col = gray((0:255)/255)) } If you want to plot the third image, for example, you can call the plot function as follows: plot_matrix(mydata[3, 2:785]) • For each digit: • Relabel each row with a 1 if it corresponds to the clothing item you are training for, and 0 otherwise. • Train a logistic regression model that predicts the label as a function of the image pixels: Label~Pixel1+Pixel2+Pixel3+ …+Pixel784 • Use only a random sample of 20000 training images • Test your models on the test set (use the function predict to do this. See example usage in the logistic regression example we saw in class). The output of each of your models will be the probability of an image being a particular clothing item. You will therefore have 10 probabilities per image. The predicted clothing item is the one associated with the model that produces the maximum probability among those 10 probabilities. • Create a confusion matrix with the counts of the correct and incorrect classifications. You may use the function “confusionMatrix” from the package “caret”. For example: Prediction 0 1 ... 9 Actuals 0 890 50 ... 10 1 0 900 ... 30 ... ... ... ... ... 9 78 1 ... 895 4. Discuss the obtained results, focusing on the accuracy of the classifications.