程序代写代做代考 data mining algorithm Mariia Okuneva, M.Sc.

Mariia Okuneva, M.Sc.
Data Mining Home Assignment 2
In this home assignment you will again use Smarket data set which is part of the ISLR library. In the 􏰀rst part, you will write your own function designed to 􏰀t an LDA model to the train set and compare its performance with a pre-impelemnted lda() function. Furthermore, you will use this 􏰀tted model to make predictions on the test set. In the second part, you will learn about another classi􏰀cation method called KNN (K nearest neighbours). You will program it from scratch and compare results with the output of existing knn() function. In both parts of this assignment, your goal is to predict Direction of the market with Lag1, and Lag2, percentage returns one and two days ago respectively. Your task is to write an R script that contains the following parts. But 􏰀rst, download the script template HA2_yournames.R from OLAT.
Part 1. LDA
1. Write your own function mylda to 􏰀t an LDA model to the train data set. This function should work for any number of features, but only for binary classi􏰀cation. Of course, you are welcome to make the function more general, but this is not required for successful completion of the assignment. In part one, train data set should include observations from the time period between 2001 and 2004. Please 􏰀nd additional information on the inputs and outputs of this function in the script template HA2_yournames.R. Note that lda() function applies scaling to the discriminant function coe􏰂cients (formula is given in the template, please see ESL, Chapter 4, Equations 4.15, 4.16 for details). Compare your results with the output of lda() function.
2. Visualize the data and a decision boundary.
3. Write a function my_predict which will output predicted classi􏰀cation for one observation. Use apply() and my_predict functions to produce predicted classes for the whole test data set. Calculate the accuracy of prediction, produce confusion matrix. Compare your results with the results based on pre-implemented lda() function.
Part 2. KNN
1. Program knn classi􏰀cation algorithm from scratch (do not use any integrated knn func- tions) . Detailed information on the inputs and output of the function myknn can be found in the template. In part two of this assignment, use the 􏰀rst half of the data set as a train set. Use Euclidean distance as a measure of similarity.
2. Write a function MER for calculating misclassi􏰀cation error rate. Test performance of myknn and MER with K=5. Additionally, produce a confusion matrix.
3. Compare results you get with knn() pre-implemented function and your own functions from Task 1 and Task 2.
1

Remarks: Write comments for everything you do. Codes that are not written using the template and/or that return error messages will not be evaluated. If you are working in groups (not more than 5 students in one group), make sure to note down every participant’s name and ID. Submission: Submit your scripts via email to mokuneva[at]stat-econ.uni-kiel.de until the end of June 10th (until 00:00:00, June 11th)
2

Related Posts