CS计算机代考程序代写 data mining algorithm University of Toronto Scarborough

University of Toronto Scarborough

Introduction to Machine Learning

and Data Mining

CSCC11H3

Fall 2021

Take-home Final Exam

Due December 21, 2021 at 11:59 pm

Analysis of the Stock Market Fluctuations, Anomalies

and Fear Index Using Data-driven Methodologies

Overview

Throughout this take-home exam, you will use your machine learning and analytic skills by building an
algorithm to investigate the driving forces behind anomalous events in stock markets. In particular, you
will characterize the extent to which the interplay among economic, political, psychological and
behavioural forces can leave an impact on stock market movements by resorting to applications of
various machine learning concepts and techniques. The data provided to you cover diverse sources of
information mainly retrieved from the public news and social media, multiple sectors of the
government and international organizations, balance sheets of many companies, etc. The ultimate goal
will be to construct a classifier for predicting the future direction of the VIX index (also known as the
fear index) and hence the stock market, which in turn will provide us with a significant amount of
valuable information. Furthermore, you will use probability mixture models, clustering algorithms,
neural networks, logistic regression and class conditionals, among several other topics.

Why this exam?
The take-home exam will teach you the end-to-end process of investigating data through a machine
learning lens. It will teach you how to model complex phenomena based on your knowledge acquired
during the semester and develop algorithms to extract salient features that best represent your data.
Furthermore, you will gain first-hand experience with some of the powerful machine learning
algorithms and, more importantly, have invaluable experience evaluating the performance of your
implemented algorithms and validating the accuracy of their results.

What will I learn?
By the end of this exam, you will be able to:
– Deal with an imperfect, real-world dataset.
Validate a machine learning result using test data, your critical thinking, and other analytic capabilities.
Evaluate the accuracy of your machine learning algorithm results using different quantitative metrics.
– Create, select and transform features to compare the performance of different machine learning
algorithms for their fine-tuning.
– Communicate your ideas, methodologies and results concisely, coherently and formally. 

Pep Talk
The exam description might sound a bit complicated and time-consuming when reading along its
description lines for the first time. However, I can assure you that it has been designed to be done
during 5-8 hours, and I believe most of you will enjoy working on the problem once you get engaged in
doing it. Besides, I am scheduling bi-daily office hours to help you out along this path, and of course,
you are always welcome to send me an email and book a personal appointment.

Outline of the Written Report

(please note that in general I’m not expecting a work of literature and it is considered sufficient to

convey your main points and fruitfully interpret your results. Yet, the following items are the

structure of most formal reports that you will write in your future career, hence I will mention some

of them)

1. Jupyter Notebook: You are required to prepare your report using the Jupyter notebook and

submit a pdf or html version of the final report on Crowdmark. Please try to organize your

report in different sections and use various available headings and typesetting to embellish on

your work.

2. Title and Abstract: Please choose an appropriate title and provide a brief abstract (in a very

high level) summarizing the entire work, its main points and the most significant findings. You

may aim for at most 200 words.

3. Introduction: This part introduces the reader to the dataset and to the area to which it pertains.

You should describe why this is an important problem to investigate and give the reader a

review of pertinent background information about the underlying problem. Basically, introduce

the reader to the problem and why it is meritorious of investigation. The introduction should be

written at a very basic level (i.e., no mathematics or notation), and remember that your reader

may not know anything about the area in which you are writing. You may aim for 1000 words

maximum. Introduction is also the best place to communicate all important and critical

questions which you tried to tackle and find their answer, regardless of the final output. For

instance, you may address the following questions (these are just a few examples and the list is

not exhaustive):

1. Is it even possible to forecast the stock market fear index? What are the reasons in

favour of and against such possibility.

2. Are the past data available to you sufficient to for constructing a predictive model to

forecast the direction of the stock market movements in the future?

3. Are there other factors existing which might have great influence on the behaviour that

one observes in stock markets? If so, is it possible to omit these additional elements

from our analysis and simplify the problem for the purpose of making its study more

practical? If not, what would be the main drawbacks in case such simplifications are

taken into account?

.

.

.

4. Model Specification: In this section, you want to describe, in clear detail, the data analysis used

to specify your candidate models. Pretend as if you are taking the reader by the hand and

leading him/her through your thought process which leads to your model selections. In doing

this, however, try not to overdo the first-person writing. It can sound less professional and less

authoritative if you continually write things like, “I tried this, and then I tried that …”.

5. Fitting and Diagnostics: This part of the report should describe the model fitting and

diagnostics techniques you used, with the goal of identifying a “final” model for forecasting.

Identify also what possible deficiencies your final model has. Remember, no model is perfect.

You will also need to be relatively detailed (as much as your time allows) and use your critical

thinking to come up and answer questions related to the possible obstacles.

6. Forecasting: This section should describe the techniques you used to forecast future

observations. Why is forecasting important? What impacts could forecasting the VIX have?

7. Discussion: Here you want to offer a summary of what you have done and draw your main

conclusions. Also, it is a good idea to discuss here other issues related to the data analysis. For

example, does your analysis have any shortcomings or lack of generalization? What were the

main problems you encountered? It is OK if your final model is not picture-perfect as real-life

data analysis is often more difficult than textbook problems.

8. Bibliography: Cite all the references (if necessary).

9. Appendices: Use appendices to catalogue extra graphics/plots/output. Basically, it is a good

idea to use appendix to house information that you want the reader to have access to, but you

feel that these information would interrupt the flow of the main body of the report.

General Points and Grading Scale

1. You are permitted to use any library which would facilitate your understanding of the problem.

2. There is no specific target length for your report. You should do enough to provide a an analysis

of this dataset, with attention paid to each sections listed above. You may decide to include

descriptions of the methods of analysis which have shed light on some fundamental questions

you tried to answer.

3. Your reports will be read very carefully by my senior PhD assistant and under my supervision

in order to ensure fair assessment of your understanding of the course material and the abilities

you have obtained throughout the course.

4. Your report will be graded out of a total of 100 points, based on your understanding of the

context, your analysis and your writing (this last component is not mandatory given the currrent

circumstances):

1. Context: Were the questions answered in terms of the variables of the dataset? Have you

attempted to frame your conclusions and interpretations in a subject-matter context? Have

you provided some background information about the data set and why it is of interest?

2. Analysis: Were the chosen models, graphs, and data analyses appropriate for the problem?

Were the analyses carried out correctly? Were your conclusions about the data sensible and

clearly justified by numerical or graphical evidence?

3. Writing: How organized, clearly written and comprehensible is the report? Would the client

reading this report be confident that it was written by an educated, well-trained computer

scientist (of course by considering the restriction pertaining to the short period of time to

write it)?

Overview
Why this exam?
What will I learn?
Pep Talk