AFIN8015-Financial Data Science Assessment-2 (Session-1, 2022)
Data Analysis-I
Total Marks: 100
Submission Deadline: Assessment must be submitted by 11:59pm, 10 April 2022
Copyright By PowCoder代写 加微信 powcoder
General Instructions
• This assignment has two parts.
• Part-I is on theoretical background and descriptive analysis and Part-II is machine learning,
specifically, classification models.
• You have been assigned a company to work with (look at ‘Assignment_afin8015_s1_2022.xlsx’ on ilearn), you must work on the company listed against your name as the companies in your analysis.
• The data time period used in the assessment will be from 1 July 2019 to 15 March 2022.1
• Both parts must be documented in one document as Part-I and Part-II.
• The assignment requires submission of your working R code files
– All data files used in the code must be submitted.
– The code must be included in the appendix of the document and an R code file should
be uploaded.
– You are not required to use the R Markdown format for this assignment but you are encouraged to use it.
• Your individual paper must not exceed 12 A4 pages of 11pt font size with 2 spacing. This excludes any appendices, tables and lengthy R output you may elect to incorporate in the report.
• The word count mentioned in the questions is the maximum word count and excludes any figures and/or tables.
• Marks will be awarded for depth of coverage, quality of insight, succinctness and accuracy of answers.
• Marks will be deducted for poorly informed reports which lack proper formatting, referencing etc. Following deduction will apply
1 Inform the unit convenor ASAP if the data time period or OHLC data is not available for the company allocated to you.
– No references (in-text and end text), includes reference to data source: -10
– No coversheet: -5
– Illegible presentation: -10
– Lack of informed research: -10
– Plagiarism will be dealt according to the university policy and a high similarity score will be penalized.
• The discussion must be informed by research and the report must cite all the sources.
• Both, in-text and end text citations are required. End text references are excluded from the
page limit. Use one citation style, either APA or Harvard.
• FACTSET is the preferred data source for the assignment along with publicly available in- formation from company website and ASX and the sources mentioned in this document.
• Assignment (document) must include a cover sheet.
• A sample coversheet is provided on ilearn, you may choose to use it.
Please contact your unit convenor well before the submission deadline for any clarifications you may need on the assignment instructions. You may also post your questions on the discussion forum.
Assignment Questions
You are an intern data scientist at Shootingformars Corp. and have data analysis and machine learning skills, particularly in the financial service sector. As it happens, your mentor Ms Fowler has just been approached by Mr Musk, a new client who has recently started investing in the Share Market and has been using some past information to make his trading decisions on a daily basis. Mr Musk has also been researching during his spare time and has heard that modern Data Science methods such as Machine Learning can be used to predict the price direction for stocks and other financial assets. Unfortunately, he has limited understanding of the Data Science process and limited programming skills, he does have some background in statistics.
After the initial meeting with Mr Musk, Ms Fowler has decided to treat this as a educa- tional/proof of concept project and brought you on board to conduct the analysis and prepare the documentation for the project.
Ms Fowler has assigned a publicly trading stock listed in ‘Assignment_afin8015_s1_2022.xlsx’ and given you a set of tasks as listed in Part-I and Part-II of this document. Part-I is aimed to assist Mr Musk in developing a better understanding of the Data Science and Descriptive Statistics using statistics and visualisation. Part-II of the task is to use the stock assigned to you and conduct a classification exercise for demonstration. The task requires you to create a professional standard document to be presented to the client.
You have been given a choice of either using a traditional workflow of creating and word document and R for coding the methods separately and then bring them all together in one document or use a reproducible method with an RMarkdown file.
Part I. Data Science Concepts & Descriptive Analysis 1 Task-1 (10 Marks)
1. Explain the concept of Data Science, and outline and explain the Life Cycle of a Data Science Project. Use example(s) from the financial service sector domain.
(10 marks)
Go beyond the text book and in-class resources to include recent developments and explain the concept with Financial Service Sector as the main domain. All references must be cited.
Task-2 (10 Marks)
1. Use FACTSET and download the daily Open, High, Low and Close (OHLC) Prices and Trading Volume for the company stock assigned to you from 01-July-2019 to 15-March-2022.
2. Use the closing prices and percentage logarithmic returns of the closing prices to generate descriptive statistics (including Skewness, Kurtosis and Test for Normal Distribution). Present the statistics in the document and briefly discuss the range, distribution and tail behaviour of
3 Task-3 (10 Marks) 4
the price and return series. Keep the discussion brief and to the point, remember your client has some statistical background and understanding of the stock market. (Word limit: 250 words)
Task-3 (10 Marks)
1. Plot and present the closing prices and log returns using ggplot2 package in R. Hint: One way is to extract the dates and closing prices and returns in a data frame and convert it from wide to long.
2. Use the last 6 months OHLC prices and the Volume data to plot the following charts :
(a) Line Chart
(b) Candlestick Chart:
(c) Add the following Technical indicators to the candlestick chart i. 5 Day Simple Moving Average
ii. 5 Day Exponential Moving Average
(6 marks: 2+2+2)
3. Comment on the trend and price direction based on the plots generated in 1 and 2 above.
(Word limit: 150 words)
Part II. Classification Models & Application
Task-4 (10 Marks)
1. As Mr Musk has limited exposure to Machine Learning (ML) and various methods in ML, you are tasked to conduct a short review of ML and ML methods with a focus on Classification models. Your review should also include the following
(a) An overview of Machine Learning.
(b) Discussion on Supervised and Unsupervised Machine Learning and the following methods.
i. Logistic Regression
ii. K-Nearest Neighbour Algorithm
(c) As the modelling task requires you to conduct a price direction forecast exercise, the review should also include examples of previous research using ML for stock price move- ment/direction prediction.
Go beyond the text book and in-class resources to include recent developments and research. All references must be cited. (Word Limit: 300 words)
(10 marks)
5 Task-5 (60 Marks) 5
5 Task-5 (60 Marks)
Your final task is to conduct a proof of concept comparative analysis of two classification methods to demonstrate classification and predictive ability of ML methods in modelling and predicting the price direction based on various technical indicators. Specifically, the task should conduct the following:
1. Select the closing prices from the OHLC stock price data downloaded from FACTSET (same as in Task-2) and create the one period lags of the following technical indicators2.
(a) Moving Average: 10 day moving average. (b) Log returns
(c) MACD (default values for nFast, nSlow and nSig) (d) Exponential Moving Average: 10 day
(e) Momentum: 5 day
(10 marks: 2×5)
2. Create a dichotomous price direction indicator output variable based on 4 day lagged price
(this is not a lagged indicator)
3. Combine the indicators in a data frame and visualise the data using
(a) A time series plot, and
(b) Box plots of indicators categorised by price direction
(8 marks: 4+4)
1→Pt ≥Pt−4 0 otherswise
4. Createa70:30trainingandtestingsamplefromthedatasetandconductaclassificationexercise using Logistic Regression. The analysis should include the following:
(a) Training on the training sample using a ‘timeslice’ sampling. Use at least 250 days as window size and 14 days for prediction horizon in a fixed window.
(b) Data pre-processing to standardise the data.
(c) Prediction on the test set and corresponding confusion matrix.
(d) Brief discussion on the accuracy of the prediction based on the confusion matrix.
(20 marks)
5. Conduct the classification exercise (in 3 above) using k-Nearest Neighbours algorithm. The analysis should include the following:
2 Hint: Use the TTR and quantmod package
5 Task-5 (60 Marks) 6
(a) A odd number grid search for the ‘k’ parameter from 1 to 30. (b) Prediction on the test set and corresponding confusion matrix.
(c) Brief discussion on the accuracy of the prediction based on the confusion matrix.
(15 marks)
6. Compare the performance of the Logistic Regression Model and k-NN model based on their accuracy (based on the confusion matrix from the two models) and provide a recommendation for Mr Musk. (Word Limit: 150 words)
Your final report must include both Part-I and Part-II and must contain the output from the analysis conducted in R. Final code and data files must be submitted on the relevant links on ilearn.
**End of Assignment Questions**
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com