2020/2021 MSCI 570 Forecasting Coursework part 1
Coursework Information & Submission
This is the first of two individual assignments. This first individual assignment is weighted 40% and will require you to explore a dataset of a single time series individually assigned to you in order to understand the data. Completing this assignment will prepare you for the 2nd assignment, weighted 60%, which will be to forecast the same time series you analyse in assignment 1 with multiple forecasting algorithms and models.
Coursework deadline is November 23, 2020, 10:00 am.
Standard departmental penalties will apply for late work unless you have been given an extension for exceptional reasons from the course administrator. All submissions will be checked by the plagiarism software. Coursework must be submitted online on Moodle. Submit your report PLUS all R scripts in the appendix in Moodle.
Assignment: Data Exploration of a real-world time series
Your task is to explore and critically discuss the time series pattern(s) of a real-world time series. Document your findings comprehensively in a technical report, making adequate use of (readable and correctly labelled) graphs which you also critically discuss to support your arguments. Base your justification on evidence and document your iterative data exploration process, possibly transforming the time series and analysing the resulting patterns throughout.
40 % of points – Explore graphically & verbally.
Explore the regular components and the irregular components of the time series making good use of graphs, plots, and descriptive summary statistics. Critically discuss the patterns you observe verbally, both regular (level, trend, season, etc.) and irregular patterns (outliers, breaks, etc.) and conclude what patterns are observed. Consider transforming the time series to exemplify individual patterns, both by removing patterns or transforming the time series into an aggregate form of lower frequency (e.g. quarterly, monthly, weekly buckets instead of daily).
20% of points – Statistical Tests
Explore the data using statistical tests. Complement your analysis with statistical tests to support your visual analysis, interpret their outputs, and discuss potential discrepancies with your visual analysis. Consider using alternative packages in R to find suitable tests. Note that important tests relate not only to stationarity (e.g. the ADF test) but also to regular time series patterns (e.g. seasonality, trend, etc) and irregular time series patterns (outliers, structural breaks)
20% of points – ACF Analysis
Explore the data using ACF & PACF. Plot the ACF and / or PACF graphs of the time series as suitable. Also consider to iteratively transform the time series depending on the patterns to consider non-stationarity, trends and / or seasonality and demonstrate the effect of transformations on ACF/PACF graphs with correct interpretation.
10% of points – Conclusions
Conclude by recommending one (or multiple) suitable algorithm(s) and forecasting model form(s), and critically discuss your choice weighting the different options. As time series patterns are not always clear, there often aer multiple suitable forecasting model for each time series. Please recommend all that are suitable.
10% of points – General report writing skills
General report writing skills include a critical discussion of findings, thoroughness of documentation, clarity of arguments, structure of the report, readability of the report (i.e. lack of spelling and grammatical mistakes etc.) in marking each section. Please see next page for some more technical considerations on report writing.
SUM 100%
We highly recommend using R, but you are free to use any external software but report the software used.
Please also consider the general recommendations on writing a technical report on the next page!
General suggestions on writing a report
The coursework requires you to document your analysis and critically discuss your chosen experimental design, modelling approaches and the results in a technical report. This technical report should be written as if tailored to an Analytics specialist (e.g. who has an MSc from Lancaster University and has taken the MSCI750 course, and who wants to evaluate your results AND your decision making process to determine your skills in modelling and whether you have missed anything). This means that you are not required to write a general description (i.e. a statistical test is, the ACF function is, Exponential Smoothing is … ) as an Analytics expert would be aware of this! Consequently, the report should document the process of modelling, and allow an understanding of your choices and a replication of your experiments.
The report should contain an introduction and a summary with conclusions on your findings, numbered headings, list of figures and tables and an executive summary (tailored to senior management) indicating the most relevant findings. The report should display a logical and concise structure, be generally “readable” and support your argument using plots of time series, forecasts and /or accuracy. Make adequate use of graphs to show time series, model fit / predictions and residuals to support your arguments (for this graphs must be completely readable and with labels), as well as tables to compare results.
The page limit for the report is 10 pages (note this is a maximum to make your life easier – you can produce shorter reports! pages count only for main text incl. graphs and tables, but not for the cover sheet, executive summary, contents sheet or appendices). Reports of excessive length will be penalised by deducting 10 marks (i.e. 10% of 100) but only if they are including un-necessary material. For formatting, use single spacing, format normal text in times new roman font size 12, text in tables, figure and table headings in font size 10, and leave 2cm of margin left and right.
Excessive evidence (e.g. the complete information from statistical tests) may be placed in the appendix, but must be referenced directly at the corresponding place in the main the text, else it Is not taken into consideration. Include any technical details and hardcopies that support your arguments in a set of appendices (i.e. the printouts from ADF tests in the appendix, with only the conclusion of significance / insignificance at a probability in the main text), which will not count towards the page limit. You must ensure the main text is readable and that your argument is coherent without needing to consult the appendices. All parts of the text supported by an appendix must cross-reference directly to the relevant part.
Non-disclosure clause: these datasets and the coursework task is subject to copyright © by Sven Crone, all rights reserved. In downloading the documents and submitting the assignment for assessment the copyright agreement is deemed accepted. Any publication of the dataset, the coursework task, or its solution (e.g. on a coursework website or a social network site), or a part thereof, will be considered a violation of copyright. The person breaking the copyright may be held liable for damages by international law suit. Furthermore, the publication will count the assignment as a plagiarism – even in retrospect after receiving the MSc degree – leading to a mark of zero, with the usual right to appeal to university court in official hearing.
Contact details:
Questions regarding the coursework Sven F. Crone
Room A53a s.crone@lancaster.ac.uk
Questions regarding R and workshops: Anna-Lena Sachs
Room 55
a.sachs@lancaster.ac.uk
If you have any questions please don’t hesitate to contact us! Also consider in your enquiries that I cannot always react within a few hours, so don’t leave questions to the last minute … start early!
Best of luck! Anna-Lena & Sven