UNIVERSITY OF GLASGOW
ADAM SMITH BUSINESS SCHOOL
Data Science & Machine Learning in Finance (ACCFIN5246) Assignment 1 – Spring 2024
Copyright By PowCoder代写 加微信 powcoder
A. Instruction
— This assignment counts towards 35% of the overall course grade. This is an individual as- sessment. Answer all questions. Submission to be made electronically via the course Moodle page. Each part specifies further instructions. The grading weights are described below:
Part 1 2 3 4 Weight 10% 30% 30% 30%
— Results should be reported in a clear format. Avoid reporting numbers in ‘scientific format’ e.g. 7.2031e-06. All reported numbers should be rounded to two decimal points. For example, report 0.00 in place of 7.2031e-06.
Report Organization The assignment requested results are described under Section (E.), Parts [1]-[4], clearly number each part as [1]-[4] in the report. The contents are to be structured as follows:
1. two numbers reported under Part [1], xr% and xrm% and a diagram (with captions, labels, legend, etc.) showing cumulative daily log-returns {rt, rM,t, rf,t} over the time horizon.
2. three diagrams (with captions, labels, legend, etc.) each including three series of estimation outputs described under Part [2],
3. optimal thresholds (τj∗)’s and FMSE(τj∗) values associated with (c1)-(c2) under Part [3], and
4. 500 words comments under Part [4].
The grading is carried out strictly based on the precision of results and clarity of visualisations. The final part is graded based on the relevance of finance analysis supported by the methodologies and empirical results.
B. Data Acquisition
Obtain data for the variables needed to construct and estimate the model in Section (D.). The data should cover the period 2000/01/03-2022/12/30, on a daily basis. When acquiring the data, en- sure relevant characteristics, such as calendar dates and timestamps are obtained as these additional characteristics are essential throughout the data cleaning and dataset arrangement.
(rt) real log-returns to be constructed based on Microsoft stock price, acquired from WRDS1 1wharton.upenn.edu — Get Data, CRSP, Annual Update, Stock / Security Files, Daily Stock File
ACCFIN5246, , 2024 – Assignment 1
(rM,t) real market log-returns to be constructed based on the S&P500 composite market index, acquired from WRDS
(rf,t) real interest rates, associated with US 10-year maturity treasuries acquired from FRED2 (CPI) The US consumer price index may be used to transform nominal data to real terms3
All series must be researched thoroughly to ensure consistency with other variables, in terms economic interpretations, units, frequency, and other characteristics.
Data Preparation
Construction of the financial dataset must, first takes into account the possibility of sporadic observations. When multiple series are used within the same model, variable timestamps must be aligned.
Thecombineddatasetincludingallvariablesalongsideacommontimelinethenmayamount to encountering missing value, NaNs and other irregularities, therefore the data cleaning re- tains datapoints when all observations are recorded and economically meaningful at each date.
The definition of daily log-return provides a measurement for value changes between con- secutive observations points which may or may not be consecutive days as a result of dis- carding unbalanced observations. In reality, when multiple days are omitted, for example as a result of an unbalanced dataset, then computed financial returns are split based on the time distance between the two neighbouring datapoints, to adjust for the corresponding performance over a fixed 24-hour window. For simplicity and consistency in this empirical exercise, a daily return is generated based on two adjacent post-cleaning datapoints, regard- ing the gap between as a calendar day.
Assume a ‘calendar year’ comprises exactly 52 weeks, this may amount to minor discrepen- cies since, year ≈ 52.17 weeks – disregard this discrepancy and set each window (w) to contain exactly 52 consecutive weekly datapoints.
When required, assume a trading year comprises 365 days, thus disregard any variations such as leap years or public holidays affecting the number of trading days.
Weekly log-returns are defined as the percentage value change between a week’s first trad- ing day to next week’s first trading day.
The cleaned dataset must be arranged in both daily and weekly frequencies in preparation for various results requested in Section (E.).
2fred.stlouisfed.org/ — key: DGS10 3fred.stlouisfed.org/ — key: CPILFESL
ACCFIN5246, , 2024 – Assignment 1
D. The Model
Consider the capital asset pricing model characterised by the following specification, used to in- terrelate the real excess log-return, on a given asset rt − rf,t where rf,t is the risk-free rate, to the market real log-return denoted by rm,t:
rt −rf,t | {z }
αw +βw(rM,t −rf,t)+ ut | {z }
(1) note that the object of interest is the time-varying feature of the coefficients αbw and βw. In partic-
ular, βw summarizes the conditional relationship, given a rolling window incorporating a consec- utive but limited span of data, between the market excess log-return rM,t − rf,t and an individual investment excess log-return. The diagram below provides an illustration to describe overlapping windows (w), including a calendar year of data:
Figure 1: The timeline illustrates a rolling window set-up, where each iteration includes a consecutive 52 weekly datapoints, where W1, W2, … refer to week numbers throughout the entire sample and wi refer to a rolling window indentifier.
E. Implementations and Results
Based on the dataset and instructions in Sections (B.)-(C.), complete the following parts.
W1 W2 W3 W4 W5 W6
W48 W49 W50 W51 W52 W53 W54 W55
| . . . . .
} . . . . .
{z } . . w3
ConstructdailyrealexcessMSFTlog-returns(xr)anddailyrealexcessmarketlog-returns (xrm). Report (i) the precise values for two averages: xr% and xrm% over the entire sam- ple in daily net log-return averages (rounded to two decimal points) and (ii) a diagram depicting the cumulative daily log-returns {rt, rM,t, rf,t}, overlaid within the same dia- gram4 space, with the vertical axis showing the cumulative log-returns cumsum(rets) versus the horizontal axis, showing a representation of calendar time. (Mark: 10%)
Arrange the dataset, based on a calendar variable, such that all data are set to a weekly frequency. The construction must be based on the first trading days between two consecutive weeks. Proceed to parts [2]-[4] based on this data frequency. Each window within the rolling specification contains 52 adjacent weekly observations.
Part [2] Implement a restricted least squares model based on specification (1) in addition to the 4Overlaiddiagramsarecreatedviavariousapproaches,e.g.useplot(xseries,yseries); hold on,followed
byadditionalplotsinthesameformatandendingthelastplotwithhold off.
ACCFIN5246, , 2024 – Assignment 1
units. (Mark: 30%)
Assume τ is now variable and implement the following optimization. Consider a linear
regression between the LHS of expression (1) versus the lagged value of λw(cj), i.e. two cases given j = 1, 2, xrt = θ0 + θ1λbw,t−1(cj ) + νt. Compute the forecast MSE of a 1- step ahead predictions (based on 52-consecutive observations to predict one week ahead, using weekly data) for both expressions (c1)-(c2) throughout the series horizon, computed separately once for all points in a grid for parameter τ ∈ [−1% : 0.01% : +1%] i.e. the points are defined as −1%, −0.99%, . . ., +0.99%, 1.00%. Based on the FMSE’s (lower FMSE is better) for each of the 201 cases within the grid search, report the best value τj∗ for expressions (c1)-(c2) that generates the lowest FMSE, together with the FMSE(τj∗) values for both expressions (no additional comments). (Mark: 30%)
Explain with comments, why predictions based on expressions (c1)-(c2) should result in the FMSE values (ranking) above. Comments should draw on finance analysis in con- nection with the empirical framework developed in Parts [1]-[3]. The research may refer to the two references (F[5], F[6]) (Mark: 30%, 500 words).
following constraints, implemented individually across two cases cj , j = 1, 2: αw ≥ τ
Assume τ = 0. Store all values obtained for αc(c ) and β (c ) and the estimated La-
grange multiplier(s) λw(cj) for each of the cases described in expressions (c1)-(c2). Present
the results in three diagrams for each of the estimated variables alongside the time hori- zon — for example, the first diagram depicts two αc series in expressions (c )-(c ) (over-
laid in the same diagram on the vertical axis) versus time (horizontal axis, displayed as the year or a simplified date format), with the diagram legend identifying each series as theoutcomeforexpressions(c1)-(c2).5 Theillustrationoflinesmustclearlybeidentifiable either with colors or line patterns. Clearly label all axes with variable names and their
F. Computational Notes
While calendar timeline may be handled via various approaches, using the following built-in libraries is recommended:
• Utilising timetable variable type facilitates timestamped operations. Upon inputting the variable type6, (i) week() returns the week number in a given year, (ii) weekday() returns the day position in a given week.
• Inequality constrained within the constrained least-squared method are implemented via various approaches (G[3], G[4]). A built-in routine lsqlin is a suitable and efficient method to compute the problems (G[2]).
5Similarly, a diagram including βw series in expressions (c1)-(c2), and a separate diagram including λw series in expressions (c1)-(c2).
6Timetables & Functions.
ACCFIN5246, , 2024 – Assignment 1
• Grid search can be constructed based on linspace(lb,ub,201) for the case described in Part [3] where the function generates evenly spaced values between lb to ub and for 201 points.
G. Background Reading
[G1] Lecture slides
[G2] Constrainedleastsquaresoptimizationwithinequalityconstraintsmaybecarriedoutusing
analytical or computational approaches such as lsqlin as described in The MathWorks Inc.
Optimization Toolbox: Solve Constrained Linear Least-Squares Problems (lsqlin), 2022
[G3] Wolak. An exact test for multiple inequality and equality constraints in the linear
regression model. Journal of the American Statistical Association, 82(399):782–793, 1987
[G4] Liew. Inequality Constrained Least-Squares Estimation. Journal of the American
Statistical Association, 71(355):746–751, 1976
[G5] S&P U.S. Indices Methodology
[G6] Microsoft Annual Report (2023)
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com