Certificate in Quantitative Finance Final Project Brief
January 2024 Cohort
This document outlines topics available for this cohort. No other topics can be submitted. Each topic has by-step instructions to give you a structure (not limit) as to what and how to implement.
Marks earned will strongly depend on your coding of numerical techniques and presentation of how you explored and tested a quantitative model (report in PDF or HTML). Certain numerical methods are too involved or auxiliary to the model, for example, do not recode optimisation or RNs generation. Code adoption allowed if the code fully modified by yourself.
Copyright By PowCoder代写 加微信 powcoder
A capstone project requires own study and ability to work with documentation on packages that implement numerical methods in your coding environment e.g., Python, R, Matlab, C#, C++, Java. You do not need to pre-approve the coding language and use of libraries, including very specialised tools such as Scala, kdb+ and q. However, software like EViews is not coding.
Exclusively for current CQF delegates. No distribution.
To complete the project, you must code the model(s) and its numerical techniques form one topic from the below options and write an analytical report. If you continue from a previous cohort, please review topic description because tasks are regularly reviewed. It is not possible to submit past topics.
1. Credit Spread for a Basket Product (CR)
2. Deep Learning for Financial Time Series (DL)
3. Pairs Trading Strategy Design & Back test (TS)
4. Portfolio Construction using Black-Litterman Model and Factors (PC)
5. Optimal Hedging with Advanced Greeks (DH)
6. Blending Ensemble for Classification (ML)
7. Algorithmic Trading for Reversion and Trend-Following (AL)
8. Deep Neural Networks for Solving High Dimensional PDEs (DN)
Topics List for the current cohort will be available on the relevant page of Canvass Portal.
Project Report and Submission
• First recommendation: do not submit Python Notebook ’as is’ – there is work to be done to transform it into an analytical report. Remove printouts of large tables/output. Write up mathematical sections (with LaTeX markup). Write up analysis and comparison for results and stress-testing (or alike). Explain your plots. Think like a quant about the computational and statistical properties: convergence/accuracy/variance and bias. Make a table of the numerical techniques you coded/utilised. • Project Report must contain sufficient mathematical model(s), numerical methods and an adequate conclusion discussing pros and cons, further development.
• There is no set number of pages. Some delegates prefer to present multiple plots on one page for comparability, others choose more narrative style.
• It is optimal to save Python Notebook reports as HTML but do include a PDF with page numbers – for markers to refer to.
• Code must be submitted and working.
FILE 1. For our download and processing scripts to work, it is necessary to name and upload the project report as ONE file (pdf or html) with the two-letter project code, followed by your name as registered on CQF Portal.
Examples: TS John EPORT.pdf or PC Xiao EPORT.pdf
FILE 2. All other files, code and a pdf declaration (if not the front page) must be uploaded as additional ONE zip file, for example TS John ODE.zip. In that zip include converted PDF, Python, and other code files. Do not submit unzipped .py, .cpp files as cloud anti-virus likely to flash red on our side.
Do not submit files with generic names, such as CODE.zip, FinalProject.zip, Final Project Declaration.pdf, etc. Such files will be disregarded.
Submission date for the project is Thursday 22nd August 2024, 23.59 BST
There is no extension time to Final Project.
Projects without a hand-signed declaration or working code are incomplete.
Failure to submit ONE report file and ONE zip file according to the naming instructions means such a project will miss an allocation for grading.
All projects are checked for originality. We reserve an option of a viva voce before the qualification to be awarded.
Project Support
Advanced Electives
To gain background knowledge in a focused way, we ask you to review two Advanced Electives. Electives canvass knowledge areas and can be reviewed before/at the same time/closer to writing up Analysis and Discussion (explanation of your results).
➢ There is no immediate match between Project Topics and Electives
➢ Several workable combinations for each Project Topic are possible.
➢ One elective learning strategy is to select one `topical elective’ and one `coding elective.’
To access the electives:
Login to the CQF Learning Hub
Navigate to Module 6 on your Dashboard.
Click on Electives button on global navigation menu.
Scroll down to Electives, then click the Electives Catalog.
You will be redirected to the electives Catalogue, where you can view and review all electives available to you. Full descriptions for each elective can be found here.
When on an elective click the enrol button
You will see the confirmation page, click the enrol in Course button to confirm your selection.
You will land on the successful enrolment page, where you can click to start the elective or return to the catalogue page.
When on the catalogue page you can click the Learning Platform link to return to Canvas. Your electives selected will appear on your learning dashboard.
Workshop & Tutorials
Each project title is supported by a faculty member alongside a set of project workshops and tutorials.
DATE 06/07/2024 13/07/2024 19/07/2024 22/07/2024 23/07/2024 24/07/2024
Final Project Workshop I (CR, PC & DH)
Final Project Workshop II (TS, DL, ML, AL, DN & DH) Final Project Tutorial I (CR Topic)
Final Project Tutorial II (TS, DH & AL Topic)
Final Project Tutorial III (DL & ML Topic)
Final Project Tutorial IV (PC & DN Topic)
13:00 – 15:30 BST 13:00 – 15:30 BST 18:00 – 19:00 BST 18:00 – 19:00 BST 18:00 – 19:00 BST 18:00 – 19:00 BST
Faculty Support
Title: Credit Spread for a Basket Product (CR) Project Code: CR
Title: Deep Learning for Financial Time Series (DL)
Ensemble for Classification (ML) Project Code: DL & ML
Faculty Lead:
Title: Pairs Trading Strategy Design & Backtest (TS)
Optimal Hedging with Advanced Greeks (DH)
Algorithmic Trading for Reversion and Trend-Following (AL) Project Code: TS, DH, AL
Faculty Lead:
Title: Portfolio Construction using Black-Litterman Model and Factors
Deep Neural Networks for Solving High Dimensional PDEs (DN) Project Code: PC & DN
To ask faulty a question on your chosen topic, please submit a support ticket by clicking on the Support button which can be found in the bottom hand right corner on your portal.
Coding for Quant Finance
• Choose programming environment that has appropriate strengths and facilities to implement the topic (pricing model). Common choice is Python, Java, C++, R, Matlab. Exercise judgement as a quant: which language has libraries to allow you to code faster, validate easier.
• Use of R/Matlab/Mathematica is encouraged. Often there a specific library in Matlab/R gives fast solution for specific models in robust covariance matrix/cointegration analysis tasks.
• Project Brief give links to nice demonstrations in Matlab, and Webex sessions demonstrate Python notebooks {does not mean your project to be based on that ready code.
• Python with pandas, matplotlib, sklearn, and tensorow forms a considerable challenge to Matlab, even for visualization. Matlab plots editor is clunky, and it is not that difficult to learn various plots in Python.
• ‘Scripted solution’ means the ready functionality from toolboxes and libraries is called, but the amount of own coding of numerical methods is minimal or non-existent. This particularly applies to Matlab/R.
• Projects done using Excel spreadsheet functions only are not robust, notoriously slow and do not give understanding of the underlying numerical methods. CQF-supplied Excel spreadsheets are a starting point and help to validate results but coding of numerical techniques/use of industry code libraries is expected.
• The aim of the project is to enable you to code numerical methods and develop model prototypes in a production environment. Spreadsheets-only or scripted solutions are below the expected standard for completion of the project.
• What should I code? Delegates are expected to re-code numerical methods that are central to the model and exercise judgement in identifying them. Balanced use of libraries is at own discretion as a quant.
• Produce a small table in report that lists methods you implemented/adjusted. If using ready functions/borrowed code for a technique, indicate this and describe the limitations of numerical method implemented in that code/standard library.
• It is up to delegates to develop their own test cases, sensibility checks and validation. It is normal to observe irregularities when the model is implemented on real life data. If in doubt, reflect on the issue in the project report.
• The code must be thoroughly tested and well-documented: each function must be described, and comments must be used. Provide instructions on how to run the code.
Credit Spread for a Basket Product
Price a fair spread for a portfolio of CDS for 5 reference names (Basket CDS), as an expec- tation over the joint distribution of default times. The distribution is unknown analytically and so, co-dependent uniform variables are sampled from a copula and then converted to default times using a marginal term structure of hazard rates (separately for each name). Copula is calibrated by estimating the appropriate default correlation (historical data of CDS differences is natural candidate but poses market noise issue). Initial results are histograms (uniformity checks) and scatter plots (co-dependence checks). Substantial result is sensitivity analysis by repricing.
A successful project will implement sampling from both, Gaussian and t copulae, and price all k-th to default instruments (1st to 5th). Spread convergence can require the low discrepancy sequences (e.g., Halton, Sobol) when sampling. Sensitivity analysis wrt inputs is required. Data Requirements
Two separate datasets required, together with matching discounting curve data for each.
1. A snapshot of credit curves on a particular day. A debt issuer likely to have a USD/EUR CDS curve – from which a term structure of hazard rates is bootstrapped and utilised to obtain exact default times, ui → τi. In absence of data, spread values for each tenor can be assumed or stripped visually from the plots in financial media. The typical credit curve is concave (positive slope), monotonically increasing for 1Y, 2Y, . . . , 5Y tenors.
2. Historical credit spreads time series taken at the most liquid tenor 5Y for each ref- erence name. Therefore, for five names, one computes 5 × 5 default correlation matrix. Choosing corporate names, it is much easier to compute correlation matrix from equity returns.
Corporate credit spreads are unlikely to be in open access; they can be obtained from Bloomberg or Reuters terminals (via your firm or a colleague). For sovereign credit spreads, time series of ready bootstrapped PD5Y were available from DB Research, however, the open access varies. Explore data sources such as www.datagrapple.com and www.quandl.com.
Even if CDS5Y and PD5Y series are available with daily frequency, the co-movement of daily changes is market noise more than correlation of default events, which are rare to observe. Weekly/monthly changes give more appropriate input for default correlation, however that entails using 2-3 years of historical data given that we need at least 100 data points to estimate correlation with the degree of significance.
If access to historical credit spreads poses a problem remember, default correlation matrix can be estimated from historic equity returns or debt yields.
Step-by-Step Instructions
1. For each reference name, bootstrap implied default probabilities from quoted CDS and convert them to a term structure of hazard rates, τ ∼ Exp(λˆ1Y , . . . , λˆ5Y ).
2. Estimate default correlation matrices (near and rank) and d.f. parameter (ie, calibrate copulæ). You will need to implement pricing by Gaussian and t copulæseparately.
3. Using sampling form copula algorithm, repeat the following routine (simulation):
(a) Generate a vector of correlated uniform random variable.
(b) For each reference name, use its term structure of hazard rates to calculate exact time of default (or use semi-annual accrual).
(c) Calculate the discounted values of premium and default legs for every instrument from 1st to 5th-to-default. Conduct MC separately or use one big simulated dataset.
4. Average premium and default legs across simulations separately. Calculate the fair spread.
Model Validation
• The fair spread for kth-to-default Basket CDS should be less than k-1 to default. Why?
• Project Report on this topic should have a section on Risk and Sensitivity Analysis
of the fair spread w.r.t.
1. default correlation among reference names: either stress-test by constant high/low
correlation or ± percentage change in correlation from the actual estimated levels.
2. credit quality of each individual name (change in credit spread, credit delta) as well
as recovery rate.
Make sure you discuss and compare sensitivities for all five instruments.
• Ensure that you explain historical sampling of default correlation matrix and copula fit (uniformity of pseudo-samples) – that is, Correlations Experiment and Distribution Fitting Experiment as will be described at the Project Workshop. Use histograms.
Copula, CDF and Tails for Market Risk
The recent practical tutorial on using copula to generate correlated samples is available at:
https://www.mathworks.com/help/stats/copulas-generate-correlated-samples.html
Semi-parametric CDF fitting gives us percentile values with fitting the middle and tails. Gen- eralised Pareto Distribution applied to model the tails, while the CDF interior is Gaussian kernel-smoothed. The approach comes from Extreme Value Theory that suggests correction for an Empirical CDF (kernel fitted) because of the tail exceedances. http://uk.mathworks.com/help/econ/examples/using- extreme- value- theory- and- copulas- to- evaluate- market- risk.html http://uk.mathworks.com/help/stats/examples/nonparametric- estimates- of- cumulative- distribution- functions- and- their- inverses.html
Deep Learning for Asset Prediction
Trend prediction has drawn a lot of research for many decades using both statistical and computing approaches including machine learning techniques. Trend prediction is valuable for investment man- agement as accurate prediction could ensure asset managers outperform the market. Trend prediction remains a challenging task due to the semi-strong form of market efficiency, high noise-to-signal ratio, and the multitude of factors that affect asset prices including, but not limited to the stochastic nature of underlying instruments. However, sequential financial time series can be modeled effectively using sequence modeling approaches like a recurrent neural network.
Ob jective
Your objective is to produce a model to predict positive moves (up trend) using the Long Short- Term Memory Networks. Your proposed solution should be comprehensive with the detailed model architecture, evaluated with a backtest applied to a trading strategy.
• Choose one ticker of your interest from the index, equity, ETF, crypto token, or commodity.
• Predict trend only, for a short-term return (example: daily, 6 hours). Limit prediction to binomial classification: the dependent variable is best labelled [0, 1]. Avoid using [-1, 1] as class labels.
• Analysis should be comprehensive with detailed feature engineering, data pre-processing, model building, and evaluation.
Note: You are free to make study design choices to make the task achievable. You may redefine the task and predict the momentum sign (vs return sign) or direction of volatility. Limit your exploration to ONLY one asset. At each step, the process followed should be expanded and explained in detail. Merely presenting python codes without a proper explanation shall not be accepted. The report should present the study in a detailed manner with a proper conclusion. Code reproducibility is a must and the use of modular programming approaches is recommended. Under this topic, you do not recode existing indicators, libraries, or optimization to compute neural network weights and biases.
Step-by-Step Instructions
1. The problem statement should be explicitly specified without any ambiguity including the selection of underlying assets, datasets, timeframe, and frequency of data used.
• If predicting short-term return signs (for the daily move), then training and testing over up to 5 years should be sufficient. If you attempt the prediction of 5D, 10D return for equity or 1W, 1M for the factor, you’ll have to increase the data required to at least 10 years.
2. Perform exhaustive Feature Engineering (FE).
• FE should be detailed including the listing of derived features and specification of the tar- get/label. Devise your approach on how to categorize extremely small near-zero returns (drop from the training sample or group with positive/negative returns). The threshold will strongly depend on your ticker. Example: small positive returns below 0.25% can be labelled as negative.
• Class imbalances should be addressed – either through model parameters or via label definition.
• Use of features from cointegrated pairs and across assets is permitted but should be tactical about design. There is no one recommended set of features for all assets; however, the initial feature set should be sufficiently large. Financial ratios, advanced technical indicators including volatility estimators, and volume information can be a predictor for price direction.
• OPTIONAL Use of news heatmap, credit spreads (CDS), historical data for financial ratios, history of dividends, purchases/disposals by key stakeholders (director dealings) or by large funds, or Fama-French factor data can enhance your prediction and can be sourced from your professional subscription.
3. Conduct a detailed Exploratory Data Analysis (EDA).
• EDA helps in dimensionality reduction via a better understanding of relationships between features and uncovers underlying structure, and invites detection/explanation of the outliers. The choice of feature scaling techniques should be determined by EDA.
4. Proper handling of data is a must. The use of a different set of features, lookback length, and datasets warrant cleaning and/or imputation.
5. Feature transformation should be applied based on EDA.
• Multi-collinearity analysis should be performed among predictors.
• Multi-scatter plots presenting relationships among features are always a good idea.
• Large feature sets (including repeated kinds, and different lookbacks) warrant a reduction in dimensionality in features. Self Organizing Maps (SOM), K-Means clustering, or other methods can be used for dimensionality reduction. Avoid using Principal Component Analysis (PCA) for non-linear datasets/predictors.
6. Perform extensive and exhaustive model building.
• Design the neural network architecture after extensive and exhaustive study.
• The best model should be presented only after performing the hyperparameter optimization and compared with the baseline model.
• The choice and number of hyperparameters to be optimized for the best model are design choices. Use experiment trackers like MLFlow or TensorBoard to present your study.
7. The performance of your proposed classifier should be evaluated using multiple metrics including backtesting of the predicted signal applied to a trading strategy.
• Investigate the prediction quality using AUC, confusion matrix, and classification report incl
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com