Assignment – Automatically formulating trading rules
The assignment for this topic is going to look at the problem of automatically formulating trading rules using one of the techniques introduced in the lecture: GA, PSO, DE or ANN (or others of your own choice but check with me first). The choice of technique is up to you, as is the exact form of the problem you choose. The approaches covered in the lecture looked at:
• Predicting the high price (HP) and low price (LP) for the coming day, based upon various key factors from previous days such as opening price, closing price, high price, low price, volume, Relative Strength Index (RSI), and Exponential Moving Average (EMA). These likely values for HP and LP can then be built into trading rules which will automatically buy or sell assets at what should be the optimum point.
• Optimising the parameters for pre-existing trading rules, in particular the long, short and signal values for MACD.
You can investigate either of these or some other aspect of algorithmic trading of your own choice (but best check with me first).
With the above problems you will need to either train a neural network or evaluate a GA (or similar algorithm) so will need some notion of fitness or performance. This will be based upon profit: for whichever trading rule you are considering you will need to iterate through the training data and calculate your earnings (through returns or sales) on some notional amount of investment if your trading rule was to be invoked.
You will also need to keep some data in reserve for back-testing: i.e. keep some unseen historical data in reserve on which to evaluate the profitability of your final rule.
Also, as with forecasting and prediction, you will be working with time series data, so the size of the window needs to be considered (this will depend upon the characteristics of the data you are using).
Some Useful Resources
A useful package to assist in doing this is quantmod – designed to support the rapid development and evaluation of trading models. Amongst other things it makes getting hold of data very straightforward and also provides functions for the opening and closing prices, high and low values, volume etc. – just take a look at some of the examples.
Another value package is TTR, which again provides a myriad of function for building trading rules, but in particular ones for RSI and EMA. If you use quantmod then TTR will be installed by default as quantmod depends on it.
NOTE: A couple of hints…
When using the quantmod package the data is in a time series structure called “xts”. You can check this as follows (for example):
> getSymbols(“GOOG”,src=”yahoo”)
> str(GOOG) # this will give you the structure
> head(GOOG) #will give you the first few lines of the data
This means it has to be extracted for use within the fitness function. There are several ways to doing this but one fairly simple one is to use the coredata() function.
So if we wanted to extract the opening prices:
> OP <- coredata(GOOG$GOOG.Open)
If you check this using
> str(OP)
> head(OP)
You will see that this is a still a two dimensional vector with several rows and one column (check this using dim(), nrow() and ncol()) so we need to extract this further using [,1] . Or in one step:
> OP <- coredata(GOOG$GOOG.Open)[,1]
Which gives a simple dimensionless vector. Hope that helps (and you probably worked this out already). I suspect there may be a simpler way of doing this so don't hesitate to let me know!
Also, one of the other things you might find is that the volume values are too high for the GP to handle without losing accuracy so you could try scaling these down by several orders of magnitude.
Assignment Requirements
You write also need to submit a report (around 6-10 pages and worth 40 marks) covering:
• Background to the problem - a short review of the new approach you are going to employ and an overview of related work in the area drawn from key published papers (12 marks)
• Your choice of data - what did you choose to work with. As usual, summary plots would be welcome. (4 marks)
• Details of the approach taken and any specific decisions about the representation, fitness function etc. along with details about any key parameters. (4 marks)
• Presentation of and comments on the solutions achieved: How well do they fit the training data. What fitness values were achieved? (4 marks)
• The performance of the model. How does this perform over unseen data and what level of profit would it yield? (8 marks)
• Comparison against other approaches. The choice of what you compare with is yours. At least try a random solution or the performance of the mean value (if appropriate) and look at the performance obtained. If you wish you may also consider other approaches. (8)
• Please also include the R code, either as an appendix or in a separate file.