Final Assessment Brief: Spike Detection
Final Assessment Brief: Spike Detection
Modelling in Finance S2 2021 – Updated 25/09/2021
Dinh Tang
Background
The algorithm in front of you is used to detect spikes from time-series data.
This algorithm uses moving averages and moving standard deviations to identify
spikes. A spike is defined when a data point, xt is greater than n× σt from x̄t,
where n is an integer, x̄t and n× σt, are respectively the moving average and
moving standard deviation.
Recall from your basic statistics course that the Z score tells you how far a data
point is from the mean in units of standard deviation (in our case, the mean and
standard deviations are both moving, where the Z-score for xt is calculated as:
z(xt) =
(xt − x̄t)
σt
If the raw data point, xt, is above (below) the mean, then the Z-score is positive
(negative). If xt is equal to the mean, then the Z-score is 0. For example, a spike
can be identified when the moving Z-score for xt is greater than 3, which is the
same as saying when xt is greater than 3 × moving standard deviation from the
moving mean. Again, 3 is just an arbitrary number, and that number may not
be ideal for all datasets.
The image below is an illustration of the algorithm (see Figure 1):
1
Figure 1: In the top panel, the blue trace is the raw time-series data. The yellow
trace is the moving average based on a defined lag window. The purple trace is
the moving standard deviation based on a defined lag window. In the bottom
panel, the red lines are the spikes detected when the above-mentioned conditions
are true. A value of one is assigned when true and zero otherwise.
In neuroscience, an action potential (spike) from the same neuron cannot occur
consecutively due to the absolute refractory period of a neuron (the membrane
potential must reset before another action potential can fire again). In order to
correct for the VALID number of spikes, you must calculate the time between
spikes (also known as the interspike interval). For this assessment, it takes 50ms
for the membrane potential to reset. Therefore, if a spike occurs at x0, this same
neuron cannot spike again until after 50ms from x0. In the case that there are
false positive spikes, you will need to revert the signals back to 0.
For the purpose of the presentation (not marked by the automatic marking
system):
1. You must calculate the average inter-spike intervals starting from the very
first spike from the list of valid spikes.
2. Calculate the frequency/count of interspike intervals per given time bin to
generate an interspike interval histogram
• The time bins will be provided in a separate sheet, and the inter-
spike interval histogram will be automatically plotted to visualise the
temporal distribution of interspike intervals.
NB: (1) and (2) above will NOT be marked by the automatic marking system.
You will need to demonstrate this in the week 12 presentation. ALL calculations
must be completed in VBA.
2
Workbooks and Datasets
I have provided two sets of data:
1. The first is an electrophysiological recording of a single neuron. This is a
common trace in neuroscience where it illustrates the change in electrical
activity (measured in voltage) over time, with spikes denoting action
potentials. Each data point represents 10ms.
2. The second is historical daily closing price data of the ASX200 index from
1992 to this week. In this case, spikes would denote abnormal daily returns.
In addition, there are two spreadsheets on Moodle – one is for the Week 12
presentation and the other is for the submission for the automatic marking.
In the submission workbook, you are provided the electrophys dataset (1), and
you are only required to output the list of valid spikes and nothing more. Your
output should be binary with 0’s and 1’s, where 0 = no spike and 1 = spike
at t. Do not implement the interspike interval statistics or financial modelling
extension here. As for the presentation workbook, you are provided both the
electrophys (1) and ASX200 (2) dataset. You should copy your working code to
detect valid spikes over to this workbook to perform the following:
• Output the average interspike interval and histogram in sheet 1; and
• Provide your creative input to improve the usability of the detection of
abnormal returns in sheet 2.
Description of the function
The spike detection algorithm is called as a function and relies on four main
inputs:
1. Time series data
2. Lag window (number of data points for the rolling average and standard
deviation)
3. Threshold (n) – the number of moving standard deviations each observation
is away from the moving average
4. A weight parameter – I will let you work this one out yourself (it is
somewhere in the code) and plays an important role in this algorithm.
The function MUST be called as follows:
=(Data, Lag, Threshold, Weight)
3
Assessment
You will be assessed across weeks 11 and 12 on the content, quality of your
ideas/solutions and on your ability to enter into an intelligent and informed
discussion.
Q&A in week 11 (5 minutes)
This component focuses on your understanding of the idea (similar to a code
plan). I want you to, but not limited to:
1. Demonstrate your understanding of the problem;
2. Outline the missing condition and identify the bugs and issues with the
problem;
3. Present a decision tree on how you would isolate valid spikes – your decision
tree should be descriptive to highlight the algorithmic processes;
4. Propose methods to improve the efficiency of the code – there are redun-
dancies in the algorithm which slows it down; and
5. Suggest recommendations on what you think would be the optimal inputs
for the ASX 200 sheet and why?
Presentation in week 12 (5 minutes)
This week focuses on your code and solution quality.
1. The algorithm is poorly written and presented. Your job is to fix and
improve the code with respect to the general marking guide posted on
Moodle – see the written communication section;
2. Implement the average inter-spike interval and inter-spike interval his-
togram;
3. Explain the issues and improvements you have made; and
4. Modify the ASX200 sheet to improve its usability of the sheet. Be creative
and imagine you are designing this sheet for a client (me).
Spreadsheet submission due on Sunday, October 24
1. Make sure you submit the submission spreadsheet and NOT the pre-
sentation spreadsheet.
2. You should not modify the structure of the electrophysiological sheet; and
3. All your creative improvements on usability should be in the presentation
spreadsheet.
Again, the structure of how you want to present the information is up to you. If
you have any questions, feel free to post them on the discussion forum, and I will
respond either on the forum or in our email chain so everyone else can see them.
I look forward to seeing your work in weeks 11 and 12.
Good luck!
4
Background
Workbooks and Datasets
Description of the function
Assessment
Q&A in week 11 (5 minutes)
Presentation in week 12 (5 minutes)
Spreadsheet submission due on Sunday, October 24