CS代考 FIN3018 students out of all QUB students—random?

Financial Econometrics and Data Science
Introduction to Econometrics & Statistical Foundations
Dr Ran Tao

Copyright By PowCoder代写 加微信 powcoder

1. Introduction
1.1 What is Econometrics?
1.2 Special Characteristics of Financial Data
1.3 Formulation of Econometric Models
1.4 Data Types & Data Aggregation
2. Statistical Foundations
2.1 Revision
2.2 Probability and Probability Distributions
2.3 Descriptive Statistics
2.4 Returns in Financial Modelling
2.5 Real and Nominal Series

1. Introduction
1.1 What is Econometrics?

1.1 What is Econometrics?
What is Econometrics?
Literal meaning is “measurement in economics.”
Definition of Financial Econometrics:
Econometrics is the application of statistical and mathematical techniques and models to problems in finance.
⇒ Very broad definition…
􏰁 What models and techniques are applied? 􏰁 What problems do we face in finance?

1.1 What is Econometrics?
Examples of the kind of problems that may be solved by an Econometrician:
1. Testing whether financial markets are weak-form informationally efficient.
2. Testing whether the CAPM or APT represent superior models for the determination of returns on risky assets.
3. Measuring and forecasting the volatility of bond returns.
4. Explaining the determinants of bond credit ratings used by
the ratings agencies.
5. Modelling long-term relationships between prices and exchange rates.

1.1 What is Econometrics?
6. Determining the optimal hedge ratio for a spot position in
7. Testing technical trading rules to determine which makes the most money.
8. Testing the hypothesis that earnings or dividend announcements have no effect on stock prices.
9. Testing whether spot or futures markets react more rapidly to news.
10. Forecasting the correlation between the returns to the stock indices of two countries.

1.2 Special Characteristics of Financial Data

1.2 Special Characteristics of Financial Data
What are special characteristics of financial data?
Financial data
􏰁 all financial and economic information about an economy
􏰁 GDP, growth rates of unemployment, consumption etc. 􏰁 all information about the participants of this economy
􏰁 firm data; stock prices and returns, dividends etc. 􏰁 bond prices, yields
􏰁 interest rates
􏰁 all financial information about the goods produced/processed/consumed in this economy
􏰁 commodity prices (energy, metals, agriculture, …) 􏰁 machines, cars, etc.
􏰁 directs back to price indices and GDP growth

1.2 Special Characteristics of Financial Data
Finance ̸= Economics
⇒ We focus on price data and calculate returns, volatilities,
correlations, …
⇒ Asset-perspective of finance (in contrast to a micro- or macroeconomic perspective in economics)

1.2 Special Characteristics of Financial Data
Frequency & quantity of data
􏰁 stock market prices are measured every time there is a trade or somebody posts a new quote.
􏰁 yields large amounts of data
􏰁 in many econometric models, daily data is used
􏰁 data aggregation
􏰁 recorded asset prices are usually those at which the transaction took place
􏰁 no possibility for measurement error but financial data are “noisy”

1.3 Formulation of Econometric Models

1.3 Formulation of Econometric Models
1a. Economic or financial theory (previous studies) 1b. Formulation of an estimable theoretical model
2. Collection of data
3. Model estimation
4. Is the model statistically adequate?
No Yes Reformulate model 5. Interpret model
6. Use for analysis

1.4 Data Types & Data Aggregation

1.4 Data Types & Data Aggregation
Data types
􏰁 There are 3 types of data which econometricians might use for analysis:
1. Time series data
2. Cross-sectional data
3. Panel data, a combination of 1. & 2
􏰁 The data may be quantitative (e.g. exchange rates, stock prices, number of shares outstanding), or qualitative (e.g. day of the week, month seasonality)
􏰁 Examples of time series data Series
GNP or unemployment government budget deficit money supply
value of a stock market index
monthly, or quarterly annually
as transactions occur

1.4 Data Types & Data Aggregation
Time Series versus Cross-sectional Data
􏰁 Examples of problems that could be tackled using a time series regression
– How the value of a country’s stock index has varied with that country’s macroeconomic fundamentals
– How the value of a company’s stock price has varied when it announced the value of its dividend payment
– The effect on a country’s currency of an increase in its interest rate
􏰁 Cross-sectional data are data on one or more variables collected at a single point in time, e.g.
– The age of investors who use internet stock broking services
– Cross-section of stock returns on the Stock
– A sample of bond credit ratings for UK banks

1.4 Data Types & Data Aggregation
Cross-sectional and Panel Data
􏰁 Examples of problems that could be tackled using a cross-sectional regression
– The relationship between company size and the return to investing in its shares
– The relationship between a country’s GDP level and the probability that the government will default on its sovereign debt
􏰁 Panel Data has the dimensions of both time series and cross-sections, e.g. the daily prices of a number of blue chip stocks over two years
􏰁 It is common to denote each observation by the letter t and the total number of observations by T for time series data, and to denote each observation by the letter i and the total number of observations by N for cross-sectional data

1.4 Data Types & Data Aggregation
Continuous and Discrete Data
􏰁 Continuous data can take on any value and are not confined to take specific numbers.
􏰁 Their values are limited only by precision.
– For example, the rental yield on a property could be 6.2%,
6.24%, or 6.238%.
􏰁 On the other hand, discrete data can only take on certain values, which are usually integers
– For instance, the number of people in a particular underground carriage or the number of shares traded during a day.
􏰁 They do not necessarily have to be integers (whole numbers) though, and are often defined to be count numbers.
– For example, until recently when they became ‘decimalised‘, many financial asset prices were quoted to the nearest 1/16 or 1/32 of a dollar.

1.4 Data Types & Data Aggregation
Cardinal, Ordinal and Nominal Numbers
􏰁 Another way in which we could classify numbers is according to whether they are cardinal, ordinal, or nominal
􏰁 Cardinal numbers are those where the actual numerical values that a particular variable takes have meaning, and where there is an equal distance between the numerical values
– Examples of cardinal numbers would be the price of a share or of a building, and the number of houses in a street.
􏰁 Ordinal numbers can only be interpreted as providing a position or an ordering.
– Thus, for cardinal numbers, a figure of 12 implies a measure that is ‘twice as good’ as a figure of 6. On the other hand, for an ordinal scale, a figure of 12 may be viewed as ‘better’ than a figure of 6, but could not be considered twice as good. Examples of ordinal numbers would be the position of a runner in a race.

1.4 Data Types & Data Aggregation
Cardinal, Ordinal and Nominal Numbers
􏰁 Nominal numbers occur where there is no natural ordering of the values at all.
– Such data often arise when numerical values are arbitrarily assigned, such as telephone numbers or when codings are assigned to qualitative data (e.g. when describing the exchange that a US stock is traded on).
􏰁 Cardinal, ordinal and nominal variables may require different modelling approaches or at least different treatments, as should become evident in the subsequent chapters.

Further reading:
Brooks (2019), Chapter 1, pp. 2-6, Chapter 2, pp. 63 – 66.

2. Statistical Foundations
2.1 Revision

2.1 Revision
The population and the sample
􏰁 The population is the total collection of all objects to be
􏰁 The population may be either finite or infinite, while a sample is a selection of just some items from the population.
􏰁 A population is finite if it contains a fixed number of elements.
􏰁 In general, either all of the observations for the entire population will not be available, or they may be so many in number that it is infeasible to work with them, in which case a sample of data is taken for analysis.

2.1 Revision
The population and the sample, cont’d
􏰁 The sample is usually random, and it should be
representative of the population of interest.
􏰁 A random sample is one in which each individual item in the population is equally likely to be drawn.

2.1 Revision
􏰁 FIN3018 students out of all QUB students—random?
􏰁 students sitting in the last row of this classroom out of all present students—random?
􏰁 random draw of 5 students of all FIN3018 students—random?

2.2 Probability and Probability Distributions

2.2 Probability and Probability Distributions
􏰁 A random variable can take any value from a given set
􏰁 A discrete random variable can take on only certain specific
values (e.g., the sum of two dice thrown)
􏰁 A probability is the likelihood of a particular event happening
􏰁 A probability distribution function shows the outcomes that are possible from a random process and how likely each one is to occur
􏰁 A continuous random variable can take any value (possibly only within a fixed range), and the probabilities associated with each range of outcomes is shown in a probability density function (pdf)

2.2 Probability and Probability Distributions
􏰁 The probability that a continuous variable takes on a specific value is always zero, since the variable could be defined to any arbitrary degree of accuracy (0.1 vs 0.0000001 etc.) and thus we can only calculate the probability that the variable lies within a particular range.
􏰁 There are many continuous distributions, including the uniform and the normal distribution.

2.2 Probability and Probability Distributions
The normal distribution
􏰁 The Normal (Gaussian) distribution is the most commonly used in statistics
􏰁 It has many desirable properties and is easy to work with
􏰁 It is unimodal (has only one peak) and symmetric
􏰁 The moments of a distribution describe its properties.
􏰁 The first two moments of a distribution are its mean and variance respectively
􏰁 Only knowledge of the mean and variance are required to completely describe the distribution

2.2 Probability and Probability Distributions
􏰁 A normal distribution has a skewness of zero and a kurtosis of 3 (excess kurtosis of zero)
􏰁 Skewness and kurtosis are the (standardised) third and fourth moments of the distribution respectively.

2.2 Probability and Probability Distributions
Definitions
􏰁 The formula for the pdf of a normal distribution is given by: f(y) = √ 1 e−(y−μ)2/2σ2
􏰁 A standard normally distributed variable can be constructed from any normal random variable by subtracting its mean (μ) and dividing by its standard deviation (σ):
Z = y − μ N (0, 1) σ
􏰁 The probability that a continuous random variable lies above a certain value (or below a certain value) is given by the cumulative density function (cdf)
􏰁 The cdf for a normal distribution has a sigmoid shape

2.2 Probability and Probability Distributions
A plot of the pdf for a normal distribution N(μ,σ) f (x)

2.2 Probability and Probability Distributions
Some comments on the central limit theorem (CLT)
􏰁 If we take N draws from a normally distributed random variable with population mean μ and variance σ2 then the sample mean will also be normally distributed with mean μ and variance σ2/N
􏰁 In fact, the sample mean of any random variable (whether normally distributed or not) will tend towards a normal distribution as the sample size tends to infinity.
􏰁 This is known as the central limit theorem.

2.2 Probability and Probability Distributions
Other important distributions
􏰁 Three other important continuous distributions are the chi-squared (χ2), the F and the t (sometimes known as Student’s t)
􏰁 The sum of squares of n independent normal distributions will be a χ2 distribution with n degrees of freedom
􏰁 The ratio of two independent χ2 distributions divided by their respective degrees of freedom n1 and n2 will be an
F -distribution with n1 and n2 degrees of freedom
􏰁 The t distribution tends to the normal as its degrees of freedom increase towards infinity

2.3 Descriptive Statistics

2.3 Descriptive Statistics
Usage of Descriptive Statistics
􏰁 in literature, descriptive statistics are a way to quickly outline the shape of the data
􏰁 What do we deal with? What are we looking into?
􏰁 this should support the model choice and methodology
􏰁 almost all papers with some econometric background have a descriptive statistics table in the data description

2.3 Descriptive Statistics
Measures of central tendency / Centrality
􏰁 The average value of a series is its measure of location or measure of central tendency, capturing its typical behaviour
􏰁 There are three broad methods to calculate the average value of a series: the mean, median and mode
􏰁 The mean is the very familiar sum of all N observations divided by N
􏰁 More strictly, this is known as the arithmetic mean 1 􏰑N
x=N xi i=1

2.3 Descriptive Statistics
R command for the mean of a series (exampleseries): mean(exampleseries)
Cumulative example at the end of this subsection.

2.3 Descriptive Statistics
􏰁 The mode is the most frequently occurring value in a set of observations
􏰁 The median is the middle value in a series when the observations are arranged in ascending order
Each of the three methods of calculating an average has advantages and disadvantages

2.3 Descriptive Statistics
R command for calculating the median of a series (exampleseries):
median(exampleseries)
Cumulative example at the end of this subsection.

2.3 Descriptive Statistics
Measures of spread
􏰁 spread of a series around its mean value can be measured using the variance or standard deviation (which is the square root of the variance)
􏰁 this quantity is an important measure of risk in finance
􏰁 standard deviation scales with the data, whereas the variance
scales with the square of the data
􏰁 if the units of the data points are US dollars, the standard deviation will also be measured in dollars whereas the variance will be in dollars squared

2.3 Descriptive Statistics
R command for calculating the standard deviation of a series (exampleseries):
sd(exampleseries)
Cumulative example at the end of this subsection.

2.3 Descriptive Statistics
􏰁 Other measures of spread include the range (the difference between the largest and smallest of the data points) and the semi-interquartile range (the difference between the first and third quartile points in the series)
􏰁 The coefficient of variation divides the standard deviation by the sample mean to obtain a unit-free measure of spread that can be compared across series with different scales.

2.3 Descriptive Statistics
Higher moments
The higher moments of a data sample give further indications of its features and shape.
􏰁 Skewness is the standardised third moment of a distribution and indicates the extent to which it is asymmetric
􏰌􏰂X − μ􏰃3􏰍 γs = E σ
􏰁 Kurtosis is the standardised fourth moment and measures whether a series is fat or thin tailed
􏰌􏰂X − μ􏰃4􏰍 σ
􏰁 Skewness can be positive or negative while kurtosis can only be positive
kurtosis = E

2.3 Descriptive Statistics
R command for calculating the standard skewness of a series (exampleseries) requires to load library(moments) first and then executing the following command: skewness(exampleseries)
The kurtosis is calculated with:
kurtosis(exampleseries)
Cumulative example at the end of this subsection.

2.3 Descriptive Statistics
Plot of a skewed series versus a normal distribution
f (x) f (x)

2.3 Descriptive Statistics
Plot of a leptokurtic series versus a normal distribution
–5.4 –3.6 –1.8 0.0
1.8 3.6 5.4

2.3 Descriptive Statistics
R example of summary statistics of a series v:
v <- c(5,6,6,5.5,7,5.4,6.1,5.6,6.8,7.2,4.9) 2.3 Descriptive Statistics R output of the density plot of v: 2.4 Returns in Financial Modelling 2.4 Returns in Financial Modelling simple returns vs. continuous/logarithmic returns let Pt−1 > 0 and Pt > 0 be two consecutive prices
(e.g. tick-prices, daily, or monthly prices, etc.), then:
simple returns
Rt:=Pt−Pt−1=Pt −1 ∈(−1,∞) Pt−1 Pt−1
logarithmic returns1
rt :=ln(1+Rt)=ln P
∈ (−∞, +∞)
1we use the natural logarithm (ln), which is only defined for values greater than zero
􏰂 Pt 􏰃 t−1
=ln(Pt)−ln(Pt−1)

2.4 Returns in Financial Modelling
R command for calculating the logarithmic returns of a price series (prices):
returns = c(NA,100*diff(log(prices)))

2.4 Returns in Financial Modelling
Now we face a serious problem:
􏰁 which of the two, simple (Rt) or log-returns (rt), is to be assumed normally distributed?
􏰁 but first: what does normally distributed mean?

2.4 Returns in Financial Modelling
Normal or Gaussian distribution
􏰁 refers to a random variable
􏰁 if this random variable follows a Normal distribution, it is
said to be normally distributed
􏰁 the Normal distribution is one of the simplest probability distributions
􏰁 it is characterized by two variables: 􏰁 expected value: μ ∈ R
􏰁 variance σ2 > 0
􏰁 if you draw from a normal distribution, you should obtain a value around the mean, though it varies depending on the variance
􏰁 we obtain a bell curve with many draws

2.4 Returns in Financial Modelling
Normal or Gaussian distribution
Example: standard Normal distribution N (0, 1)
⇒ mean 0 and variance 1, if we draw from N (0, 1) we obtain values around 0, which vary
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1
Density plot for N(0,1), N(1,1), and N(0,0.5)
-4 -3 -2 -1 0 1 2 3 4

2.4 Returns in Financial Modelling
R command for plotting the density of series (here for a return series returns):
plot(density(returns))

2.4 Returns in Financial Modelling
Which return should be assumed to be normally distributed?
􏰁 Markowitz and the CAPM assumes that Rt ∼ N (μt, σt2), hence simple returns are normally distributed
􏰁 problem:
􏰁 Rt ∈ (−1,∞), but with Rt ∼ N(μt,σt2) it holds that
P(Rt <−1)>0
􏰁 with this assumption, negative prices are possible =⇒
contradiction
􏰁 returns have to be limited at −100%, which would be a total
􏰁 if we assume simple returns to be normally distributed, we
can loose more than 100%
􏰁 for shares, this does not make any sense
=⇒ we hence assume that logarithmic returns are normally distributed: rt ∼ N (μt, σt2).

2.4 Returns in Financial Modelling
Log Returns
􏰁 The returns are also known as log price relatives
1. They have the nice property that they can be interpreted as
continuously compounded returns.
2. Can add them up, e.g. if we want a weekly return and we have calculated daily log returns:
r1=lnp1/p0 r2=lnp2/p1 r3=lnp3/p2 r4=lnp4/p3 r5=lnp5/p4
=lnp1−lnp0 =lnp2−lnp1 =lnp3−lnp2 =lnp4−lnp3 =lnp5−lnp4
lnp5 −lnp0 =lnp5/p0

2.4 Returns in Financial Modelling
A Disadvantage of using Log Returns
􏰁 There is a disadvantage of using the log-returns. The simple return on a portfolio of assets is a weighted average of the simple returns on the individual assets:
Rpt =􏰑wiRit
􏰁 But this does not work for the continuously compounded returns.

2.5 Real and Nominal Series

2.5 Real and Nominal Series
Real Versus Nominal Series
􏰁 The general level of prices has a tendency to rise most of the time because of inflation
􏰁 We may wish to transform nominal series into real ones to adjust them for inflation
􏰁 This is called deflating a series or displaying a series at