CS计算机代考程序代写 finance Non-Parametrics

Non-Parametrics
Chris Hansman
Empirical Finance: Methods and Applications Imperial College Business School
March 1-2, 2021
1/46

Non-Parametrics
1. Kernel Density Estimation 2. Non-Parametric Regression
2/46

Kernel Density Estimation
1. Parametric vs. non-parametric approaches 2. Histograms and the uniform kernel
3. Different bandwidths
4. Different kernels
3/46

Estimating Densities
􏰒 Suppose we see n=100 draws from a continuous random variable X: x1,x2 ··· ,xn
􏰒 We are often interested in the distribution of X: 􏰒 CDF:
􏰒 PDF:
FX(u)=P(X ≤u) fX (u) = dFX (u)
du
􏰒 How do we uncover the distribution of X from the data?
4/46

Scatter Plot of x1,x2 ··· ,xn
􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣
5/46
􏰚􏰛􏰜􏰝􏰞􏰟􏰠
􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙

Estimating Densities
􏰒 How do we uncover the distribution of X from the data? x1,x2 ··· ,xn
􏰒 Parametric Approach
􏰒 One strategy is to assume we know the form of the distribution
􏰒 e.g. Normal or χ2
􏰒 But we don’t know the particular parameters:
􏰒 Use the data to estimate the unknown parameters
􏰒 For example: we know X ∼ N(μ,σ2), but we don’t know μ or σ2
􏰒 Estimate: μˆ = ∑n xi i=1 n
􏰒 Estimate: σˆ2 = ∑n (xi −μˆ)2 i=1 n−1
􏰒 PlotN(μˆ,σˆ2)
6/46

Normal Density with Estimated μˆ = −0.75, σˆ = 9.24
􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣
7/46
􏰚􏰛􏰜􏰝􏰞􏰟􏰠
􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙

Downsides of the Parametric Approach
􏰒 In practice, we often don’t know the underlying distribution 􏰒 e.g. The assumption of normality may provide a very bad fit
􏰒 Non-parametric Approach
􏰒 No assumptions about underlying distribution
􏰒 Recover directly from the data 􏰒 Simplest form: histogram
8/46

Histogram Built from x1, x2 · · · , xn
􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣
9/46
􏰚􏰛􏰜􏰝􏰞􏰟􏰠
􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙

Histograms
􏰒 Histograms (appropriately scaled) provide a non-parametric approach to the density
􏰒 But a few downsides:
􏰒 Doesn’t provide a smooth, continuous distribution
􏰒 Lots of holes in the distribution when choice of bin is small 􏰒 Uninformative when bins are big
10/46

Histogram Built from x1,x2 ··· ,xn: Bin Size=1
􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣
11/46
􏰚􏰛􏰜􏰝􏰞􏰟􏰠
􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙

Histogram Built from x1,x2 ··· ,xn: Bin Size=5
􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣
12/46
􏰚􏰛􏰜􏰝􏰞􏰟􏰠
􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙

Histogram Built from x1,x2 ··· ,xn: Bin Size=20
􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣
13/46
􏰚􏰛􏰜􏰝􏰞􏰟􏰠
􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙

Kernel Density Estimation: Uniform Kernel
􏰒 To uncover smoother non-parametric densities we use a technique called kernel density estimation
􏰒 Many different versions (“choices of kernel”) but lets start with one very similar to a histogram
􏰒 Suppose we are interested in estimating fˆ(u) for any u 􏰒 First, lets count how many xi are “near” u
􏰒 We’ll define “near” as within 1 of u in either direction: 2
n􏰦 1􏰧 number of xi near u = ∑1 |u−xi|≤ 2
i=1
u – 1/2
u
u + 1/2
14/46

Kernel Density Estimation: Uniform Kernel
􏰒 To turn this count into a density, just scale by n: ˆ1n􏰦 1􏰧
f(u)=n∑1 |u−xi|≤2 i=1
􏰒 Average number of xi near u (per unit of x) scaled by n observations 􏰒 A density
􏰒 Note that 􏰨∞ =fˆ(u)du=1 −∞
15/46

Kernel Density Estimation: Uniform Kernel
􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣
16/46
􏰚􏰛􏰜􏰝􏰞􏰟􏰠
􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙

Kernel Density Estimation: Uniform Kernel
􏰒 Naturally, can adjust definition of “near” depending on the context 􏰒 For example, define “near” as within 1 of u in either direction:
i=1
􏰒 Doubling “near” ⇒ divide by 2 to keep things comparable
n 1{|u−xi|≤1}

i=1
􏰒 Number of xi near u per unit of x. 􏰒 To get a density:
fˆ(u)= 1 ∑n 1{|u−xi|≤1} n i=1 2
n
∑1 |u−xi|≤1
􏰦􏰧
2
17/46

Kernel Density Estimation: Uniform Kernel
􏰒 We call the function:
the uniform (or box, or rectangular) kernel 􏰒 Note that above we evaluate:
K(u−xi)= 1{|u−xi|≤1} 2
􏰒 We can write the density in terms of the kernel: ˆ1n
K(z) = 1{|z| ≤ 1} 2
f(u)= n ∑K(u−xi) i=1
18/46

What Defines a Kernel?
􏰒 Typically, a kernel is a function K(·) that satisfies two properties: 1. K(·) integrates to 1;
􏰩∞
K(z)dz = 1
−∞ 2. Symmetry: K(−z) = K(z)
􏰒 You can think of it as a weighting function
19/46

Kernel Density Estimation: Different Bandwidths
K(u−xi)= 1{|u−xi|≤1} 2
􏰒 By adjusting definition of “near” u, we get smoother densities: 􏰒 For example, define “near” as within 3:
n
Numberofxi within3ofu =∑1 3 ≤1
􏰒 Average number of xi near u (per unit):
numberofxi nearu 1 n 􏰦|u−xi| 􏰧 1 n 􏰉u−xi􏰊
unit =6∑1 3 ≤1=3∑K 3 i=1 i=1
􏰦|u−xi| 􏰧 i=1
􏰒 Then we can estimate the density as:
ˆ 11n􏰉u−xi􏰊
f (u) = n · 3 ∑ K 3 i=1
20/46

Uniform Kernel Density Estimation: Bandwidth=3
􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣
21/46
􏰚􏰛􏰜􏰝􏰞􏰟􏰠
􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙

Kernel Density Estimation: Different Bandwidths
K(ui −xi) = 1{|ui −xi| ≤ 1} 2
􏰒 In general, we can estimate our density as:
ˆ 11n 􏰉ui−xi􏰊
fh(u) = n · h ∑ K h i=1
􏰒 we call h the bandwidth
􏰒 Larger bandwidth ⇒ smoother
􏰒 Note that for any choice of h:
1n
= n · ∑Kh(ui −xi)
i=1
􏰩∞ˆ
fh(u)du = 1 −∞
22/46

Kernel Density Estimation: Bandwidth=6
􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣
23/46
􏰚􏰛􏰜􏰝􏰞􏰟􏰠
􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙

Kernel Density Estimation: Different Kernels
􏰒 The uniform kernel is one of the simplest:
1 n 1 n 1{|ui−xi|≤1} n ∑K(ui −xi)= n ∑ 2
i=1 i=1
􏰒 Many other choices of kernel that do a better job
􏰒 In fact, can choose any function K(z) such that: 􏰩∞
K(z)dz = 1 −∞
􏰒 Common choice: Gaussian
1 −1z2 K(z)=φ(z)=√ e 2

24/46

Kernel Density Estimation: Different Kernels
􏰒 For any choice of Kh: Kh(u−xi) gives a weight for observation xi
􏰒 Uniform (h=1)
􏰒 Weight=1 if xi is within 1 of u 2
􏰒 0 Otherwise
􏰒 Gaussian
􏰒 Weight is positive for all xi
􏰒 But declines depending on distance from u
􏰒 By taking the average of these weights (across all xi ), we get an estimate of the density at any point u
ˆ1n
fh(u)= n ∑Kh(u−xi)
i=1
25/46

Different Kernels
26/46

Different Kernels
􏰚􏰛􏰜􏰝􏰞􏰟􏰠
􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙
􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣
􏰪􏰜􏰞􏰫􏰬􏰭􏰮 􏰯􏰰􏰱􏰝􏰝􏰞􏰰􏰜 􏰳􏰴􏰰􏰜􏰛􏰵􏰶􏰜􏰞􏰷􏰬􏰸 􏰚􏰰􏰟􏰰
27/46

Kernel Density Estimation: Epanechnikov
􏰒 A frequently used kernel is the Epanechnikov: 􏰉􏰊

3 1−1z2 ift2<5 K(z)= 4 5 5 0 otherwise 􏰒 Optimal under certain assumptions – default in many softwares 􏰒 But the difference between this and, e.g. Gaussian is not huge 28/46 Different Kernels 􏰚􏰛􏰜􏰝􏰞􏰟􏰠 􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙 􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣 􏰪􏰜􏰞􏰫􏰬􏰭􏰮 􏰯􏰰􏰱􏰝􏰝􏰞􏰰􏰜 􏰳􏰴􏰰􏰜􏰛􏰵􏰶􏰜􏰞􏰷􏰬􏰸 􏰚􏰰􏰟􏰰 29/46 Kernel Density Estimation: Bandwidth Choice 􏰒 Choice of bandwidth (h) often matters a lot more 􏰒 Many different approaches to choose bandwidth optimally 􏰒 Most software will have a decent bandwidth choice built in as default 􏰒 One rule of thumb (works well when underlying data is normal) is −1 h=1.06σˆn 5 30/46 Bandwidth (h) too Big 􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣 31/46 􏰚􏰛􏰜􏰝􏰞􏰟􏰠 􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙 Bandwidth (h) too Small 􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣 32/46 􏰚􏰛􏰜􏰝􏰞􏰟􏰠 􏰓 􏰔􏰓􏰕 􏰔􏰓􏰖 􏰔􏰓􏰗 􏰔􏰓􏰘 􏰔􏰙 −1 Bandwidth: h=1.06 σˆn 5 􏰡􏰕􏰓 􏰡􏰙􏰢 􏰡􏰙􏰓 􏰡􏰢 􏰓 􏰢 􏰙􏰓 􏰙􏰢 􏰕􏰓 􏰣 33/46 􏰚􏰛􏰜􏰝􏰞􏰟􏰠 􏰓 􏰔􏰓􏰕􏰢 􏰔􏰓􏰢 􏰔􏰓􏰹􏰢 􏰔􏰙 Kernel Density Estimation 􏰒 Write a kernel density estimator with: 􏰒 Epanechnikov kernal 􏰒 Rule of thumb bandwidth 􏰒 what is fˆ(0)? 􏰒 Hints: 􏰒 The Epanechnikov function can be accessed by installing the “kader” package 􏰒 You can implement it with kader:::epanechnikov(x) 34/46 Non-Parametric Regression 1. Nearest-Neighbors 2. Nadaraya-Watson 3. Local Polynomial Regression 35/46 Non-Parametric Regression 􏰒 Given y and x, we previously wrote: y =E[y|x]+ε 􏰒 Showed that OLS estimator provides best linear approximation of the conditional mean function 􏰒 In many settings, E [y |x ] = h(x ) is a non-linear function 􏰒 Known functional form: can often use OLS to estimate parameters 􏰒 For example: β0,β1,β2 in: E[y|x] = β0 +β1x +β2x2 􏰒 Or other methods if non additive 􏰒 But we often do not know the functional form 36/46 Non-Parametric Regression h(u)=E[y|u] 􏰒 Given data: (y1,x1),(y2,x2),···(yn,xn) 􏰒 Goal: estimate hˆ(u) without knowing the functional form 􏰒 Simple approach: Nearest Neighbors (local averaging) 􏰒 For any u, define Ku as the set of K individuals with xi nearest to u 􏰒 The K observations (yi,xi) with the smallest values of |u−xi| 􏰒 For any point x, define: hˆ ( u ) = ∑ y i i∈Ku K 37/46 Nearest Neighbors Regression (K=3) X1 X2X3uX4 X5 38/46 Nearest Neighbors Regression (K=3) X1 X2X3uX4 X5 38/46 Nearest Neighbors Regression (K=3) y2 y4 y3 X1 X2X3uX4 X5 38/46 Nearest Neighbors Regression (K=3) y2 y4 y3 X1 X2X3uX4 X5 38/46 Nearest Neighbors Regression (K=3) h(u) y2 y4 y3 X1 X2X3uX4 X5 38/46 Nearest Neighbors Regression (K=3) h(u)= y2+y3+y4 3 y4 y3 y2 X1 X2X3uX4 X5 38/46 Non-Parametric Regression: Nearest Neighbors hˆ ( u ) = ∑ y i i∈Ku K 􏰒 Downsides of nearest neighbors: 􏰒 Problems in extremes: suppose h(u) is an increasing function: 􏰒 For small u all of the neighbors will be above u 􏰒 But slightly larger u has almost exactly the same neighbors 􏰒 Awkward flattening in the extremes 􏰒 Big jumps as large values of yi enter Ku 39/46 Non-Parametric Regression: Nadaraya-Watson 􏰒 Instead of averaging yi for xi close to u, weighted average of all yi n hˆ(u) = ∑ ωi (u)yi i=1 􏰒 With ∑ni=1ωi(u)=1 for any u 􏰒 If we choose ωi(u)= 1 for all i,u, we get: n ˆn1 h ( u ) = ∑ n y i = y ̄ i=1 􏰒 Not very informative 􏰒 Choose ωi that give higher weight for observations with xi close to u 40/46 Non-Parametric Regression: Nadaraya-Watson n hˆ(u) = ∑ ωi (u)yi i=1 􏰒 Choose ωi that give higher weight for observations with xi close to x: 􏰒 What about: 􏰉u−xi 􏰊 ωi(u)=K h 􏰒 Where K(·) is a kernel (e.g gaussian) 􏰒 Gives higher weight to observations with xi close to x 􏰒 Does not necessarily sum to 1... 􏰒 Solution: K􏰤u−xi 􏰥 h ωi(x)= n 􏰤u−xi􏰥 ∑i=1K h 41/46 Non-Parametric Regression: Nadaraya-Watson n hˆ(u) = ∑ ωi (u)yi i=1 􏰒 Want to choose ωi that give higher weight for observations with xi close to u: 􏰒 Hence, the Nadaraya-Watson Estimator for the point u is: n K􏰤u−xi 􏰥 h ˆ h(u) = ∑ ∑n K􏰤u−xi 􏰥yi i=1 i=1 h 42/46 Non-Parametric Regression: Local Polynomial Regression h(x) = E[y|x] 􏰒 Consider a Taylor expansion at the point x ̃ close to u: h(x ̃) ≈ h(u)+h(1)(u)(x ̃−u)+ h(2)(u)(x ̃−x)2 +···+ h(p)(u)(x ̃−x)p 2! p! = β 0 + β 1 ( x ̃ − x ) + β 2 ( x ̃ − x ) 2 + · · · + β p ( x ̃ − x ) p 􏰒 Where β0 = h(u), β1 = h′(u),··· 􏰒 Key idea: estimate βˆ0 at any u. 43/46 Non-Parametric Regression: Local Polynomial Regression 􏰒 Key idea: estimate βˆ0 at any u. 􏰒 Ifallxi areclosetouthen yi ≈ β0 +β1(xi −u)+β2(xi −u)2 +···+βp(xi −u)p 􏰒 Can just run a regression (for any given point u) n􏰋 􏰌2 min ∑ yi −β0 −β1(xi −u)−···−βp(xi −u)p β i=1 􏰒 Estimate unknown βs (and specifically) β0 􏰒 This is just a regression of yi on (xi −u), (xi −u)2,···, (xi −u)p 􏰒 βˆ = hˆ ( u ) 􏰒 A different regression for each u 0 44/46 Non-Parametric Regression: Local Polynomial Regression n􏰋 􏰌2 min ∑ yi −β0 −β1(x −xi)−···−βp(x −xi)p β i=1 􏰒 Of course, as xi gets far from u, the approximation is bad 􏰒 Solution: give more weight locally: n􏰋 p􏰌2 􏰉xi−u􏰊 min ∑ yi −β0 −β1(xi −u)−···−βp(xi −u) β i=1 􏰒 Where K(·) is some kernel 􏰒 If p=0 this is just Nadaraya-Watson 􏰒 If p=1 is local linear regression K h 45/46 Non-Parametrics 1. Kernel Density Estimation 2. Non-Parametric Regression 46/46