计算机代考 Non-Parametrics

Non-Parametrics
Chris Hansman
Empirical Finance: Methods and Applications Imperial College Business School
Feb. 28-March 1, 2022

Copyright By PowCoder代写 加微信 powcoder

Non-Parametrics
1. Kernel Density Estimation 2. Non-Parametric Regression

Kernel Density Estimation
1. Parametric vs. non-parametric approaches 2. Histograms and the uniform kernel
3. Different bandwidths
4. Different kernels

Estimating Densities
􏰀 Suppose we see n=100 draws from a continuous random variable X: x1,x2 ··· ,xn
􏰀 We are often interested in the distribution of X: 􏰀 CDF:
FX(u)=P(X ≤u) fX (u) = dFX (u)
􏰀 How do we uncover the distribution of X from the data?

Scatter Plot of x1,x2 ··· ,xn
􏰟􏰓􏰑 􏰟􏰗􏰠 􏰟􏰗􏰑 􏰟􏰠 􏰑 􏰠 􏰗􏰑 􏰗􏰠 􏰓􏰑 􏰡
􏰑 􏰒􏰑􏰓 􏰒􏰑􏰔 􏰒􏰑􏰕 􏰒􏰑􏰖 􏰒􏰗

Estimating Densities
􏰀 How do we uncover the distribution of X from the data? x1,x2 ··· ,xn
􏰀 Parametric Approach
􏰀 One strategy is to assume we know the form of the distribution
􏰀 e.g. Normal or χ2
􏰀 But we don’t know the particular parameters:
􏰀 Use the data to estimate the unknown parameters
􏰀 For example: we know X ∼ N(μ,σ2), but we don’t know μ or σ2
􏰀 Estimate: μˆ = ∑n xi i=1 n
􏰀 Estimate: σˆ2 = ∑n (xi −μˆ)2 i=1 n−1
􏰀 PlotN(μˆ,σˆ2)

Normal Density with Estimated μˆ = −0.75, σˆ = 9.24
􏰟􏰓􏰑 􏰟􏰗􏰠 􏰟􏰗􏰑 􏰟􏰠 􏰑 􏰠 􏰗􏰑 􏰗􏰠 􏰓􏰑 􏰡
􏰑 􏰒􏰑􏰓 􏰒􏰑􏰔 􏰒􏰑􏰕 􏰒􏰑􏰖 􏰒􏰗

Downsides of the Parametric Approach
􏰀 In practice, we often don’t know the underlying distribution 􏰀 e.g. The assumption of normality may provide a very bad fit
􏰀 Non-parametric Approach
􏰀 No assumptions about underlying distribution
􏰀 Recover directly from the data 􏰀 Simplest form: histogram

Histogram Built from x1, x2 · · · , xn
􏰟􏰓􏰑 􏰟􏰗􏰠 􏰟􏰗􏰑 􏰟􏰠 􏰑 􏰠 􏰗􏰑 􏰗􏰠 􏰓􏰑 􏰡
􏰑 􏰒􏰑􏰓 􏰒􏰑􏰔 􏰒􏰑􏰕 􏰒􏰑􏰖 􏰒􏰗

Histograms
􏰀 Histograms (appropriately scaled) provide a non-parametric approach to the density
􏰀 But a few downsides:
􏰀 Doesn’t provide a smooth, continuous distribution
􏰀 Lots of holes in the distribution when choice of bin is small 􏰀 Uninformative when bins are big

Histogram Built from x1,x2 ··· ,xn: Bin Size=1
􏰟􏰓􏰑 􏰟􏰗􏰠 􏰟􏰗􏰑 􏰟􏰠 􏰑 􏰠 􏰗􏰑 􏰗􏰠 􏰓􏰑 􏰡
􏰑 􏰒􏰑􏰓 􏰒􏰑􏰔 􏰒􏰑􏰕 􏰒􏰑􏰖 􏰒􏰗

Histogram Built from x1,x2 ··· ,xn: Bin Size=5
􏰟􏰓􏰑 􏰟􏰗􏰠 􏰟􏰗􏰑 􏰟􏰠 􏰑 􏰠 􏰗􏰑 􏰗􏰠 􏰓􏰑 􏰡
􏰑 􏰒􏰑􏰓 􏰒􏰑􏰔 􏰒􏰑􏰕 􏰒􏰑􏰖 􏰒􏰗

Histogram Built from x1,x2 ··· ,xn: Bin Size=20
􏰟􏰓􏰑 􏰟􏰗􏰠 􏰟􏰗􏰑 􏰟􏰠 􏰑 􏰠 􏰗􏰑 􏰗􏰠 􏰓􏰑 􏰡
􏰑 􏰒􏰑􏰓 􏰒􏰑􏰔 􏰒􏰑􏰕 􏰒􏰑􏰖 􏰒􏰗

Kernel Density Estimation: Uniform Kernel
􏰀 To uncover smoother non-parametric densities we use a technique called kernel density estimation
􏰀 Many different versions (“choices of kernel”) but lets start with one very similar to a histogram
􏰀 Suppose we are interested in estimating fˆ(u) for any u 􏰀 First, lets count how many xi are “near” u
􏰀 We’ll define “near” as within 12 of u in either direction:
n􏰇 1􏰈 number of xi near u = ∑1 |u−xi|≤ 2

Kernel Density Estimation: Uniform Kernel
􏰀 To turn this count into a density, just scale by n: ˆ1n􏰇 1􏰈
f(u)=n∑1 |u−xi|≤2 i=1
􏰀 Average number of xi near u (per unit of x) scaled by n observations 􏰀 A density
􏰀 Note that 􏰲∞ =fˆ(u)du=1 −∞

Kernel Density Estimation: Uniform Kernel
􏰟􏰓􏰑 􏰟􏰗􏰠 􏰟􏰗􏰑 􏰟􏰠 􏰑 􏰠 􏰗􏰑 􏰗􏰠 􏰓􏰑 􏰡
􏰑 􏰒􏰑􏰓 􏰒􏰑􏰔 􏰒􏰑􏰕 􏰒􏰑􏰖 􏰒􏰗

Kernel Density Estimation: Uniform Kernel
􏰀 Naturally, can adjust definition of “near” depending on the context 􏰀 For example, define “near” as within 1 of u in either direction:
􏰀 Doubling “near” ⇒ divide by 2 to keep things comparable
n 1{|u−xi|≤1}
􏰀 Number of xi near u per unit of x. 􏰀 To get a density:
fˆ(u)= 1 ∑n 1{|u−xi|≤1} n i=1 2
∑1 |u−xi|≤1

Kernel Density Estimation: Uniform Kernel
􏰀 We call the function:
K(z) = 1{|z| ≤ 1} 2
the uniform (or box, or rectangular) kernel 􏰀 Note that above we evaluate:
K(u−xi)= 1{|u−xi|≤1} 2
􏰀 We can write the density in terms of the kernel: ˆ1n
f(u)= n ∑K(u−xi) i=1

What Defines a Kernel?
􏰀 Typically, a kernel is a function K(·) that satisfies two properties: 1. K(·) integrates to 1;
K(z)dz = 1
−∞ 2. Symmetry: K(−z) = K(z)
􏰀 You can think of it as a weighting function

Kernel Density Estimation: Different Bandwidths
K(u−xi)= 1{|u−xi|≤1} 2
􏰀 By adjusting definition of “near” u, we get smoother densities: 􏰀 For example, define “near” as within 3:
Numberofxi within3ofu =∑1 3 ≤1
unit =6∑1 3 ≤1=3∑K 3 i=1 i=1
􏰇|u−xi| 􏰈 i=1
􏰀 Average number of xi near u (per unit):
numberofxi nearu 1 n 􏰇|u−xi| 􏰈 1 n 􏰃u−xi􏰄
􏰀 Then we can estimate the density as:
ˆ 11n􏰃u−xi􏰄
f (u) = n · 3 ∑ K 3 i=1

Uniform Kernel Density Estimation: Bandwidth=3
􏰟􏰓􏰑 􏰟􏰗􏰠 􏰟􏰗􏰑 􏰟􏰠 􏰑 􏰠 􏰗􏰑 􏰗􏰠 􏰓􏰑 􏰡
􏰑 􏰒􏰑􏰓 􏰒􏰑􏰔 􏰒􏰑􏰕 􏰒􏰑􏰖 􏰒􏰗

Kernel Density Estimation: Different Bandwidths
K(ui −xi) = 1{|ui −xi| ≤ 1} 2
􏰀 In general, we can estimate our density as:
ˆ 11n 􏰃ui−xi􏰄
fh(u) = n · h ∑ K h i=1
􏰀 we call h the bandwidth
􏰀 Larger bandwidth ⇒ smoother
􏰀 Note that for any choice of h:
= n · ∑Kh(ui −xi)
fh(u)du = 1 −∞

Kernel Density Estimation: Bandwidth=6
􏰟􏰓􏰑 􏰟􏰗􏰠 􏰟􏰗􏰑 􏰟􏰠 􏰑 􏰠 􏰗􏰑 􏰗􏰠 􏰓􏰑 􏰡
􏰑 􏰒􏰑􏰓 􏰒􏰑􏰔 􏰒􏰑􏰕 􏰒􏰑􏰖 􏰒􏰗

Kernel Density Estimation: Different Kernels
􏰀 The uniform kernel is one of the simplest:
1 n 1 n 1{|ui−xi|≤1} n ∑K(ui −xi)= n ∑ 2
􏰀 Many other choices of kernel that do a better job
􏰀 In fact, can choose any function K(z) such that: 􏰊∞
K(z)dz = 1 −∞
􏰀 Common choice: Gaussian
1 −1z2 K(z)=φ(z)=√ e 2

Kernel Density Estimation: Different Kernels
􏰀 For any choice of Kh: Kh(u−xi) gives a weight for observation xi
􏰀 Uniform (h=1)
􏰀 Weight=12 if xi is within 1 of u 􏰀 0 Otherwise
􏰀 Gaussian
􏰀 Weight is positive for all xi
􏰀 But declines depending on distance from u
􏰀 By taking the average of these weights (across all xi ), we get an
estimate of the density at any point u ˆ1n
fh(u)= n ∑Kh(u−xi) i=1

Different Kernels

Different Kernels
􏰟􏰓􏰑 􏰟􏰗􏰠 􏰟􏰗􏰑 􏰟􏰠 􏰑 􏰠 􏰗􏰑 􏰗􏰠 􏰓􏰑 􏰡
􏰢􏰚􏰜􏰣􏰤􏰥􏰦 􏰧􏰨􏰩􏰛􏰛􏰜􏰨􏰚 􏰫􏰬􏰨􏰚􏰙􏰭􏰮􏰚􏰜􏰯􏰤􏰰 􏰘􏰨􏰝􏰨
􏰑 􏰒􏰑􏰓 􏰒􏰑􏰔 􏰒􏰑􏰕 􏰒􏰑􏰖 􏰒􏰗

Kernel Density Estimation: Epanechnikov
􏰀 A frequently used kernel is the Epanechnikov: 􏰃􏰄
3 1−1z2 ift2<5 K(z)= 4 5 5 0 􏰀 Optimal under certain assumptions – default in many softwares 􏰀 But the difference between this and, e.g. Gaussian is not huge Different Kernels 􏰟􏰓􏰑 􏰟􏰗􏰠 􏰟􏰗􏰑 􏰟􏰠 􏰑 􏰠 􏰗􏰑 􏰗􏰠 􏰓􏰑 􏰡 􏰢􏰚􏰜􏰣􏰤􏰥􏰦 􏰧􏰨􏰩􏰛􏰛􏰜􏰨􏰚 􏰫􏰬􏰨􏰚􏰙􏰭􏰮􏰚􏰜􏰯􏰤􏰰 􏰘􏰨􏰝􏰨 􏰑 􏰒􏰑􏰓 􏰒􏰑􏰔 􏰒􏰑􏰕 􏰒􏰑􏰖 􏰒􏰗 Kernel Density Estimation: Bandwidth Choice 􏰀 Choice of bandwidth (h) often matters a lot more 􏰀 Many different approaches to choose bandwidth optimally 􏰀 Most software will have a decent bandwidth choice built in as default 􏰀 One rule of thumb (works well when underlying data is normal) is h = 1.06σˆn− 15 Bandwidth (h) too Big 􏰟􏰓􏰑 􏰟􏰗􏰠 􏰟􏰗􏰑 􏰟􏰠 􏰑 􏰠 􏰗􏰑 􏰗􏰠 􏰓􏰑 􏰡 􏰑 􏰒􏰑􏰓 􏰒􏰑􏰔 􏰒􏰑􏰕 􏰒􏰑􏰖 􏰒􏰗 Bandwidth (h) too Small 􏰟􏰓􏰑 􏰟􏰗􏰠 􏰟􏰗􏰑 􏰟􏰠 􏰑 􏰠 􏰗􏰑 􏰗􏰠 􏰓􏰑 􏰡 􏰑 􏰒􏰑􏰓 􏰒􏰑􏰔 􏰒􏰑􏰕 􏰒􏰑􏰖 􏰒􏰗 Bandwidth: h=1.06 σˆn−15 􏰟􏰓􏰑 􏰟􏰗􏰠 􏰟􏰗􏰑 􏰟􏰠 􏰑 􏰠 􏰗􏰑 􏰗􏰠 􏰓􏰑 􏰡 􏰑 􏰒􏰑􏰓􏰠 􏰒􏰑􏰠 􏰒􏰑􏰳􏰠 􏰒􏰗 Kernel Density Estimation 􏰀 Write a kernel density estimator with: 􏰀 Epanechnikov kernal 􏰀 Rule of thumb bandwidth 􏰀 what is fˆ(0)? 􏰀 The Epanechnikov function can be accessed by installing the “kader” 􏰀 You can implement it with kader:::epanechnikov(x) Non-Parametric Regression 1. Nearest-Neighbors 2. Nadaraya-Watson 3. Local Polynomial Regression Non-Parametric Regression 􏰀 Given y and x, we previously wrote: y =E[y|x]+ε 􏰀 Showed that OLS estimator provides best linear approximation of the conditional mean function 􏰀 In many settings, E [y |x ] = h(x ) is a non-linear function 􏰀 Known functional form: can often use OLS to estimate parameters 􏰀 For example: β0,β1,β2 in: E[y|x] = β0 +β1x +β2x2 􏰀 Or other methods if non additive 􏰀 But we often do not know the functional form Non-Parametric Regression h(u)=E[y|u] 􏰀 Given data: (y1,x1),(y2,x2),···(yn,xn) 􏰀 Goal: estimate hˆ(u) without knowing the functional form 􏰀 Simple approach: Nearest Neighbors (local averaging) 􏰀 For any u, define Ku as the set of K individuals with xi nearest to u 􏰀 The K observations (yi,xi) with the smallest values of |u−xi| 􏰀 For any point x, define: hˆ ( u ) = ∑ y i i∈ Nearest Neighbors Regression (K=3) X1 X2X3uX4 X5 Nearest Neighbors Regression (K=3) X1 X2X3uX4 X5 Nearest Neighbors Regression (K=3) X1 X2X3uX4 X5 Nearest Neighbors Regression (K=3) X1 X2X3uX4 X5 Nearest Neighbors Regression (K=3) X1 X2X3uX4 X5 Nearest Neighbors Regression (K=3) h(u)= y2+y3+y4 3 X1 X2X3uX4 X5 Non-Parametric Regression: Nearest Neighbors hˆ ( u ) = ∑ y i i∈ 􏰀 Downsides of nearest neighbors: 􏰀 Problems in extremes: suppose h(u) is an increasing function: 􏰀 For small u all of the neighbors will be above u 􏰀 But slightly larger u has almost exactly the same neighbors 􏰀 Awkward flattening in the extremes 􏰀 Big jumps as large values of yi enter Ku Non-Parametric Regression: Nadaraya-Watson 􏰀 Instead of averaging yi for xi close to u, weighted average of all yi hˆ(u) = ∑ ωi (u)yi i=1 􏰀 With ∑ni=1ωi(u)=1 for any u 􏰀 If we choose ωi(u)= n1 for all i,u, we get: h ( u ) = ∑ n y i = y ̄ 􏰀 Not very informative 􏰀 Choose ωi that give higher weight for observations with xi close to u Non-Parametric Regression: Nadaraya-Watson hˆ(u) = ∑ ωi (u)yi i=1 􏰀 Choose ωi that give higher weight for observations with xi close to x: 􏰀 What about: 􏰃u−xi 􏰄 ωi(u)=K h 􏰀 Where K(·) is a kernel (e.g gaussian) 􏰀 Gives higher weight to observations with xi close to x 􏰀 Does not necessarily sum to 1... 􏰀 Solution: K􏰁u−xi 􏰂 h ωi(x)= n 􏰁u−xi􏰂 ∑i=1K h Non-Parametric Regression: Nadaraya-Watson hˆ(u) = ∑ ωi (u)yi i=1 􏰀 Want to choose ωi that give higher weight for observations with xi close to u: 􏰀 Hence, the Nadaraya-Watson Estimator for the point u is: n K􏰁u−xi 􏰂 h h(u) = ∑ ∑n K􏰁u−xi 􏰂yi Non-Parametric Regression: Local Polynomial Regression h(x) = E[y|x] 􏰀 Consider a Taylor expansion at the point x ̃ close to u: h(x ̃) ≈ h(u)+h(1)(u)(x ̃−u)+ h(2)(u)(x ̃−u)2 +···+ h(p)(u)(x ̃−u)p 2! p! = β 0 + β 1 ( x ̃ − u ) + β 2 ( x ̃ − u ) 2 + · · · + β p ( x ̃ − u ) p 􏰀 Where β0 = h(u), β1 = h′(u),··· 􏰀 Key idea: estimate βˆ0 at any u. Non-Parametric Regression: Local Polynomial Regression 􏰀 Key idea: estimate βˆ0 at any u. 􏰀 Ifallxi areclosetouthen yi ≈ β0 +β1(xi −u)+β2(xi −u)2 +···+βp(xi −u)p 􏰀 Can just run a regression (for any given point u) n􏰅 􏰆2 min ∑ yi −β0 −β1(xi −u)−···−βp(xi −u)p 􏰀 Estimate unknown βs (and specifically) β0 􏰀 This is just a regression of yi on (xi −u), (xi −u)2,···, (xi −u)p 􏰀 βˆ = hˆ ( u ) 􏰀 A different regression for each u Non-Parametric Regression: Local Polynomial Regression n􏰅 􏰆2 min∑ yi −β0−β1(u−xi)−···−βp(u−xi)p 􏰀 Of course, as xi gets far from u, the approximation is bad 􏰀 Solution: give more weight locally: n􏰅 p􏰆2 􏰃xi−u􏰄 min ∑ yi −β0 −β1(xi −u)−···−βp(xi −u) β i=1 􏰀 Where K(·) is some kernel 􏰀 If p=0 this is just Nadaraya-Watson 􏰀 If p=1 is local linear regression Non-Parametrics 1. Kernel Density Estimation 2. Non-Parametric Regression 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com