CS计算机代考程序代写 scheme algorithm Motivation

Motivation

Motivation

Smoothing

4.0 – Smoothing

Motivation

Smoothing

Nonparametric regression

4.1 – Nonparametric regression

Motivation

Smoothing

Nonparametric regression

Nonparametric regression model

Parametric regression models are usually of the form

Yi = g(θ,Xi) + εi, i = 1, . . . , N

where g and the Xi’s are specified (i.e. “known”), and the additive

noise term characterized, e.g. ε
iid∼ N (0, σ2). The problem is then

to estimate θ.

Nonparametric regression consists in leaving g unspecified; the
problem becomes that of estimating g in

Yi = g(Xi) + εi, i = 1, . . . , N

Motivation

Smoothing

Nonparametric regression

Various forms may be considered for g(x), such as additive
regression models

g(x) = α+ g1(x1) + g2(x2) + · · ·+ fk(xk)
g(x) = α+ θ1×1 + g2(x2) + · · ·+ fk(xk)
g(x) = α+ g12(x1, x2) + g3(x3) + · · ·+ fk(xk)

g is often assumed to be continuous and smooth.

Local polynomial regression, smoothing splines and
(Nadaraya-Watson) kernel regression are among the most
popular “smoothing” techniques available to estimate the shape of
g.

Motivation

Smoothing

Nonparametric regression

Local polynomial regression: simple regression

Consider the simple regression problem of estimating g in

Yi = g(Xi) + εi

A p-th order weighted least squares (WLS) polynomial regression of
observations y = (y1, . . . , yN )

T on design points x = (x1, . . . , xN )
evaluated at a given point x0 yields the approximation

y0i = α+ θ1(xi − x0) + θ2(xi − x0)
2 + · · ·+ θp(xi − x0)p

Motivation

Smoothing

Nonparametric regression

Using WLS, the weights are usually chosen to account for the
distance between yi and the reference design point x0, using e.g.
the tricube function

W (z) =

{
(1− |z|3)3 for |z| < 1 0 for |z| ≥ 1 with z0i = xi−x0 h and h is a control parameter that is picked according to the scale of the data. Motivation Smoothing Nonparametric regression library(car) attach(Prestige) plot(income, prestige, xlab="Average income", ylab="Prestige") lines(lowess(income, prestige, f=0.25, iter=0), lwd=2, col=’blue’) lines(lowess(income, prestige, f=0.5, iter=0), lwd=2, col=’red’) lines(lowess(income, prestige, f=1, iter=0), lwd=2, col=’green’) degree of smoothing: too jittery degree of smoothing: about right degree of smoothing: too smooth weight control: “do not refit to address outliers” (ref [Cox2002]) Motivation Smoothing Nonparametric regression Local polynomial regression: multiple regression Consider the multiple regression problem of estimating g in Yi = g(Xi1, . . . , Xik) + εi Local polynomial regression consists in fitting a weighted polynomial, e.g. of the linear form at a given point x0 = (x01, . . . , x0k): y0i = α+ θ1(xi1− x01) + θ2(xi2− x02) 2 + · · ·+ θk(xik − x0k)p + εi This approach requires choosing a distance metric and a weighting scheme for multivariate observations. Motivation Smoothing Nonparametric regression The default distance metric is commonly the Euclidean distance D(xi, x0) =   k∑ j=1 (zij − z0j)2   using standardized values zij = xij−x̄j sj (with sample mean and std dev) The weights are usually defined using scaled distances between yi and the reference point x0, e.g. wi =W ( D(xi, x0) h ) where h denotes the bandwidth (e.g. the half-width of the neighbourhood). Motivation Smoothing Nonparametric regression mod.lo = loess(prestige∼income+education, span=.5, degree=2) summary(mod.lo) # plot this smoothing... inc <- seq(min(income), max(income), len=25) ed <- seq(min(education), max(education), len=25) newdata <- expand.grid(income=inc,education=ed) fit.prestige <- matrix(predict(mod.lo, newdata), 25, 25) persp(inc, ed, fit.prestige, theta=45, phi=30, ticktype=’detailed’, expand=2/3, shade=0.5 xlab=’Income’, ylab=’Education’, zlab=’Prestige’) Motivation Smoothing Nonparametric regression Nadaraya-Watson regression In this approach one likes to represent the regression problem based on g(x) = E(Yi | Xi = x) and to evaluate the importance of the neighbourhood x± h around each design point x (for h > 0).

Then the average of observations Yi’s yields the following
nonparametric estimator for g:

ĝ(x) =

∑N
i=1 1(|Xi − x| ≤ h)Yi

1(|Xi − x| ≤ h)
=

∑N
i=1K

(
Xi−x

h

)
Yi

K
(
Xi−x

h

)
using Uniform kernels K.

Motivation

Smoothing

Nonparametric regression

Note that the estimator may also be defined by plugging f̂(x, y)
into

g(x) =


yf(x, y)dy

f(x)

This approach extends to the choice of other kernels; the Normal
distribution is (once again) very commonly used.

The density estimator used should be well-defined (i.e. f̂(x) > 0)
for the kernel regression estimator to be well defined.

Kernel regression extends easily to multivariate problems by using
multivariate kernel constructs (this is beyond our scope).

Motivation

Smoothing

Nonparametric regression

Prestige example

library(car)

attach(Prestige)

plot(income, prestige)

inc.100 <- seq(min(income), max(income), len=100) mod.lo.inc <- loess(prestige ∼ income, span=.7, degree=1) pres <- predict(mod.lo.inc, data.frame(income=inc.100)) Motivation Smoothing Nonparametric regression # This other implementation works better: library(KernSmooth) lp = locpoly(income, prestige, bandwidth=1500) lines(lp, lwd=2, col=’red’) Motivation Smoothing Nonparametric regression car example with(cars, { plot(cars$speed, cars$dist) ex2 = ksmooth(cars$speed, cars$dist, "normal", bandwidth = 2) ex5 = ksmooth(cars$speed, cars$dist, "normal", bandwidth = 5) lines(ex2, lwd=2, col = 2) lines(ex5, lwd=2, col = 3) }) Motivation Smoothing Nonparametric regression Smooth control!? At this point one ought to wonder: how do we set (or select) the value for h? This is a very common question for most nonparametric methods (cf. KDE’s!)... Typical approaches are: simulation study (using plotting and trial-and-error) some type of criterion optimization (usually some likelihood or MSE function) cross-validation (typically based on the MSE) The choice of strategy usually depends on the problem at hand (and on the user’s taste!)... Motivation Smoothing Splines 4.2 - Splines Motivation Smoothing Splines Splines Spline = continuous function (i.e. curve) constructed by piecewise linkage of polynomials/functions Free splines (Bezier curves): not suitable for mortality curves Regression splines: use equidistant knots Smoothing splines: penalize roughness, knots are data points [6] Bronze spline weights Motivation Smoothing Splines B-splines Explain f(x) in terms of a basis (B = ’basis’) A B-spline is defined by its order m and number of interior knots K (x0 and xK+1 are end-knots) x0 ≤ x1 ≤ · · · ≤ xK ≤ xK+1 Polynomial is of order m− 1 (one often picks m = 4) Univariate construction: S(x) = α0 + J∑ j=1 αjBj(x,m) where J = K − 1 +m is the number of basis functions Basis elements are defined in a recursive manner [cf. lit.] Fitting: basis coefficients are calculated via linear regression Motivation Smoothing Splines B-splines: specifications Placing knots can be achieved in a number of ways One way is to define regular intervals within [xmin,xmax] Another way is to define the interior knots as the quantiles from the empirical distribution of the underlying variable enforces an equal number of observations in each interval intervals have different lengths Motivation Smoothing Splines B-splines: specifications The basis elements (or design matrix) are evaluated at the control points for the specified number of knots. Ex: compare require(splines) bs(dat$Age) matplot(dat$Age,BM,xlab=’Age’) with xs = seq(min(dat$Age),max(dat$Age),length=1000) ks = quantile(xs,seq(0.1,0.9,by=.1)) (BM2 = bs(dat$Age,knots=ks)) matplot(dat$Age,BM2,xlab=’Age’,t=’b’) abline(v=ks,lty=2,col=8) Motivation Smoothing Splines Choice of design matrix (Life dataset) 1 1 1 1 1 1 1 1 1 11 11 11 111 11111111111111111111111111111111111111111111111111 30 40 50 60 70 80 90 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 3-basis Age S pl in e ba si s 2222222 222 222 22 22 22 22 22 22 22 22 22 22 222 222 22222222222222222222 2 2 2 2 2 2 2 2 233333333333333333333 33333 3333 333 333 33 33 33 33 33 33 33 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 1 1 1 1 11111 1 1 1 1 1 1 1 1 1 1 11111111111111111111111111111111111111111111111 30 40 50 60 70 80 90 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 6-basis Age S pl in e ba si s 222 2 2 2 2 2 2 2 2 2 2 2 2 2 22 22222 2 2 2 2 2 2 2 2 2 2 2 2 222222222222222222222222222222222333333333 333 33 33 3 3 3 3 3 3 3 3 3 3 3 3 3 3 33 333333 3 3 3 3 3 3 3 3 3 3 3 3 3 3 333333333333333344444444444444444444444444 444 44 44 4 4 4 4 4 4 4 4 4 4 4 4 44 44444 4 4 4 4 4 4 4 4 4 4 4 4 4444555555555555555555555555555555555555555555 55 55 5 5 5 5 5 5 5 5 5 5 5 5 5555 5 5 5 5 5 566666666666666666666666666666666666666666666666666666666 6 6 6 6 6 6 6 6 6 6 6 6 Motivation Smoothing Splines Choice of design matrix (Life dataset) 1 1 1 1 1 1 1 1 1 11 11 11 111 11111111111111111111111111111111111111111111111111 30 40 50 60 70 80 90 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 3-basis Age S pl in e ba si s 2222222 222 222 22 22 22 22 22 22 22 22 22 22 222 222 22222222222222222222 2 2 2 2 2 2 2 2 233333333333333333333 33333 3333 333 333 33 33 33 33 33 33 33 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1 1 1 1 1111111111111111111111111111111111111111111111111111111111 30 40 50 60 70 80 90 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 Using deciles as knots Age S pl in e ba si s 2 2 2 2 2 2 2 22 2 2 2 2 2 2 2 22222222222222222222222222222222222222222222222222223333 3 3 3 3 3 3 3 3 3 33 3 3 3 3 3 3 3 3 33333333333333333333333333333333333333333333344444444444 4 4 4 4 4 4 4 4 4 44 4 4 4 4 4 4 4 4 4444444444444444444444444444444444444455555555555555555 5 5 5 5 5 5 5 5 5 555 5 5 5 5 5 5 5 55555555555555555555555555555555666666666666666666666666 6 6 6 6 6 6 6 6 6 66 6 6 6 6 6 6 6 6 66666666666666666666666667777777777777777777777777777777 7 7 7 7 7 7 7 7 7 77 7 7 7 7 7 7 7 7 7777777777777777778888888888888888888888888888888888888 8 8 8 8 8 8 8 8 8 888 8 8 8 8 8 8 8 8 8888888888899999999999999999999999999999999999999999999 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 99999000000000000000000000000000000000000000000000000000 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa a a a a a a a a a a abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb b b b b b Motivation Smoothing Splines B-spline questions Question (a) Compute the design matrix for the Life dataset, and quote the projection of age 40 onto that basis (cf. figure below). (b) Reconstruct the B-spline approximation to the crude male force data using linear regression. Plot the output over the data points and criticize. 11 11 11 11 11 111 1111 111111111111111111111111111111111111111111111111111 30 40 50 60 70 80 90 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 3-basis Age S pl in e ba si s 222222222 2222 2222 2222 222 2222 2222 22222 2222222222222222222222222222222333333333333333333333333 3333333 33333 3333 333 333 33 33 33 33 33 33 33 33 33 33 3 3 Motivation Smoothing Splines B-spline questions Question (a) Continuing from the previous question, compute the B-spline as a power series, i.e. compute the traditional cubic spline representation S(x) = β0 + β1x+ β2x 2 + β3x 3 + J∑ j=1 αj(x− xj)3+ where (x− xj)3+ = max((x− xj)3, 0), using J = 3 interior knots at 40, 60 and 80. (b) Compare with the fitted values from the regression onto the design matrix obtained from bs() for those knots. Motivation Smoothing Splines P-splines Splines are piece-wise polynomial functions of the underlying target “response function”. The pieces are joined at knots {X1, . . . , XN} to yield a continuous functional. A simple smoothing spline ĝ is defined as the function that minimizes the penalized sum of squares SSh(g) = N∑ i=1 (Yi − g(Xi))2 + h ∫ Xmax Xmin ( g′′(u) )2 du defined over a given grid [Xmin, . . . , Xmax]. Motivation Smoothing Splines Smoothing splines SSh(g) = N∑ i=1 (Yi − g(Xi))2 + h ∫ Xmax Xmin ( g′′(u) )2 du P-splines use all data points as knots and penalize for smoothness Penalty term is usually controlled by one parameter (h) This smoothing parameter h is controlled Can also define periodic splines Motivation Smoothing Splines SSh(g) = N∑ i=1 (Yi − g(Xi))2 + h ∫ Xmax Xmin ( g′′(u) )2 du The solution (which defines the smoothing spline) ĝ = argmin g SSh(g) is a cubic spline with knots located at the design points X1, . . . , XN . The cubic spline approach assumes the existence of two continuous derivatives, which defines a roughness penalty in the integral term: the larger g′′, the more jittery g. Motivation Smoothing Splines library(car) attach(Prestige) plot(income, prestige) inc.100 <- seq(min(income), max(income), len=100) mod.lo.inc <- loess(prestige ∼ income, span=.7, degree=1) pres <- predict(mod.lo.inc, data.frame(income=inc.100)) ssp <- smooth.spline(income, prestige, df=3.85) lines(inc.100, pres, lty=2, lwd=2) lines(ssp, lwd=2, col=’blue’) Motivation Smoothing Splines R implementations There are many, many implementations of nonparametric regression methods in R, including lowess for simple regression, i.e. g(x) = g1(x1) loess for local polynomial regression smooth.spline for fitting simple regression smoothing splines ksmooth for Nadaraya-Watson kernel regression the sm library for local regression, local likelihood and pdf estimation the gss library for generalized smoothing splines and regression models locfit, gam and mgcv also come up regularly Motivation Smoothing Graduation 4.3 - Graduation Motivation Smoothing Graduation Dealing with rough data (simulated life data, D=585) 30 40 50 60 70 80 90 0. 00 0. 05 0. 10 0. 15 0. 20 0. 25 Crude force of mortality (Males) Age (years) F o rc e o f m o rt al it y Motivation Smoothing Graduation Or with unbalanced information (Irish Males, 2005-07) 0 20 40 60 80 100 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 1. 2 Central rates of mortality, Irish Males 2005/7 Time (years) F o rc e o f m o rt al it y 0 20 40 60 80 100 -8 -6 -4 -2 0 Log-rates Time (years) lo g (F o rc e o f m o rt al it y) Motivation Smoothing Graduation Fitting mortality rates Objective: to fit a curve of mortality rates from either dense information (e.g. life table from population census) or small, patchy sample (e.g. late-life characteristics for pensioners) Steps: Obtain observed mortality rates Fit some functional model to it Forecast late-life, age-derived features, or future generations Challenges: High variability due to small sample information Missing information for specific ages Parametric models maybe ill-suited (late-life) Motivation Smoothing Graduation Some standard smoothers Running mean / median Linear time-invariant filters Kernel smoothing Loess / polynomial regression Splines . . . and parametric models? Motivation Smoothing Graduation Running mean/median Write out an algorithmic formula for the running mean What calibration must be done? What must be controlled? What advantage(s)/inconvenient(s) do these methods have? 30 40 50 60 70 80 90 0. 00 0. 05 0. 10 0. 15 0. 20 0. 25 Crude force of mortality (Males) Age (years) F o rc e o f m o rt al it y Empirical 5-MA 9-MA 10-MA Motivation Smoothing Graduation Running mean/median cf. example on 2000-2010 USA mortality rates... age 0 20 40 60 80 100 2000 2002 2004 2006 2008 2010 lo g d eath rate -8 -6 -4 -2 Observed death rates (USA) Motivation Smoothing Graduation Linear time-invariant filters Algorithmic formula for an LTI? What calibration must be done? What must be controlled? What advantage(s)/inconvenient(s) does this method have? Motivation Smoothing Graduation Kernel smoothing Write out an algorithmic formula for a kernel smoother What calibration must be done? What must be controlled? What advantage(s)/inconvenient(s) do these methods have? Motivation Smoothing Graduation Polynomial regression Write out an algorithmic formula for a polynomial smoother What calibration must be done? What must be controlled? What advantage(s)/inconvenient(s) do these methods have? Question: 1 Apply lowess() to both the USA 2000 log-mortality rates and the (non-log) crude mortality rates from the Life dataset, and visualize the outputs. Adjust the smoother span to successively 2/3, 1/3 and 0.1, and compare outputs. 2 Compute the MSE for each smooth, for both datasets. 3 Based on the above, describe the main steps for a grid-search approach for selecting an appropriate value for the span. Motivation Smoothing Graduation Assessment of graduation Typical issues: Lack of goodness-of-fit, either overall or at specific locations Consistent bias Inconsistency in overall shape Comparing smoothing procedures: use tests, e.g. χ2-test Motivation Smoothing Graduation Assessment of graduation / comparisons χ2-test H0 : graduated estimates {µx} are true Assumes independent numbers of deaths at different ages Asusming Dx ∼ N (Ecxµ0x, Ecxµ0x) (Poisson-Normal approx.), Actual deaths - Expected deaths = Dx − Ecxµ 0 x (Ecx is the central exposed-to-risk at age x nearest birthday) zx = Dx − Ecxµ0x√ Ecxµ 0 x ∼ N (0, 1) (under H0) and ξ = m∑ i=1 z2xi ∼ χ 2 m, m = number of age groups Motivation Smoothing Graduation Plenty methods yielding smooth curves. . . Choice of method should be data-driven (and user-driven?) Parametric modelling is a form of smoothing Ex: fit Makeham model to a rough mortality curve. . . Exponential smoothing (cf. time series) is another adaptive procedure Weighted functional polynomials Etc. Motivation Smoothing Graduation Mortality rates projections IoFA: in-house method for mortality projection Cf. implementation of 2009 model for mortality rates proj. Office Nat. Stat. population mortality data Use of age-cohort P-spline early on in processing step Recent CMI documents [7] illustrate concerns that current projection methods are not statistical in nature Lee-Carter typically used for projection of smoothed data Smoothing Nonparametric regression Splines Graduation