Motivation
Motivation
Smoothing
4.0 – Smoothing
Motivation
Smoothing
Nonparametric regression
4.1 – Nonparametric regression
Motivation
Smoothing
Nonparametric regression
Nonparametric regression model
Parametric regression models are usually of the form
Yi = g(θ,Xi) + εi, i = 1, . . . , N
where g and the Xi’s are specified (i.e. “known”), and the additive
noise term characterized, e.g. ε
iid∼ N (0, σ2). The problem is then
to estimate θ.
Nonparametric regression consists in leaving g unspecified; the
problem becomes that of estimating g in
Yi = g(Xi) + εi, i = 1, . . . , N
Motivation
Smoothing
Nonparametric regression
Various forms may be considered for g(x), such as additive
regression models
g(x) = α+ g1(x1) + g2(x2) + · · ·+ fk(xk)
g(x) = α+ θ1×1 + g2(x2) + · · ·+ fk(xk)
g(x) = α+ g12(x1, x2) + g3(x3) + · · ·+ fk(xk)
g is often assumed to be continuous and smooth.
Local polynomial regression, smoothing splines and
(Nadaraya-Watson) kernel regression are among the most
popular “smoothing” techniques available to estimate the shape of
g.
Motivation
Smoothing
Nonparametric regression
Local polynomial regression: simple regression
Consider the simple regression problem of estimating g in
Yi = g(Xi) + εi
A p-th order weighted least squares (WLS) polynomial regression of
observations y = (y1, . . . , yN )
T on design points x = (x1, . . . , xN )
evaluated at a given point x0 yields the approximation
y0i = α+ θ1(xi − x0) + θ2(xi − x0)
2 + · · ·+ θp(xi − x0)p
Motivation
Smoothing
Nonparametric regression
Using WLS, the weights are usually chosen to account for the
distance between yi and the reference design point x0, using e.g.
the tricube function
W (z) =
{
(1− |z|3)3 for |z| < 1
0 for |z| ≥ 1
with z0i =
xi−x0
h
and h is a control parameter that is picked
according to the scale of the data.
Motivation
Smoothing
Nonparametric regression
library(car)
attach(Prestige)
plot(income, prestige, xlab="Average income",
ylab="Prestige")
lines(lowess(income, prestige, f=0.25, iter=0),
lwd=2, col=’blue’)
lines(lowess(income, prestige, f=0.5, iter=0),
lwd=2, col=’red’)
lines(lowess(income, prestige, f=1, iter=0),
lwd=2, col=’green’)
degree of smoothing: too jittery
degree of smoothing: about right
degree of smoothing: too smooth
weight control: “do not refit to address outliers” (ref [Cox2002])
Motivation
Smoothing
Nonparametric regression
Local polynomial regression: multiple regression
Consider the multiple regression problem of estimating g in
Yi = g(Xi1, . . . , Xik) + εi
Local polynomial regression consists in fitting a weighted
polynomial, e.g. of the linear form at a given point
x0 = (x01, . . . , x0k):
y0i = α+ θ1(xi1− x01) + θ2(xi2− x02)
2 + · · ·+ θk(xik − x0k)p + εi
This approach requires choosing a distance metric and a weighting
scheme for multivariate observations.
Motivation
Smoothing
Nonparametric regression
The default distance metric is commonly the Euclidean distance
D(xi, x0) =
k∑
j=1
(zij − z0j)2
using standardized values zij =
xij−x̄j
sj
(with sample mean and std
dev)
The weights are usually defined using scaled distances between yi
and the reference point x0, e.g.
wi =W
(
D(xi, x0)
h
)
where h denotes the bandwidth (e.g. the half-width of the
neighbourhood).
Motivation
Smoothing
Nonparametric regression
mod.lo = loess(prestige∼income+education,
span=.5, degree=2)
summary(mod.lo)
# plot this smoothing...
inc <- seq(min(income), max(income), len=25)
ed <- seq(min(education), max(education), len=25)
newdata <- expand.grid(income=inc,education=ed)
fit.prestige <- matrix(predict(mod.lo, newdata), 25, 25)
persp(inc, ed, fit.prestige, theta=45, phi=30,
ticktype=’detailed’, expand=2/3, shade=0.5
xlab=’Income’, ylab=’Education’, zlab=’Prestige’)
Motivation
Smoothing
Nonparametric regression
Nadaraya-Watson regression
In this approach one likes to represent the regression problem
based on
g(x) = E(Yi | Xi = x)
and to evaluate the importance of the neighbourhood x± h
around each design point x (for h > 0).
Then the average of observations Yi’s yields the following
nonparametric estimator for g:
ĝ(x) =
∑N
i=1 1(|Xi − x| ≤ h)Yi
1(|Xi − x| ≤ h)
=
∑N
i=1K
(
Xi−x
h
)
Yi
K
(
Xi−x
h
)
using Uniform kernels K.
Motivation
Smoothing
Nonparametric regression
Note that the estimator may also be defined by plugging f̂(x, y)
into
g(x) =
∫
yf(x, y)dy
f(x)
This approach extends to the choice of other kernels; the Normal
distribution is (once again) very commonly used.
The density estimator used should be well-defined (i.e. f̂(x) > 0)
for the kernel regression estimator to be well defined.
Kernel regression extends easily to multivariate problems by using
multivariate kernel constructs (this is beyond our scope).
Motivation
Smoothing
Nonparametric regression
Prestige example
library(car)
attach(Prestige)
plot(income, prestige)
inc.100 <- seq(min(income), max(income), len=100) mod.lo.inc <- loess(prestige ∼ income, span=.7, degree=1) pres <- predict(mod.lo.inc, data.frame(income=inc.100)) Motivation Smoothing Nonparametric regression # This other implementation works better: library(KernSmooth) lp = locpoly(income, prestige, bandwidth=1500) lines(lp, lwd=2, col=’red’) Motivation Smoothing Nonparametric regression car example with(cars, { plot(cars$speed, cars$dist) ex2 = ksmooth(cars$speed, cars$dist, "normal", bandwidth = 2) ex5 = ksmooth(cars$speed, cars$dist, "normal", bandwidth = 5) lines(ex2, lwd=2, col = 2) lines(ex5, lwd=2, col = 3) }) Motivation Smoothing Nonparametric regression Smooth control!? At this point one ought to wonder: how do we set (or select) the value for h? This is a very common question for most nonparametric methods (cf. KDE’s!)... Typical approaches are: simulation study (using plotting and trial-and-error) some type of criterion optimization (usually some likelihood or MSE function) cross-validation (typically based on the MSE) The choice of strategy usually depends on the problem at hand (and on the user’s taste!)... Motivation Smoothing Splines 4.2 - Splines Motivation Smoothing Splines Splines Spline = continuous function (i.e. curve) constructed by piecewise linkage of polynomials/functions Free splines (Bezier curves): not suitable for mortality curves Regression splines: use equidistant knots Smoothing splines: penalize roughness, knots are data points [6] Bronze spline weights Motivation Smoothing Splines B-splines Explain f(x) in terms of a basis (B = ’basis’) A B-spline is defined by its order m and number of interior knots K (x0 and xK+1 are end-knots) x0 ≤ x1 ≤ · · · ≤ xK ≤ xK+1 Polynomial is of order m− 1 (one often picks m = 4) Univariate construction: S(x) = α0 + J∑ j=1 αjBj(x,m) where J = K − 1 +m is the number of basis functions Basis elements are defined in a recursive manner [cf. lit.] Fitting: basis coefficients are calculated via linear regression Motivation Smoothing Splines B-splines: specifications Placing knots can be achieved in a number of ways One way is to define regular intervals within [xmin,xmax] Another way is to define the interior knots as the quantiles from the empirical distribution of the underlying variable enforces an equal number of observations in each interval intervals have different lengths Motivation Smoothing Splines B-splines: specifications The basis elements (or design matrix) are evaluated at the control points for the specified number of knots. Ex: compare require(splines) bs(dat$Age) matplot(dat$Age,BM,xlab=’Age’) with xs = seq(min(dat$Age),max(dat$Age),length=1000) ks = quantile(xs,seq(0.1,0.9,by=.1)) (BM2 = bs(dat$Age,knots=ks)) matplot(dat$Age,BM2,xlab=’Age’,t=’b’) abline(v=ks,lty=2,col=8) Motivation Smoothing Splines Choice of design matrix (Life dataset) 1 1 1 1 1 1 1 1 1 11 11 11 111 11111111111111111111111111111111111111111111111111 30 40 50 60 70 80 90 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 3-basis Age S pl in e ba si s 2222222 222 222 22 22 22 22 22 22 22 22 22 22 222 222 22222222222222222222 2 2 2 2 2 2 2 2 233333333333333333333 33333 3333 333 333 33 33 33 33 33 33 33 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 1 1 1 1 11111 1 1 1 1 1 1 1 1 1 1 11111111111111111111111111111111111111111111111 30 40 50 60 70 80 90 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 6-basis Age S pl in e ba si s 222 2 2 2 2 2 2 2 2 2 2 2 2 2 22 22222 2 2 2 2 2 2 2 2 2 2 2 2 222222222222222222222222222222222333333333 333 33 33 3 3 3 3 3 3 3 3 3 3 3 3 3 3 33 333333 3 3 3 3 3 3 3 3 3 3 3 3 3 3 333333333333333344444444444444444444444444 444 44 44 4 4 4 4 4 4 4 4 4 4 4 4 44 44444 4 4 4 4 4 4 4 4 4 4 4 4 4444555555555555555555555555555555555555555555 55 55 5 5 5 5 5 5 5 5 5 5 5 5 5555 5 5 5 5 5 566666666666666666666666666666666666666666666666666666666 6 6 6 6 6 6 6 6 6 6 6 6 Motivation Smoothing Splines Choice of design matrix (Life dataset) 1 1 1 1 1 1 1 1 1 11 11 11 111 11111111111111111111111111111111111111111111111111 30 40 50 60 70 80 90 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 3-basis Age S pl in e ba si s 2222222 222 222 22 22 22 22 22 22 22 22 22 22 222 222 22222222222222222222 2 2 2 2 2 2 2 2 233333333333333333333 33333 3333 333 333 33 33 33 33 33 33 33 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 1 1 1 1 1 1 1 1 1111111111111111111111111111111111111111111111111111111111 30 40 50 60 70 80 90 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 Using deciles as knots Age S pl in e ba si s 2 2 2 2 2 2 2 22 2 2 2 2 2 2 2 22222222222222222222222222222222222222222222222222223333 3 3 3 3 3 3 3 3 3 33 3 3 3 3 3 3 3 3 33333333333333333333333333333333333333333333344444444444 4 4 4 4 4 4 4 4 4 44 4 4 4 4 4 4 4 4 4444444444444444444444444444444444444455555555555555555 5 5 5 5 5 5 5 5 5 555 5 5 5 5 5 5 5 55555555555555555555555555555555666666666666666666666666 6 6 6 6 6 6 6 6 6 66 6 6 6 6 6 6 6 6 66666666666666666666666667777777777777777777777777777777 7 7 7 7 7 7 7 7 7 77 7 7 7 7 7 7 7 7 7777777777777777778888888888888888888888888888888888888 8 8 8 8 8 8 8 8 8 888 8 8 8 8 8 8 8 8 8888888888899999999999999999999999999999999999999999999 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 99999000000000000000000000000000000000000000000000000000 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa a a a a a a a a a a abbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb b b b b b Motivation Smoothing Splines B-spline questions Question (a) Compute the design matrix for the Life dataset, and quote the projection of age 40 onto that basis (cf. figure below). (b) Reconstruct the B-spline approximation to the crude male force data using linear regression. Plot the output over the data points and criticize. 11 11 11 11 11 111 1111 111111111111111111111111111111111111111111111111111 30 40 50 60 70 80 90 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 3-basis Age S pl in e ba si s 222222222 2222 2222 2222 222 2222 2222 22222 2222222222222222222222222222222333333333333333333333333 3333333 33333 3333 333 333 33 33 33 33 33 33 33 33 33 33 3 3 Motivation Smoothing Splines B-spline questions Question (a) Continuing from the previous question, compute the B-spline as a power series, i.e. compute the traditional cubic spline representation S(x) = β0 + β1x+ β2x 2 + β3x 3 + J∑ j=1 αj(x− xj)3+ where (x− xj)3+ = max((x− xj)3, 0), using J = 3 interior knots at 40, 60 and 80. (b) Compare with the fitted values from the regression onto the design matrix obtained from bs() for those knots. Motivation Smoothing Splines P-splines Splines are piece-wise polynomial functions of the underlying target “response function”. The pieces are joined at knots {X1, . . . , XN} to yield a continuous functional. A simple smoothing spline ĝ is defined as the function that minimizes the penalized sum of squares SSh(g) = N∑ i=1 (Yi − g(Xi))2 + h ∫ Xmax Xmin ( g′′(u) )2 du defined over a given grid [Xmin, . . . , Xmax]. Motivation Smoothing Splines Smoothing splines SSh(g) = N∑ i=1 (Yi − g(Xi))2 + h ∫ Xmax Xmin ( g′′(u) )2 du P-splines use all data points as knots and penalize for smoothness Penalty term is usually controlled by one parameter (h) This smoothing parameter h is controlled Can also define periodic splines Motivation Smoothing Splines SSh(g) = N∑ i=1 (Yi − g(Xi))2 + h ∫ Xmax Xmin ( g′′(u) )2 du The solution (which defines the smoothing spline) ĝ = argmin g SSh(g) is a cubic spline with knots located at the design points X1, . . . , XN . The cubic spline approach assumes the existence of two continuous derivatives, which defines a roughness penalty in the integral term: the larger g′′, the more jittery g. Motivation Smoothing Splines library(car) attach(Prestige) plot(income, prestige) inc.100 <- seq(min(income), max(income), len=100) mod.lo.inc <- loess(prestige ∼ income, span=.7, degree=1) pres <- predict(mod.lo.inc, data.frame(income=inc.100)) ssp <- smooth.spline(income, prestige, df=3.85) lines(inc.100, pres, lty=2, lwd=2) lines(ssp, lwd=2, col=’blue’) Motivation Smoothing Splines R implementations There are many, many implementations of nonparametric regression methods in R, including lowess for simple regression, i.e. g(x) = g1(x1) loess for local polynomial regression smooth.spline for fitting simple regression smoothing splines ksmooth for Nadaraya-Watson kernel regression the sm library for local regression, local likelihood and pdf estimation the gss library for generalized smoothing splines and regression models locfit, gam and mgcv also come up regularly Motivation Smoothing Graduation 4.3 - Graduation Motivation Smoothing Graduation Dealing with rough data (simulated life data, D=585) 30 40 50 60 70 80 90 0. 00 0. 05 0. 10 0. 15 0. 20 0. 25 Crude force of mortality (Males) Age (years) F o rc e o f m o rt al it y Motivation Smoothing Graduation Or with unbalanced information (Irish Males, 2005-07) 0 20 40 60 80 100 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 1. 2 Central rates of mortality, Irish Males 2005/7 Time (years) F o rc e o f m o rt al it y 0 20 40 60 80 100 -8 -6 -4 -2 0 Log-rates Time (years) lo g (F o rc e o f m o rt al it y) Motivation Smoothing Graduation Fitting mortality rates Objective: to fit a curve of mortality rates from either dense information (e.g. life table from population census) or small, patchy sample (e.g. late-life characteristics for pensioners) Steps: Obtain observed mortality rates Fit some functional model to it Forecast late-life, age-derived features, or future generations Challenges: High variability due to small sample information Missing information for specific ages Parametric models maybe ill-suited (late-life) Motivation Smoothing Graduation Some standard smoothers Running mean / median Linear time-invariant filters Kernel smoothing Loess / polynomial regression Splines . . . and parametric models? Motivation Smoothing Graduation Running mean/median Write out an algorithmic formula for the running mean What calibration must be done? What must be controlled? What advantage(s)/inconvenient(s) do these methods have? 30 40 50 60 70 80 90 0. 00 0. 05 0. 10 0. 15 0. 20 0. 25 Crude force of mortality (Males) Age (years) F o rc e o f m o rt al it y Empirical 5-MA 9-MA 10-MA Motivation Smoothing Graduation Running mean/median cf. example on 2000-2010 USA mortality rates... age 0 20 40 60 80 100 2000 2002 2004 2006 2008 2010 lo g d eath rate -8 -6 -4 -2 Observed death rates (USA) Motivation Smoothing Graduation Linear time-invariant filters Algorithmic formula for an LTI? What calibration must be done? What must be controlled? What advantage(s)/inconvenient(s) does this method have? Motivation Smoothing Graduation Kernel smoothing Write out an algorithmic formula for a kernel smoother What calibration must be done? What must be controlled? What advantage(s)/inconvenient(s) do these methods have? Motivation Smoothing Graduation Polynomial regression Write out an algorithmic formula for a polynomial smoother What calibration must be done? What must be controlled? What advantage(s)/inconvenient(s) do these methods have? Question: 1 Apply lowess() to both the USA 2000 log-mortality rates and the (non-log) crude mortality rates from the Life dataset, and visualize the outputs. Adjust the smoother span to successively 2/3, 1/3 and 0.1, and compare outputs. 2 Compute the MSE for each smooth, for both datasets. 3 Based on the above, describe the main steps for a grid-search approach for selecting an appropriate value for the span. Motivation Smoothing Graduation Assessment of graduation Typical issues: Lack of goodness-of-fit, either overall or at specific locations Consistent bias Inconsistency in overall shape Comparing smoothing procedures: use tests, e.g. χ2-test Motivation Smoothing Graduation Assessment of graduation / comparisons χ2-test H0 : graduated estimates {µx} are true Assumes independent numbers of deaths at different ages Asusming Dx ∼ N (Ecxµ0x, Ecxµ0x) (Poisson-Normal approx.), Actual deaths - Expected deaths = Dx − Ecxµ 0 x (Ecx is the central exposed-to-risk at age x nearest birthday) zx = Dx − Ecxµ0x√ Ecxµ 0 x ∼ N (0, 1) (under H0) and ξ = m∑ i=1 z2xi ∼ χ 2 m, m = number of age groups Motivation Smoothing Graduation Plenty methods yielding smooth curves. . . Choice of method should be data-driven (and user-driven?) Parametric modelling is a form of smoothing Ex: fit Makeham model to a rough mortality curve. . . Exponential smoothing (cf. time series) is another adaptive procedure Weighted functional polynomials Etc. Motivation Smoothing Graduation Mortality rates projections IoFA: in-house method for mortality projection Cf. implementation of 2009 model for mortality rates proj. Office Nat. Stat. population mortality data Use of age-cohort P-spline early on in processing step Recent CMI documents [7] illustrate concerns that current projection methods are not statistical in nature Lee-Carter typically used for projection of smoothed data Smoothing Nonparametric regression Splines Graduation