程序代写代做代考 case study Keras deep learning capacity planning chain graph Liyuan Xing, Gleb Sizov, Odd Erik Gundersen

Liyuan Xing, Gleb Sizov, Odd Erik Gundersen
Times Series Forecasting
stand in the present and forecast the future

Lecture 3
Introduction to time series forecasting (Liyuan)
Data exploration by time series graphics (Liyuan)
Statistic methods (Liyuan)
• Time series decomposition
• Exponential smoothing
• ARI MA models
• Time series regression models
Evaluation (Liyuan)
• (Nested) cross validation
• Forecast error types and measures
Page 2

Lecture 7
Machine learning methods (Gleb)
• Statistical methods vs machine learning, M4 competition • Recurrent neural networks
• Hybrid models
• Global vs local models
Uncertainty and forecast distribution (Liyuan)
• Prediction interval, Quantile • Quantile models
Case study: Projects in TrønderEnergi (Odd Erik)
• Wind power production forecasting • Grid loss forecasting
Page 3

Machine learning methods
• Statistical methods vs machine learning, M4 competition
• Recurrent neural networks
• Hybrid models
• Global vs local models

Uncertainty and probability forecast
Point forecast
• Statistical methods •Regression, ETS, ARIMA
• Machine learning methods •NN, DL, Regression trees, SVR
• •
Prediction interval, quantile
Probability forecast
Parametric models

• Johnson’s Su, Skew-t distribution

Quantile models
• Regression models, Deep learning, Tree based models

Motivation
Why point forecast is NOT enough, and uncertainty estimation is important?
• Real-world problems often extend beyond predicting means • capacity planning problems
• Example, capacity in Google’s data centers to support spikes in compute workloads
• Forecasting the capacity to support up to the 95th or higher percentiles of our forecasts
• It is more important to get the upper quantiles correct than to actually get the point forecasts correct
• supply chain planning
• price prediction for trading • …
Prediction intervals are just as important as the point forecast itself
• The difference in prediction intervals results in two very different forecasts
• The second forecast calls for much higher capacity reserves to allow for the possibility of a large increase in demand
• The further ahead we forecast, the more uncertain we are
Page 6

Uncertainty
The thing we are trying to forecast is unknown (or we would not be forecasting it), and so we can think of it as a random variable
• Example, the total sales for next month could take a range of possible values, and until we add up the actual sales at the end of the month, we don’t know what the value will be. So until we know the sales for next month, it is a random quantity
• Predictions are never absolute, and it is imperative to know the potential variations
At least four sources of uncertainty in forecasting using time series models
• The random error term
• The parameter estimates
• The choice of model for the historical data
• The continuation of the historical data generating process into the future
Page 7

Predictions are themselves random variables with distribution
Point forecasts
•Only represent the expected prediction
•Does not model uncertainty
•The middle of the range of possible values the random variable could take
Probability forecasts
•Represent the prediction distribution •Model the uncertainty of forecasting
error
•Apredictionintervalgiving arange of values the random variable could take with relatively high probability
Page 8

Probability forecasting methods
Parametric models
• The forecast is given by a full parameterization of the probability distribution, which is defined by random variable p and its cumulative distribution CDF F(p)
• Limited by the distribution assumption
Quantile models
• F(p) is approximated building quantile models qα(θ,x)
• Quantile qα of p is the value at which the probability of p is
less than or equal to α, i.e. α=F(qα)
• More general: no assumptions needed
Page 9

Prediction interval
A prediction interval is an estimate of an interval [Qα-Q1-α] in which a future observation will fall, with a certain probability α, 0<α<1 Forecast error is a normal distribution •The prediction intervals for normal distributions are easily calculated from the ML-estimates of the expectation μ and the variance σ •α=80%, [μ-1.28σ, μ+1.28σ] •α=95%, [μ-1.96σ, μ+1.96σ] •Normality assumption by Linear regression, ETS, ARIMA Forecast error is not a normal distribution, but uncorrelated •Bootstrapped prediction interval •Assuming future errors will be similar to past errors, so replace future error by sampling from the collection of errors we have seen in the past (i.e., the residuals) •Compute prediction intervals by calculating percentiles Page 10 Quantiles Quantiles are a generalization of prediction intervals and no assumption about distribution is needed • Example: 90% prediction interval equals the interval [q5,q95] of quantiles Cut points • diving the range of a probability distribution into continuous intervals with equal probabilities • dividing the observations in a sample in the same way A quantile is the value below which a fraction of observations in a group falls • Example: A prediction for quantile 0.9 should over-predict 90% of the times Page 11 Empirical quantiles Empirical quantile distribution • A step function that jumps up by 1/n at each of the n data points Point prediction plus quantile function of errors • 1. Consider past point forecasts at hour h • 2. Compute historical forecasting errors • 3. Compute empirically quantile distribution of errors • 4. Quantile function of price at hour h Page 12 Quantile loss weight weight under-estimates positive error over-estimates negative error np.maximum(q * e, (q - 1) * e), e = y-y_hat For true values y, the predicted values y_hat, and a desired quantile q • Quantile 50% weights underestimates equally to overestimates • The closer the desired quantile is to 0%, the more this loss function penalizes estimates when they are above the true value, meaning assign more of a loss to under- estimates/positive error than to over-estimates/negative error. • The closer the desired quantile gets to 100%, the more the loss function penalizes estimates which are less than the true value, meaning assign more of a loss to over- estimates/negative error than to under-estimates/positive error. This quantile loss can be used to calculate prediction intervals for • Regression, neural networks, tree based models Page 13 Linear quantile regression OLS: Regressions minimize the squared-error loss function to predict a mean • Prediction intervals are calculate based on standard errors and the inverse normal CDF Quantile regressions minimize the quantile loss in predicting a certain quantile • Optimizing this loss function results in an estimated linear relationship between yi and xi where a portion of the data, , lies below the line and the remaining portion of the data, 1- , lies above the line Page 14 Page 15 https://towardsdatascience.com/quantile-regression-from-linear-models-to-trees-to-deep-learning-af3738b527c3 Deep learning and quantiles Deep learning vs traditional machine learning • Large and deep neural networks • Feature learning, such as automatic feature extraction Quantiles are predicted in DL by passing the quantile loss function, neither RMSE nor MAE • Keras: each quantile must be trained separately • Tensorflow: to leverage patterns common to the quantiles • Co-learning across the quantiles in its predictions, where the model learns a common kink rather than separate ones for each quantile Page 16 Page 17 https://towardsdatascience.com/quantile-regression-from-linear-models-to-trees-to-deep-learning-af3738b527c3 Quantile regression on gradient boosting Using a second-order approximation of the quantile loss function • Tree boosting vs gradient boosting in splitting procedure • every (dimesion, cutoff) pair vs a single pass per dimension (next-largest cutpoint) • Point forecasts vs forecast distribution Implemented differently, but all support explicit quantile prediction • Scikit-learn’s implementation GradientBoostingRegressor • LightGBM • http://jmarkhou.com/lgbqr/ • Xgboost • https://towardsdatascience.com/regression-prediction-intervals-with-xgboost-428e0a018b • Catboost Page 18 Page 19 https://towardsdatascience.com/quantile-regression-from-linear-models-to-trees-to-deep-learning-af3738b527c3 Bootstrapping Doesn’t explicitly predict quantiles, but treat each model as a possible value, and calculate quantiles using its empirical CDF • 1. Generate datasets obtained via resampling with replacement • Samples split, blocked bootstrap • 2. Estimate a point forecast model for each dataset or time series • Tree in random forest, ETS • 3. Use models to estimate quantiles of model errors • Model and parameter uncertainty • 4. Use to estimate quantiles of process error • Randomerrorterm • Quantilefunctionofpriceathourh Page 20 Which did best? Small datasets with 1 feature Page 21 Small datasets with 1 feature Large datasets with 12 features Error measures proper scoring rule for probabilistic forecasting Pinball loss function • y is the observation used for forecast evaluation • a=1, 2, ..., 99, and q1, q2, ..., q99 are the 1st, 2nd, ..., 99th percentiles MAE, RMSE, MAPE, sMAPE, MASE, K-S statistic • To evaluate the full predictive densities, this score L is then averaged over all target quantiles for all time periods over the forecast horizon • A lower score indicates a better forecast Continuous rank probability score (CRPS) • x be the observation • F the CDF associated with an empirical probabilistic forecast • 𝟙 is the Heaviside step function and denotes a step function along the real line that attains • the value of 1 if the real argument is positive or zero • the value of 0 otherwise Page 22 Case study: Projects in TrønderEnergi • Wind power production forecasting • Grid loss forecasting