程序代写 MIE1624H – Introduction to Data Science and Analytics Lecture 5

Lead Research Scientist, Financial Risk Quantitative Research, SS&C Algorithmics Adjunct Professor, University of Toronto
MIE1624H – Introduction to Data Science and Analytics Lecture 5 – Modeling and Regressions
University of Toronto February 8, 2022 February 15, 2022

 Modeling

▪Simplified representation or abstraction of reality ▪Capture essence of system without unnecessary details ▪Models tailored for specific types of problems
▪Models help us understand the world – Prediction (What if?)
– Optimization (What’s best?)
– Clustering (How similar?)
▪Often models much easier, faster, and cheaper to experiment with than the real system

Models and reality
Simplified abstraction of reality
Capture essence of problem
Calculations
From Monahan, G., “Management Decision Making”, Cambridge University Press, 2000
Interpretation
“Real” World
Analysts World

Why do we model for decision making?
▪Building model forces detailed examination and
thought about a problem
– structures our thinking
– must articulate our assumptions, preconceived notions
– model building may illuminate solution without actually using the
▪Searching for general insights
– form of relationship between key variables involved in decision – importance of various parameters on decisions
▪Looking for specific numeric answers to a decision
making problem
– If we add 1 lab tech between 7a-3p, how much reduction can we expect in test turnaround time?
▪Find the best way to do something
– Which routing schedule minimizes our delivery costs?

Types of models
▪ Physical – cars, buildings
▪ Diagrams – flow chart, decision trees, influence diagrams
▪ Statistical – regression equation, probability distribution
▪ Mathematical – queuing model, scheduling model
▪ Computer simulation
▪ Computational – neural networks, genetic algorithms

A 7 step (idealized) modeling process
▪Define the problem – “exploring the mess”
▪Observe system / collect data ▪Formulate model(s)
– much “art and craft”
▪Verify/validate model and use for prediction and exploration of system being modeled
▪Use model to help select among alternatives ▪Present results to decision makers ▪Implement solution and evaluate outcomes

 Predictive Maintenance

Predictive maintenance – before big data
◼ Wind turbines are big and expensive machines, so keeping them running smoothly helps keeping their operational cost down. To do that, you need to be able to anticipate failures in heavy and expensive parts like the gearbox, generator and main shaft.
◼ Physical models explicitly describes the turbine design using detailed knowledge of its physical characteristics. A physical model has to be calibrated by an expert.
◼ Preventive maintenance saves money:
➢ Shorter downtime and less lost production
➢ Better planning of people and materials
➢ Cheaper repairs
Source: Algoritmica, http://www.algoritmica.nl

Predictive maintenance – big data era
◼ In the big data era wind turbines have an array of sensors that measure
temperatures, pressures, voltages, currents, and blade angles.
◼ Data-driven approach: a model learns the relationship between the various sensor readings based on the training data. To create such a sensor model we apply machine learning, i.e. one or more algorithms that use a set of training examples to learn a predictive model. The model can be trained by a non-turbine expert. The model then calculates its predicted value and compares it with the actual sensor reading.
Source: Algoritmica, http://www.algoritmica.nl

Environmental risk management

Model Examples

Pit stop analytics
F1 analytics based on past experience
Calculations showed that time spent changing tires and refilling the tank was more than offset by the improved performance of the car on the track.
1. Softer tires stuck to the track better during turns than their harder cousins,
though they wore out more quickly.
14 2. Less gas in the tank translated into a lighter, and therefore faster, car.

Fitting room analytics
Source: Adme

Shortest path or most beautiful path?

Sports analytics

Personalized health analytics – 23andMe

Linear Regression – Statistical Perspective

Linear regression – statistical perspective
Source: “Getting Started with Data Science: Making Sense of Data with Analytics”

Linear regression – statistical perspective
How is cab fare calculated?
Source: “Getting Started with Data Science: Making Sense of Data with Analytics”

Linear regression – statistical perspective
Put it in a formula
Source: “Getting Started with Data Science: Making Sense of Data with Analytics”

Linear regression – statistical perspective
We can do this by hand
Source: “Getting Started with Data Science: Making Sense of Data with Analytics”

Linear regression – statistical perspective
Why to regress?
What if we don’t know the rates?
Regression estimates the unknown rates
Source: “Getting Started with Data Science: Making Sense of Data with Analytics”

Linear regression – statistical perspective
Things to remember
Source: “Getting Started with Data Science: Making Sense of Data with Analytics”

Linear regression – statistical perspective
Hypothesis testing
Null hypothesis: no linear relationship exists between independent variable x and dependant variable y
a) no linear relationship exists (H0)
b) true relationship is not linear
c) linear relationship exists (H0 rejected)
d) a higher order model may be needed

Linear regression – statistical perspective
Distribution of errors

Linear regression – statistical perspective
Applications
Do good looking professors get higher teaching evaluations?
Hamermesh, . and . Parker (2005) “Beauty in the Classroom: Instructors’ Pulchritude and Putative Pedagogical Productivity”, Economics of Education Review, August 2005, pp. 5-16
Source: “Getting Started with Data Science: Making Sense of Data with Analytics”

Linear regression – statistical perspective
Applications
Regression analysis in Freakonomics books Econometrics

Linear Regression – Data Science Perspective

Linear regression
▪ Predict a value of a given continuous variable based on the values of other variables, assuming a linear or nonlinear model of dependency
▪ Virtually endless applications:
o Election outcomes
o Future product revenues or commodity prices o Wind velocity
✓Both predictive and explanatory power
y = c0 + c1 x1 + … + cn xn
Spend = 7.4 + 0.37 * Income

Linear regression – usage
Two major categories where regression analysis can be leveraged
1. To predict, estimate or forecast the values: linear regression can be used to fit a predictive model to an observed data set of y and x values
▪ The model is fit to a set of known values using “training” data set and validated using a holdout “test” sample
▪ Once the model is built it can be leveraged to predict the y values for the records where only x values are available
o Estimate customer spend (scoring the universe) o Estimate future product demand (forecasting)
2. To quantify the strength of the relationship between y and the xi. To assess which xj has a strong relationship or whether a particular xk has a statistically significant relationship with a target variable
▪ The model is fit to a set of known values using “training” data set and validated using a holdout “test” sample, then the relationship between the target variable y and predictors xi is evaluated.
o Did the patients who received treatment (actual medication) showed statistically significant improvement vs. patients receiving a placebo?
o What was the biggest driver in customer response (TV ad, banner, direct mail, email)

Estimation / prediction
▪ Can we predict the CO2 emission of car without testing it?
▪ The CO2 emission of a car is calculated based on the engine size, class, model, make, cylinder, fuel consumption of that car. Prediction is used to predict its expected CO2 emission.
What is a prediction? Prediction is similar to classification but models are
Continuous / Numerical / Ordered valued
How does it work?
1. Split your data into training and test set
2. Construct a model using training set
3. Evaluate your model using test set
4. Use model to predict unknown value
Prediction example

Estimation / prediction
Historical data showing details of past cars including cylinder, engine size, consumption, Co2
Predict expected CO2 emissions for a new car

Engine Size
Training set
Test / Eval set Prediction set

Prediction
▪ Algorithms:
– Regression
• Simple regression
• Multiple regression
• Linear regression
• Non-linear regression
– k-nearest neighbor methods – Neural Networks
– Support Vector Regression
▪ What is regression analysis?
– Simple regression analysis (one independent variable)
– Multiple regression analysis (multiple independent variables)
▪ What are data science applications: – Marketing: sales forecasting
– Psychology: satisfaction analysis –…

Simple linear regression
◼ Target value is expected to be a linear combination of the input variables (straight line)
◼ Regression computes i from data to minimize squared error to ‘fit’ the data
a single predictor response variable

Linear regression – ordinary least squares (OLS)
Engine size

Multiple linear regression
• y  dependent / target / predicted / response variable
• x1, x2, …  independent / input / feature / predictor variable • 1, 2, …  coefficients of dependent variables
• 0  intercept
• 1 > 0  positive association
• 1 < 0  negative association • 1 = 0  no association Many nonlinear functions can be transformed into the above form Non-linear regression ▪Some nonlinear models can be modeled by a polynomial function ▪A polynomial regression model can be transformed into linear regression model ▪For example is converted to linear regression with new variables ▪Measure predictor accuracy – measure how far off the predicted value ( ) is from the actual known value ( ) ▪Loss function measures the error between and the predicted value – Absolute error – Squared error ▪Test error (generalization error) – the average loss over the test set – Mean Absolute Error – Mean Squared Error (Residual Sum of Squares * m) – Relative Absolute Error – Relative Squared Error – R2 = (1 – Relative Squared Error) Regression analysis: R2 and variable selection ▪ Goodness of fit in linear regression models is generally measured by using the R2 ▪ R2 measures how well the regression line approximates the real data points, it also portrays percent of variance in the data explained by regression model o If the value is close to 1, the model fits perfectly and explains all variance o If the value is close to 0, then the model does not fit the data and/or doesn’t explain any variance ▪Variable preparation: o Interval variables can be binned or bucketed in order to capture nonlinear relationship o Categorical variables must be converted into binary vectors. Data sample must be large enough to accommodate all degrees of freedom ▪ Variable selection (LASSO algorithm): Linear Regression – Machine Learning Perspective Linear regression – data mining and machine learning perspective ◼ Input (features): ❑ - number of features ❑ - input (features) of j-th training example ❑ - value of feature i in j-th training example ◼ Hypothesis: ◼ Parameters: ◼ Cost function: ◼ Optimization: 55 Linear regression – data mining and machine learning perspective ◼ Optimization: ◼ Solution algorithms: ◼ Non-linear optimization methods, e.g., iterative algorithms such as gradient descent algorithms, Newton and Quasi-Newton algorithms, etc. ◼ Linear algebra (normal equations), i.e., solving ◼ Solving normal equations: ◼ Solve system of linear equations for : Linear regression – data mining and machine learning perspective ◼ Solving normal equations (m examples, n features): ◼ Notation: ◼ Solution: ◼ Solution algorithm (solve system of linear equations with unknown ): 57 Logistic Regression – Machine Learning Perspective Other types of regression analysis Quantile regression ▪ Ordinary least squares regression approximates the conditional mean of the response variable, while quantile regression is estimating either the conditional median or other quantiles of the response variable ▪ This is very helpful in case of skewed data (e.g., income distribution in the US) or to deal with data without suppressing outliers Logistic regression ▪ Logistic regression is used to predict categorical target variable ▪ Most often a variable with a binary outcome ➢Logit and Probit regressions can also be used to predict binary outcome. While the underlying distributions are different, all three models will produce rather similar outcomes ▪ It is frequently used to estimate the probability of an event o Bank customer defaulting on the loan o Customer responding to a marketing promotion o Spam or not-spam email o Malignant or benign tumor 59 Classification ◼ Dataset: ◼ Targets: ◼ Features: ◼ Parameters: ? ◼ Classifier: ◼ Predictions: ◼ In classification we are trying to predict discrete targets Iris setosa (A) Iris versicolor (B) ◼ In the two-class problem means that can be equal 0 (“negative class”, e.g., spam) or 1 (“positive class”, e.g., not spam) ◼ Example classification problem – classify different flowers using measurements of the flower Classification ◼ Features are numerical attributes ◼ Good features can be used to predict targets (different classes) ◼ Sometimes we can separate different classes with a linear decision boundary Fisher's Iris Data Sepal length Sepal width Petal length Petal width decision boundary Classification ◼ Features are numerical attributes ◼ Good features can be used to predict targets (different classes) ◼ Sometimes we can separate different classes with a linear decision boundary ◼ Usually more features can give Fisher's Iris Data Sepal length Sepal width Petal length Petal width better performance decision boundary Linear regression vs. logistic regression ◼ Linear regression predicts real values ◼ Logistic regression predicts values in the range of 0 to 1 Using linear regression for classification ◼ Threshold classifier output at 0.5: ❑ If predict “y = 1” ❑ If predict “y = 0” ◼ What to do if < 0 or > 1? We want 64
Source: “Machine Learning”

Logistic regression – hypothesis representation
◼ We want :
❑ hypothesis for linear regression ❑ hypothesis for logistic regression
◼ Sigmoid function / logistic function
use logistic function to approximate threshold function
one important reason is that logistic function is differentiable
◼ We need to fit parameters 65
Source: “Machine Learning”

Logistic regression – hypothesis representation ◼ estimated probability that on input
❑ tell a patient that 70% chance of tumor being malignant
◼ Probability that , given , parameterized by
◼ Suppose:
❑ predict “ ” if
❑ predict “ ” if
Source: “Machine Learning”

Logistic regression – decision boundary
❑ predict “ ” if
❑ predict “ ” if ◼
❑ predict “ ” if ❑ predict “ ” if
decision boundary
decision boundary
Source: “Machine Learning”

Logistic regression – decision boundary
❑ predict “ ” if
❑ predict “ ” if
decision boundary
Source: “Machine Learning”

Logistic regression – decision boundary
❑ predict “ ” if
❑ predict “ ” if
Source: “Machine Learning”
decision boundary

Logistic regression – cost function
◼ Input (training set):
❑ – number of features
❑ – number of examples
❑ – input (features) of j-th training example ❑ – value of feature i in j-th training example
◼ Hypothesis:
◼ Parameters:
◼ Cost function:
◼ Optimization: 72
cost function that we used for linear regression is now non-convex
Source: “Machine Learning”

Logistic regression – cost function ◼ Cost function:
◼ More compact form of cost function:
Source: “Machine Learning”

Logistic regression – optimization problem ◼ Cost function:
in statistics this cost function can be derived using principle of Maximum Likelihood estimation
◼ Optimization to fit parameters : ◼ Solution algorithms:
◼ Non-linear optimization methods, e.g., iterative algorithms such as gradient descent algorithms, Newton and Quasi-Newton algorithms
◼ To make predictions given new we output 74

Logistic regression – multiclass classification
◼ One-vs-all (one-vs-rest)
❑ Train a logistic regression classifier for each class k to predict the
probability that
❑ On a new input x, to make a prediction, pick the class k that maximizes
◼ Email tagging – Primary, Social, Promotions, Forums
◼ Classify different flowers using measurements of the flower
Iris setosa (A) Iris versicolor (B) Iris virginica (C)

Multi-class classification algorithm (one-vs-all) – e-mail tagging
Class A Primary
p0=0.1 Class 0
Primary p1=0.7
Class 1 Primary
Class B Social
Class C Promotions
p1=0.9 Class 1
Class D Forums
Binary classification problem #1
Binary classification problem #2
Binary classification problem #3
Binary classification problem #4
Social, Promotions, Forums
p0=0.3 Class 0
p1=0.6 Class 1
p1=0.7 Class 1
Promotions, Forums
Primary, Social p1=0.8
p0=0.4 Class 0
Promotions
p1=0.6 Class 1
Forums p0=0.2
Class 0 Forums
Primary, Social, Promotions

Multi-class classification algorithm (one-vs-all) – e-mail tagging
pA=0.1 Class A
Primary p0=0.1
Class 0 Primary
p1=0.7 Class 1
pB=0.3 Class B
pC=0.4 Class C
Promotions p1=0.9
pD=0.2 Class D
Binary classification problem #1
Binary classification problem #2
Binary classification problem #3
Binary classification problem #4
Social, Promotions, Forums
p0=0.3 Class 0
p1=0.6 Class 1
p1=0.7 Class 1
Promotions, Forums
Primary, Social p1=0.8
p0=0.4 Class 0
Promotions
p1=0.6 Class 1
Forums p0=0.2
Class 0 Forums
Primary, Social, Promotions

Multi-class classification algorithm (one-vs-all) – e-mail tagging
pA=0.1 pB=0.3
Class A Primary
p0=0.1 Class 0
Primary p1=0.7
Class 1 Primary

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts