Introduction Types of analysis Analytics techniques Software Data Considerations
CORPFIN 2503 – Business Data Analytics: Introduction to data analytics
Week 1: July 26th, 2021
£ius CORPFIN 2503, Week 1 1/57
Copyright By PowCoder代写 加微信 powcoder
Introduction Types of analysis Analytics techniques Software Data Considerations
Introduction
Types of analysis
Analytics techniques
Considerations
CORPFIN 2503, Week 1 2/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Introduction
The purpose of business analytics is to improve protability and overall performance of business.
Business analytics can help businesses in:
• detecting credit card fraud
• identifying potential customers
• analyzing or predicting protability per customer
• getting new business insights and understanding business performance
• improve operations
All these processes use a huge amount of data, information technology (IT) infrastructure, and modern quantitative techniques.
£ius CORPFIN 2503, Week 1 3/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Introduction II
Any application of business analytics involves:
• a considerable amount of eort in dening the problem and the methodology to solve it
• data collection
• data cleansing
• model building
• model validation
• conducting the analysis
• interpretation of results, and
• making policy recommendations.
It is an iterative process, and the models might need to be built several times before they are nally accepted.
£ius CORPFIN 2503, Week 1 4/57
Introduction Types of analysis
Analytics techniques Software Data Considerations
• Simulations
• Modeling
• Optimization
• Data mining
Types of analysis
£ius CORPFIN 2503, Week 1 5/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Simulations
Simulations are computerized mathematical techniques that try to imitate the real-world systems, processes, or scenarios.
E.g., predict stock price, rm default, weather.
£ius CORPFIN 2503, Week 1 6/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Simulations II
A group of researchers simulated the FIFA 2018 tournament 100,000 times and came to the conclusion that Spain is the most likely winner, followed by Germany and Brazil. They used the following factors:
• the FIFA ranking
• each country’s population and their gross domestic product
• bookmakers’ odds
• how many of the national team players play together in a club • the player’s average age, and
• how many Champions Leagues they’ve won.
Neither Spain, Germany, nor Brazil made to semi-nals. . .
£ius CORPFIN 2503, Week 1 7/57
Introduction Types of analysis Analytics techniques Software Data Considerations
A modeling is merely the mathematical logic and concepts that go into a computer program, and along with the associated data, represent the real-world systems.
Models can be used to analyze the eect of dierent components and predict system behavior.
E.g., models that are used to forecast macro-economic variables (such as GDP, unemployment, or ination).
£ius CORPFIN 2503, Week 1 8/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Optimization
Optimizations are computer simulations in order to optimize (i.e., to minimize or maximize) a mathematical function, subject to a given set of constrains.
E.g., maximization of the working time of a machine, while keeping the maintenance costs below a certain level.
£ius CORPFIN 2503, Week 1 9/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Data mining
Data mining is the process of looking for interesting patterns in the data.
Types of data mining:
• characterization (characteristics of certain group of people/rms etc.)
• discrimination (comparison of dierent groups of people/rms etc.)
• association analysis (analysing the impact of a on b)
• predictive analysis
• clustering
• deviation analysis (nding the dierences between the expected and actual values).
£ius CORPFIN 2503, Week 1 10/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Analytics techniques
In this course, we will cover:
• Visual analytics and correlations • Linear regression analysis
• Logit and probit models
• Multinomial logit models
• Monte-Carlo simulations
• Time series forecasting
• Text analytics.
CORPFIN 2503, Week 1
Introduction Types of analysis Analytics techniques Software Data Considerations
Covariance and correlation
Provides a measure of the strength of the linear relation between two variables.
If positive (negative), two variables are positively (negative) related with each other.
Correlation (ρ) is standardized covariance. ρ ∈ [−1, 1].
The closer ρ is to either 1 or 1, the stronger the correlation (linear relationship) between the variables.
£ius CORPFIN 2503, Week 1 12/57
Introduction Types of analysis Analytics techniques Software Data Considerations
0.6 0.5 0.4 0.3 0.2 0.1 0.0
0.2 0.3 0.4 0.5
£ius CORPFIN 2503, Week 1 13/57
Introduction Types of analysis Analytics techniques Software Data Considerations
0.0 0.1 0.2 0.3 0.4 0.5
£ius CORPFIN 2503, Week 1 14/57
Introduction Types of analysis Analytics techniques Software Data Considerations
0.0 0.1 0.2 0.3 0.4 0.5
£ius CORPFIN 2503, Week 1 15/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Linear regressions
Linear regression analysis is an econometric technique of analyzing the impact of one or more factors (a.k.a. independent variables) on the variable of interest (a.k.a. dependent variable).
Ordinary least squares (OLS) or linear least squares is the most popular type of regressions.
£ius CORPFIN 2503, Week 1 16/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Regressions: US stocks monthly returns (2003-’07)
y = 0.1884x + 0.0012 R2 = 0.181
0.1 0.15 0.2
CRSP S&P 500
St. dev. 0.25
CORPFIN 2503, Week 1
Introduction Types of analysis Analytics techniques Software Data Considerations
Logit and probit models
Logit and probit models are similar to OLS regressions but their dependent variable can only have two values: 0 or 1.
Problems can be analyzed:
• determinants of rm defaults
• factors of M&A targets
• decisions of monetary authority
• corporate decisions: security issues, underwriter/advisor/auditor switching. . .
£ius CORPFIN 2503, Week 1 18/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Multinomial logit models
Multinomial logit models are similar to logit models but their dependent variable have more than two values (e.g., 0, 1, and 2).
Problems can be analyzed:
• corporate decision
• security/fund selection
• modeling optimal choice out of a few possibilities.
£ius CORPFIN 2503, Week 1 19/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Forecasting
Time series forecasting is a simple form of forecasting technique, wherein some data points are available over regular time intervals of days, weeks, or months.
If some patterns can be identied in the historical data, it is possible to project those patterns into the future as a forecast.
£ius CORPFIN 2503, Week 1 20/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Forecasting: Example
Extending the straight line:
74 72 70 68 66 64 62 60
Oil price (in USD pb)
CORPFIN 2503, Week 1
Introduction Types of analysis Analytics techniques Software Data Considerations
Text analytics
Text analysis helps
• Compute word frequency, distributions and patterns • Analyze sentiment.
Examples from IBM 2017 annual report.
£ius CORPFIN 2503, Week 1 22/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Text analytics: Word frequency
Word Length Coun
Weighted Percentage (%)
Similar Words
companies, company
millio 7 719 1.12 million, millions
years 5 684 1.07 year, years
perce 7 614 0.96 percent
2017 4 604 0.94 2017
incom 6 563 0.88 income, incomes
tax, taxed, taxes, taxing
operate, operates, operating, operation, operational, operations
£ius CORPFIN 2503, Week 1 23/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Text analytics: Word cloud
£ius CORPFIN 2503, Week 1 24/57
1/05/2018 3/05/2018 5/05/2018 7/05/2018 9/05/2018
11/05/2018 13/05/2018 15/05/2018 17/05/2018 19/05/2018 21/05/2018 23/05/2018 25/05/2018 27/05/2018 29/05/2018 31/05/2018
2/06/2018 4/06/2018 6/06/2018 8/06/2018
10/06/2018 12/06/2018 14/06/2018 16/06/2018 18/06/2018 20/06/2018 22/06/2018
Introduction Types of analysis Analytics techniques Software Data Considerations
Popular software in data analysis: • Excel
• Stata •R
• Matlab • SPSS.
CORPFIN 2503, Week 1
Introduction Types of analysis Analytics techniques Software Data Considerations
Software II
The choice of software might depend on the number of factors:
• What analytics techniques to be used and how frequently they will be used.
• Existing organizational processes
• Budgetary constraints
• Comfort level of the company in using open source software
• The size of data to be handled
• The sophistication of graphics and presentation required in the project
• How the current data is organized and how comfortable the team is in handling data.
£ius CORPFIN 2503, Week 1 26/57
Introduction Types of analysis Analytics techniques Software Data Considerations
SAS is an extremely powerful data analysis tool:
• capable to deal with very large les (le size is limited by the hard drive)
• can do lots of things
• popular among large nancial corporations and other entities
dealing with large datasets.
Disadvantages of SAS:
• rather steep learning curve • expensive.
£ius CORPFIN 2503, Week 1 27/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Basic advices for using SAS
1. Always save your SAS code (*.sas) 2. Use comments:
• /* Comment */ or • * Comment;
3. Don’t give up on debugging your code. Google.com is your good friend.
£ius CORPFIN 2503, Week 1 28/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Data can be:
• qualitative:
• rm location (city name) • day of a week
• quantitative:
• exchange rate
• oil price • etc.
CORPFIN 2503, Week 1
Introduction Types of analysis Analytics techniques Software Data Considerations
Obtaining data in a usable format is the rst step in any model-building process.
It is important to understand the format and content of the raw data:
• Does data need to be cleaned?
• Does data need to be coded (e.g., gender)?
• Does the data need to be reduced to a manageable form for analysis purposes?
The data sourcing, extraction, transformation, and cleansing may eat up to 70 percent of total hours made available to a business analytics project.
£ius CORPFIN 2503, Week 1 30/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Big data refers to data sets of large volumes that are beyond the limits of commonly used desktop database and analytical applications.
• CERN’s Large Hydron Collider Data Centre processes on average one petabyte (one million gigabytes) of data per day.
• Facebook, Google, and Walmart generate data in petabytes every day.
£ius CORPFIN 2503, Week 1 31/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Data handling
A research project has the following stages:
1. Identication of the problem =⇒ research question:
• should be important
2. Hypothesis development (not always applicable) 3. Data collection
4. Data analysis
5. Interpretation of results
6. Policy recommendation
£ius CORPFIN 2503, Week 1 32/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Data handling II
The quality of the data matters: garbage in garbage out.
The quality of results and subsequent recommendations depend on the quality of data used in the analysis.
One should select good sources of data.
Afterwards, one should handle and process data carefully.
£ius CORPFIN 2503, Week 1 33/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Data frequency
Data frequency: • Annual
• Intra-day (tick, 1 sec., etc.) • etc.
CORPFIN 2503, Week 1
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Time period
Time period: • 1 month • 1 quarter • 1 year
• 10 years etc.
Data frequency and time period are related:
• The higher the frequency, the shorter time period.
£ius CORPFIN 2503, Week 1 35/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Duplicates
Due to data recording and/or reporting issues, sometimes databases might contain duplicate (i.e., identical) observations.
They should be removed before the analysis.
£ius CORPFIN 2503, Week 1 36/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Importing the data
SAS code to import the data:
data work.sample;
input Company $ Analyst Recommendation $;
datalines;
IBM 112232 BUY
Apple 736352 BUY
Ford 929191 SELL
HP 929277 HOLD
IBM 112232 BUY
Amazon 48483 HOLD
IBM 32125 HOLD
CORPFIN 2503, Week 1
Introduction
Types of analysis Analytics techniques Software Data Considerations
Example of duplicates
Company Analyst IBM 112232 Apple 736352 Ford 929191 HP 929277 IBM 112232 Amazon 48483 IBM 32125
Recommendation BUY
HOLD BUY HOLD HOLD
CORPFIN 2503, Week 1
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Duplicates II
SAS code to sort the data:
PROC SORT data=work.sample;
BY Company;
£ius CORPFIN 2503, Week 1 39/57
Introduction
Types of analysis Analytics techniques Software Data Considerations
Example of duplicates II
Company Analyst Amazon 48483 Apple 736352 Ford 929191 HP 929277 IBM 112232 IBM 112232 IBM 32125
Recommendation HOLD BUY
SELL HOLD BUY
CORPFIN 2503, Week 1
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Duplicates II
SAS code to remove duplicates:
PROC SORT data=work.sample nodupkey;
BY Company;
Option NODUPKEY deletes those observations with duplicate BY variable (in our case: Company) values.
£ius CORPFIN 2503, Week 1 41/57
Introduction
Types of analysis Analytics techniques Software Data Considerations
Example of duplicates III
Company Analyst Amazon 48483 Apple 736352 Ford 929191 HP 929277 IBM 112232 IBM 112232 IBM 32125
Recommendation HOLD BUY
SELL HOLD BUY
CORPFIN 2503, Week 1
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Duplicates II
SAS code to remove duplicates:
PROC SORT data=work.sample nodup;
BY Company;
Option NODUP deletes duplicated observations.
£ius CORPFIN 2503, Week 1 43/57
Introduction
Types of analysis Analytics techniques Software Data Considerations
Example of duplicates III
Company Analyst Amazon 48483 Apple 736352 Ford 929191 HP 929277 IBM 112232 IBM 112232 IBM 32125
Recommendation HOLD BUY
SELL HOLD BUY
CORPFIN 2503, Week 1
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Missing observations
Sometimes some observations have missing values.
For example:
Company Analyst Amazon 48483 Apple . Ford 929191 IBM 112232 IBM 112232 IBM 32125 HP 929277
Recommendation HOLD BUY
CORPFIN 2503, Week 1
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Missing observations II
Solutions:
• Remove observations with missing values • Replace them with other values:
• group average (not popular in nance research but might be
an acceptable practice in other elds) • etc.
£ius CORPFIN 2503, Week 1 46/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Outliers
Outliers are extreme observations. E.g.:
• greater than μ+3σ or smaller than μ−3σ or
• the observations below the 1st percentile and the observation above the 99th percentile.
Outliers are likely to appear due to errors in reporting or in recording.
£ius CORPFIN 2503, Week 1 47/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Outliers II
Consider the following stock prices on 15 July 2019: • ICLD – $0.00060
• INTC – $49.92
• MSFT – $138.90
• AAPL – $203.30
• GOOG – $1,144.90
• BRK-A – $321,093.00
ICDL and BRK-A can be considered in the analysis as outliers despite there is no error in recording or reporting their stock prices.
£ius CORPFIN 2503, Week 1 48/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Outliers III
Possible ways to deal with outliers:
• Do nothing. If the outlier is really an important piece of information.
• Winsorize the data. The values of the outliers are changed to less extreme values.
• For example, 1% and 99% winsorization (or top 1% and bottom 1%) means that the values smaller than the 1st percentile are set to the 1st percentile, and values above the 99th percentile are set to the 99th percentile.
£ius CORPFIN 2503, Week 1 49/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Outliers IV
Other possible ways to deal with outliers:
• Truncate (trim) the data. The outliers are removed.
• For example, 1% and 99% truncation (or top 1% and bottom 1%) means that the values smaller than the 1st percentile and above the 99th percentile are removed.
• Change the scale of variable using natural logarithms (for positive values only).
£ius CORPFIN 2503, Week 1 50/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Outliers V
Stock prices after logarithmic transformation:
• ICLD – $0.00060
• INTC – $49.92
• MSFT – $138.90
• AAPL – $203.30
• GOOG – $1,144.90
• BRK-A – $321,093.00
=⇒ 7.42 =⇒ 3.91 =⇒ 4.93
=⇒ 5.31 =⇒ 7.04
£ius CORPFIN 2503, Week 1 51/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Fiscal vs calendar year
If we analyze rm-level data, we should pay some attention at the beginning and end of rm’s scal year.
In some cases, rm’s scal year will coincide with calendar year (January 1 – December 31).
In other cases – not. For example, for BHP scal year is July 1 – June 30.
£ius CORPFIN 2503, Week 1 52/57
Introduction Types of analysis Analytics techniques Software Data
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com