Introduction Types of analysis Analytics techniques Software Data Considerations
CORPFIN 2503 – Business Data Analytics:
Introduction to data analytics
£ius
Week 1: July 26th, 2021
£ius CORPFIN 2503, Week 1 1/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Outline
Introduction
Types of analysis
Analytics techniques
Software
Data
Considerations
£ius CORPFIN 2503, Week 1 2/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Introduction
The purpose of business analytics is to improve pro�tability and
overall performance of business.
Business analytics can help businesses in:
• detecting credit card fraud
• identifying potential customers
• analyzing or predicting pro�tability per customer
• getting new business insights and understanding business
performance
• improve operations
• etc.
All these processes use a huge amount of data, information
technology (IT) infrastructure, and modern quantitative techniques.
£ius CORPFIN 2503, Week 1 3/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Introduction II
Any application of business analytics involves:
• a considerable amount of e�ort in de�ning the problem and
the methodology to solve it
• data collection
• data cleansing
• model building
• model validation
• conducting the analysis
• interpretation of results, and
• making policy recommendations.
It is an iterative process, and the models might need to be built
several times before they are �nally accepted.
£ius CORPFIN 2503, Week 1 4/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Types of analysis
• Simulations
• Modeling
• Optimization
• Data mining
£ius CORPFIN 2503, Week 1 5/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Simulations
Simulations are computerized mathematical techniques that try to
imitate the real-world systems, processes, or scenarios.
E.g., predict stock price, �rm default, weather.
£ius CORPFIN 2503, Week 1 6/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Simulations II
Example:
A group of researchers simulated the FIFA 2018 tournament
100,000 times and came to the conclusion that Spain is the most
likely winner, followed by Germany and Brazil. They used the
following factors:
• the FIFA ranking
• each country’s population and their gross domestic product
• bookmakers’ odds
• how many of the national team players play together in a club
• the player’s average age, and
• how many Champions Leagues they’ve won.
Neither Spain, Germany, nor Brazil made to semi-�nals. . .
£ius CORPFIN 2503, Week 1 7/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Modeling
A modeling is merely the mathematical logic and concepts that go
into a computer program, and along with the associated data,
represent the real-world systems.
Models can be used to analyze the e�ect of di�erent components
and predict system behavior.
E.g., models that are used to forecast macro-economic variables
(such as GDP, unemployment, or in�ation).
£ius CORPFIN 2503, Week 1 8/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Optimization
Optimizations are computer simulations in order to optimize (i.e.,
to minimize or maximize) a mathematical function, subject to a
given set of constrains.
E.g., maximization of the working time of a machine, while keeping
the maintenance costs below a certain level.
£ius CORPFIN 2503, Week 1 9/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Data mining
Data mining is the process of looking for interesting patterns in
the data.
Types of data mining:
• characterization (characteristics of certain group of
people/�rms etc.)
• discrimination (comparison of di�erent groups of people/�rms
etc.)
• association analysis (analysing the impact of a on b)
• predictive analysis
• clustering
• deviation analysis (�nding the di�erences between the
expected and actual values).
£ius CORPFIN 2503, Week 1 10/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Analytics techniques
In this course, we will cover:
• Visual analytics and correlations
• Linear regression analysis
• Logit and probit models
• Multinomial logit models
• Monte-Carlo simulations
• Time series forecasting
• Text analytics.
£ius CORPFIN 2503, Week 1 11/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Covariance and correlation
Provides a measure of the strength of the linear relation between
two variables.
If positive (negative), two variables are positively (negative) related
with each other.
Correlation (ρ) is standardized covariance.
ρ ∈ [−1, 1].
The closer ρ is to either �1 or 1, the stronger the correlation (linear
relationship) between the variables.
£ius CORPFIN 2503, Week 1 12/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Corr=0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.0 0.1 0.2 0.3 0.4 0.5
£ius CORPFIN 2503, Week 1 13/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Corr=1
0.0
0.1
0.2
0.3
0.0 0.1 0.2 0.3 0.4 0.5
£ius CORPFIN 2503, Week 1 14/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Corr=�1
-0.3
-0.2
-0.1
0.0
0.0 0.1 0.2 0.3 0.4 0.5
£ius CORPFIN 2503, Week 1 15/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Linear regressions
Linear regression analysis is an econometric technique of
analyzing the impact of one or more factors (a.k.a. independent
variables) on the variable of interest (a.k.a. dependent variable).
Ordinary least squares (OLS) or linear least squares is the most
popular type of regressions.
£ius CORPFIN 2503, Week 1 16/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Regressions: US stocks monthly returns (2003-’07)
CRSP
S&P 500
DJI 30
y = 0.1884x + 0.0012
R² = 0.181
-0.02
0
0.02
0.04
0.06
0.08
0 0.05 0.1 0.15 0.2 0.25
Return
St. dev.
£ius CORPFIN 2503, Week 1 17/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Logit and probit models
Logit and probit models are similar to OLS regressions but their
dependent variable can only have two values: 0 or 1.
Problems can be analyzed:
• determinants of �rm defaults
• factors of M&A targets
• decisions of monetary authority
• corporate decisions: security issues,
underwriter/advisor/auditor switching. . .
£ius CORPFIN 2503, Week 1 18/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Multinomial logit models
Multinomial logit models are similar to logit models but their
dependent variable have more than two values (e.g., 0, 1, and 2).
Problems can be analyzed:
• corporate decision
• security/fund selection
• modeling optimal choice out of a few possibilities.
£ius CORPFIN 2503, Week 1 19/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Forecasting
Time series forecasting is a simple form of forecasting technique,
wherein some data points are available over regular time intervals
of days, weeks, or months.
If some patterns can be identi�ed in the historical data, it is
possible to project those patterns into the future as a forecast.
£ius CORPFIN 2503, Week 1 20/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Forecasting: Example
Extending the straight line:
60
62
64
66
68
70
72
74
1
/0
5
/2
0
1
8
3
/0
5
/2
0
1
8
5
/0
5
/2
0
1
8
7
/0
5
/2
0
1
8
9
/0
5
/2
0
1
8
1
1
/0
5
/2
0
1
8
1
3
/0
5
/2
0
1
8
1
5
/0
5
/2
0
1
8
1
7
/0
5
/2
0
1
8
1
9
/0
5
/2
0
1
8
2
1
/0
5
/2
0
1
8
2
3
/0
5
/2
0
1
8
2
5
/0
5
/2
0
1
8
2
7
/0
5
/2
0
1
8
2
9
/0
5
/2
0
1
8
3
1
/0
5
/2
0
1
8
2
/0
6
/2
0
1
8
4
/0
6
/2
0
1
8
6
/0
6
/2
0
1
8
8
/0
6
/2
0
1
8
1
0
/0
6
/2
0
1
8
1
2
/0
6
/2
0
1
8
1
4
/0
6
/2
0
1
8
1
6
/0
6
/2
0
1
8
1
8
/0
6
/2
0
1
8
2
0
/0
6
/2
0
1
8
2
2
/0
6
/2
0
1
8
Oil price (in USD pb)
£ius CORPFIN 2503, Week 1 21/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Text analytics
Text analysis helps
• Compute word frequency, distributions and patterns
• Analyze sentiment.
Examples from IBM 2017 annual report.
£ius CORPFIN 2503, Week 1 22/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Text analytics: Word frequency
comp 7 1067 1.67 companies,
company
Word Length Coun Weighted Percentage (%) Similar Words
millio 7 719 1.12 million, millions
years 5 684 1.07 year, years
perce 7 614 0.96 percent
2017 4 604 0.94 2017
incom 6 563 0.88 income, incomes
taxes 5 527 0.82 tax, taxed, taxes,
taxing
opera 10 501 0.78 operate, operates,
operating,
operation,
operational,
operations
£ius CORPFIN 2503, Week 1 23/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Text analytics: Word cloud
£ius CORPFIN 2503, Week 1 24/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Software
Popular software in data analysis:
• Excel
• SAS
• Stata
• R
• Matlab
• SPSS.
£ius CORPFIN 2503, Week 1 25/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Software II
The choice of software might depend on the number of factors:
• What analytics techniques to be used and how frequently they
will be used.
• Existing organizational processes
• Budgetary constraints
• Comfort level of the company in using open source software
• The size of data to be handled
• The sophistication of graphics and presentation required in the
project
• How the current data is organized and how comfortable the
team is in handling data.
£ius CORPFIN 2503, Week 1 26/57
Introduction Types of analysis Analytics techniques Software Data Considerations
SAS
SAS is an extremely powerful data analysis tool:
• fast
• capable to deal with very large �les (�le size is limited by the
hard drive)
• can do lots of things
• popular among large �nancial corporations and other entities
dealing with large datasets.
Disadvantages of SAS:
• rather steep learning curve
• expensive.
£ius CORPFIN 2503, Week 1 27/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Basic advices for using SAS
1. Always save your SAS code (*.sas)
2. Use comments:
• /* Comment */ or
• * Comment;
3. Don’t give up on debugging your code. Google.com is your
good friend.
£ius CORPFIN 2503, Week 1 28/57
Google.com
Introduction Types of analysis Analytics techniques Software Data Considerations
Data
Data can be:
• qualitative:
• �rm location (city name)
• day of a week
• etc.
• quantitative:
• exchange rate
• oil price
• etc.
£ius CORPFIN 2503, Week 1 29/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Data II
Obtaining data in a usable format is the �rst step in any
model-building process.
It is important to understand the format and content of the raw
data:
• Does data need to be �cleaned�?
• Does data need to be coded (e.g., gender)?
• Does the data need to be reduced to a manageable form for
analysis purposes?
The data sourcing, extraction, transformation, and cleansing may
eat up to 70 percent of total hours made available to a business
analytics project.
£ius CORPFIN 2503, Week 1 30/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Big Data
Big data refers to data sets of large volumes that are beyond the
limits of commonly used desktop database and analytical
applications.
Examples:
• CERN’s Large Hydron Collider Data Centre processes on
average one petabyte (one million gigabytes) of data per day.
• Facebook, Google, and Walmart generate data in petabytes
every day.
£ius CORPFIN 2503, Week 1 31/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Data handling
A research project has the following stages:
1. Identi�cation of the problem =⇒ research question:
• should be important
2. Hypothesis development (not always applicable)
3. Data collection
4. Data analysis
5. Interpretation of results
6. Policy recommendation
£ius CORPFIN 2503, Week 1 32/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Data handling II
The quality of the data matters: �garbage in � garbage out.�
The quality of results and subsequent recommendations depend on
the quality of data used in the analysis.
One should select good sources of data.
Afterwards, one should handle and process data carefully.
£ius CORPFIN 2503, Week 1 33/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Data frequency
Data frequency:
• Annual
• Monthly
• Daily
• Intra-day (tick, 1 sec., etc.)
• etc.
£ius CORPFIN 2503, Week 1 34/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Time period
Time period:
• 1 month
• 1 quarter
• 1 year
• 10 years etc.
Data frequency and time period are related:
• The higher the frequency, the shorter time period.
£ius CORPFIN 2503, Week 1 35/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Duplicates
Due to data recording and/or reporting issues, sometimes
databases might contain duplicate (i.e., identical) observations.
They should be removed before the analysis.
£ius CORPFIN 2503, Week 1 36/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Importing the data
SAS code to import the data:
data work.sample;
input Company $ Analyst Recommendation $;
datalines;
IBM 112232 BUY
Apple 736352 BUY
Ford 929191 SELL
HP 929277 HOLD
IBM 112232 BUY
Amazon 48483 HOLD
IBM 32125 HOLD
;
run;
£ius CORPFIN 2503, Week 1 37/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Example of duplicates
Company Analyst Recommendation
IBM 112232 BUY
Apple 736352 BUY
Ford 929191 SELL
HP 929277 HOLD
IBM 112232 BUY
Amazon 48483 HOLD
IBM 32125 HOLD
£ius CORPFIN 2503, Week 1 38/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Duplicates II
SAS code to sort the data:
PROC SORT data=work.sample;
BY Company;
RUN;
£ius CORPFIN 2503, Week 1 39/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Example of duplicates II
Company Analyst Recommendation
Amazon 48483 HOLD
Apple 736352 BUY
Ford 929191 SELL
HP 929277 HOLD
IBM 112232 BUY
IBM 112232 BUY
IBM 32125 HOLD
£ius CORPFIN 2503, Week 1 40/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Duplicates II
SAS code to remove duplicates:
PROC SORT data=work.sample nodupkey;
BY Company;
RUN;
Option NODUPKEY deletes those observations with duplicate BY
variable (in our case: Company) values.
£ius CORPFIN 2503, Week 1 41/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Example of duplicates III
Company Analyst Recommendation
Amazon 48483 HOLD
Apple 736352 BUY
Ford 929191 SELL
HP 929277 HOLD
IBM 112232 BUY
IBM 112232 BUY
IBM 32125 HOLD
£ius CORPFIN 2503, Week 1 42/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Duplicates II
SAS code to remove duplicates:
PROC SORT data=work.sample nodup;
BY Company;
RUN;
Option NODUP deletes duplicated observations.
£ius CORPFIN 2503, Week 1 43/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Example of duplicates III
Company Analyst Recommendation
Amazon 48483 HOLD
Apple 736352 BUY
Ford 929191 SELL
HP 929277 HOLD
IBM 112232 BUY
IBM 112232 BUY
IBM 32125 HOLD
£ius CORPFIN 2503, Week 1 44/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Missing observations
Sometimes some observations have missing values.
For example:
Company Analyst Recommendation
Amazon 48483 HOLD
Apple . BUY
Ford 929191 SELL
IBM 112232 BUY
IBM 112232 BUY
IBM 32125 .
HP 929277 HOLD
£ius CORPFIN 2503, Week 1 45/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Missing observations II
Solutions:
• Remove observations with missing values
• Replace them with other values:
• 0
• group average (not popular in �nance research but might be
an acceptable practice in other �elds)
• etc.
£ius CORPFIN 2503, Week 1 46/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Outliers
Outliers are extreme observations. E.g.:
• greater than µ+ 3σ or smaller than µ− 3σ or
• the observations below the 1st percentile and the observation
above the 99th percentile.
Outliers are likely to appear due to errors in reporting or in
recording.
£ius CORPFIN 2503, Week 1 47/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Outliers II
Consider the following stock prices on 15 July 2019:
• ICLD – $0.00060
• INTC – $49.92
• MSFT – $138.90
• AAPL – $203.30
• GOOG – $1,144.90
• BRK-A – $321,093.00
ICDL and BRK-A can be considered in the analysis as outliers
despite there is no error in recording or reporting their stock prices.
£ius CORPFIN 2503, Week 1 48/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Outliers III
Possible ways to deal with outliers:
• Do nothing. If the outlier is really an important piece of
information.
• Winsorize the data. The values of the outliers are changed to
less extreme values.
• For example, 1% and 99% winsorization (or top 1% and
bottom 1%) means that the values smaller than the 1st
percentile are set to the 1st percentile, and values above the
99th percentile are set to the 99th percentile.
£ius CORPFIN 2503, Week 1 49/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Outliers IV
Other possible ways to deal with outliers:
• Truncate (trim) the data. The outliers are removed.
• For example, 1% and 99% truncation (or top 1% and bottom
1%) means that the values smaller than the 1st percentile and
above the 99th percentile are removed.
• Change the scale of variable using natural logarithms (for
positive values only).
£ius CORPFIN 2503, Week 1 50/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Outliers V
Stock prices after logarithmic transformation:
• ICLD – $0.00060 =⇒ �7.42
• INTC – $49.92 =⇒ 3.91
• MSFT – $138.90 =⇒ 4.93
• AAPL – $203.30 =⇒ 5.31
• GOOG – $1,144.90 =⇒ 7.04
• BRK-A – $321,093.00 =⇒ 12.68
£ius CORPFIN 2503, Week 1 51/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Important considerations: Fiscal vs calendar year
If we analyze �rm-level data, we should pay some attention at the
beginning and end of �rm’s �scal year.
In some cases, �rm’s �scal year will coincide with calendar year
(January 1 – December 31).
In other cases – not. For example, for BHP �scal year is July 1 –
June 30.
£ius CORPFIN 2503, Week 1 52/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Variable coding
Some variables need to be coded.
For example, if in quantitive analysis, one needs to analyze the
impact of gender, one should code the variable gender:
• 1 if gender is female and
• 0 if gender is male.
£ius CORPFIN 2503, Week 1 53/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Variable scaling
Firm speci�c variables can be scaled by assets or sales in order to
control for size e�ect.
E.g., you want to analyze pro�tability. Instead of looking at net
pro�t in $, one should probably consider looking at ROA (net pro�t
over assets) or pro�t margin (net pro�t over sales).
£ius CORPFIN 2503, Week 1 54/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Distribution
A lot of tests and analytical methods assume normal distribution.
However, some variables are not normally distributed.
For example, stock returns follow log-normal distribution, in
general.
=⇒ Take natural logarithm.
£ius CORPFIN 2503, Week 1 55/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Merging several databases
In many cases, one needs to merge several databases.
In most cases, one needs to merge di�erent �les by security name
(�rm name, ticker code, commodity etc.) and date (e.g., �scal
year, calendar year).
One should make sure that de�nitions of variables used to merge
the databases (such as �rm name and �scal year) are consistent
across databases.
Duplicate observations (if any) should be removed prior to merging.
£ius CORPFIN 2503, Week 1 56/57
Introduction Types of analysis Analytics techniques Software Data Considerations
Required reading
Konasani, V. R. and Kadre, S. (2015). �Practical Business
Analytics Using SAS: A Hands-on Guide�: chapters 1, 2, 3, and 4.
£ius CORPFIN 2503, Week 1 57/57
Introduction
Introduction
Types of analysis
Types of analysis
Analytics techniques
Analytics techniques
Software
Software
Data
Data
Considerations
Considerations