程序代写CS代考 matlab database data mining Excel Introduction Types of analysis Analytics techniques Software Data Considerations

Introduction Types of analysis Analytics techniques Software Data Considerations

CORPFIN 2503 – Business Data Analytics:
Introduction to data analytics

£ius

Week 1: July 26th, 2021

£ius CORPFIN 2503, Week 1 1/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Outline

Introduction

Types of analysis

Analytics techniques

Software

Data

Considerations

£ius CORPFIN 2503, Week 1 2/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Introduction
The purpose of business analytics is to improve pro�tability and
overall performance of business.

Business analytics can help businesses in:

• detecting credit card fraud
• identifying potential customers
• analyzing or predicting pro�tability per customer
• getting new business insights and understanding business
performance

• improve operations
• etc.

All these processes use a huge amount of data, information
technology (IT) infrastructure, and modern quantitative techniques.

£ius CORPFIN 2503, Week 1 3/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Introduction II

Any application of business analytics involves:

• a considerable amount of e�ort in de�ning the problem and
the methodology to solve it

• data collection
• data cleansing
• model building
• model validation
• conducting the analysis
• interpretation of results, and
• making policy recommendations.

It is an iterative process, and the models might need to be built
several times before they are �nally accepted.

£ius CORPFIN 2503, Week 1 4/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Types of analysis

• Simulations
• Modeling
• Optimization
• Data mining

£ius CORPFIN 2503, Week 1 5/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Simulations

Simulations are computerized mathematical techniques that try to
imitate the real-world systems, processes, or scenarios.

E.g., predict stock price, �rm default, weather.

£ius CORPFIN 2503, Week 1 6/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Simulations II

Example:

A group of researchers simulated the FIFA 2018 tournament
100,000 times and came to the conclusion that Spain is the most
likely winner, followed by Germany and Brazil. They used the
following factors:

• the FIFA ranking
• each country’s population and their gross domestic product
• bookmakers’ odds
• how many of the national team players play together in a club
• the player’s average age, and
• how many Champions Leagues they’ve won.

Neither Spain, Germany, nor Brazil made to semi-�nals. . .

£ius CORPFIN 2503, Week 1 7/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Modeling

A modeling is merely the mathematical logic and concepts that go
into a computer program, and along with the associated data,
represent the real-world systems.

Models can be used to analyze the e�ect of di�erent components
and predict system behavior.

E.g., models that are used to forecast macro-economic variables
(such as GDP, unemployment, or in�ation).

£ius CORPFIN 2503, Week 1 8/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Optimization

Optimizations are computer simulations in order to optimize (i.e.,
to minimize or maximize) a mathematical function, subject to a
given set of constrains.

E.g., maximization of the working time of a machine, while keeping
the maintenance costs below a certain level.

£ius CORPFIN 2503, Week 1 9/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Data mining

Data mining is the process of looking for interesting patterns in
the data.

Types of data mining:

• characterization (characteristics of certain group of
people/�rms etc.)

• discrimination (comparison of di�erent groups of people/�rms
etc.)

• association analysis (analysing the impact of a on b)
• predictive analysis
• clustering
• deviation analysis (�nding the di�erences between the
expected and actual values).

£ius CORPFIN 2503, Week 1 10/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Analytics techniques

In this course, we will cover:

• Visual analytics and correlations
• Linear regression analysis
• Logit and probit models
• Multinomial logit models
• Monte-Carlo simulations
• Time series forecasting
• Text analytics.

£ius CORPFIN 2503, Week 1 11/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Covariance and correlation

Provides a measure of the strength of the linear relation between
two variables.

If positive (negative), two variables are positively (negative) related
with each other.

Correlation (ρ) is standardized covariance.

ρ ∈ [−1, 1].

The closer ρ is to either �1 or 1, the stronger the correlation (linear
relationship) between the variables.

£ius CORPFIN 2503, Week 1 12/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Corr=0

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.0 0.1 0.2 0.3 0.4 0.5

£ius CORPFIN 2503, Week 1 13/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Corr=1

0.0

0.1

0.2

0.3

0.0 0.1 0.2 0.3 0.4 0.5

£ius CORPFIN 2503, Week 1 14/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Corr=�1

-0.3

-0.2

-0.1

0.0

0.0 0.1 0.2 0.3 0.4 0.5

£ius CORPFIN 2503, Week 1 15/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Linear regressions

Linear regression analysis is an econometric technique of
analyzing the impact of one or more factors (a.k.a. independent
variables) on the variable of interest (a.k.a. dependent variable).

Ordinary least squares (OLS) or linear least squares is the most
popular type of regressions.

£ius CORPFIN 2503, Week 1 16/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Regressions: US stocks monthly returns (2003-’07)

CRSP

S&P 500

DJI 30

y = 0.1884x + 0.0012

R² = 0.181

-0.02

0

0.02

0.04

0.06

0.08

0 0.05 0.1 0.15 0.2 0.25

Return

St. dev.

£ius CORPFIN 2503, Week 1 17/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Logit and probit models

Logit and probit models are similar to OLS regressions but their
dependent variable can only have two values: 0 or 1.

Problems can be analyzed:

• determinants of �rm defaults
• factors of M&A targets
• decisions of monetary authority
• corporate decisions: security issues,
underwriter/advisor/auditor switching. . .

£ius CORPFIN 2503, Week 1 18/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Multinomial logit models

Multinomial logit models are similar to logit models but their
dependent variable have more than two values (e.g., 0, 1, and 2).

Problems can be analyzed:

• corporate decision
• security/fund selection
• modeling optimal choice out of a few possibilities.

£ius CORPFIN 2503, Week 1 19/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Forecasting

Time series forecasting is a simple form of forecasting technique,
wherein some data points are available over regular time intervals
of days, weeks, or months.

If some patterns can be identi�ed in the historical data, it is
possible to project those patterns into the future as a forecast.

£ius CORPFIN 2503, Week 1 20/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Forecasting: Example

Extending the straight line:

60

62

64

66

68

70

72

74

1
/0

5
/2

0
1
8

3
/0

5
/2

0
1
8

5
/0

5
/2

0
1
8

7
/0

5
/2

0
1
8

9
/0

5
/2

0
1
8

1
1

/0
5
/2

0
1

8

1
3
/0

5
/2

0
1
8

1
5
/0

5
/2

0
1
8

1
7

/0
5
/2

0
1

8

1
9
/0

5
/2

0
1
8

2
1
/0

5
/2

0
1
8

2
3

/0
5
/2

0
1

8

2
5
/0

5
/2

0
1
8

2
7
/0

5
/2

0
1
8

2
9

/0
5
/2

0
1

8

3
1
/0

5
/2

0
1
8

2
/0

6
/2

0
1
8

4
/0

6
/2

0
1
8

6
/0

6
/2

0
1
8

8
/0

6
/2

0
1
8

1
0
/0

6
/2

0
1
8

1
2
/0

6
/2

0
1
8

1
4
/0

6
/2

0
1
8

1
6
/0

6
/2

0
1
8

1
8
/0

6
/2

0
1
8

2
0
/0

6
/2

0
1
8

2
2
/0

6
/2

0
1
8

Oil price (in USD pb)

£ius CORPFIN 2503, Week 1 21/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Text analytics

Text analysis helps

• Compute word frequency, distributions and patterns
• Analyze sentiment.

Examples from IBM 2017 annual report.

£ius CORPFIN 2503, Week 1 22/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Text analytics: Word frequency

comp 7 1067 1.67 companies,
company

Word Length Coun Weighted Percentage (%) Similar Words

millio 7 719 1.12 million, millions

years 5 684 1.07 year, years

perce 7 614 0.96 percent

2017 4 604 0.94 2017

incom 6 563 0.88 income, incomes

taxes 5 527 0.82 tax, taxed, taxes,
taxing

opera 10 501 0.78 operate, operates,
operating,
operation,
operational,
operations

£ius CORPFIN 2503, Week 1 23/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Text analytics: Word cloud

£ius CORPFIN 2503, Week 1 24/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Software

Popular software in data analysis:

• Excel
• SAS
• Stata
• R
• Matlab
• SPSS.

£ius CORPFIN 2503, Week 1 25/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Software II

The choice of software might depend on the number of factors:

• What analytics techniques to be used and how frequently they
will be used.

• Existing organizational processes
• Budgetary constraints
• Comfort level of the company in using open source software
• The size of data to be handled
• The sophistication of graphics and presentation required in the
project

• How the current data is organized and how comfortable the
team is in handling data.

£ius CORPFIN 2503, Week 1 26/57

Introduction Types of analysis Analytics techniques Software Data Considerations

SAS

SAS is an extremely powerful data analysis tool:

• fast
• capable to deal with very large �les (�le size is limited by the
hard drive)

• can do lots of things
• popular among large �nancial corporations and other entities
dealing with large datasets.

Disadvantages of SAS:

• rather steep learning curve
• expensive.

£ius CORPFIN 2503, Week 1 27/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Basic advices for using SAS

1. Always save your SAS code (*.sas)

2. Use comments:
• /* Comment */ or
• * Comment;

3. Don’t give up on debugging your code. Google.com is your
good friend.

£ius CORPFIN 2503, Week 1 28/57

Google.com

Introduction Types of analysis Analytics techniques Software Data Considerations

Data

Data can be:

• qualitative:
• �rm location (city name)
• day of a week
• etc.

• quantitative:
• exchange rate
• oil price
• etc.

£ius CORPFIN 2503, Week 1 29/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Data II

Obtaining data in a usable format is the �rst step in any
model-building process.

It is important to understand the format and content of the raw
data:

• Does data need to be �cleaned�?
• Does data need to be coded (e.g., gender)?
• Does the data need to be reduced to a manageable form for
analysis purposes?

The data sourcing, extraction, transformation, and cleansing may
eat up to 70 percent of total hours made available to a business
analytics project.

£ius CORPFIN 2503, Week 1 30/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Big Data

Big data refers to data sets of large volumes that are beyond the
limits of commonly used desktop database and analytical
applications.

Examples:

• CERN’s Large Hydron Collider Data Centre processes on
average one petabyte (one million gigabytes) of data per day.

• Facebook, Google, and Walmart generate data in petabytes
every day.

£ius CORPFIN 2503, Week 1 31/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Data handling

A research project has the following stages:

1. Identi�cation of the problem =⇒ research question:
• should be important

2. Hypothesis development (not always applicable)

3. Data collection

4. Data analysis

5. Interpretation of results

6. Policy recommendation

£ius CORPFIN 2503, Week 1 32/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Data handling II

The quality of the data matters: �garbage in � garbage out.�

The quality of results and subsequent recommendations depend on
the quality of data used in the analysis.

One should select good sources of data.

Afterwards, one should handle and process data carefully.

£ius CORPFIN 2503, Week 1 33/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Important considerations: Data frequency

Data frequency:

• Annual
• Monthly
• Daily
• Intra-day (tick, 1 sec., etc.)
• etc.

£ius CORPFIN 2503, Week 1 34/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Important considerations: Time period

Time period:

• 1 month
• 1 quarter
• 1 year
• 10 years etc.

Data frequency and time period are related:

• The higher the frequency, the shorter time period.

£ius CORPFIN 2503, Week 1 35/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Important considerations: Duplicates

Due to data recording and/or reporting issues, sometimes
databases might contain duplicate (i.e., identical) observations.

They should be removed before the analysis.

£ius CORPFIN 2503, Week 1 36/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Importing the data

SAS code to import the data:

data work.sample;

input Company $ Analyst Recommendation $;

datalines;

IBM 112232 BUY

Apple 736352 BUY

Ford 929191 SELL

HP 929277 HOLD

IBM 112232 BUY

Amazon 48483 HOLD

IBM 32125 HOLD

;

run;

£ius CORPFIN 2503, Week 1 37/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Example of duplicates

Company Analyst Recommendation

IBM 112232 BUY
Apple 736352 BUY
Ford 929191 SELL
HP 929277 HOLD
IBM 112232 BUY
Amazon 48483 HOLD
IBM 32125 HOLD

£ius CORPFIN 2503, Week 1 38/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Important considerations: Duplicates II

SAS code to sort the data:

PROC SORT data=work.sample;

BY Company;

RUN;

£ius CORPFIN 2503, Week 1 39/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Example of duplicates II

Company Analyst Recommendation

Amazon 48483 HOLD
Apple 736352 BUY
Ford 929191 SELL
HP 929277 HOLD
IBM 112232 BUY
IBM 112232 BUY
IBM 32125 HOLD

£ius CORPFIN 2503, Week 1 40/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Important considerations: Duplicates II

SAS code to remove duplicates:

PROC SORT data=work.sample nodupkey;

BY Company;

RUN;

Option NODUPKEY deletes those observations with duplicate BY
variable (in our case: Company) values.

£ius CORPFIN 2503, Week 1 41/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Example of duplicates III

Company Analyst Recommendation

Amazon 48483 HOLD
Apple 736352 BUY
Ford 929191 SELL
HP 929277 HOLD
IBM 112232 BUY
IBM 112232 BUY
IBM 32125 HOLD

£ius CORPFIN 2503, Week 1 42/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Important considerations: Duplicates II

SAS code to remove duplicates:

PROC SORT data=work.sample nodup;

BY Company;

RUN;

Option NODUP deletes duplicated observations.

£ius CORPFIN 2503, Week 1 43/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Example of duplicates III

Company Analyst Recommendation

Amazon 48483 HOLD
Apple 736352 BUY
Ford 929191 SELL
HP 929277 HOLD
IBM 112232 BUY
IBM 112232 BUY
IBM 32125 HOLD

£ius CORPFIN 2503, Week 1 44/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Important considerations: Missing observations

Sometimes some observations have missing values.

For example:

Company Analyst Recommendation

Amazon 48483 HOLD
Apple . BUY
Ford 929191 SELL
IBM 112232 BUY
IBM 112232 BUY
IBM 32125 .
HP 929277 HOLD

£ius CORPFIN 2503, Week 1 45/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Important considerations: Missing observations II

Solutions:

• Remove observations with missing values
• Replace them with other values:

• 0
• group average (not popular in �nance research but might be

an acceptable practice in other �elds)
• etc.

£ius CORPFIN 2503, Week 1 46/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Important considerations: Outliers

Outliers are extreme observations. E.g.:

• greater than µ+ 3σ or smaller than µ− 3σ or
• the observations below the 1st percentile and the observation
above the 99th percentile.

Outliers are likely to appear due to errors in reporting or in
recording.

£ius CORPFIN 2503, Week 1 47/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Important considerations: Outliers II

Consider the following stock prices on 15 July 2019:

• ICLD – $0.00060
• INTC – $49.92
• MSFT – $138.90
• AAPL – $203.30
• GOOG – $1,144.90
• BRK-A – $321,093.00

ICDL and BRK-A can be considered in the analysis as outliers
despite there is no error in recording or reporting their stock prices.

£ius CORPFIN 2503, Week 1 48/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Important considerations: Outliers III

Possible ways to deal with outliers:

• Do nothing. If the outlier is really an important piece of
information.

• Winsorize the data. The values of the outliers are changed to
less extreme values.

• For example, 1% and 99% winsorization (or top 1% and
bottom 1%) means that the values smaller than the 1st

percentile are set to the 1st percentile, and values above the
99th percentile are set to the 99th percentile.

£ius CORPFIN 2503, Week 1 49/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Important considerations: Outliers IV

Other possible ways to deal with outliers:

• Truncate (trim) the data. The outliers are removed.
• For example, 1% and 99% truncation (or top 1% and bottom

1%) means that the values smaller than the 1st percentile and
above the 99th percentile are removed.

• Change the scale of variable using natural logarithms (for
positive values only).

£ius CORPFIN 2503, Week 1 50/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Important considerations: Outliers V

Stock prices after logarithmic transformation:

• ICLD – $0.00060 =⇒ �7.42
• INTC – $49.92 =⇒ 3.91
• MSFT – $138.90 =⇒ 4.93
• AAPL – $203.30 =⇒ 5.31
• GOOG – $1,144.90 =⇒ 7.04
• BRK-A – $321,093.00 =⇒ 12.68

£ius CORPFIN 2503, Week 1 51/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Important considerations: Fiscal vs calendar year

If we analyze �rm-level data, we should pay some attention at the
beginning and end of �rm’s �scal year.

In some cases, �rm’s �scal year will coincide with calendar year
(January 1 – December 31).

In other cases – not. For example, for BHP �scal year is July 1 –
June 30.

£ius CORPFIN 2503, Week 1 52/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Variable coding

Some variables need to be coded.

For example, if in quantitive analysis, one needs to analyze the
impact of gender, one should code the variable gender:

• 1 if gender is female and
• 0 if gender is male.

£ius CORPFIN 2503, Week 1 53/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Variable scaling

Firm speci�c variables can be scaled by assets or sales in order to
control for size e�ect.

E.g., you want to analyze pro�tability. Instead of looking at net
pro�t in $, one should probably consider looking at ROA (net pro�t
over assets) or pro�t margin (net pro�t over sales).

£ius CORPFIN 2503, Week 1 54/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Distribution

A lot of tests and analytical methods assume normal distribution.

However, some variables are not normally distributed.

For example, stock returns follow log-normal distribution, in
general.

=⇒ Take natural logarithm.

£ius CORPFIN 2503, Week 1 55/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Merging several databases

In many cases, one needs to merge several databases.

In most cases, one needs to merge di�erent �les by security name
(�rm name, ticker code, commodity etc.) and date (e.g., �scal
year, calendar year).

One should make sure that de�nitions of variables used to merge
the databases (such as �rm name and �scal year) are consistent
across databases.

Duplicate observations (if any) should be removed prior to merging.

£ius CORPFIN 2503, Week 1 56/57

Introduction Types of analysis Analytics techniques Software Data Considerations

Required reading

Konasani, V. R. and Kadre, S. (2015). �Practical Business
Analytics Using SAS: A Hands-on Guide�: chapters 1, 2, 3, and 4.

£ius CORPFIN 2503, Week 1 57/57

Introduction
Introduction

Types of analysis
Types of analysis

Analytics techniques
Analytics techniques

Software
Software

Data
Data

Considerations
Considerations