Assignment 5¶
For this assignment, you will do an exploratory data analysis for stock prices. The focus is on improving visualization skills.
NOTE: This assignment is very challenging because it will introduce a lot of new concepts and have you looking things up on your own (I’ll point you in the right direction) to try to solve the tasks issued.
We’ll compare bank stocks and see how they changed over time.
Get the Data¶
In this section, you will learn how to use pandas to directly read data from Yahoo finance using pandas!
First, you need to start with the proper imports, which are already laid out for you here.
*Note: You’ll need to install pandas-datareader for this to work! Pandas datareader allows you to read stock information directly from the internet Use these links for install guidance (conda install pandas-datareader OR pip install pandas-datareader)
The Imports¶
Already filled out for you.
In [34]:
from pandas_datareader import data, wb
import pandas as pd
import numpy as np
import datetime
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
Data¶
You need to get data using pandas datareader. You will get stock information for the following banks:
• Bank of America
• CitiGroup
• Goldman Sachs
• JPMorgan Chase
• Morgan Stanley
• Wells Fargo
Figure out how to get the stock data from Jan 1st 2006 to current date for these banks. Set each bank to be a separate dataframe, with the variable name for that bank being its ticker symbol. This will involve a few steps:
1. Use datetime to set start and end datetime objects.
2. Figure out the ticker symbol for each bank.
3. Figure out how to use datareader to grab info on the stock.
Use this documentation page for hints and instructions (it should just be a matter of replacing certain values. Use yahoo finance as a source, for example:
# Bank of America
BAC = data.DataReader(“BAC”, ‘yahoo’, start, end)
In [36]:
start =
end =
In [3]:
# Bank of America
BAC =
# CitiGroup
C =
# Goldman Sachs
GS =
# JPMorgan Chase
JPM =
# Morgan Stanley
MS =
# Wells Fargo
WFC =
In [4]:
WFC
Out[4]:
High
Low
Open
Close
Volume
Adj Close
Date
2006-01-03
31.975000
31.195000
31.600000
31.900000
11016400.0
21.045347
2006-01-04
31.820000
31.365000
31.799999
31.530001
10870000.0
20.801256
2006-01-05
31.555000
31.309999
31.500000
31.495001
10158000.0
20.778162
2006-01-06
31.775000
31.385000
31.580000
31.680000
8403800.0
20.900213
2006-01-09
31.825001
31.555000
31.674999
31.674999
5619600.0
20.896914
…
…
…
…
…
…
…
2019-11-11
54.139999
53.730000
53.790001
54.049999
10659400.0
54.049999
2019-11-12
54.240002
53.619999
53.750000
54.220001
15187600.0
54.220001
2019-11-13
53.900002
53.209999
53.810001
53.290001
16846800.0
53.290001
2019-11-14
53.560001
53.099998
53.220001
53.490002
16491400.0
53.490002
2019-11-15
53.939999
53.430000
53.779999
53.799999
15225300.0
53.799999
3493 rows × 6 columns
Create a list of the ticker symbols (as strings) in alphabetical order. Call this list: tickers. (this is done for you)
In [6]:
tickers = [‘BAC’, ‘C’, ‘GS’, ‘JPM’, ‘MS’, ‘WFC’]
Use pd.concat to concatenate the bank dataframes together to a single data frame called bank_stocks. Set the keys argument equal to the tickers list. Also pay attention to what axis you concatenate on. Google “pandas concat”
In [7]:
Set the column name levels (this is filled out for you):
In [8]:
bank_stocks.columns.names = [‘Bank Ticker’,’Stock Info’]
Check the head of the bank_stocks dataframe.
In [8]:
Out[8]:
Bank Ticker
BAC
C
…
MS
WFC
Stock Info
Open
High
Low
Close
Volume
Open
High
Low
Close
Volume
…
Open
High
Low
Close
Volume
Open
High
Low
Close
Volume
Date
2006-01-03
46.92
47.18
46.15
47.08
16296700
490.0
493.8
481.1
492.9
1537660
…
57.17
58.49
56.74
58.31
5377000
31.60
31.98
31.20
31.90
11016400
2006-01-04
47.00
47.24
46.45
46.58
17757900
488.6
491.0
483.5
483.8
1871020
…
58.70
59.28
58.35
58.35
7977800
31.80
31.82
31.36
31.53
10871000
2006-01-05
46.58
46.83
46.32
46.64
14970900
484.4
487.8
484.0
486.2
1143160
…
58.55
58.59
58.02
58.51
5778000
31.50
31.56
31.31
31.50
10158000
2006-01-06
46.80
46.91
46.35
46.57
12599800
488.8
489.0
482.0
486.2
1370250
…
58.77
58.85
58.05
58.57
6889800
31.58
31.78
31.38
31.68
8403800
2006-01-09
46.72
46.97
46.36
46.60
15620000
486.0
487.4
483.0
483.9
1680740
…
58.63
59.29
58.62
59.19
4144500
31.68
31.82
31.56
31.68
5619600
5 rows × 30 columns
EDA¶
Let’s explore the data a bit! Before continuing, I encourage you to check out the documentation on Multi-Level Indexing and Using .xs. Reference the solutions if you can not figure out how to use .xs(), since that will be a major part of this project.
What is the max Close price for each bank’s stock throughout the time period?
In [9]:
Out[9]:
Bank Ticker
BAC 54.900002
C 564.099976
GS 273.380005
JPM 130.380005
MS 89.300003
WFC 65.930000
dtype: float64
Create a new empty DataFrame called returns. This dataframe will contain the returns for each bank’s stock. returns are typically defined by:
$$r_t = \frac{p_t – p_{t-1}}{p_{t-1}} = \frac{p_t}{p_{t-1}} – 1$$
In [10]:
We can use pandas pct_change() method on the Close column to create a column representing this return value. Create a for loop that goes and for each Bank Stock Ticker creates this returns column and set’s it as a column in the returns DataFrame.
In [11]:
Out[11]:
BAC Return
C Return
GS Return
JPM Return
MS Return
WFC Return
Date
2006-01-03
NaN
NaN
NaN
NaN
NaN
NaN
2006-01-04
-0.010620
-0.018462
-0.013812
-0.014183
0.000686
-0.011599
2006-01-05
0.001288
0.004961
-0.000393
0.003029
0.002742
-0.001110
2006-01-06
-0.001501
0.000000
0.014169
0.007046
0.001025
0.005874
2006-01-09
0.000644
-0.004731
0.012030
0.016242
0.010586
-0.000158
Create a pairplot using seaborn of the returns dataframe. Hint: [1:]
In [12]:
Out[12]:

Using this returns DataFrame, figure out on what dates each bank stock had the best and worst single day returns. You should notice that 4 of the banks share the same day for the worst drop, did anything significant happen that day? Hint: Google “idxmin and idxman
In [14]:
# Worst Drop (4 of them on Inauguration day)
Out[14]:
BAC Return 2009-01-20
C Return 2009-02-27
GS Return 2009-01-20
JPM Return 2009-01-20
MS Return 2008-10-09
WFC Return 2009-01-20
dtype: datetime64[ns]
In [15]:
# Best Single Day Gain
Out[15]:
BAC Return 2009-04-09
C Return 2008-11-24
GS Return 2008-11-24
JPM Return 2009-01-21
MS Return 2008-10-13
WFC Return 2008-07-16
dtype: datetime64[ns]
Take a look at the standard deviation of the returns, which stock would you classify as the riskiest over the entire time period? Which would you classify as the riskiest for the year 2018?
In [18]:
# Citigroup is riskiest for entire period.
Out[18]:
BAC Return 0.032286
C Return 0.033885
GS Return 0.023020
JPM Return 0.024493
MS Return 0.033308
WFC Return 0.026649
dtype: float64
In [20]:
# Very similar risk profiles, but Morgan Stanley is riskiest for 2018.
C:\Users\TestUser\Miniconda3\envs\data501\lib\site-packages\ipykernel_launcher.py:1: FutureWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing
See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
“””Entry point for launching an IPython kernel.
Out[20]:
BAC Return 0.015616
C Return 0.015093
GS Return 0.016175
JPM Return 0.013879
MS Return 0.017187
WFC Return 0.014804
dtype: float64
Create a distplot using seaborn of the 2018 returns for Morgan Stanley
In [21]:
C:\Users\TestUser\Miniconda3\envs\data501\lib\site-packages\ipykernel_launcher.py:1: FutureWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing
See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
“””Entry point for launching an IPython kernel.
Out[21]:

Create a distplot using seaborn of the 2008 returns for CitiGroup
In [19]:
/Users/marci/anaconda/lib/python3.5/site-packages/statsmodels/nonparametric/kdetools.py:20: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
y = X[:m/2+1] + np.r_[0,X[m/2+1:],0]*1j
Out[19]:

More Visualization¶
A lot of this project will focus on visualizations. Feel free to use any of your preferred visualization libraries to try to recreate the described plots below, seaborn, matplotlib, plotly and cufflinks, or just pandas.
Imports¶
In [22]:
# Plotly Method Imports
import plotly
import cufflinks as cf
cf.go_offline()
Create a line plot showing Close price for each bank for the entire index of time. (Hint: Try using a for loop, or use .xs to get a cross section of the data.)
In [23]:
Out[23]:

Moving Averages¶
Let’s analyze the moving averages for these stocks in the year 2008.
Plot the rolling 30 day average against the Close Price for Bank Of America’s stock for the year 2008 Hint: Use ix,rolling and mean
In [24]:
Out[24]:

Create a heatmap of the correlation between the stocks Close Price.
In [25]:
Out[25]:

You will use cufflinks library to create some Technical Analysis plots.
Use .iplot(kind=’candle) to create a candle plot of Bank of America’s stock from Jan 1st 2018 to Jan 1st 2019.
In [29]:
C:\Users\TestUser\Miniconda3\envs\data501\lib\site-packages\ipykernel_launcher.py:1: FutureWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing
See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
Use .ta_plot(study=’sma’) to create a Simple Moving Averages plot of Morgan Stanley for the year 2018.
In [31]:
C:\Users\TestUser\Miniconda3\envs\data501\lib\site-packages\ipykernel_launcher.py:1: FutureWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing
See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated
Use .ta_plot(study=’boll’) to create a Bollinger Band Plot for Bank of America for the year 2018.
In [32]:
C:\Users\TestUser\Miniconda3\envs\data501\lib\site-packages\ipykernel_launcher.py:1: FutureWarning:
.ix is deprecated. Please use
.loc for label based indexing or
.iloc for positional indexing
See the documentation here:
http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated