Assignment 3: Replicating De Bondt and Thaler (1985) “Does the Stock Market Overreact?”¶
• Name: PUT YOUR NAME HERE
• Student ID: PUT YOUR ID HERE
• Section: PUT YOUR SECTION (1 or 2)
Grading: This is a pass or fail assignment worth 5% towards your final grade.
You must provide a clean code. Make sure the grader can follow each step of your analysis. Put comments to describe what you are doing (Using MARKDOWN or using #). Remove any lines of code that is not necessary. If you struggle on a question, that is OK, as long you show all the steps that you have taken and describe what is not working. If your code is not clean and it is hard for the grader to understand what you are doing, you will fail.
SUBMISSION of ASSIGNMENT
You must print your code output. Save the Jupyter code in PDF. To do so, click on File->Download as->PDF. Or you can do, File->Print Preview.
— Remove all the sample code. Keep your code only. Make sure you code works fully, from start to finish with no error message. In other words, if you close Jupyter, re-open your code, and press the PLAY button, does your code work fully with no error message?
Objectives:
Research in experimental psychology has suggested that, in violation of Baye’s rule, most people “overreact” to unexpected and dramatic news events. The question then arises whether such behavior matters at the market level.
• At every 3 year interval, on January, you will buy the worst performing stocks of the last 3 years and sell the best performing stocks and hold these stocks for 3 years.
• The paper is in Quercus under assignment 3.
———————————————————————————————————–¶
Sample code¶
In [1]:
# load the necessary packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
# this code is to be able to see the plot in the Jupyter notebook
%matplotlib inline
In [2]:
# load crsp (different file from last time. data starts in 1925 to 2018 for all US stocks)
crsp = pd.read_csv(‘CRSP_Monthly_1925_2018_DeBondtThaler_Sample-1.csv.gz’)
crsp[‘date’] = pd.to_datetime(crsp[‘date’])
crsp[‘date’] = crsp[‘date’]+pd.offsets.MonthEnd(0)
crsp.index= crsp[‘date’] # set the date as index
In [3]:
# notice the columns VWRETD and EWRETD, these are two other variables of market return that we will use
crsp.head(5)
Out[3]:
PERMNO
date
TICKER
RET
SHROUT
vwretd
ewretd
sprtrn
date
1986-02-28
10000
1986-02-28
OMFGA
-0.257143
3680.0
0.072500
0.060381
0.071489
1986-03-31
10000
1986-03-31
OMFGA
0.365385
3680.0
0.053885
0.047192
0.052794
1986-04-30
10000
1986-04-30
OMFGA
-0.098592
3793.0
-0.007903
0.016140
-0.014148
1986-05-31
10000
1986-05-31
OMFGA
-0.222656
3793.0
0.050844
0.036194
0.050229
1986-06-30
10000
1986-06-30
OMFGA
-0.005025
3793.0
0.014246
0.008087
0.014110
In [4]:
# load the fama-french data
ff = pd.read_csv(‘F-F_Research_Data_Factors_Monthly-1.csv’)
# we have to format the date
ff[‘date’] = pd.to_datetime(ff.date, format=’%Y%m’)
# again, let’s push the date to the end of the calendar month
ff[‘date’] = ff[‘date’]+pd.offsets.MonthEnd(0)
# The issue with the fama-french data is that the returns are in %!!! So we have to divide by 100
for i in [‘Mkt-RF’, ‘SMB’, ‘HML’, ‘RF’]:
ff[i] = ff[i]/100
# This time we have to rename Mkt-RF variable for EXMKT
ff = ff.rename(columns={‘Mkt-RF’:’EXMKT’})
ff = ff.set_index(‘date’)
In [5]:
# load the sorting variable
sort_var = pd.read_csv(‘CRSP_Sort_Variable_1925_2018_DeBondtThaler_Sample.csv.gz’)
sort_var[‘Formation_Date’] = pd.to_datetime(sort_var[‘Formation_Date’])
In [6]:
# the variables in sort_var are:
# PERMNO (firm_id),
# Total_Return_Decile (rank, 0=worst performing stock, 9=best performing stock in the last 3 years)
# Total_3YR_RET, this is the total 3 year return for which we have sorted on (creating 10 decile groups)
# Formation_Date, the date that we will begin forming porfolio based on Total_return_Decile
sort_var.head(8)
Out[6]:
PERMNO
Total_Return_Decile
Total_3YR_RET
Formation_Date
0
10006
4
-0.077260
1930-01-01
1
10014
9
5.095246
1930-01-01
2
10022
6
0.350485
1930-01-01
3
10030
7
0.601047
1930-01-01
4
10057
9
2.514131
1930-01-01
5
10073
6
0.243249
1930-01-01
6
10102
9
1.991851
1930-01-01
7
10137
9
2.827394
1930-01-01
In [7]:
# create a list of dates on which we will form a portfolio – once every 3 years!
dates = pd.date_range(start=’1933-01-01′, end=’2015-12-31′, freq=’12MS’)
dates = dates[::3]
# OR
# dates = pd.date_range(start=’1933-01-01′, end=’2015-12-31′, freq=’36MS’)
In [8]:
dates # these are the dates when we form portfolios
Out[8]:
DatetimeIndex([‘1933-01-01’, ‘1936-01-01’, ‘1939-01-01’, ‘1942-01-01’,
‘1945-01-01’, ‘1948-01-01’, ‘1951-01-01’, ‘1954-01-01’,
‘1957-01-01’, ‘1960-01-01’, ‘1963-01-01’, ‘1966-01-01’,
‘1969-01-01’, ‘1972-01-01’, ‘1975-01-01’, ‘1978-01-01’,
‘1981-01-01’, ‘1984-01-01’, ‘1987-01-01’, ‘1990-01-01’,
‘1993-01-01’, ‘1996-01-01’, ‘1999-01-01’, ‘2002-01-01’,
‘2005-01-01’, ‘2008-01-01’, ‘2011-01-01’, ‘2014-01-01′],
dtype=’datetime64[ns]’, freq=’36MS’)
In [9]:
# that’s the function that creates an equal-weighted portfolio for the set of chosen stocks
def getPortfolio(crsp, sort_var, rank, yr): # the input are the CRSP and sort_var dataframe, the ranking, and year
# select the stocks we want, based on year, and decile ranking
stock = sort_var[(sort_var[‘Formation_Date’].dt.year == yr) &
(sort_var[‘Total_Return_Decile’] == rank)][‘PERMNO’].values
# from crsp keep the three years of data that we need to create our 3 year portfolio
port = crsp[str(yr)+’-01-01′:str(yr+2)+’-12-31′].copy()
# from the 3 year range of returns, select the stocks that we have chosen (based on their rank)
port = port[port[‘PERMNO’].isin(stock)]
# pivot the table to get returns as values and each columns is a company (PERMNO)
port = port.pivot(columns=’PERMNO’, values=’RET’, index=’date’)
# fill the missing values as 0 — no return
port = port.fillna(0)
# calculate the gross returns
port = port+1
# invest 1 dollar in each stocks
port.loc[pd.to_datetime(str(yr)+’-01-01′)] = 1
# sort the date index
port = port.sort_index()
# get the cumulative return of each stock
port = port.cumprod()
# total portfolio value
port[‘Port_Value’] = port.sum(axis=1)
# get the return of the portfolio
if rank == 9:
name = ‘Winner’
else:
name = ‘Loser’
port[name+’_Return’] = port[‘Port_Value’].pct_change() # get the portfolio return
# choose the column we want and kill the first row (no return value because it is the first observation)
port = port[[name+’_Return’]][1:]
# here I create a column that simply denotes each row from 1 to 36 (in months)
port[‘time’] = range(1, 37)
# create a column with the year that we form the portfolio
port[‘formation_year’] = yr
return port
In [10]:
# let’s go get the return of our portfolios
winner_portfolio = pd.DataFrame() # store the winning portfolios in this dataframe
loser_portfolio = pd.DataFrame() # store the losing portfolios in this dataframe
In [11]:
# Loop on the dates
for dt in dates:
year = dt.year
# get the winner stocks
winner = getPortfolio(crsp, sort_var, 9, year) # we choose the rank 9 for best performing stocks
winner_portfolio = pd.concat([winner_portfolio, winner]) # store the output
# get the loser stocks
loser = getPortfolio(crsp, sort_var, 0, year) # we choose the rank 0 for worst performing stocks
loser_portfolio = pd.concat([loser_portfolio, loser]) # store the output
In [12]:
# merge the winner and the loser returns
portfolio = winner_portfolio.join(loser_portfolio[‘Loser_Return’])
# calculate the long-short return
portfolio[‘LongShort_Return’] = portfolio[‘Loser_Return’]-portfolio[‘Winner_Return’]
In [13]:
# join the fama french dataframe
portfolio = portfolio.join(ff)
In [14]:
# plot the long-short return
portfolio[‘LongShort_Return’].plot()
Out[14]:

In [15]:
# 12 month moving average of the long-short return to better look at the trend
ax = pd.Series.rolling(portfolio[‘LongShort_Return’], 12).mean().plot()
ax.axhline(y=0, color=’k’)
Out[15]:

In [16]:
# join the value- and equal-weighted market return from the CRSP files. I prefer those over Fama French
mkt = crsp[[‘vwretd’, ‘ewretd’]].drop_duplicates().sort_index()
portfolio = portfolio.join(mkt)
In [17]:
# calculate the Abnormal Return for the winner and lower portfolios
portfolio[‘Winner_Abnormal_Return’] = portfolio[‘Winner_Return’]-portfolio[‘ewretd’]
portfolio[‘Loser_Abnormal_Return’] = portfolio[‘Loser_Return’]-portfolio[‘ewretd’]
In [18]:
# Get cumulative abnormal return (CAR) for both portfolios
portfolio[‘CAR_Winner’] = portfolio.groupby([‘formation_year’])[‘Winner_Abnormal_Return’].cumsum()
portfolio[‘CAR_Loser’] = portfolio.groupby([‘formation_year’])[‘Loser_Abnormal_Return’].cumsum()
In [19]:
# get the mean CAR on each month (remember that “time” is a variable from 1 to 36 to represent a month since formation)
mean_car = portfolio.groupby(‘time’)[[‘CAR_Winner’, ‘CAR_Loser’]].mean()
mean_car.loc[0] = 0 # add a row of 0 so that when I plot, the line starts at 0
mean_car = mean_car.sort_index() # make sure to sort the index
mean_car.head(3)
Out[19]:
CAR_Winner
CAR_Loser
time
0
0.000000
0.000000
1
-0.038107
0.093064
2
-0.035943
0.089206
In [20]:
# plot the average CAR
plt.figure()
mean_car.plot()
plt.title(‘Average CAR’)
plt.xlabel(‘Month since portfolio formation’)
plt.xlim([0,36])
# save figure to pdf
# plt.savefig(‘Average_CAR.pdf’)
Out[20]:
(0, 36)

Questions¶
In [39]:
# load the necessary packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf
# this code is to be able to see the plot in the Jupyter notebook
%matplotlib inline
In [40]:
# load crsp (different file from last time. data starts in 1925 to 2018 for all US stocks)
crsp = pd.read_csv(‘CRSP_Monthly_1925_2018_DeBondtThaler_Sample-1.csv.gz’)
crsp[‘date’] = pd.to_datetime(crsp[‘date’])
crsp[‘date’] = crsp[‘date’]+pd.offsets.MonthEnd(0)
crsp.index= crsp[‘date’] # set the date as index
In [41]:
# load the fama-french data
ff = pd.read_csv(‘F-F_Research_Data_Factors_Monthly-1.csv’)
# we have to format the date
ff[‘date’] = pd.to_datetime(ff.date, format=’%Y%m’)
# again, let’s push the date to the end of the calendar month
ff[‘date’] = ff[‘date’]+pd.offsets.MonthEnd(0)
# The issue with the fama-french data is that the returns are in %!!! So we have to divide by 100
for i in [‘Mkt-RF’, ‘SMB’, ‘HML’, ‘RF’]:
ff[i] = ff[i]/100
# This time we have to rename Mkt-RF variable for EXMKT
ff = ff.rename(columns={‘Mkt-RF’:’EXMKT’})
ff = ff.set_index(‘date’)
In [42]:
# load the sorting variable
sort_var = pd.read_csv(‘CRSP_Sort_Variable_1925_2018_DeBondtThaler_Sample.csv.gz’)
sort_var[‘Formation_Date’] = pd.to_datetime(sort_var[‘Formation_Date’])
(1) Provide the description statistics on the long-short portfolio return. Simply by looking at the average returns, would you be tempted to invest in such strategy?¶
In [43]:
# create a list of dates on which we will form a portfolio – once every 3 years!
dates = pd.date_range(start=’1933-01-01′, end=’2015-12-31′, freq=’12MS’)
dates = dates[::3]
# OR
# dates = pd.date_range(start=’1933-01-01′, end=’2015-12-31′, freq=’36MS’)
In [44]:
# that’s the function that creates an equal-weighted portfolio for the set of chosen stocks
def getPortfolio(crsp, sort_var, rank, yr): # the input are the CRSP and sort_var dataframe, the ranking, and year
# select the stocks we want, based on year, and decile ranking
stock = sort_var[(sort_var[‘Formation_Date’].dt.year == yr) &
(sort_var[‘Total_Return_Decile’] == rank)][‘PERMNO’].values
# from crsp keep the three years of data that we need to create our 3 year portfolio
port = crsp[str(yr)+’-01-01′:str(yr+2)+’-12-31′].copy()
# from the 3 year range of returns, select the stocks that we have chosen (based on their rank)
port = port[port[‘PERMNO’].isin(stock)]
# pivot the table to get returns as values and each columns is a company (PERMNO)
port = port.pivot(columns=’PERMNO’, values=’RET’, index=’date’)
# fill the missing values as 0 — no return
port = port.fillna(0)
# calculate the gross returns
port = port+1
# invest 1 dollar in each stocks
port.loc[pd.to_datetime(str(yr)+’-01-01′)] = 1
# sort the date index
port = port.sort_index()
# get the cumulative return of each stock
port = port.cumprod()
# total portfolio value
port[‘Port_Value’] = port.sum(axis=1)
# get the return of the portfolio
if rank == 9:
name = ‘Winner’
else:
name = ‘Loser’
port[name+’_Return’] = port[‘Port_Value’].pct_change() # get the portfolio return
# choose the column we want and kill the first row (no return value because it is the first observation)
port = port[[name+’_Return’]][1:]
# here I create a column that simply denotes each row from 1 to 36 (in months)
port[‘time’] = range(1, 37)
# create a column with the year that we form the portfolio
port[‘formation_year’] = yr
return port
In [45]:
# let’s go get the return of our portfolios
winner_portfolio = pd.DataFrame() # store the winning portfolios in this dataframe
loser_portfolio = pd.DataFrame() # store the losing portfolios in this dataframe
In [46]:
# Loop on the dates
for dt in dates:
year = dt.year
# get the winner stocks
winner = getPortfolio(crsp, sort_var, 9, year) # we choose the rank 9 for best performing stocks
winner_portfolio = pd.concat([winner_portfolio, winner]) # store the output
# get the loser stocks
loser = getPortfolio(crsp, sort_var, 0, year) # we choose the rank 0 for worst performing stocks
loser_portfolio = pd.concat([loser_portfolio, loser]) # store the output
In [47]:
# merge the winner and the loser returns
portfolio = winner_portfolio.join(loser_portfolio[‘Loser_Return’])
# calculate the long-short return
portfolio[‘LongShort_Return’] = portfolio[‘Loser_Return’]-portfolio[‘Winner_Return’]
In [48]:
# join the fama french dataframe
portfolio = portfolio.join(ff)
In [49]:
# plot the long-short return
portfolio[‘LongShort_Return’].plot()
Out[49]:

In [50]:
print(‘I would invest in such strategy because the average retunr is positive.’)
I would invest in such strategy because the average retunr is positive.
(2) Take the long-short (LS) return and run a CAPM regression and report your results. That is:¶
$R^{LS}_t = \alpha + \beta [R_{M,t}-RF_t] + \varepsilon_t$
$R_{M,t}$ is variable ewretd (equal-weighted market return) and RF is the excess return
What does your alpha (intercept) of the regression say about the viability of the trading strategy? Is the alpha statistically different at the 95% confidence level?
NOTE: For the return of the long-short portfolio, you don’t do excess long-short return. In other words you don’t do $R^{LS}_t-RF_t$. Can you guess why?
In [51]:
# join the value- and equal-weighted market return from the CRSP files. I prefer those over Fama French
mkt = crsp[[‘vwretd’, ‘ewretd’]].drop_duplicates().sort_index()
portfolio = portfolio.join(mkt)
In [52]:
portfolio[‘market’] = portfolio.ewretd-portfolio.RF
In [59]:
reg = smf.ols(‘LongShort_Return~market’, data=portfolio).fit()
reg.summary()
Out[59]:
OLS Regression Results
Dep. Variable:
LongShort_Return
R-squared:
0.167
Model:
OLS
Adj. R-squared:
0.166
Method:
Least Squares
F-statistic:
201.3
Date:
Mon, 04 Mar 2019
Prob (F-statistic):
8.57e-42
Time:
06:41:38
Log-Likelihood:
1373.2
No. Observations:
1008
AIC:
-2742.
Df Residuals:
1006
BIC:
-2733.
Df Model:
1
Covariance Type:
nonrobust
coef
std err
t
P>|t|
[0.025
0.975]
Intercept
0.0029
0.002
1.445
0.149
-0.001
0.007
market
0.4220
0.030
14.190
0.000
0.364
0.480
Omnibus:
845.991
Durbin-Watson:
1.862
Prob(Omnibus):
0.000
Jarque-Bera (JB):
43285.621
Skew:
3.482
Prob(JB):
0.00
Kurtosis:
34.339
Cond. No.
15.2
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
In [ ]:
print(‘the alpha is positive, so the strategy is viable’)
In [ ]:
print(‘
In [ ]:
(3) Redo (2), but also include HML and SMB as control variables (with ewretd variable as market return). How does your alpha change? What about the alphas statistical significance level?¶
This is the regression
$R^{LS}_t = \alpha + \beta [R_{M,t}-RF_t] + \gamma HML_t + \delta SMB_t +\varepsilon_t$
Note: The HML and SMB factors are also long-short portfolio. Therefore, we do not substract the risk-free rate.
In [60]:
# Test the 3 factor model on AAPL (the dataframe aapl)
reg = smf.ols(‘LongShort_Return~market+SMB+HML’, data=portfolio).fit()
In [61]:
# to look at the regression output, you do
reg.summary()
Out[61]:
OLS Regression Results
Dep. Variable:
LongShort_Return
R-squared:
0.361
Model:
OLS
Adj. R-squared:
0.359
Method:
Least Squares
F-statistic:
189.3
Date:
Mon, 04 Mar 2019
Prob (F-statistic):
2.86e-97
Time:
06:42:26
Log-Likelihood:
1507.2
No. Observations:
1008
AIC:
-3006.
Df Residuals:
1004
BIC:
-2987.
Df Model:
3
Covariance Type:
nonrobust
coef
std err
t
P>|t|
[0.025
0.975]
Intercept
0.0015
0.002
0.860
0.390
-0.002
0.005
market
0.0258
0.037
0.701
0.483
-0.046
0.098
SMB
0.9229
0.074
12.462
0.000
0.778
1.068
HML
0.7566
0.057
13.194
0.000
0.644
0.869
Omnibus:
640.039
Durbin-Watson:
1.858
Prob(Omnibus):
0.000
Jarque-Bera (JB):
18148.998
Skew:
2.427
Prob(JB):
0.00
Kurtosis:
23.213
Cond. No.
46.3
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
(4) Redo (3), but this time, get the alpha at each decade, from 1930 to today.¶
Plot your alpha over time.
hints
• Do a loop over decades and store the alpha from the regression in a list [alpha1, alpha2, …].
• Remember that you can select a subset of a dataframe (based on dates) by doing for example portfolio[‘1930-01-01′:’1939-12-31’] to get the data from January 1930 to December 1939
• To get the alpha from the regression you do reg.params[‘Intercept’] if you store your regression results in a variable called “reg”.
In [ ]: