CS代考 FF25 excess returns

Empirical Methods in Finance Homework 7: Solution
Prof. . Lochstoer TA:
February 27, 2022
Principal Component Analysis

Copyright By PowCoder代写 加微信 powcoder

Download the 48 industry portfolio data (monthly) from ’s web site. Use the data from 1960 through 2021. Use the value-weighted returns. You may drop the industries that have missing values and are reported as −99.99. Also, download the 3 Fama-French factors from his web site. Use the monthly risk-free rate series provided by French in the same FF factor dataset to compute excess returns on these 48 portfolios.
1. Get the eigenvalues for the sample variance-covariance matrix of the excess returns to the 48 industries. Plot the fraction of variance explained by each eigenvalue in a bar plot.
Suggested solution:
We first subtract risk free rate from industry returns to get the excess returns. Then we get the variance-covariance matrix of the industry excess returns and perform eigen- value decomposition of this variance-covariance matrix to get the eigenvalues (λs) and eigenvectors. The fraction of total variance explained by the i-th eigenvalue is λi/􏰂j λj. Figure 1 plots the fraction of variance explained by each eigenvalue.
2. Choose the 3 first (largest) principal components.
(a) How much of the total variance do these 3 factors explain?
Suggested solution:
The first 3 PCs explain 69.57% of the total variance of the industry excess returns.

Figure 1: Fraction of variance explained by each eigenvalue Table 1: Summary statistics for the first three factors
Mean PC2 0.22
PC3 -0.42.2
σ12 1034.84
Correlation
ρ12 ρ13 ρ23 0 0 0
Give the mean sample return to these 3 factor portfolios, their standard deviation, and correlation.
Suggested solution:
The i-th principal component is given by: yt = e′irt, where ei is the i-th eigenvector and rt is the time-series of the industry excess returns. We get the loadings for each factor from the corresponding eigenvector and then calculate the mean return, variance and correlation of the first 3 PCs given in Table 1. Note that the variance of the i-th PC is equal to the i-th eigenvalue λi and the correlations are 0.
Consider a multi-factor model of returns using these three factors as pricing fac- tors. Plot the predicted return from this model for all the industries versus the realized average industry returns over the sample. That is, estimate the betas of

each industry with respect to these factors, get the expected excess returns as
Eˆ [Re ] = βˆ′Eˆ [F ] , (1)
where Ft are the factor returns and Eˆ [Ft] is estimated as the sample average of each factor. This expected return will be on your x-axis. The sample average of each industry’s excess return will be on the y-axis. Add a 45 degree line to this plot.
[You can get factor loadings (betas) from the eigenvectors. Or, if you like, you can run the time-series regression of each industry’s return on the 3 factors. The result is the same.]
Suggested solution:
The first three eigenvectors are the betas for the three PC factors. We multiply the betas by their corresponding factors to get the predicted return:
rt =β1PC1t +β2PC2t +β3PC3t
Figure 2 plots the predicted against the realized industry excess returns. Since most of the dots are not lining up on the 45 degree line, this model does a poor job explaining industry excess returns.
Figure 2: Predicted vs. realized average industry excess returns

(d) Give the implied cross-sectional R2 of the plot in c). That is, calculate: var 􏰅R ̄act − Rˆpred􏰆
R2 =1− 􏰉􏰊 cross-section Var R ̄act
where R ̄act is the N × 1 vector of average industry excess returns. Rˆpred is the N × 1 vector of predicted industry excess returns from the 3-factor model. Suggested solution:
The implied cross-sectional R2 is −0.65. This tells us that the three largest PCs explaining the variation in industry returns are most likely not priced (lambda = 0) and pricing errors from this model are quite large. Also recall from Table 1 the mean returns of the second and third factors are not significantly different from zero. This is also evident from the poor cross-sectional fit in Figure 2.
3. Now, download the 25 FF portfolios sorted on size and book-to-market, same sample period.
(a) Get the eigenvalues for the sample variance-covariance matrix of the excess re- turns to these 25 F-F portfolios. Plot the fraction of variance explained by each eigenvalue in a bar plot.
Suggested solution:
We use the value-weighted 25 FF portfolios and repeat the same exercise as before. Figure 3 shows the fraction of variance that is explained by each PC.
Figure 3: Fraction of variance explained by each eigenvalue

Table 2: Cumulative proportion of variance explained by the fist 3 PCs for 25 FF portfolio
PC1 PC2 PC3 83.47% 89.80% 93.13%
(b) Given (a), how many factors does do you reckon you need to explain average returns to the 25 F-F portfolios?
Suggested solution:
From Figure 3, we can see that first three PCs should be able to explain most of the variation in data. Table 2 reports the cumulative proportion of variance explained by the first 3 factors. The first 3 PCs explain more than 93% of the excess returns on these portfolios. Also, since these portfolios are sorted based on size and book-to-market, from Fama and French (1992, 1993), we know that three factors (MktRf, SMB, and HML) can explain the cross-section of size/book- to-market portfolios. So, I would choose 3 factors to explain average returns to the 25 FF portfolios.
Figure 4 shows the predicted against realized excess returns for the 25 FF port- folios using first three PCs as pricing factors. The cross-sectional R2 is 0.54. We get a much better cross-sectional fit than the 3-factor model for the industry portfolios above.

Figure 4: Predicted vs. realized average FF25 excess returns
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
##################
# Industry Portfolio #
##################
library(data.table)
library(ggplot2)
library(zoo)
library(lubridate)
library(sandwich)
rm(list=ls())
options(max.print=999999)
FF <- read.csv("F-F_Research_Data_Factors.CSV") industry <- read.csv("48_Industry_Portfolios.csv") setnames(FF,c("X","Mkt.RF"),c("date","MktRF")) FF$year=substr(FF$date,1,4) FF=subset(FF,as.numeric(FF$year)>=1960)
FF=subset(FF,as.numeric(FF$year)<=2019) FF$year=NULL # get industry returns industry <- fread("48_Industry_Portfolios.csv") setnames(industry,"V1","date") industry$year=substr(as.character(industry$date),1,4) industry=subset(industry,as.numeric(industry$year)>=1960)
industry=subset(industry,as.numeric(industry$year)<=2019) industry$year=NULL # dropping the industries that have missing values and are reported as -99.99 industry <- as.data.frame(industry) while (i<=ncol(industry)){ if(industry[1,i]==-99.99){industry[,i]=NULL }else{ i=i+1} FF<-as.data.table(FF) industry = as.data.table(industry) setkey(FF,date) setkey(industry,date) reg <- merge(FF,industry) ind_exret <- reg[,lapply(.SD, function(x) x-RF), .SDcols=-c("MktRF","SMB","HML","RF"),by=date] setkey(ind_exret,date) ind_exret_nodate <- copy(ind_exret) ind_exret_nodate[,date:=NULL] # eigenvalue decomposition Sigma <- cov(ind_exret_nodate) eigs <- eigen(Sigma) # fraction of variance explained by the first 3 PCs # using eigenvalues sum(eigs$values[1:3])/sum(eigs$values) # another way: PCA PCA<-prcomp(ind_exret_nodate) PCA_ind_summ <- summary(PCA) imp <- summary(PCA)$importance[2,] sum(imp[1:3])/sum(imp) names(imp)[1:9] <- c("PC01","PC02","PC03","PC04","PC05","PC06","PC07","PC08","PC09") prop_var <- data.table(PC=names(imp),imp=imp) # Fraction of variance explained by each PC ggplot(prop_var, aes(x=PC, y=imp)) + geom_bar(stat="identity", fill="steelblue", position=position_dodge()) + theme_bw() + xlab("") + ylab("Fraction of variance explained") + theme(axis.text.x = element_text(angle = 90, vjust = .5)) + theme(axis.text.x = element_text(size=14,face="plain"), axis.text.y = element_text(size=16,face="plain"), axis.title.x = element_text(size=16,face="plain"), axis.title.y = element_text(size=16,face="plain"), legend.text = element_text(size=9,face="plain"),legend.position = c(0.9,0.2)) + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + theme(legend.margin=margin(t=-0.5, r=0, b=0, l=0, unit="cm")) names_ind <- names(ind_exret_nodate) ind_exret_nodate <- as.matrix(ind_exret_nodate) # calculate the first 3 PCs PC1 <- ind_exret_nodate %*% PCA$rotation[,1] PC2 <- ind_exret_nodate %*% PCA$rotation[,2] PC3 <- ind_exret_nodate %*% PCA$rotation[,3] PC1_3 <- cbind(PC1,PC2,PC3) # summary stats for the first 3 PCs # covaraince matrix of PC1-3 cov(PC1_3) # sample mean of PC1-3 colMeans(PC1_3) # std errors std_err <- function(x) sd(x)/sqrt(length(x)) std_errs <- apply(PC1_3, 2, std_err) (t_stat <- colMeans(PC1_3)/std_errs) # Plot industry loadings on the first 3 PCs PCs <- data.table(indx=1:43,PC1=PCA$rotation[,1],PC2=PCA$rotation[,2], PC3=PCA$rotation[,3]) ggplot(PCs) + geom_line(aes(x=indx,y=PC1,color = "PC1"),size=1.2) + geom_line(aes(x=indx,y=PC2,color = "PC2"),size=1.2) + geom_line(aes(x=indx,y=PC3,color = "PC3"),size=1.2) + theme_bw() + xlab("Industry") + ylab("") + #ggtitle("Industry Loadings on PCs") + scale_colour_manual("",breaks = c("PC1", "PC2","PC3"), values = c("PC1"= rgb(0,0.447,0.741), "PC2"=rgb(0.85,0.325,0.098), "PC3"=rgb(0.494,0.184,0.5560)),labels = c("PC1","PC2","PC3")) + theme(title = element_text(size=16,face="bold"), axis.text.x = element_text(size=16,face="plain"), axis.text.y = element_text(size=16,face="plain"), axis.title.x = element_text(size=16,face="plain"), axis.title.y = element_text(size=16,face="plain"), legend.text = element_text(size=12,face="plain"),legend.position = c(0.9,0.2))+ theme(legend.key.size = unit(0.4, "in"),legend.key.height = unit(0.2, "in"), legend.key.width = unit(0.4, "in")) + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.background = element_rect(fill=alpha(0.4))) # Compute the predicted return from 3 PCs PC1_3 <- t(PC1_3) predict <- PCA$rotation[,1:3]%*%PC1_3 avgPredict <- apply(predict,1,mean) avgActual <- apply(ind_exret_nodate,2,mean) dt_ind <- data.table(predicted_ret=avgPredict, realized_ret=avgActual, names=names_ind) # plot predicted vs. realized returns ggplot(dt_ind,aes(x=predicted_ret,y=realized_ret))+geom_point(size = 2) + xlab("Predicted") + ylab("Realized") + geom_abline(color="blue", size = .6) + theme_bw() + scale_x_continuous(limits = c(0,1), breaks = seq(0,1,by = .2)) + scale_y_continuous(limits = c(0,1), breaks = seq(0,1,by = .2)) + geom_text(aes(label=names),hjust=.5, vjust=-0.4, size = 4.5) + theme(axis.text.x = element_text(size=14,face="plain"), axis.text.y = element_text(size=16,face="plain"), axis.title.x = element_text(size=16,face="plain"), axis.title.y = element_text(size=16,face="plain"), legend.text = element_text(size=9,face="plain"),legend.position = c(0.9,0.2)) + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + theme(legend.margin=margin(t=-0.5, r=0, b=0, l=0, unit="cm")) (Rsquared <- 1-var(avgActual-avgPredict)/var(avgActual)) ################## # Fama-French 25 # ################## FF25 <- fread("25_Portfolios_5x5 3.CSV") setnames(FF25,c("date","S1B1","S1B2","S1B3","S1B4","S1B5", "S2B1","S2B2","S2B3","S2B4","S2B5", "S3B1","S3B2","S3B3","S3B4","S3B5", "S4B1","S4B2","S4B3","S4B4","S4B5", "S5B1","S5B2","S5B3","S5B4","S5B5")) FF25[,":="(year=date%/%100, month=date%%100)] FF25[,date:=as.yearmon(paste(year,month,sep="-"))] FF25[,":="(year=NULL,month=NULL)] FF25 <- FF25[year(date) %in% 1960:2019] FF[,":="(year=date%/%100, month=date%%100)] FF[,date:=as.yearmon(paste(year,month,sep="-"))] FF[,":="(year=NULL,month=NULL)] setkey(FF,date) setkey(FF25,date) reg <- merge(FF,FF25) FF25_exret <- reg[,lapply(.SD, function(x) x-RF), .SDcols=-c("MktRF","SMB","HML","RF"),by=date] setkey(FF25_exret,date) FF25_exret_nodate <- copy(FF25_exret) FF25_exret_nodate[,date:=NULL] # eigenvalue decomposition Sigma_FF25 <- cov(FF25_exret_nodate) eigs_FF25 <- eigen(Sigma_FF25) # fraction of variance explained by the first 3 PCs # using eigenvalues sum(eigs_FF25$values[1:3])/sum(eigs_FF25$values) PCA_FF25 <- prcomp(FF25_exret_nodate) imp_FF25 <- summary(PCA_FF25)$importance[2,] sum(imp_FF25[1:3])/sum(imp_FF25) # Cumulative variance explained by each PC imp_FF25 <- summary(PCA_FF25)$importance[2,] imp_FF25[1]/sum(imp_FF25) sum(imp_FF25[1:2])/sum(imp_FF25) sum(imp_FF25[1:3])/sum(imp_FF25) sum(imp_FF25[1:4])/sum(imp_FF25) names(imp_FF25)[1:9] <- c("PC01","PC02","PC03","PC04","PC05","PC06","PC07","PC08","PC09" prop_var_FF25 <- data.table(PC=names(imp_FF25),imp=imp_FF25) # Fraction of variance explained by each PC ggplot(prop_var_FF25, aes(x=PC, y=imp)) + geom_bar(stat="identity", fill="steelblue", position=position_dodge()) + theme_bw() + xlab("") + ylab("Fraction of variance explained") + theme(axis.text.x = element_text(angle = 90, vjust = .5)) + theme(axis.text.x = element_text(size=14,face="plain"), axis.text.y = element_text(size=16,face="plain"), axis.title.x = element_text(size=16,face="plain"), axis.title.y = element_text(size=16,face="plain"), legend.text = element_text(size=9,face="plain"),legend.position = c(0.9,0.2)) + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + theme(legend.margin=margin(t=-0.5, r=0, b=0, l=0, unit="cm")) # Compute the predicted return from 4 PCs # calculate the first 3 PCs names_FF25 <- names(FF25_exret_nodate) FF25_exret_nodate <- as.matrix(FF25_exret_nodate) # calculate the first 3 PCs PC1_FF25 <- FF25_exret_nodate %*% PCA_FF25$rotation[,1] PC2_FF25 <- FF25_exret_nodate %*% PCA_FF25$rotation[,2] PC3_FF25 <- FF25_exret_nodate %*% PCA_FF25$rotation[,3] PC1_3_FF25 <- cbind(PC1_FF25,PC2_FF25,PC3_FF25) PC1_3_FF25 <- t(PC1_3_FF25) predict_FF25 <- PCA_FF25$rotation[,1:3]%*%PC1_3_FF25 avgPredict_FF25 <- apply(predict_FF25,1,mean) avgActual_FF25 <- apply(FF25_exret_nodate,2,mean) dt_FF25 <- data.table(predicted_ret=avgPredict_FF25, realized_ret=avgActual_FF25, names=names_FF25) # plot predicted vs. realized returns ggplot(dt_FF25,aes(x=predicted_ret,y=realized_ret))+geom_point(size = 2) + xlab("Predicted") + ylab("Realized") + geom_abline(color="blue", size = .6) + theme_bw() + scale_x_continuous(limits = c(0,1.2), breaks = seq(0,1.2,by = .2)) + scale_y_continuous(limits = c(0,1.2), breaks = seq(0,1.2,by = .2)) + geom_text(aes(label=names),hjust=.5, vjust=-0.4, size = 4.5) + theme(axis.text.x = element_text(size=14,face="plain"), axis.text.y = element_text(size=16,face="plain"), axis.title.x = element_text(size=16,face="plain"), axis.title.y = element_text(size=16,face="plain"), legend.text = element_text(size=9,face="plain"),legend.position = c(0.9,0.2)) + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + theme(legend.margin=margin(t=-0.5, r=0, b=0, l=0, unit="cm")) # R^2 FF25 Rsquared_FF25 <- 1-var(avgActual_FF25-avgPredict_FF25)/var(avgActual_FF25) 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com