news and the market
Linking the news to market dynamics¶
In today’s exercise, our goal is to link the news to market dynamics. Market dynamics can definitely influence the news but is the reverse true? We’ll take an exploratory angle to this problem using data from
Copyright By PowCoder代写 加微信 powcoder
NYTimes provides the news data by giving us the metadata on all of the articles.
Alpha Vantage will provide the market data approximated by the Vanguard S&P 500 ETF
The collective datasets are too big for Ed to handle so you’ll need to work on your local environment for this exercise.
SMALL DATA WARNING! We only have one month of data from NYTimes, don’t take the results too seriously.¶
Task 1 – read and plot¶
Please read in the file voo_full.csv from Resources on Ed. Please do so with the following constraints: You should use pandas to read in the file
You should look into the argument parse_dates in pandas.read_csv to make sure the column corresponding to dates is read in as a datetime64 object. (Let’s not set the dates to the index for now)
Plot the time series of the close values against the dates using seaborn. Make sure your x-axis is labeled as Dates and your y-axis is labeled as daily close price.
The plot should have revealed an interesting change in the price at a particular date (that cannot be explained by the pandemic). Please find the answer in the description of the Vanguard S&P 500 ETF.
For simplicity, let’s keep only the data after this change. Hint: you can compare to a specific date like 2020/03/20 by creating a datetime value like pandas.to_datetime(“2020-03-20”).
import pandas as pd
import seaborn as sns
Task 2 – understanding the monthly news metadata¶
There’s a file called nyt_arch_2022_9.csv under Resources on Ed. This file contains information related to the metadata for articles published in 2022 September to date. In particular, each row is an “article” on NYTimes. There are a few columns with the prefix kw- which represents whether the article was tagged with this particular keyword by NYTimes. This list of keywords are chosen by the popularity of keywords.
Please answer the following:
What are the dimensions of this CSV?
Which article(s) has(have) the most keywords? Hint: ‘hello’.startswith(‘hell’)
Which news_desk type has the most articles?
Which keyword(s) is/are the most common one?
Which keyword(s) is/are the most common among the articles whose news_desk value is “Business”?
Task 3 – aggregating the data¶
To compare the daily market data to the news data, we must first aggregate the news data such that each row corresponds to a daily record. Please continue using nyt_arch_2022_9.csv, and create a aggregated data frame that contains the date and the total count for each keyword on that day.
Task 4 – joining the data together¶
Join the news data and the market data together by date so we have the total count of keywords each day aligned with the daily close price.
Are there any days missing in either data frame (starting from the earliest date in each dataset)? How could we find out? For the questions below, we will ignore this issue!
Find the keyword with the highest correlation to the daily close price, then plot the time series of this keyword’s frequency. How would you interpret what the correlation is picking up? Is this what we care about?
Please use your joined data frame for this task
Please calculate the daily difference in closing price and store this in a separate column. E.g. if the close price was 100 on day 2 and 99 on day 1, the daily difference on day 2 should be 1 and this is not defined for day 1. Please choose a sensible variable name. Hint: pandas.Series.diff
Plot this new variable (no need to label your axis)
Now find the keyword with the strongest correlation again but with the daily difference in closing price.
To be useful, we may want to “forecast” the closing price tomorrow using keywords from the day before. Find the keyword with the strongest correlation between the daily difference in closing price and the keyword frequency the day before. Hint: pandas.Series.shift
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com