hadoop sql代写:CSE 3244: Data Management in the Cloud Lab 3: Simulating a “buy and hold” strategy

CSE 3244: Data Management in the Cloud
Lab 3: Simulating a “buy and hold” strategy

Instructor: Spyros Blanas, blanas.2@osu.edu TA: Kalyan Khandrika, khandrika.1@osu.edu

This lab asks you to simulate the performance of a “buy and hold” investment strategy on the financial dataset that was used in the previous lab. We are interested in the value of a hypothetical $10,000 investment on January 16, 2015 every day between January 16, 2015 and June 17, 2016. We make a few simplifying assumptions:

  • One can own and trade a fractional number of shares.
  • There are no transaction costs to buy or sell.
  • Taxes do not impact performance.
  • All trades on the same day happen at the closing price for that day.

    You can submit either Impala SQL or MapReduce code, or combinations of either. If submitting a SQL query, make sure you include the SQL statements that create and populate the table in your submission.

    Hint 1: Using SQL is easier, but crafting the SQL query will require some thought. It is recommended to start from isolated examples and then synthesize these into more complex queries. Feel free to create temporary tables as needed.

    Hint 2: Some SQL operations naturally introduce NULLs for non-existing values. One can convert every NULL in a column to zero using the COALESCE(column_name, 0) SQL function.

    0. [Optional] Download the cleaned files

    You need to use a clean version of the dataset for this lab. You can either use your own files from lab #2, or find a cleaned version of the dataset in the following URL:

    http://cse.osu.edu/~sblanas/3244/findata-clean.zip

    1. Compute the daily value of the investment, ignoring dividends

    Look at the SPY price file and the “Close” column for each date. Find the price of the security on January 16, 2015. Find the number of shares you can buy for the $10,000 starting investment at that day’s closing price. Multiply the quantity by the closing price for every day between January 16, 2015 until June 17, 2016 to obtain the value of the investment every day.

Deliverable #1 [30 points]

Please write and submit an Impala SQL query or a MapReduce program that returns a table with every date between January 16, 2015 and June 17, 2016, along with the investment value for each day. Name the file “Deliverable1.sql” if submitting an Impala SQL query, or “Deliverable1.java” if submitting a MapReduce program.

2. Compute the daily value of the investment, assuming dividends accumulate

A dividend is a sum of money a company pays quarterly to its shareholders. Look at the SPY dividends file for the dividend amount per share and the date of the dividend. Assume that the dividends accumulate in the account as cash and add them to the daily investment value between January 16, 2015 until June 17, 2016.

Deliverable #2 [35 points]

Please write and submit an Impala SQL query or a MapReduce program that returns a table with every date between January 16, 2015 and June 17, 2016, along with the investment value for each day that includes dividends. Assume dividends accumulate as cash in the account. Name the file “Deliverable2.sql” if submitting an Impala SQL query, or “Deliverable2.java” if submitting a MapReduce program.

3. Compute the daily value of the investment, assuming dividends are reinvested

One can choose to reinvest the dividends, which means to purchase as many shares of SPY as possible on the day of the dividend payment. Now the number of shares is no longer fixed throughout the entire period, but it increases on every dividend payment date because of the reinvestment.

Deliverable #3 [35 points]

Please write and submit an Impala SQL query or a MapReduce program that returns a table with every date between January 16, 2015 and June 17, 2016, along with the investment value for each day that includes dividends. Assume dividends are reinvested in the security by purchasing as many shares as possible at the closing price on every dividend payment date. Name the file “Deliverable3.sql” if submitting an Impala SQL query, or “Deliverable3.java” if submitting a MapReduce program.

Hint 3: First construct the table with the number of shares held each day. Use this table to calculate the final answer.

Bonus #1 [20 points]

Plot the three tables in one figure, using a solid line of different color for each table. Time should be shown on the horizontal axis, and investment value on the vertical axis. Include a legend and axis titles. Save the plot as “figure.png” to upload on Carmen.

Bonus #2 [20 points]

Repeat all steps with INTC instead of SPY. Use the prefix “Bonus” in the filename. (That is, deliverable #1 for INTC should be named “BonusDeliverable1.sql”).