CS代考计算机代写 Business Analytics Coding Semester 203 – Assignment 2

Business Analytics Coding Semester 203 – Assignment 2
Information
• Due Date – Mon 07/12/2020 (Week 13) 11:59pm
• Percentage of Grade – 40%
• This assignment has a total of 100 marks
• For this assignment, you are required to complete several tasks and present your solutions
in a Jupyter Notebook
• Data required for this assignment is on iLearn in the housing.csv, BHP.csv, ANZ.csv, CBA.csv,
RIO.csv and WES.csv files
• Make sure that all plots have appropriate axes values, axis labels, titles and legends
• You are required to submit your solutions to this assignment via iLearn
• When submitting, please provide the raw Jupyter Notebook (i.e. a .ipynb file) and the
Jupyter Notebook downloaded as a HTML file
• Save your Jupyter Notebook file as DTSC-100_A2_Firstname_Lastname_SID.ipynb
• Save your HTML file as DTSC-100_A2_Firstname_Lastname_SID.html
Part 1 (40 marks)
Tasks 1-5 focus on housing data from the 1990 California census, with each row representing a census block group, which is the smallest geographical unit for which sample data is provided by the U.S. Census Bureau. A census block group usually has a population of about 600 to 3,000 people. The data is sourced from Kaggle1 and is provided in the housing.csv file on iLearn. It consists of the following fields:
• longitude – a measure of how far west a house is; a higher value is farther west
• latitude – a measure of how far north a house is; a higher value is farther north
• housing_median_age – median age of a house within a block in years; a lower number is a
newer building
• total_rooms – total number of rooms within a block
• total_bedrooms – total number of bedrooms within a block
• population – total number of people residing within a block
• households – total number of households, a group of people residing within a home unit, for
a block
• median_income – median income for households within a block of houses (measured in tens
of thousands of US Dollars)
• median_house_value – median house value for households within a block (measured in US
Dollars)
• ocean_proximity – location of the block with respect to the ocean
1 https://www.kaggle.com/camnugent/california-housing-prices

Task 1 (10 marks)
• Display the structure of the housing data and calculate descriptive statistics for the numerical columns (mean, median, standard deviation, maximum and minimum).
• How many rows and columns are in this dataset?
• Create a frequency table showing the unique values of the ocean proximity variable and the
number of times these values occur in the dataset.
• Create a pie chart showing the breakdown of the categories in the ocean proximity variable.
Display the percentage of each category on the pie chart. Interpret the plot.
Task 2 (5 marks)
• How many values are missing from the total bedrooms variable? Print out the first 5 rows in the housing data with missing values for this variable.
• Create a violin plot showing how the distribution of median house value differs across the location of the block with respect to the ocean. Interpret the plot.
Task 3 (5 marks)
• Add a new column (age_category) to the housing data that classifies each block into one of five categories depending upon the median age of a house:
o 0-10years–Cat1 o 10-20years–Cat2 o 20-30years–Cat3 o 30-40years–Cat4 o >40years–Cat5
• Print out the median age of a house and the age category for the first 10 rows in the dataset
• Create a grouped bar chart showing the breakdown of the age_category variable for each
category in the ocean proximity variable (i.e. put ocean proximity on the x-axis and age_category in the bars). Interpret the plot.
Task 4 (10 marks)
• Investigate the relationship between total bedrooms (x-axis) and median house value (y- axis) by creating a scatter plot with a linear regression line through it and a joint plot. Comment on what the plots say about this relationship.
• Filter the housing data so it only contains census blocks with a median income above $60,000 USD and a median house age less than 5 years. Print the first 5 rows of this filtered dataset. How many census blocks are in this filtered dataset?
• Randomly sample 500 rows from the housing dataset and store them in a variable. Print out the first 10 rows of this dataset.
• Create a scatter plot showing the relationship between total rooms (x-axis) and median house value (y-axis) for the randomly sampled data, colouring points on the chart based on a census block’s median income. Interpret the plot.

Task 5 (10 marks)
• Calculate the number of people per household for each census block. Print out the top ten census blocks with the largest number of people per household. What do you notice?
• Plot a histogram of the number of people per household and display its mean and median as straight and dashed lines respectively on the plot. Include the actual values of the mean and median in the legend.
• Create the same histogram, but this time only include census blocks with less than 10 people per household. Which plot do you prefer and why?
• Compare the data of the ten census blocks over the entire housing dataset with the lowest median income with the data of ten census blocks with the highest median income.
Part 2 (25 marks)
This part of the assignment involves repeatedly simulating an automated (not interactive) game of Snakes and Ladders and calculating the probability of winning the game. Simulations are a common way to assess the probability of an event happening in real life. In this game, players start at step 1 and win if they get past step 100 within 30 moves. There are no snakes or ladders for the first ten steps. After this, each step has a 15% chance of being a ladder and a 20% chance of being a snake. Ladders take players forward a random number of steps between 1 and 10 and snakes take players back a random number of steps between 1 and 10. The steps which are ladders/snakes, along with their number of steps forward/back, should be predetermined before the start of a game. For each move, players roll a random dice and move forward the number of steps on the dice (i.e. 1-6 steps). If a player lands on a ladder/snake, they go forward/back the appropriate number of steps before their next move.
• Simulate the Snakes and Ladders game described above 1000 times. At the start of each game, the board should be reset (i.e. steps which are ladders/snakes should be recalculated).
• The results of each game should be recorded in a dictionary with two keys, Won and Lost, which store the number of games won and lost respectively (i.e. when a game is won, add one to the value of the Won key). Print out this dictionary after the 1000 simulations.
• Report the probability of winning based on the 1000 simulations. Part 3 (35 Marks)
A trader is developing a new trading strategy to apply on the stock market. He has collected stock price data to backtest his strategy (see how it performs on historical data to get an idea of how it will perform in real life). His strategy involves buying whenever the 5-period moving average of a stock’s daily closing price crosses its 20-period moving average and selling when it crosses back over. The strategy buys and sells on the opening price of the day after a buy/sell signal (i.e. if the 5-period moving average crosses the 20-period moving average on 01/01/2018, it will buy/sell on the opening price of the 02/01/2018). The trader has asked for your help in evaluating the strategy on the stock

data on iLearn. This data consists of the daily stock prices for ANZ, CBA, BHP, RIO and WES from January 2002 to August 2018. Data for each stock is provided in a separate CSV file. Each file contains the following fields:
o Date–dateofthetradingday
o Open–openingpriceofstockthatday
o High–highestpriceofstockthatday
o Low–lowestpriceofstockthatday
o Close–closingpriceofstockthatday
o Volume–amountofstockthatwastradedthatday o Member–whetherthestockisamemberofASX200
Task 1 (10 marks)
• Create a line plot using different colours to show the closing prices of the stock data over time.
• Calculate and display the correlations between the closing prices of the stock data. Create a heatmap showing these correlations. Comment on what the numbers and plot say about the correlation between the stocks.
Task 2 (15 marks)
• Calculate 5-period and 20-period moving averages for each stock. Print the last 5 values of each moving average for each stock.
• Create a separate line plot for each stock from 2008 – 2010, with the stock’s opening price, 5-period moving average and 20-period moving average displayed using different colours.
• Record the buy and sell transactions for each stock based on the strategy above and print out the buying/selling dates/prices for the first 10 transactions of each stock. Here’s some
additional information that might be helpful:
o Therearenoequityconstraints(i.e.thetraderisabletomakeatradeeverytimethe
strategy conditions are met, regardless of how much money he has gained/lost
previously and how many other stocks he is currently holding).
o Ifatradeisopenwhenthedataforastockends,itshouldbeclosed(i.e.stockssold)
on the last day that data is available for. For example, if a trade is open for BHP on
10/08/2018 (last day of data) it should be closed at the open price on that day. o Therearenotransactioncosts.
o 1000stocksarepurchasedatatime.
Task 3 (10 marks)
• Create a data frame containing information about all of the trades the strategy made (i.e. trades for every stock should be in the one data frame). For this task, a trade is a pair of buy and sell transactions. For example, if I buy 1000 shares in Westpac and then sell them three days later, I have made one trade, which is comprised of two transactions (buying the shares and then selling them). Print the first 10 rows and last 10 rows of the data frame. The data frame should have the following columns:

o Stock–nameofthestock
o Open_Date–startingdateofthetrade
o Close_Date–closingdateofthetrade
o Open_Price–buyingpriceofthetrade
o Close_Price–sellingpriceofthetrade
o Quantity–tradequantity(hint–thisshouldalwaysbe1000–seeTask2) o Profit–profitofthetrade
• Calculate descriptive statistics (sum/total, number of trades, mean, median, standard deviation, maximum and minimum) of the trading profits for each stock, as well as the total trading profits.
• Comment on the performance of the strategy, including which stock performed the best and whether you would recommend this strategy to the trader.