CS544 Module 2 Assignment
Part1) Probability – 20 points
Suppose that in a particular state, among the registered voters, 35% are democrats, 45 % are republicans, and the rest are independents. Suppose that a ballot question is whether to impose wealth tax on millionaires or not. Suppose that 75% of democrats, 20% of republicans, and 50% of independents favor the wealth tax. If a person is chosen at random that favors the sales tax, what is the probability that the person is i) a democrat? ii) a republican, iii) an independent. Show the solutions with the calculations without using R. Then, verify your answers with the bayes function provided in the code samples.
Part2) Random Variables – 30 points
a) Consider the experiment of rolling a pair of dice. Using R, show how would you define a random variable for the absolute value of the difference of the two rolls.
b) Using the above result, what is the probability that the two rolls differ by exactly 2? What is the probability that the two rolls differ by at most 2? What is the probability that the two rolls differ by at least 3? Use the Prob function as shown in the code samples.
c) Show the marginal distribution of the above random variable (using R).
d) Using R, add another random variable to the above probability space using a user defined function. The random variable is TRUE if the two rolls differ by 2, and FALSE otherwise. Making use of this random variable, what is the probability that the two rolls differ by 2? Show also the marginal distribution for this random variable.
Part3) Functions – 20 points
Using a for loop, write your own R function, evensum(data), that returns the sum of all the even
values in the given numeric data vector.
Now, without using any loop, write your own R function, evensum2(data), that returns the sum of all the even values in the given numeric data vector.
Test both functions with sample data.
Sample output:
Part4) R – 30 points
Initialize the Dow Jones Industrials daily closing data as shown below: dow <- read.csv('http://people.bu.edu/kalathur/datasets/DJI_2018.csv',
stringsAsFactors = FALSE)
Provide the simplest R code and output for all of the following. The code should work for any given data.
a) Use the diff function to calculate the differences between consecutive values.
Insert the value 0 at the beginning of these differences. Add this result as the DIFFS column of the data frame.
b) How many days did the Dow close higher than its previous day value? How many days did the Dow close lower than its previous day value?
c) Show the subset of the data where there was a gain of at least 500 points from its previous day value.
d) Show the above result by adding the previous day value to the result in c) as below.
d) Provide the solution to compute the longest gaining streak of at least 200 points in the data. Show the data for that longest gaining streak. Hint: Use the rle function provided by R.
Submission:
Create a folder, CS544_HW2_lastName and place the following files in this folder.
Provide all R code in a single file, CS544_HW2_lastName.R. Clearly mark each subpart of each question and add appropriate comments.
Provide the corresponding outputs in a single Word document, CS544_HW2_lastName.doc.
Archive the folder (CS544_HW2_lastName.zip). Upload the zip file to the Assignments section of Blackboard.