Assignment 1: Python for Analytics, Winter¶
• covers lectures 1-3
• due: Feb 5th by 6pm.
• Points will be deducted if:
▪ Problems are not completed.
▪ Portions of problems are not completed.
▪ Third party modules where used when the question specified not to do so.
▪ The problem was solved in a very inefficient manner. For instance, copying and pasting the same block of code 10 times instead of using a for loop or using a for loop when a comprehension would work.
▪ Each day late will result in a 10% penalty.
▪ Not attemping a problem or leaving it blank will result in 0 points for the problem and an additional 5 point deduction.
Question 1 (10 points)¶
If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23.
Find the sum of all the multiples of 3 or 5 below 1000.
Do this using 2 methods.
• Explicility testing if 3 or 5 is a divisor inside the comprehension.
• Using a udf called inside the comprehension to test if 3 or 5 is a divisor.
Print the resulting sum each time.
Question 2 (10 points)¶
The below snippet of code will download daily closing prices for SPY, which is an ETF that tracks the price S and P 500.
Using a for loop and enumerate, create a list of 5 period moving averages. For instance, the first 5 values of the list are [321.56, 319.12, 320.34, 319.44, 321.14] and the average is 320.32. This means the first entry in our new list would be 320.32.
Make your own udf to calculate the mean and use this in the for loop.
Print last 5 items of the list and the sum of the list of 5 period moving averges.
In [89]:
import yfinance as yf
SPY = yf.download(‘SPY’)
SPY = yf.Ticker(‘SPY’)
spy_lst = SPY.history(start=”2020-01-01″, end=”2020-02-01″)[“Close”].tolist()
print(spy_lst[0:5])
print(sum(spy_lst[0:5])/5)
[*********************100%***********************] 1 of 1 completed
[321.56, 319.12, 320.34, 319.44, 321.14]
320.32
Question 3 (10 points)¶
Consider the list of transactions, where each inner list item is a line item on a recipt. For instance, the first inner list [“A”, “item_a”] indicates “A” bought “item_a”. Iterate the list and return a dictionary where the key is the user id (A, B, C, D) and the values are a list of items the user bought.
The desired output for “A” can be seen in the sample_dct.
Do not include the first item in the list, [“User”, “Item”], which can be considered a header.
Be sure your solution is scalable, meaning it should be sustainable for a list of transactions of any size.
Print the dictionary out at the end.
In [13]:
transactions = [
[“User”, “Item”],
[“A”, “item_a”],
[“B”, “item_a”],
[“C”, “item_a”],
[“C”, “item_b”],
[“C”, “item_c”],
[“B”, “item_c”],
[“D”, “item_b”],
[“D”, “item_b”]
]
sample_dct = {
“A”: [“item_a”]
}
Question 4 (10 points)¶
A string can be sliced just like a list, using the bracket notation. Find the 3 consecutive numbers and their index positions that have the greatest sum in the number 35240553124353235435234214323451282182551204321.
As an example, the the string “1324” has 2 three number windows, 132 and 324. The first sums to 6 and the later sums to 9. Thus the 3 numbers would be [3,2,4] and the indices would be [1,2,3].
Print out the 3 numbers, the 3 indices where they occur and their sum.
In [14]:
sample = “1324”
# results should be
numbers = [3,2,4]
max_val = 9
index_vals = [1,2,3]
In [15]:
a = “35240553124353235435234214323451282192551204321”
Quesiton 5 (15 points)¶
The sum of the squares of the first ten natural numbers is
• 1^2 + 2^2 + … + 10^2 = 385
The square of the sum of the first ten natural numbers is
• (1 + 2 + … + 10) = 3025
Hence the difference between the sum of the squares of the first ten natural numbers and the square of the sum is
• 3025 – 385 = 2640.
Write a function, or collection of functions, that find the difference between the square of sums and sum of squares from 1 to n. Note, to solve the problem you have to:
• find the sum of squares
• the square of sums
• the difference
This can be broken up into smaller functions, with one function making use of the smaller ones, or all be done in one function.
Add an assert statement to your function, to make sure the input is a positive int.
Test the function using n=12 and n=8 and print the results.
Question 6 (20 points)¶
Make a function, or group of functions, to find outlier datapoints in a list. The outlier should be based on the standard deviation, giving the user some ability to control the outlier threshold. For instance, setting 2 standard deviations or 3 from the mean should be possible. Note, to solve this problem you will have to:
• find the mean of a list
• find the standard deviation fo a list
• convert the list of zscores using (x-mean)/std
• find the indices of where the outlier datapoints are, using the zscores
• return the outlier values and the indicies they occur at.
Test your data using the below stock price data for TSLA. Keep the same data range as is coded in below.
The standard deviation can be calculated as such (https://numpy.org/doc/stable/reference/generated/numpy.std.html):
• std = sqrt(mean(abs(x – x.mean())**2))
Print out the average, standard deviation, outliers and the index position of where the outliers occur.
Again, this can be done in one big function or a collection of smaller ones that are then used inside a final function to find the outliers.
NOTE: ASIDE FROM CHECKING WORK, THE ONLY PIECE OF IMPORTED CODE TO BE USED IS sqrt from math and the imported data from yfinance.
In [73]:
import yfinance as yf
from math import sqrt
TSLA = yf.download(‘TSLA’)
TSLA = yf.Ticker(‘TSLA’)
tsla_lst = TSLA.history(start=”2019-01-01″, end=”2020-04-01″)[“Close”].tolist()
[*********************100%***********************] 1 of 1 completed
Question 7 (25 points)¶
Make a class to profile a list of data. Include methods to do the below:
• Initialize and create an attribute data that is a list of data.
• Converts the list to a zscore scale. Note, no data is returned here, instead is bound to the class using self and overrides the value of the data attribute.
• Returns an n period moving average.
• Returns an n period cumulative sum.
• Returns the standard deviation.
• Returns the range as a tuple.
• Returns the mean.
• Returns the median.
The standard deviation can be calculated as such (https://numpy.org/doc/stable/reference/generated/numpy.std.html):
• std = sqrt(mean(abs(x – x.mean())**2))
Zscore conversions can be checked using the below:
• https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.zscore.html
Test a few of your methods on the SPY data from yfinance above.
NOTE: ASIDE FROM CHECKING WORK, THE ONLY PIECE OF IMPORTED CODE TO BE USED IS sqrt from math and the imported data from yfinance.
In [17]:
from math import sqrt