代写代考

Always explore your data¶
We have 3 files that simulated a potential AB test that is measuring clicks and views.

Copyright By PowCoder代写 加微信 powcoder

sim_ab_test_assignment.csv
sim_ab_test_clicks.csv
sim_ab_test_views.csv

Load the files into sensible variable names using numpy.

import numpy as np

clicks = np.loadtxt(“./sim_ab_test_clicks.csv”,delimiter=”,”)
views = np.loadtxt(“./sim_ab_test_views.csv”,delimiter=”,”)
groups = np.loadtxt(“./sim_ab_test_assignment.csv”,delimiter=”,”, dtype=str)

Examine the data a bit

What are the dimensions for each of the CSVs? This should bring up some questions, what are our guesses?

Can we look at some sample values in the data? Try looking at the first few rows and columns
Try looking at the last few rows and columns

print(clicks.shape)
print(views.shape)
print(groups.shape)

(1000, 14)
(1000, 14)

clicks[:3,:5]

array([[0., 1., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 1.]])

It’s important to track a few summary statistics about data, statistics that will let us know if the data is bad.

Calculate some of these statistics
Use print() to print out a human readable message that includes these statistics.

clicks.sum() / clicks.shape[1]

266.2142857142857

is_neg_clicks = clicks < 0 print(is_neg_clicks.shape) print(is_neg_clicks[:4,:4]) (1000, 14) [[False False False False] [False False False False] [False False False False] [False False False False]] np.array([np.nan,1,2,0]).sum() What kind of mathematical notation would you use to describe our data? Now use that notation to express (no code) total clicks over all users / total views over all users Average click through rate Now write the code to do these calculations (don't use a for-loop!) total over total clicks.sum() / views.sum() 0.05291929346282728 Average CTR def get_avg_ctr(clicks, views): user_clicks = np.apply_along_axis(sum,1, ) user_views = np.apply_along_axis(sum,1,clicks) ctrs = user_clicks / user_views return np.sum(ctrs) / len(ctrs) user_views = np.apply_along_axis(sum,1,clicks) Let's compare the 2 metrics: Remove one of the most active members from the data and re-calculate both metrics. How would you compare the metric before/after the removal? Calculate something and print() it out. Which metric would you recommend? is_most_active = user_views == np.max(user_views) is_most_active.sum() not_most_active = np.logical_not(is_most_active) chill_views = views[not_most_active, :] chill_views.shape chill_clicks = clicks[not_most_active, :] chill_views.shape Calculate the recommended metric for each of the treatment groups. is_treat = groups == "True" clicks_treat = clicks[is_treat, :] views_treat = views[is_treat, :] clicks_control = clicks[np.logical_not(is_treat), :] views_control = views[np.logical_not(is_treat), :] clicks_treat.sum() / views_treat.sum() 0.10317675807765408 clicks_control.sum() / views_control.sum() 0.05014607835792943 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com