CS代考 Experiment Design & Analysis

Experiment Design & Analysis

Copyright By PowCoder代写 加微信 powcoder

version = “REPLACE_PACKAGE_VERSION”

Experiment Design and Analysis¶
School of Information, University of Michigan¶
What are experiments?

Experimental Design

Lab vs. Field Experiments

Online Field Experiments

The objective of this part is to:¶
Apply theory of experiment design and knowledge of analysis techniques to real experiment data.

Resources:¶
StatsModels

We recommend using a python library called StatsModels for data analysis

Dataset used in this part: Fixed-Price Auction data download csv file

Source for dataset: Chen, Y., et al. Sealed bid auctions with ambiguity: Theory and experiments. (2007).

import pandas as pd
import statsmodels.api as sm
import statsmodels.stats.api as sms
from scipy import stats
#you may or may not use all of the above libraries, and that is OK!
data = pd.read_csv(‘assets/Part1_data.csv’) #Data for this part

Suppose subjects were randomly assigned to two treatment groups. We want to know if the randomization was properly applied to these groups. In other words, we want to know if the proportion of participants in these demographic groups are different between the two treatments.

To determine if the randomization worked, for each of the two treatments, modify the following stats_calculator function so that it can input the data dataframe and tabulates the mean, standard deviation, minimum and maximum of the following variables: female, age, number of siblings, white, asian, african american, hispanic, and other ethnicities. (6 points)

import numpy as np
def stats_calculator(provided_data):
Write the function so that it fills-in the mean, standard deviation,
minimum and maximum of the following variables: Female, Age,
Number of siblings, White, Asian, African American, Hispanic, and Other ethnicities.

It should return a dataframe with these calculations based on the partially-completed dataframe below.

Your function should return a dataframe with each of the variables and their completed statistics. Check that it does:

stats_calculator(data)

variable mean std. dev. max min
0 female 0.64 0.48 1 0
1 age 22.51 3.49 31 18
2 siblings 1.64 1.2 5 0
3 white 0.47 0.5 1.0 0.0
4 asian 0.27 0.44 1.0 0.0
5 african 0.11 0.31 1.0 0.0
6 hispanic 0.08 0.25 1.0 0.0
7 other 0.08 0.27 1 0

“””Check that the function above outputs the (rounded) statistics”””
assert stats_calculator(data).iloc[0][1] == 0.64, “Part A #1 female mean value differs”
assert stats_calculator(data).iloc[1][2] == 3.49, “Part A #1 age std. dev value differs”

“””Hidden test Part A: Check function abv outputs (rounded) statistics”””
# Hidden tests

‘Hidden test Part A: Check function abv outputs (rounded) statistics’

“””Part A: Check function abv outputs (rounded) statistics”””
# Hidden tests

‘Part A: Check function abv outputs (rounded) statistics’

Part B (4 points)¶

We can also use a more objective measure to identify if our treatment groups were properly randomized.

Using a t-test (make sure you use the correct type of t-test) and the data dataframe again, analyze the differences between the two treatment groups (k1_8_exp_lot and k1_8_lot_exp) for the female, age, and hispanic demographic variables by completing the following objective_randomization function. (4 points)

Round any calculations to the hundredth decimal. Do not use percentages.

def objective_randomization(provided_data):

Complete the function that takes the provided data and runs a t-test on the
female, age, and hispanic demographic variables between the two treatments
and outputs the results in the following partially-completed dataframe.
Round your results to the nearest hundredth.
Tip: you can choose to use either the statsmodels stats library or the scipy stats library to calculate the t-statistic and p-value.

Your function should return a dataframe with each of the variables and their completed t-statistic and p-value across the treatments.

Check that it does:

“””Check that the function above outputs the required statistics”””
result = objective_randomization(data)
assert abs(result.iloc[0][1]) == 0.23, “checking the value of the female t-statistic”

“””Part B # 1: Check function abv outputs required statistics”””
# Hidden tests

‘Part B # 1: Check function abv outputs required statistics’

The objective of this part is to:¶
Apply theory of experiment design and knowledge of analysis techniques to real experiment data.

Resources:¶
StatsModels, Scipy.stats, Numpy

We recommend using two python libraries called StatsModels and Scipy.stats for data analysis. For this dataset, we’ll be using Numpy as well.

Optional Reading: .A, & .K. Risk Aversion and Incentive Effects. (2002).

Datasets used in this part

Trust data download csv file
Fixed-Price Auction data download csv file Source for dataset: Chen, Y., et al. Sealed bid auctions with ambiguity: Theory and experiments. (2007).

import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.stats.api as sms
from scipy import stats
#you may or may not use all of the above libraries, and that is OK!

trust_data = pd.read_csv(‘assets/Part2_data1.csv’) #Trust Game data for this part
fpa_data = pd.read_csv(‘assets/Part2_data2.csv’) #First-price auction data for this part – this is the same dataset from last part

#uncomment the below line to view readme files for this dataset (includes explanation of variable names)
!cat assets/Part2_data1_readme.md
!cat assets/Part2_data2_readme.md

#uncomment the below line to view snippet of csv file
trust_data.head()
fpa_data.head()

### Assignment Topic: Data analysis of a laboratory experiment on trust

### Background:
We upload data files from laboratory experiments conducted at the University of Michigan.

Subjects are grouped in pairs, paired with one assigned the role of an investor and another a recipient.

– The investor holds a set amount of money and can choose to give any fraction of that amount to the recipient – or none.

– The amount given is multiplied by a set amount and the recipient can choose to give any fraction of the multiplied amount back to the investor – or none.

The data given was collected from an experiment involving the Trust Game and it contains decisions from the “investors” and “recipients.”

The Trust Game data has the following variables:
– Period: period which the game was held
– group #: pair the player was in
– player #: order or role the player had
– player role: first if investor, second if recipient
– decision type: INVEST if investor, RETURN if recipient
– decision: amount invested or returned
– payoff: if investor – amount kept + amount returned, if recipient – (3 * amount received) – amount returned
### Assignment Topic: Data analysis of a laboratory experiment on first-price auction

### Background:
We upload data files from a laboratory experiment conducted at the University of Michigan.

There are ten experimental sessions, with eight subjects per session. In this context, subjects are tasked with completing auction and lottery (Holt-Laury 2002) tasks in two orders.

– In five of the ten sessions, subjects first complete a lottery task, followed by 30 rounds of auctions.
– In the other five sessions, subjects first complete 30 rounds of auctions, followed by a lottery task.

At the end of each session, subjects complete a demographics survey.

The data has the following variables:
– treatment: the treatment received by the subject
– session: the session in which the data was collected in the experiment
– period: the period in the session (multiple periods per session)
– subject: unique identifier for each subject
– disttype: distribution type, high or low
– highdist: if the subjects’ values are from the high distribution type
– lowdist: if the subjects’ values are from the low distribution type
– group: group number
– v: value of the subjects’ object
– b: how much the subjects bid for the object
– highbid: the highest bid in the group
– lowbid: the lowest bid in the group
– buy: whether subjects win the auction
– buy_yes: if subjects win the auction
– buy_no: if subjects do not win the auction
– profit: profit for the period
– cumprof: accumulated profit from the first period
– timeb: time the bid is submitted
– new: new data indicator
– lottery_profit: profit from the Holt-Laury lottery
– choice1-10: choosing lottery A results in a value of 1 choosing lottery B results in a value of 2
– error: if subjects do not follow the pattern that they switch from lottery A to B and continue to choose B
– ra: risk-attitude
– ra_adj: censored risk-attitude
– ra1-5: if ra_adj=4, ra1=1 … if ra_adj=8, ra5=1
– pclab: PC lab number
– gender: gender
– male: male
– female: female
– ethnic: ethnicity
– white, asian, african, hispanic, native, other: 1 if category applies
– age: age
– siblings: number of siblings
– personality: optimistic, pessimistic, or neither
– optim, pessim, neither: 1 if applies
– emotions: emotions
– anger, anxiety, confusion, contentment, fatigue, happiness, irritation, moodswings, withdrawal: 1 if applies
– major: major field type
– sdmajor: self-declared major
– major1: math or statistics major
– major2: science or engineering major
– major3: economics or business major
– major4: other soc. science major
– major5: humanities or other major

treatment session period subject disttype highdist lowdist group v b … irritation moodswings withdrawal major sdmajor major1 major2 major3 major4 major5
0 k1_8_exp_lot 061018_1 1 1 Low 0 1 1 48 40 … 0 0 0 2 Electrical Engineering – Signal Processing (st… 0 1 0 0 0
1 k1_8_exp_lot 061018_1 1 2 High 1 0 4 76 15 … 0 0 0 4 public health 0 0 0 1 0
2 k1_8_exp_lot 061018_1 1 3 High 1 0 3 73 53 … 0 0 0 5 german and film and video studies 0 0 0 0 1
3 k1_8_exp_lot 061018_1 1 4 High 1 0 4 74 51 … 1 0 0 5 spanish 0 0 0 0 1
4 k1_8_exp_lot 061018_1 1 5 Low 0 1 1 72 52 … 0 1 0 2 Engineering 0 1 0 0 0

5 rows × 72 columns

Part A (3 points)¶
For the Trust Game, subjects are grouped in pairs, paired with one assigned the role of an investor and another a recipient. Let’s examine the correlation between the amounts the investors invest and the amounts the recipients return. Complete the function below to return the correlation coefficient. (3 points)

Round any calculations to the hundredth decimal. Do not use percentages.

def inv_rec_corrcoef(provided_data):
“”” Later in this problem set, you will be modeling OLS regressions on your data. For now, we’ll calculate just
the correlation coefficient using numpy. If needed, refer to the numpy documentation linked above.
# YOUR CODE HERE
# raise NotImplementedError()
# return answer

Your function should return a float with the correct coefficient value. Check that it does:

assert type(inv_rec_corrcoef(trust_data)) == np.float64

“””Part A #1: Checking value of correlation coefficient”””
# Hidden tests

‘Part A #1: Checking value of correlation coefficient’

Part B (4 points)¶
For the first-price auctions experiment, there are ten experimental sessions, with eight subjects per session. In this context, subjects are tasked with completing auction and lottery (Holt-Laury 2002) tasks in two orders. In five of the ten sessions, subjects first complete a lottery task, followed by 30 rounds of auctions. In the other five sessions, subjects first complete 30 rounds of auctions, followed by a lottery task. At the end of each session, subjects complete a demographics survey. The data sets extract the first period auction data for each treatment.

In this case, say that the control for the first-price auction experiment is the order in which subjects complete the lottery task followed by the auction task (k1_8_lot_exp) and the outcome variable we want to measure is the bid-value ratio (b/v).

Using differences-in-means, what is the average treatment effect for the first-price auction experiment? (4 points)

Round any calculations to the hundredth decimal. Do not use percentages.

def ate_fpa_payoff(provided_data):
Write the function to manually check the differences in means of bid-value ratios across the different groups explained above.
To do this, please create a new dataframe column called ‘bidval_ratio’ in the provided data.
Your function should output a dataframe with the following columns: ‘lot_auc_mean’, ‘auc_lot_mean’, ‘diff in means’.
# YOUR CODE HERE

Your function should return a dataframe with the correct values and columns. Check that it does:

assert isinstance (ate_fpa_payoff(fpa_data), pd.core.frame.DataFrame), “checking your data is in a dataframe”

assert ate_fpa_payoff(fpa_data).columns[0] == ‘lot_auc_mean’, “checking df column names”
assert ate_fpa_payoff(fpa_data).columns[1] == ‘auc_lot_mean’, “checking df column names”
assert ate_fpa_payoff(fpa_data).columns[2] == ‘diff in means’, “checking df column names”

“Part B: lot_auc_mean value”
# Hidden tests

‘Part B: lot_auc_mean value’

“Part B: auc_lot_mean value”
# Hidden tests

‘Part B: auc_lot_mean value’

Part C (12 points)¶
Continuing with the fpa_data dataset from last part, we would expect subjects to bid a certain fraction of their value in a first-price sealed bid auction depending on their risk attitudes (e.g., risk neutral, risk averse). Let’s explore what effect gender has on bid-value ratios when controlled with risk. This time, let’s calculate this average treatment effect using an ordinary least-squares regression.

Using the fpa_data dataframe and an ordinary least-squares regression model, complete the ols_riskfemale_on_bidvalue function to evaluate how subjects’ risk attitudes and gender (in the form of the female variable) affect their bid/value ratio. For now, we’ll just return a summary view of our data. (2 points)

Round any calculations to the hundredth decimal. Do not use percentages.

def ols_riskgender_on_bidvalue(provided_data):

The easiest way to evaluate how subjects’ risk attitudes and gender affect their bid/value ratio is to run an OLS linear
regression on your fpa_data dataframe. Use the statsmodels library to run an OLS linear regression, and return the summary
view of your results.

# >>> Y = duncan_prestige.data[‘income’]
# >>> X = duncan_prestige.data[‘education’]
# >>> X = sm.add_constant(X)
# >>> model = sm.OLS(Y,X)
# >>> results = model.fit()
# >>> results.params

Your function should return a summary view of your results. Check that it does:

Now, modify the ols_riskgender_on_bidvalue function to access the model’s coefficients (parameters) and associated p-values, instead of printing out the entire summary view. For now, we won’t worry about rounding. (3 points)

def ols_riskgender_on_bidvalue(provided_data):

The easiest way to evaluate how subjects’ risk attitudes and gender affect their bid/value ratio is to run an OLS linear
regression on your data dataframe. Use the statsmodels library to run an OLS linear regression, and this time return the
the coefficients and the p-values for your model.

Your function should return a raw tuple of your results in pandas Series form. Check that it does:

ols_riskgender_on_bidvalue(fpa_data)

“checking your return value is a tuple of type pandas series”
assert isinstance (ols_riskgender_on_bidvalue(fpa_data)[0], pd.core.series.Series)
assert isinstance (ols_riskgender_on_bidvalue(fpa_data)[1], pd.core.series.Series)

“checking the value of ‘const’ for both values”
# Hidden tests

Now, let’s make our results more readable. Let’s modify our function once again to this time create a dataframe that has the coefficients and p-values for the control variables and constant, rounding to the nearest hundredth decimal. (4 points)

def ols_riskgender_on_bidvalue_df(provided_data):

This function should use the results of the ols_riskgender_on_bidvalue function above to output a dataframe
that has the coefficients and p-values for the control variables and constant.
The dataframe should have the following columns: ‘variable’, ‘coefficient’, and ‘p-value’

# define your parameters for your model and the p-values, then fill in the rest of the function below.

# YOUR CODE HERE

Your function should return a dataframe of your results. Check that it does:

ols_riskgender_on_bidvalue_df(fpa_data)

“””Part C: Check the dataframe outputs the correct p-values from OLS model”””
# Hidden tests

“””Part C: Check the dataframe outputs the correct coefficients from OLS model”””
# Hidden tests

4. If you remove the risk attitudes variable from the model, does it have a significant effect on how gender contributes to bid/value ratios? Complete the “`ols_female_on_bidvalue“` function to assess this. Part of the function has already been completed for you. (3 points)

**Round any calculations to the hundredth decimal. Do not use percentages.**

def ols_gender_on_bidvalue_df(provided_data):

Complete the function that takes the provided data and creates a OLS model that determines the effect of
gender (using the female variable) on subjects’ bid/value ratios. It should output (by filling in the missing values)
a dataframe that has the coefficients for the control variables and intercept.

# assign your X and Y variables, and define your parameters and pvalues. Then, fill in the rest of the function below.
# YOUR CODE HERE
raise NotImplementedError()

Your function should return a dataframe with each of the variables and their completed coefficient and p-value for the OLS model.

Check that it does:

ols_gender_on_bidvalue_df(fpa_data)

assert ols_gender_on_bidvalue_df(fpa_data).iloc[0][2] == 0, “checking the const p-value value”

“””Check that the dataframe outputs the correct values from the OLS model”””
# Hidden tests

Power & Sample Sizes

Randomization – Blocking & Clustering

Differences-in-Differences

The objective of this part is to:¶
Applying theory of experiment design and knowledge of analysis techniques to real experiment data.

Resources:¶
StatsModels and Scipy.stats

We recommend using two python libraries called StatsModels and scipy.stats for data analysis

Datasets used for this part:

MovieLens Data: Part3_data.csv Source for dataset: Chen, Y. et al. Social Comparisons and Contributions to Online Communities: A Field Experiment on MovieLens. (2010).

import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.stats.api as sms
import math as math
from scipy import stats
from statsmodels.stats.po

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com