Math 558 Lecture #30-31
Balanced Incomplete Block Design: Application 1 Randomized Response Procedure1
In surveys with sensitive questions the participants avoid to give an honest response to the questions they feel uncomfortable with. We can use BIBD to estimate the proportion of the people in the sensitive category. A procedure given by Raghavarao is as follows.
Copyright By PowCoder代写 加微信 powcoder
Suppose there are m sensitive categories and we want to estimate the population proportion πi for each of the m sensitive categories. Let t be the total number of questions in the questionaire with m sensitive questions and t − m unrelated, non sensitive questions. The t questions have binary response.
1Raghavarao p.87
Application 1
Form blocks of the size K such that not all questions in a block are sensitive. Divide the participants into groups such that the number of k-subsets are equal to the number of groups. Also, all the groups are of same size. As all the questions are binary response, we can code them as 0(No) and 1(Yes). Each respondent in the ith group responds by giving the sum of all the codded responses without giving the yes/No answers to te questions. This will ensure the anonymity of the response to a great extent.
. Let Y ̄j be the mean of all responses by the participants in the jth group. The jth group responds to the jth subset (denoted by Sj). Then
E(Y ̄j=∑πl, j=1,2,3..b l∈Sj
Where E(Y ̄j is the expected value of Y ̄j. The quantity πl is the proportion of responses belonging to category l of questions where l = 1, 2, ..t. Te problem is to estimate all t proportions of m sensitive and t − m non-sensitive questions. In design theory literature this problem is known as Spring balance weighing design problem without bias (Mood 1946, Raghavarao 1971). The balanced incomplete block design suggested for this problem is
t=4g−1=b, r=2g=k, ,λ=g
Application 1
Let n respondents be divided into b = 4g − 1 groups where each group has u number of participants.
n = (4g−1)u
Each of the u respondents will give a total sum of 2g binary responses for j = 4g − 1 subsets (blocks). Consider the following questionnare for a job satisfaction survey in a large company. The purpose is to estimate the proportion of workers who are satisfied with the working conditions at the company.
A I am satisfied with the work conditions. No (0), Yes (1) B I eat my lunch in the company cafeteria. No (0), Yes (1) C I exercise at least once a week. No (0), Yes (1)
For this design t = 4g − 1 = 3 = b, so λ = 1 and r = k = 2. Let us randomly choose 15 employees so that we have three groups of size 5 each. So each subset of questions (block of size k = 2 will be administered to 5 participants and there response will be averaged. Fictitious data are given in the following table.
subset (block) A, B A,C
Responses 0,1,1,2,2 0,0,1,1,1 1,1,2,2,2
Mean(Y ̄j) Variance (s2j 1.2 0.7
Let πi is the proportion of “yes” responses for ith item , i = A, B, C. Then
πˆA + πˆB = Y ̄1 = 1.2 πˆA + πˆC = Y ̄2 = 0.6 πˆB + πˆC = Y ̄2 = 1.6
Solving these equations simultaneously we get
πˆA = Y ̄1 + Y ̄2 − Y ̄3 = 0.1 2
πˆB = Y ̄1 − Y ̄2 + Y ̄3 = 1.1 2
SinceπˆB >1wecantakeπˆB =1
πˆC = −Y ̄1 + Y ̄2 + Y ̄3 = 0.5 2
1s21 +s2 +s23 Var(πˆA) = 4 5
1s21 +s2 +s23 Var(πˆB) = 4 5
1s21 +s2 +s23 Var(πˆC) = 4 5
We can see from this data that only 10% of the workers are satisfied with the work conditions.
Application 2
Balanced incomplete cross validation 2
Cross validation is an important tool to validate our fitted statistical model. In this method we divide the total number of observations into two sets. We use one set of observations to fit the model and the other set of observations to validate our model. This tool helps us to quantify the predictive ability of our model. Let us suppose that we have n observations in our data set. We can use nco observations to construct our model and nva to validate our model. Let Y(2) is the vector of the nva responses and Y(ˆ2) is the vector of estimated responses from our fitted model that was constructed by using nco observations.
2Raghavarao p.87
Application 2
Balanced incomplete cross validation
Then the average squared prediction error is
(Y(2) − Y(ˆ2))′ (Y(2) − Y(ˆ2)) nva
This average squared prediction error can be calculated for all ( n ) nva
combinations or fewer combinations. The model with the smallest prediction error will be selected.
Application 2
Balanced incomplete cross validation
Shao(1993)(given in Raghavarao, p. 90) suggested to split the n observations by using the balanced incomplete block designs. He called this procedure Balanced incomplete cross validation. The method is described as follows. Suppose there exists a BIBD with
t = n,k = nva,r,b,λ. Each subset (block) can be used as the data set for validation and it’s complement gives the data set for model construction. The mean prediction error computed from the model is cross-validation estimate. The variance of this estimate can be expressed as a function of an arbitrary cross-validation design. It can be shown that Balanced Incomplete Block Designs have smaller variance than any other cross-validation.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com