程序代写代做代考 Due on Wednesday, September 30th.

Due on Wednesday, September 30th.

• NBER birthweight data
The National Bureau of Economic Research maintains detailed birth records for the United States. The data set under consideration consists of information on single-child, live births to mothers between the ages of 18 and 45 in the year 1997. The goal is to characterize the relationship between the distribution of birth weights and various measured attributes of the mother and child. Specifically, the included features are:

• BirthWt: weight of baby at birth in grams
• Boy: male baby
• Married: mothers marriage status
• Black: mother’s race
• Age: mother age
• HighSchool: mother has high school diploma
• SomeCollege: mother has done college course work
• College: mother has college degree
• NoPrenatal: mother had no prenatal care
• PrenatalSecond: mother had prenatal care starting in second trimester
• PrenatalThird: mother had prenatal care starting in third trimester
• NonSmoker: mother is non-smoker
• Cigarettes: number of cigarettes smoked per day
• Weightgain: weight gained over course of pregnancy in pounds

第1页
MS 5217：作业2
到期日为9月30日。
1. NBER出生体重数据
国家经济研究局保留了美国的详细出生记录
状态。正在考虑的数据集包括有关单胎活产的信息
面向1997年年龄在18至45岁之间的母亲。
出生体重的分布与婴儿的各种测量属性之间的关系
母亲和孩子。具体来说，包含的功能包括：
•出生体重：出生时婴儿的体重（克）
•男孩：男婴
•已婚：母亲的婚姻状况
•黑色：母亲的种族
•年龄：母亲年龄
•高中：母亲有高中文凭
•SomeCollege：母亲完成了大学课程工作
•大学：母亲具有大学学历
•没有产前检查：母亲没有产前检查
•产前第二：母亲从孕中期开始接受产前检查
•产前第三：母亲从孕晚期开始接受产前检查
•不吸烟：母亲不吸烟
•香烟：每天吸烟的数量
•体重增加：整个怀孕过程中增加的体重（以磅为单位

Answer the following questions about the provided data set.
• What is the probability of a randomly selected birth weight being less than 2.5kg?

• What fraction of mothers are black?

• What fraction of mothers are non-black? (Can you answer this question without any additional consultation of the data?)

• What is the probability of a birth weight less than 2.5kg, given that the mother is black?

• What is the probability of a birth weight less than 2.5kg, given that the mother is non- black?

• Use the law of total probability, together with the answers to the previous four questions, to confirm your answer to the first question:
Pr(low weight | black) Pr(black)+Pr(low weight | non-black) Pr(non-black) = Pr(low weight).

• Make and overlay density plots of BirthWeight for the following four groups of mothers: black smokers, non-black smokers, black non-smokers, and non-black non- smokers.

• Make box plots for BirthWeight for mothers of each age in the data set.

第2页
回答有关提供的数据集的以下问题。
（a）随机选择的出生体重低于2.5公斤的可能性是多少？
（b）黑人的母亲比例是多少？
（c）百分之几的非黑人母亲？（您能不回答任何问题吗？
有关数据的其他咨询？）
（d）如果母亲是黑人，出生体重不足2.5公斤的可能性是多少？
（e）如果母亲没有出生体重不足2.5公斤的可能性是多少？
黑色？
（f）使用总概率定律以及前四个问题的答案，
确认您对第一个问题的回答：
Pr（低重量|黑色）Pr（黑色）+ Pr（低重量|非黑色）Pr（非黑色）= Pr（低重量）。
（g）以下四组的BirthWeight的制作和覆盖密度图
母亲：黑人吸烟者，非黑人吸烟者，黑人非吸烟者和非黑人非吸烟者
吸烟者。
（h）在数据集中为每个年龄的母亲绘制出生体重的箱形图

• Bid-Ask spread versus Market Capitalization
Read the Investopedia primer Understanding Stock Splits, which is available as a pdf at the class web site (stock-split.pdf). The bidAsk.txt file on the course web site uses data taken from the paper Securities Market Efficiency, also available on the course website. For each of 2,464 stocks listed on India’s National Stock Exchange, this data records the market capitalization (price times shares outstanding) and the bid-ask spread (the difference between the price at which a buyer wants to buy and the price at which a seller wants to sell) on a single day in January of 2001.
• Make a scatter plot of bid-ask spread and market cap.

• What is the correlation between bid-ask spread and market cap?

• What is the correlation between bid-ask spread and market cap after making log trans- formations of both variables? Can you explain why the answer is different from the previous answer? Make a scatter plot of the transformed variables.

• Find the best linear predictor of the log-spread, using log-market cap as a predictor. Plot this line on the scatterplot. Does it appear to be a decent predictor?

• Create two new data columns by segmenting log market capitalization and log bid- ask spread each into four disjoint bins, each with 25% of the total observations. This discretizes the continuous observations.

• The two new discretized data vectors each have levels 1 through 4. Compute the distri- bution of market cap level (1 through 4) given the spread level (1 through 4). Based on these tables, are these two random variables independent? How did we already know this from answering earlier questions?

• CEO compensation
USA Today publishes annual data on CEO compensation. This problem asks you to analyze the 2010 dataset, found on the website in the file ceo.txt. In addition to overall compen- sation, the data disaggregates compensation by the categories salary, bonus and stock.
• Create three new variables representing the proportion of total compensation due to salary, bonus and stock for each executive in the dataset.

• Plot histograms or density plots (either side-by-side or overlaid) for each of the propor- tion variables. Summarize these plots in words.

• Create a scatterplot of total CEO compensation against annual stock return. Find the best linear predictor and overlay it against the data. What is the correlation between CEO compensation and stock return? How do you interpret this?

• Repeat the process in the previous step using different subsets of 80% of the original data (compensation/stock return pairs). How frequently (if ever) does the corresponding line of best fit have a positive slope? Does this process give you more faith in your con- clusion about the observed relationship between compensation and stock performance?

第3页
2.买卖差价与市值
阅读Investopedia入门读物《了解股票拆分》，该书可在pdf上找到。
类网站（stock-split.pdf）。课程网站上的bidAsk.txt文件使用
数据取自论文《证券市场效率》，也可在课程网站上找到。
对于印度国家证券交易所（National Stock Exchange）上市的2464只股票，该数据记录了
市值（价格乘以流通股）和买卖价差（差额）
买方要购买的价格与卖方要出售的价格之间的差额）
在2001年1月的一天。
（a）绘制散布的买卖差价和市值。
（b）买卖价差与市值之间有什么关系？
（c）进行对数转换后，买卖价差与市值之间有什么关系？
两个变量的形成？您能解释一下为什么答案与
以前的答案？绘制变换变量的散点图。
（d）使用对数市值作为预测变量，找到对数传播的最佳线性预测变量。情节
散点图上的这条线。它似乎是一个不错的预测指标吗？
（e）通过细分日志市值和日志出价来创建两个新的数据列-
“问”将其分散到四个不相交的垃圾箱中，每个垃圾箱占总观测值的25％。这个
离散化连续观察。
（f）两个新的离散化数据向量分别具有1到4级。
给定价差水平（1到4），得出市值水平（1到4）。基于
这些表，这两个随机变量是否独立？我们怎么知道
这是从回答先前的问题中得出的？

• A classroom of twenty-five individuals are asked aloud and in turn what their political party affiliation is; 16 answer Democrat and 9 answer Republican. There is a concern that lack of anonymity could have impacted the accuracy of the responses. To test the hypothesis that the responses were provided independently of one another, we count the number of times that adjacent responses differ. In the observed sequence of responses
(D, D, D, D, R, R, R, R, R, D, D, D, D, D, R, D, D, D, R, R, R, D, D, D, D)
this number is 6. By randomly shuffling these 25 observations, we get a distribution for the number of “switches” under the null hypothesis that the ordering of the responses was ran- dom, which has the following quantiles:

0.5%
1% 2.5%
5%
10%
50%
60%
95% 97.5%
99% 99.5%
6
6 7
8
9
12
12
15 16
17 17

• Is the p-value of the observed data under the hypothesis of independent responses above or below 5%? Explain why.

• Do you reject the null hypothesis of independent responses at the 1% level?

• If friends tend to share political affiliation and friends also tend to sit together, would the above test be able to answer the question of whether or not honest responses were elicited?

第4页
3. CEO薪酬
《今日美国》发布有关首席执行官薪酬的年度数据。这个问题请你分析
网站上的ceo.txt文件中找到了2010年数据集。除了总体薪酬外
例如，数据按薪水，奖金和股票类别将薪酬分类。
（a）创建三个新变量，分别代表由于
数据集中每个主管的薪水，奖金和股票。
（b）为每个比例绘制直方图或密度图（并排或重叠）-
位置变量。用文字总结这些情节。
（c）绘制总CEO薪酬与年度股票回报率的散点图。找出
最佳线性预测变量并将其与数据叠加。之间有什么相关性
CEO薪酬和股票回报？您如何解释呢？
（d）使用原始数据的80％的不同子集重复上一步的过程
数据（补偿/库存收益对）。相应的频率（如果有）
最合适的线有正斜率吗？这个过程是否会让您对自己的信念更有信心？
关于观察到的薪酬与股票表现之间的关系有什么保留？
•

Related Posts