CS计算机代考程序代写 Take Home Assessment 2021

Take Home Assessment 2021
Tengyao Wang 2021-03-12

2

Contents
Rules for the take home assessment 5
1 Data description 7
2 Individual component 9
2.1 Task………………………………………….. 9 2.2 Whattosubmit…………………………………….. 9
3 Group component 11
3.1 GroupTasks ……………………………………… 11 3.2 Whattosubmit…………………………………….. 11
3

4 CONTENTS

Rules for the take home assessment
The deadline for submission is 16:00 Thursday 29 Apr 2021
Please read the following carefully before proceeding to your take home assessment.
General
• Your Take Home Assessment (THA) this year has both an individual component and a group component.
• For the individual component, each student will submit a single R program file via the Submit your
take home assessment (individual component) Moodle link.
• For your group component, each group will submit, via the Submit your take home assessment (group
component) Moodle link, one PDF file containing your report and one R program file. Each group member must click their respective “Submit” buttons in order for the group’s submission to be successful and final.
Plagiarism
• The Turn-It-In® plagiarism detection system may be used to scan both your individual and group submissions for evidence of plagiarism and collusion.
• For your individual component, each of you will work by yourself and you are not allowed to discuss your work with anyone else, including your group members.
• For the group component, you will work together within your group and the usual plagiarism and collusion regulations do not apply to this form of interaction. However, they do apply to collusion with other groups or plagiarism of work from other groups or from other sources.
• Any plagiarism will normally result in zero marks for all students involved, and may also mean that your overall examination mark is recorded as non-complete. Guidelines as to what constitutes plagiarism may be found in Departmental Student Handbooks. The relevant exerpt from the Statistical Science handbook is also posted on Moodle.
Grading
• Your THA grade is the sum of your individual component and group component marks. All members of a group will be awarded the same mark for the group component of the assignment, except in exceptional circumstances (e.g. a member of a group did not contribute to the project).
• Your individual component will be marked out of 10. I will be looking for the correctness of your code, the readability of your program and elegance/efficiency of your implementation.
• Your group component will be marked out of 40, with allocation as follows:
– 30 marks for the written report. I will be looking for clarity of writing/figures/tables, appropriate
selection of materials, soundness of statistical reasoning and ability to explain your findings in a
non-technical language.
– 10 marks for the accompanied program. I will be looking for the correctness of your code, the
readability of your program and elegance/efficiency of your implementation.
• Late submission will incur a penalty unless there are extenuating circumstances (e.g. medical) supported by appropriate documentation. Penalties are set out in the latest editions of the Statistical Science
Department student handbooks, available from the departmental web pages. 5

6
CONTENTS
• Failure to submit this THA may mean that your overall examination mark is recorded as non-complete, i.e., you will not obtain a pass for the course.
• I may ask you to come and discuss your output with me.
• You will receive, via Moodle, feedback on your work and a provisional grade — grades are provisional
until confirmed by the Statistics Examiners’ Meeting in summer 2021.

Chapter 1
Data description
The data for the Take Home Assessment, available under the Data Files folder in the Take Home Assessment section on Moodle, contain information of about 6000 cells in a biological experiment. You are given three files cell_data.csv, cell_data_additional.csv and donor_info.csv.1
• In cell_data.csv, each row stores relevant information for a cell and has five comma-separated values:
– UID: this is the unique identifier for the cell.
– X1 and X2: these are two characteristic values of the cell. They are statistics summarising genetic
information in each cell in two most informative numbers.2
– CellType: this indicates whether the cell is a B cell (B) or a T cell (T). You don’t need to worry
about the biology here – just treat them as two different categories of cells.
– Donor: the identifier of the donor from whom the cell was taken.
• In cell_data_additional.csv contains information for some additional cells:
– UID: this is the unique identifier for the cell.
– X1 and X2: these are two characteristic values of the cell, defined in the same way as in
cell_data.csv.
• In donor_info.csv, each row has three comma-separated values:
– Donor: the identifier of the donor, they contain all the values in the Donor field of the cell_data.csv file (and some other donors not present in the cell_data.csv file).
– Age: age of the donor, measured in years.
– Gender: gender of the donor, Female or Male.
1Data used in this Take Home Assessment are adapted from https://science.sciencemag.org/content/367/6480/eaay3224.
2The exact way in which they were computed is irrelevant in this assignment. However, if you are interested, these numbers were obtained using a technique called UMAP, and you can learn more about UMAP and play with it at https://pair- code.github.io/understanding-umap/).
7

8 CHAPTER 1. DATA DESCRIPTION

Chapter 2
Individual component
For this individual component, you need to use cell_data.csv and cell_data_additional.csv. Your program must be based solely on the these two data files. Do not introduce other data into your work. Also, you do not need to investigate the source of the data further.
2.1 Task
The cell types of the cells in cell_data_additional.csv are unknown. Your task is to predict their cell types using the data in cell_data.csv.
Specifically, you should write a program, named impute.R, satisfying the following:
• Your program should read in cell_data.csv and cell_data_additional.csv in the current working directory.
• For each cell in cell_data_additional.csv, your program should look for the cell that is most similar to it in cell_data.csv and assign it with the same cell type. Here, by ‘most similar’ we mean the cell with the smallest Euclidean distance to it in terms of the (X1, X2) values (i.e. the two characteristic values).
• Your program should output a file named cell_type_predicted.txt to the current working directory. Each line of the output file cell_type_predicted.txt should contain a single letter B or T, corre- sponding to the predicted cell types of the cells in cell_data_additional.csv appearing in the same order.
2.2 What to submit
Please submit a single R program file named impute.R. The program should
• Be clearly laid out and well commented throughout.
• Assume that all data files are in the working directory, i.e., there should be no setwd() command or
reference to directories.
• Not use any packages that are not automatically loaded when R starts, i.e., it should not use a
library() command.
• Output a file named cell_type_predicted.txt to the working directory when run (I will run
source(‘impute.R’) in the working directory). The output file itself should not be included in
your submission.
• Be anonymous — i.e., there should be no mention of your name anywhere in your submission.
9

10 CHAPTER 2. INDIVIDUAL COMPONENT

Chapter 3
Group component
In this group component, you will need to use cell_data.csv and donor_info.csv. As a group, you will describe and analyse all the data in these two files by answering the problems below. Your analysis must be based solely on the data given to you. Do not introduce other data into your work. Also, you do not need to investigate the source of the data further.
3.1
1. 2.
3.
3.2
Group Tasks
Using techniques such as summary statistics and plots, describe the cell data in your report. Your description should include both univariate and multivariate analysis.
A biologist claims that there is a difference in the ratio between T cells and B cells among male versus female donors.
(a) Construct a contingency table for the number of T cells and B cells for different genders. (b) Perform an appropriate statistical test (at 5% significance level) of the biologist’s claim.
(c) Write one to two sentences to explain your findings in non-technical terms.
This biologist claims that there is a monotonic relationship between the age of the donor and B cell proportion in their cells.
(a) Compute the percentage of B cells in each donor’s cells in the dataset.
(b) Perform an appropriate statistical test (at 5% significance level) of the biologist’s claim.
(c) Write one to two sentences to explain your finding in non-technical terms.
What to submit
Please submit two files for your group component:
1. A PDF report named report.pdf. The report should be consistent with the following:
• You must use the Microsoft Word template provided on the Moodle page for your report and not change its font, font sizes or margins. If the template has been changed, up to 4% of marks can be lost and I will reformat the document to the template standard, to which the following point will
apply.
• The report must not be longer than 2 pages in A4 paper, including figures. I will not mark any
content beyond the page limit. Note that this doesn’t mean that you should aim to fill all the
space available to you. Writing more text doesn’t necessarily get you more marks.
• In addition, all groups submit an additional cover page (so that the total number of pages submitted
is three) where each group member briefly describes their contribution to the project. – You will need to agree this in your groups before submitting the report.
11

12
CHAPTER 3. GROUP COMPONENT

• • •
– If all group members agree that everyone contributed equally, then it is sufficient to write a single sentence to that effect on the third page, or alternatively you are very welcome to describe your own personal contribution to the project.
– Note that I will not mark this page, nor allocate different marks to different group members based on this. The purpose is to encourage you all to be mindful about contributing to this piece of groupwork.
– If a group reports that one or more of their members is not contributing fairly, please contact me by email in the first instance BEFORE SUBMISSION of the report.
The report must be capable of being read on its own: i.e., it should not refer to the R program but just contain data/plots from the program’s output.
Please save your report as a PDF file from Microsoft Word (FILE, Export, Create PDF/XPS). It must be written in clear comprehensible English with readable and well-labelled figures.
Your report should be anonymous — i.e., there should be no mention of group members’ names
anywhere in your submission.
2. An R program named analysis.R. Your R program should satisfy the following:
• It should be clearly laid out and well commented.
• You can assume that all data files are in the working directory, i.e., there should be no setwd()
command or reference to directories.
• It may use non-standard packages (remember to include a library() command to load them)
and you can assume that they are installed on my computer.
• It should create an output file named output.txt, containing only the statistics you use in
your report. Your program may investigate other things but the output file should contain all the information you use in your report. The output file itself should not be included in your submission. Instead, I will run your program using the source() function in R to generate your output file.
• Create a .pdf image file for each plot (or set of plots) that you use in your report, and no others. Name the image files fig1.pdf, fig2.pdf, . . . , following the same order in which they appear in your report. Do not submit the figures. They should be created when I use the source() function to run your program.
• Be anonymous — i.e., there should be no mention of group members’ names anywhere in your submission.