2. Run the code in the ForProb2.R file associated with this HW. In this simulation:
(a) How many genes, on average, are differentially expressed in treated samples (treatedsamps) compared to untreated samples (untreatedsamps)
(b) Check the dispersion estimates or warnings from DESeq, are they what you expect? Why or why not? Produce a plot showing the dispersion estimates using the plotDispEsts command as in lecture.
(c) Now run the code many times to produce many estimates of sensitivity and specificity and produce a boxplot of your sensitivity and specificity estimates.
3. Restated the RNA-Seq data is a comparative RNA-seq analysis of different mouse strains, specifically the C57BL/6J strain versus the DBA/2J strain.
(a) Check for the presence of overdispersion relative to the Poisson distribution. Report results as you see fit, e.g. with a figure or a table with explanations and/or captions.
(b) Is the overdispersion caused by excess zeroes? If so, produce a plot or table to demonstrate this point.
(c) Examine the pdata variable and note the experiment.number indicates a kind of experimental batch effect that may affect the results in the previous answer.
(d) Construct a new design matrix that include factor variables to handle the experiment.number. How many factor variables are necessary.
(f) Rerun edgeR where we also include factor variables to handle the experiment.number as well as the strain covariate. Using a p < 0.01, how many genes show a significant strain effect? Choose a gene that was signiffcant for strain only model but was NOT signiffcant for strain after adjusting for experiment.number and show/explain via an informative figure what may be happening in this scenario.