The Chi-squared distribution
We have seen that body height for example tends to follow a normal distribution. The functions to create random numbers that follow such a distribution, and to calculate quantiles and probabilities are rnorm(), qnorm(), and pnorm(). For the Chi-squared distibution, we use rchisq(), qchisq(), and pchisq().
Different probability distributions require different parameters, e.g. the normal distribution requires a mean and a standard deviation, the Chi-squared distribution only requires the degrees of freedom. To illustrate this, plot a histogram of 1000 random numbers that follow a Chi-squared distribution with one degree of freedom.
In a Chi-square distribution with 3 degrees of freedom, between what two values will 90 % of the values fall? Note that this question is analogous to the ones we had in earlier labs using the normal distribution!
Remember the Poisson distribution?
The poisson distribution requires only one parameter, ‘lambda’, which represents both the mean and the variance at the same time. Using rpois(), create 10 random numbers that follow a poisson distribution with a lambda = 3.
What is the characteristic of the variable you have just created? (Go back to the week two lecture slides and check the types of variables)
Is a value of 23 or larger a rare occurrence in a population of poisson distributed numbers with lambda = 20? Calculate _how_ rare it is and comment.
The Chi-squared test (1)
This test is applied when your response variable is a COUNT and you have one CATEGORICAL explanatory variable with two or more categories (levels).
The null hypothesis H₀ states that there is no association (or correlation) between the variables.
The test relies on comparing the observed counts in each category with the expected (modelled) counts. The modelled counts for each cell of the table are calculated as per formula in the slides.
You want to find out whether sex relates to the number of bullying cases in a company. You collected the following data and summarised it in a contingency table:
sex bullied not bullied row totals
————— ——— ————- ————
MALE 23 589 612
FEMALE 9 321 330
COLUMN TOTALS 32 910 942
Create a data frame based on this contingency table. Note that this does _not_ correspond to the long format we normally use (for good reasons).
Now perform a Chi-squared test, once manually (using R), and once using the appropriate R function. Refer to the slides for the formula to calculate the Chi-squared value.
Compare your results and interpret them. Conclude with a full sentence reporting the test statistic, degrees of freedom and the associated _p_-value of your test.
The Chi-squared test (2)
The beaks of red crossbills (_Loxia curvirostra_) feature crossed bills that act like diagonal pliers, enabling them to spread the scales of conifer cones in order to extract the seed with their tongue. The chirality (or ‘handedness’) of the bill-crossing is not uniform. The birds can have the tip of the upper bill either right or left of the lower bill (https://en.wikipedia.org/wiki/Red_crossbill).
You hypothesise that right- and left-billed birds occur at a 1:1 ratio and conduct a field survey during which you observe 1732 right-billed and 1865 left-billed crossbills. In this example, you only have a single binary variable, so your null hypothesis will simply be ‘the occurrence of left vs. right-billed birds is not significantly different from a 1:1 ratio’. Are there significantly more left-billed crossbills? Construct a contingency table, conduct the test, and interpret the result in a complete sentence.
A real world scenario
You are studying a viral disease. You have collected information on how many males and females got infected in a sample of 4138 people. You also collected information on the viral count in blood samples (a continuous variable) of those that got infected (males and females).
In this study, what question could you answer using a Chi-squared test?
What question could you answer using a two-sample (t- or Wilcoxon) test?
What have I learnt from this lab?
I got that, what’s more?
For the crossbill example above, imagine a different scenario, in which you have reason to believe that the right/left crossing arrangement occurs at a 1:2 ratio and in your survey you count 344 right-billed and 656 left-billed specimens. Consult the chisq.test helpfile and read through the arguments list to work out how to incorporate your prior beliefs.
Draw two simulated poisson distributions (their histograms): one for counting cars that pass Queen Street every minute between 2 and 3 am, and one for counting cars that pass the Harbour Bridge northbound between 5 and 6 pm. What observations do you make?