Statistical Models for Biology
BIO 415 and BIO 514
1
What is this class about?
2
The application of
statistical analysis to
biological data
Introduction to
statistical inference
3
• A deduction is any statement about
nature that derives from a theory (often
via mathematical reasoning).
• An induction is any statement coming
from observations that could change
our theories about how nature works.
4
Theories about
the (unknown)
real world
Observations
(data)
Deduction
Induction
(or Inference)
The structure of scientific reasoning
5
Deduction and induction applied to senescence
Theory Observation
Large animals live longer, and they
also have lower metabolic rates.The Rate of Living theory
predicts inverse relation
between metabolic rate and
lifespan.
Evolutionary Senescence
Theory predicts that animals
with high predation threat
senesce faster.
When separated from size,
metabolic rate is a poor
predictor of lifespan.
6
This theory can be tested by
comparing populations of the
same species living in different
environments.
Deduction: Opossums under higher
predation threat senesce faster
This conclusion is deduced from
Evolutionary Senescence Theory.
7
Mainland opossums have
many predators
Sapelo island
opossums have
few predators
Virginia opossum
Induction: Measure average lifespan of
30 island and 30 mainland opossums.
Mainland average
lifespan is 48 months
8
Island average
lifespan is 54 months
Based on these data, is lifespan
shorter when predation threat is high ?
A. Yes
B. No
On the mainland (many predators) average
lifespan is 48 months.
On the island (few predators) average
lifespan is 54 months.
Question 1
Randomness in observations
makes induction difficult
A deterministic process always
produces the same outcome (rare to
non-existent in nature)
A random process is not perfectly
predictable (typical in nature).
10
The average lifespan of a sample of opossums
results from a random process
• Even if Evolutionary Senescence
Theory is incorrect, average lifespan
will not always be the same.
• Even if Evolutionary Senescence
Theory is correct, average lifespan
will not always differ.
11
Based on these data, is lifespan
shorter when predation threat is high ?
A. Yes
B. No
On the mainland (many predators) average
lifespan is 48 months.
On the island (few predators) average
lifespan is 54 months.
Question 1
A good answer can only be given using
the methods of probability and
statistics, to be taught in this class.
In the presence of randomness…
Deductions are statements about
probability: i.e., quantitative
measures of our certainty that an
event will occur.
13
In the presence of randomness…
Induction relies on statistics: the
method of saying something about the
real world based on observations
influenced by randomness.
14
Theories about
the (unknown)
real world
Observations
(data)
Deduction
Induction
(or Inference)
The structure of scientific reasoning
15
(Probability)
(Statistics)
Outline of statistical reasoning
Hypothesis 1:
Predation has no
effect on senescence
Hypothesis 2:
Higher predation leads
to faster senescence
Two hypotheses:
Should we reject Hypothesis 1 in
favor of Hypothesis 2?
16
Question:
Do animals that experience high levels of predation
evolve shorter lifespans than animals that do not?
17
Answer by studying effect of predation on opossum lifespan.
To find out, measure the difference in average
lifespan for 30 mainland and 30 island opossums.
18
Mainland
opossums have
many
predators
Sapelo island
opossums
have few
predators
Before gathering data, make a deduction
• Step 1: Assume (temporarily) that
the simple hypothesis (no effect of
predation on senescence) is true.
• Step 2: Under this simple hypothesis,
calculate the probabilities of all
possible outcomes for the data.
19
Results of probability calculations
(assuming no difference in lifespan)
20
Mainland opossums
live longer
Island opossums
live longer
We gather data: island lifespan is 6
months longer than mainland lifespan
21
Observed lifespan difference
Mainland opossums
live longer
Island opossums
live longer
6
Observed lifespan difference
If there is no difference in senescence,
what is the probability of seeing a
difference this great or greater?
22
The shaded area
gives the
probability
Observed lifespan difference
If there is no difference in senescence,
what is the probability of seeing a
difference this great or greater?
23
The shaded area
gives the
probability
The probability is quite low: only 2.9%
A difference of 6 months is a surprising
result, if there is no true difference in
senescence
• How surprising? Only a 2.9% chance of a
difference this great or greater.
• If this is too surprising, you can reject the
hypothesis of no difference in favor of the
hypothesis of longer island lifespans.
• You might be wrong!
24
Two key points
• Statistical inference is not possible
without first carrying out a deductive
probability calculation.
• No statistical inference is completely
certain. Randomness might lead us to a
wrong conclusion.
25
Theories about
the (unknown)
real world
Observations
(data)
Deduction
Induction
(or Inference)
The structure of scientific reasoning
26
(Probability)
(Statistics)
Summary
• Statistics is about inference from random
phenomena to say something about the real
world.
• Statistics requires first deducing probabilities
on the basis of some theory about the real
world.
• No statistical inference is 100% correct.
27
Course objectives
and administration
28
Learning goals
• Use probability and statistics to reason about the natural world.
• Perform statistical analyses using the software environment R.
• Use data to estimate parameters that characterize biological
phenomena of interest.
• Perform rigorous tests of hypotheses about parameters.
• Choose the most appropriate statistical method for a data
analysis problem.
• Write complete and concise reports of the results of statistical
analyses.
• Recognize and avoid practices that lead to inaccurate statistical
claims.
29
Course components
• Lectures: STPV 324; Recordings will
be posted on Canvas
• Labs: LSE 236
30
Class communication
• Email questions to me or TAs
• Office hours in-person or via Zoom
• Canvas:
• Receive and turn in assignments
• Take quizzes and final exam
• View lecture slides and recordings
31
Exams and quizzes
• Five online quizzes throughout the semester.
• One online final exam.
• Two take-home exams (only one for BIO 514).
32
Independent project
(BIO 514 only)
• Statistically analyze data of your
choice to answer a biological
question.
• Apply the skills learned in class.
• Mostly done in the latter half of the
semester.
33
Recommended textbook
34
• The Analysis of
Biological Data,
Whitlock & Schluter.
• Reading assignments
posted on Canvas.
• Two copies on
reserve in Noble.
Any edition is OK
(1st, 2nd, or 3rd)
Assessment: BIO 415
Component Points
Lab reports 400 (40 per report)
Quizzes 200 (40 per quiz)
Take-home exam 1 125
Take-home exam 2 125
Final exam 100
Polls 50 (~0.5 per question)
Total 1000
35
Assessment: BIO 514
Component Points
Lab reports 400 (40 per report)
Quizzes 200 (40 per quiz)
Take-home exam 1 110
Independent project 150
Final exam 90
Polls 50 (~0.5 per question)
Total 1000
36
R: A software environment for
statistics and graphics
• Advantages of R
• Free.
• Works on all platforms (Mac, Windows, Linux…).
• Can do virtually any statistical procedure.
• Powerful graphics.
• Script-based: Easy to save, edit, and re-use
analyses.
• Challenges of R
• Script-based: Must learn the R language.
37
Office hours
• Can attend in person (ISTB1 304)…
• …or via Zoom (link on Canvas site)
• See Canvas for specific dates and
times
• TAs will also hold regular office hours
38
Things to do by next week
• Read the syllabus (and take the quiz).
• Register for iClicker.
• Download and install R and RStudio.
• Complete the first day survey.
• Labs start next week.
39