程序代写 MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2022

Introduction
(Module 1)
Statistics (MAST20005) & Elements of Statistics (MAST90058) Semester 2, 2022
1 Subject information 1

Copyright By PowCoder代写 加微信 powcoder

2 Review of probability 5
3 Descriptive statistics 13
4 Basic data visualisations 16
Aims of this module
• Brief information about this subject
• Brief revision of some prerequisite knowledge (probability)
• Introduce some basic elements of statistics, data analysis and visualisation
1 Subject information
What is statistics?
Let’s see some examples. . .
• Weather forecasts: Bureau of Meteorology
• Poll aggregation: FiveThirtyEight, The Guardian
• Climate change modelling: Australian Academy of Science
• Discovery of the (the ‘God Particle’): van Dyk (2014) • Smoking leads to lung cancer: Doll & Hill (1945)
• A/B testing for websites: Google and 41 shades of blue
Wei’s examples
• Astronomical time-series classification • Causal inference
Damjan’s examples
• Genome-wide association studies • Web analytics
• Skin texture image analysis
• Wedding ‘guestimation’

Goals of statistics
• Answer questions using data
• Evaluate evidence
• Optimise study design
• Make decisions
And, importantly:
• Clarify assumptions
• Quantify uncertainty
Why study statistics?
“The best thing about being a statistician is that you get to play in everyone’s backyard.” — . Tukey (1915–2000)
“I keep saying the sexy job in the next ten years will be statisticians…The ability to take data – to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it’s going to be a hugely important skill in the next decades. . . ” — , Google’s Chief Economist, Jan 2009
The best job
U.S. News Best Business Jobs in 2022:
1. Medical and Health Services Manager 2. Financial Manager
3. Statistician
CareerCast (recruitment website) of 2021: 1. Data Scientist
2. Genetic Counselor
3. Statistician
Subject overview
Statistics (MAST20005), Elements of Statistics (MAST90058)
These subjects introduce the basic elements of statistical modelling, statistical computation and data analysis. They demonstrate that many commonly used statistical procedures arise as applications of a common theory. They are an entry point to further study of both mathematical and applied statistics, as well as broader data science.
Students will develop the ability to fit statistical models to data, estimate parameters of interest and test hypotheses. Both classical and Bayesian approaches will be covered. The importance of the underlying mathematical theory of statistics and the use of modern statistical software will be emphasised.
Joint teaching
MAST20005 and MAST90058 share the same lectures but have separate tutorials and lab classes. The teaching and assessment material for both subjects will overlap significantly.
Subject website (LMS)
• Full information is on the subject website, available through the Learning Management System (LMS).
• Only a brief overview is covered in these notes. Please read all of the info on the LMS as well.
• New material (e.g. problem sets, assignments, solutions) and announcements will appear regularly on the LMS.

• This subject introduces basic statistical computing and programming skills.
• We make extensive use of the R statistical software environment.
• Knowledge of R will be essential for some of the tutorial problems, assignment questions and will also be examined.
• We will use the RStudio program as a convenient interface with R. Staff contacts
Subject coordinator / Lecturer
Dr Lecturer
Dr Tutorial coordinator
Dr Rekha the LMS for details of consultation times
Discussion forum
• Access via the LMS
• Post any general questions on the forum • Do not send them by email to staff
• You can answer each others’ questions
• Staff will also help to answer questions
Student representatives
Student representatives assist the teaching staff to ensure good communication and feedback from students. See the LMS to find the contact details of your representatives.
What is Data Science?
Data science is a ‘team sport’
Read more at: Data science is inclusive

How to succeed in statistics / data science?
• Get experience with real data
• Develop your computational skills, learn R
• Understand the mathematical theory
• Collaborate with others, use the discussion forum
This subject is challenging
• It is mathematical
– Manipulating equations – Calculus
– Probability
• But the ‘real’ world also matters
– Context can ‘trump’ mathematics – More than one correct answer
– Often uncertain about the answer
In 2017: 341 students
60% Bachelor of Commerce 24% Bachelor of Science
6% Master of Science (Bioinformatics) 10% 8 other degrees/categories
What are your strengths and weaknesses?
Get extra help
• Your classmates
• Discussion forum
• Consultation times • Textbooks
1. Log in to the discussion forum
2. Install RStudio on your computer
3. Start reading the R introduction and reference guide before week 2
The best way to learn statistics is by solving problems and ‘getting your hands dirty’ with data.
We encourage you to attend all lectures, tutorial and computer labs to get as much practice and feedback as possible. Good luck!

2 Review of probability
Why probability?
• It forms the mathematical foundation for statistical models and procedures • Let’s review what we know already. . .
Random variables (notation)
• Random variables (rvs) are denoted by uppercase letters: X, Y , Z, etc.
• Outcomes, or realisations, of random variables are denoted by corresponding lowercase letters: x, y, z, etc.
Distribution functions
• The cumulative distribution function (cdf) of X is
F(x)=Pr(X 􏰀x), −∞x)=1−F(x)iscalledatailprobabilityofX
• F(x)increasesto1asx→∞anddecreasesto0asx→−∞
• If the rv has a certain distribution with pdf f (or pmf p), we write
X∼f (orX∼p)
Example: Unemployment duration
A large group of individuals have recently lost their jobs. Let X denote the length of time (in months) that any particular individual will stay unemployed. It was found that this was well-described by the following pdf:
otherwise.
􏰒12e−x/2, x ≥ 0,

−2 0 2 4 6 8 10 0 5 10 15
Clearly, f(x) ≥ 0 for any x and the total area under the pdf is:
Pr(−∞ < X < ∞) = 12 e−x/2 dx = 12 −2e−x/2 00 = 1. The probability that a person in the population finds a new job within 3 months is: Example: Received calls 􏰑3 􏰔􏰕3 Pr(0 􏰀 X 􏰀 3) = 21 e−x/2dx = 12 −2e−x/2 The number of calls received by an office in a given day, X, is well-represented by a pmf with the following expression: e−5 5x p(x) = x! , x ∈ {0,1,2,...}, wherex!=1·2···(x−1)·xand0!=1. Forexample, Pr(X = 1) = e−5 5 = 0.03368 e−5 53 Pr(X =3)= 3·2·1 =0.1403 0 5 10 15 20 0 5 10 15 To show that p(x) is a pmf we need to show 􏰃 p(x) = p(0) + p(1) + p(2) + · · · = 1. x=0 Since the Taylor series expansion of ez is 􏰂∞i=0 zi/i!, we can write Moments and variance 􏰃∞ e − 5 5 x =e5e−5 =1. • The expected value (or the first moment) of a rv is denoted by E(X) and 􏰃 xp(x) x=−∞ 􏰃 xkp(x) x=−∞ • More generally for a function g(x) we can compute 􏰃 g(x)p(x) x=−∞ 􏰑∞ −∞ Letting g(x) = xk gives the moments. g(x)f(x) dx E(X) = E(X ) = (discrete rv) (continuous rv) (discrete rv) (continuous rv) (discrete rv) (continuous rv) x f (x) dx • Higher moments, μk = E(Xk), for k ≥ 1, can be obtained by E(Xk) = E(Xk) = E(g(X)) = E(g(X)) = 0.6 0.8 1.0 • The variance of X is defined by and the standard deviation of X is sd(X) = 􏰄var(X) • “Computational” formula: var(X) = E(X2) − {E(X)2} Basic properties of expectation and variance • For any rv X and constant c, • ForanytworvsX andY, • For any two independent rvs X and Y , var(cX) = c2 var(X) • More generally, where cov(X, Y ) is the covariance between X and Y Covariance • Definition of covariance: cov(X,Y)=E{(X−E(X))(Y −E(Y))} • Specifically, for the continuous case 􏰑∞􏰑∞ −∞ −∞ where f (x, y) is the bivariate pdf for pair (X, Y ). (x − E(X))(y − E(Y ))f(x, y) dx dy • “Computational” formula: Correlation cov(X, Y ) = E{(X − E(X))(Y − E(Y ))} = E(XY ) − E(X) E(Y ) 􏰖 2􏰗 var(X)=E (X−E(X)) E(cX) = c E(X), E(X + Y ) = E(X) + E(Y ) var(X + Y ) = var(X) + var(Y ) var(X + Y ) = var(X) + var(Y ) + 2 cov(X, Y ) cov(X, Y ) = • If cov(X, Y ) > 0 then X and Y are positively correlated • If cov(X, Y ) < 0 then X and Y are negatively correlated • If cov(X, Y ) = 0 then X and Y are uncorrelated • The correlation between X and Y is defined as: ρ=cor(X,Y)= cov(X,Y) , −1􏰀ρ􏰀1 sd(X ) sd(Y ) • When ρ = ±1 then X and Y are perfectly correlated Moment generating functions • A moment generating function (mgf) of a rv X is MX(t) = E􏰆etX􏰇, t ∈ (−∞,∞) • It enables us to generate moments of X by differentiating at t = 0 MX′ (0)=E(X) M(k)(0) = E(Xk), k ≥ 1 X • The mgf uniquely determines a distribution. Hence, knowing the mgf is the same as knowing the distribution. • If X and Y are independent rvs, MX+Y (t) = E{et(X+Y )} = E{etX } E{etY } = MX (t)MY (t) i.e. the mgf of the sum is the product of individual mgfs. Bernoulli distribution • X takes on the values 1 (success) or 0 (failure) • X ∼ Be(p) with pmf • Properties: Binomial distribution • X ∼ Bi(n, p) with pmf • Properties: Poisson distribution • X ∼ Pn(λ) with pmf • Properties: p(x)=px(1−p)1−x, x∈{0,1} E(X) = p var(X) = p(1 − p) MX(t)=pet +1−p p(x)= x p (1−p) , x∈{0,1,...,n} E(X) = np var(X) = np(1 − p) MX(t)=(pet +1−p)n p(x) = e x! , x ∈ {0,1,...} E(X) = var(X) = λ MX(t) = eλ(et−1) • It arises as an approximation to Bi(n, p). Letting λ = np gives 􏰌n􏰍 x n−x −λλx as n → ∞ and p → 0. p(x)=xp(1−p) ≈ex! 9 Uniform distribution • X∼Unif(a,b)withpdf • Properties: f(x)= 1 , x∈(a,b) b−a E(X) = (a + b) 2 var(X) = (b − a)2 12 MX(t) = etb − eta t(b−a) • If b = 1 and a = 0, this is known as the uniform distribution over the unit interval. Exponential distribution • X ∼ Exp(λ) with pdf • It approximates “time until first success” for independent Be(p) trials every ∆t units of time with p = λ∆t and • Properties: f(x) = λe−λx, x ∈ [0, ∞) E(X) = 1/λ var(X) = 1/λ2 MX(t)= λ λ−t • It is famous for being the only continuous distribution with the memoryless property: Pr(X>y+x|X>y)=Pr(X>x), x≥0, y≥0.
Normal distribution
• X∼N(μ,σ2)withpdf
1 − (x−μ)2
f(x)= √ e 2σ2 , x∈(−∞,∞), μ∈(−∞,∞), σ>0
• It is important in applications because of the Central Limit Theorem (CLT)
• Properties:
• When μ = 0 and σ = 1 we have the standard normal distribution. • IfX∼N(μ,σ2),
Z = X − μ ∼ N(0, 1) σ
E(X) = μ var(X) = σ2
MX(t) = etμ+t2σ2/2

Let X be a continuous rv. The pth quantile of its distribution is a number πp such that p = Pr(X 􏰀 πp) = F(πp). In other words, the area under f(x) to the left of πp is p:
• πp is also called the (100p)th percentile
• The 50th percentile (0.5 quantile) is the median, denoted by m = π0.5
• The 25th and 75th percentiles are the first and third quartiles, denoted by q1 = π0.25 and q3 = π0.75
Example: Weibull distribution
The time X until failure of a certain product has the pdf 3×2 −(x/4)3
The cdf is
Then π0.3 satisfies 0.3 = F(π0.3). Therefore,
Law of Large Numbers (LLN)
f(x)dx = F(πp)
f(x)= 4 e , x∈(0,∞). F(x)=1−e−(x/4)3, x∈(0,∞)
1 − e−(π0.3/4)3 = 0.3 ⇒ ln(0.7) = −(π0.3/4)3
⇒ π0.3 = −4(ln 0.7)1/3 = 2.84.
Consider a collection X1, . . . , Xn of independent and identically distributed (iid) random variables with E(X) = μ < ∞, then with probability 1 we have: n Xi→μ,asn→∞. The LLN ‘guarantees’ that long-run averages behave as we expect them to: Central Limit Theorem (CLT) Consider a collection X1,...,Xn of iid rvs with E(X) = μ < ∞ and var(X) = σ2 < ∞. Let, ̄ 1 􏰃n follows a N(0, 1) distribution as n → ∞. This is an extremely important theorem! X=n Xi. i=1 It provides the ‘magic’ that will make statistical analysis work. Let X1,...,X25 be iid rvs where Xi ∼ Exp(λ = 1/5). Recall that E(X) = 5. Thus, the LLN implies Moreover, since var(X) = 1/λ2 = 25, we have X≈N λ,nλ2 =N 5,25 Is n = 25 large enough? A simulation exercise Then represent the distribution of {x ̄(b), b = 1, . . . , B} by a histogram. X ̄ → E ( X ) = 5 . ̄ 􏰌11􏰍 􏰌52􏰍 Generate B = 1000 samples of size n. For each sample compute x ̄. The continuous curve is the normal N(5,52/n) distribution prescribed by the CLT. x(1), . . . , x(1) → x ̄(1) 1n x(2), . . . , x(2) → x ̄(2) 1n x(B), . . . , x(B) → x ̄(B) The distribution of X ̄ approaches the theoretical distribution (CLT). Moreover it will be more and more concentrated around μ (LLN). To see this, note that var(X ̄) = σ2/n → 0 as n → ∞. n = 1 n= 5 −10 −5 0 5 10 15 20 0 5 10 15 n = 25 n = 100 2 4 6 8 10 2 4 6 8 10 Sample 1: Sample 2: A simulation exercise 0.0 0.1 0.2 0.3 0.4 0.00 0.05 0.10 0.15 0.0 0.2 0.4 0.6 0.8 0.00 0.10 0.20 Challenge problem Let X1,X2,...,X25 be iid rvs with pdf f(x) = ax3 where 0 < x < 2. 1. What is the value of a? 2. Calculate E(X1) and var(X1). 3. What is an approximate value of Pr(X ̄ < 1.5)? 3 Descriptive statistics Statistics: the big picture Example: Stress and cancer • An experiment gives independent measurements on 10 mice • Mice are divided in control and stress groups • The biologist considers two different proteins: – Vascular endothelial growth factor C (VEGFC) – Prostaglandin-endoperoxide synthase 2 (COX2) Mouse Group VEGFC COX2 1 Control 0.96718 14.05901 2 Control 0.51940 6.92926 3 Control 0.73276 0.02799 4 Control 0.96008 6.16924 5 Control 1.25964 7.32697 6 Stress 4.05745 6.45443 7 Stress 2.41335 12.95572 8 Stress 1.52595 13.26786 9 Stress 6.07073 55.03024 10 Stress 5.07592 29.92790 Data & sampling • The data are numbers: • The model for the data is a random sample, that is a sequence of iid rvs: X1,X2,...,Xn This model is equivalent to random selection from a hypothetical infinite population. • Thegoalistousethedatatolearnaboutthedistributionoftherandomvariables(and,therefore,thepopulation). • A statistic T = φ(X1,...,Xn) is a function of the sample and its realisation is denoted by t = φ(x1,...,xn). • Note: the word “statistic” can also be used to refer to both the realisation, t, as well as the random variable, T. Sometime need to be more specific about which one is meant. • A statistic has two purposes: – Describe or summarise the sample — descriptive statistics – Estimate the distribution generating the sample — inferential statistics • A statistic can be both descriptive and inferential, it depends on how you wish to use/interpret it (see later) • We now introduce some commonly used descriptive statistics. . . Moment statistics Sample mean=x ̄= n Sample variance = s2 = n − 1 Sample standard deviation = s = These are ‘sample’ or ‘empirical’ versions of moments of a random variable Empirical means ‘derived from the data’ Order statistics Arrange the sample x1 , . . . , xn in order of increasing magnitude and define: x(1) 􏰀x(2) 􏰀···􏰀x(n) Then x(k) is the kth order statistic. Special cases: • x(1) is the sample minimum • x(n) is the sample maximum For the example data, What is x(3.25)? x(1) = 0.52, x(2) = 0.73, ..., x(10) = 6.07 (xi − x ̄)2 = 3.98761 3.98761 = 1.9969 Let it be 0.25 of the way from x(3) to x(4), x(3.25) = x(3) + 0.25 · (x(4) − x(3)) = 0.96 + 0.25 · (0.97 − 0.96) = 0.9625 In other words, define it via linear interpolation. Exercise: verify that x(7.75) = 3.6480 Why do this? It allows us to define. . . Sample quantiles General definition (‘Type 7’ quantiles): Special cases: πˆp =x(k), wherek=1+(n−1)p Sample median = πˆ0.5 = x(5.5) = 1.26 + 1.53 = 1.395 2 Sample 1st quartile = πˆ0.25 = x(3.25) = 0.9625 Sample 3rd quartile = πˆ0.75 = x(7.75) = 3.6480 πˆ0.25 and πˆ0.75 contain about 50% of the sample between them Interquartile range = πˆ0.75 − πˆ0.25 = 2.685 Note: Type 7 quantiles are the default in R, but there are many alternatives! Don’t worry too much about the differences between them. We will discuss this in a bit more detail later in the semester. Some descriptive statistics in R > x <- round(VEGFC, digit = 2) [1] 0.97 0.52 0.73 0.96 1.26 4.06 2.41 1.53 6.07 5.08 > sort(x) # order statistics
[1] 0.52 0.73 0.96 0.97 1.26 1.53 2.41 4.06 5.08 6.07
> summary(x) # sample mean & sample quantiles
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.5200 0.9625 1.3950 2.3590 3.6475 6.0700
> var(x) # sample variance
[1] 3.98761
> sd(x) # sample standard deviation
[1] 1.9969
> IQR(x) # interquartile range
Frequency statistics
Can also define empirical versions of pdf, pmf, cdf Will see in the next section. . .

4 Basic data visualisations
Graphical summary of data from a single variable
Main box: πˆ0.25, πˆ0.5, πˆ0.75
‘Whiskers’: x(1), x(n) (but R does something more complicated, see tutorial problems)
Convenient way of comparing data from different groups
Example: VEGFC (Stress vs Control)
Scatter plot
For comparing data from two variables (usually continuous)
0 10 20 30 40 50
Control Stress

Empirical cdf
The sample cdf, or empirical cdf, is defined as
where I(·) is the indicator function (I(xi 􏰀 x) has value 1 if xi 􏰀 x and value 0 if xi > x).
For example, for the previous data,
ecdf(VEGFC)
Fˆ(2)= 􏰃I(xi 􏰀2)=
It has the form of a discrete cdf. However, it will approximate the cdf of a continuous variable if the sample size is large. The following diagram shows cdfs based on n = 50 and n = 200 observations sampled from a standard normal distribution, N(0, 1).
-3 -2 -1 0 1 2 3 x
Empirical pmf
If the underlying variable is discrete we use the pmf corresponding to the sample cdf Fˆ
-3 -2 -1 0 1 2 3 x
pˆ ( x ) = n
For example, the following shows pˆ(x) of size n = 15 from Pn(5) (left) and the true pmf p(x) of Pn(5) (right)
2 3 4 5 6 7 8 9 10 x
0 1 2 3 4 5 6 7 8 9 10 11 12 13 x
I ( x i = x )
p(x) 50 100
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0

Histograms and smoothed pdfs
If the underlying variable is continuous we would prefer to obtain an approximation of the pdf. There are several approaches that can be used:
1. Histogram, fˆ (h is the bin length). First divide the entire range of values into a series of small intervals (bins) h
and then count how many values fall into each interval. For interval [a, b), where b − a = h, draw a rectangle with height:
fh(x)= hn 2. Smoothed pdf, fˆ (h is the ‘bandwidth parameter’),
I(a􏰀xi 0
True density (solid black curve), smoothed pdf (red dashed curve)
0.0 0.1 0.2 0.3 0.4
0.0 0.1 0.2 0.3 0.4
f (x) f (x)

Quantile-quan

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com