代写 R algorithm GUI html Java lisp math matlab python assembly graph statistic software theory STAT 513/413: Lecture 1 Some first things

STAT 513/413: Lecture 1 Some first things
(An extended course outline, and then an immediate start)

Computational Statistics or Statistical Computing?
Computational Statistics: this may mean various things, and in the textbook of Rizzo it means pretty much the right thing – but to some minds it may suggest that it is all about obtaining the results of various statistical methods – regression, ANOVA, etc. – in some computing environment (with packaged software)
This is not what this course intends to do. Its proper name is
Statistical Computing
which is to mean: algorithmic and computational components useful in implementing statistical and similar methods
So, once again: the course will not cover how to obtain results of this or that statistical method: this is appropriately addressed in topical courses (regression, design and analysis of experiments, sample survey, survival analysis,…)
1

Topics: nothing else than in the Calendar
• some quick refreshment of what students were supposed to learn in some computing course they once took
• taking it possibly to the next level, the “higher” one as introduced by languages/environments like Matlab or S, characteristic by using vectorial and matrix algebra and widely present in professional statistical calculations nowadays
• use of “random” numbers in algorithms, that is, the introduction to simulation (Monte Carlo) technology
• methods of solving equations and optimization problems: general, and then with certain specifics in the statistical context
• introduction to convex optimization • …
2

STAT 513: A graduate course…
… for Statistics and related orientations (…)
Is there a need for a “remedial” computing course?
Yesss!
And then, if others also want to take it, sure, why not
Under the condition that everybody is familiar with probability, and knows a bit about statistics – in a typical coverage of an introductory course…
…and also already completed some computing course sometime in past, so this is not their first contact with computing
… albeit it may be the first time they get into so-called scientific computing
3

The evolution of calendar descriptions
STAT 512 – Techniques of Mathematics for Statistics
Introduction to mathematical techniques commonly used in theoretical Statistics, with applications. Applications of diagonalization results for real symmetric matrices, and of continuity, differentiation, Riemann-Stieltjes integration and multivariable calculus to the theory of Statistics including least squares estimation, generating functions, distribution theory. Prerequisite: consent of Department.
STAT 513 – Computational Statistics Statistical Computing
Introduction to contemporary computational culture: reproducible coding, literate programming. Monte Carlo methods: random number generation, variance reduction, numerical integration, statistical simulations. Optimization: linear search, gradient descent, Newton-Raphson, and their specifics in the statistical context like the method of scoring, EM algorithm. Fundamentals of convex optimization with constraints. Prerequisites: Consent of the Department.
STAT 413 – Computational Statistics Statistical Computing
Introduction to contemporary computational culture: reproducible coding, literate programming. Monte Carlo methods: random number generation, variance reduction, numerical integration, statistical simulations. Optimization: linear search, gradient descent, Newton-Raphson, and their specifics in the statistical context like the method of scoring, EM algorithm. Fundamentals of convex optimization with constraints. Prerequisites: STAT 265 or equivalent and one of CMPUT 174 or 272.
4

The textbook
5

The chosen computing environment
Starting low: Machine code, assembly language Getting higher: Fortran, Pascal
Best of both worlds: C(++), Java, Python, … Getting even higher: Matlab, Lisp, S,…
We like in particular the last ones, because of the scientific computing character mentioned above: the onus is not on “industrial-strength” selection, but rather on obtaining results quickly – but reliably
6

R
R started as an open source implementation of S; now they say it is a system for statistical computation and graphics based on S.
(For more information, see Rizzo 1.2)
And why R, and not…? See, for instance,
Grolemund and Wickham https://r4ds.had.co.nz/introduction.html Incidentally, HUGE number of texts, many freely available
And available – for free – on any platform
Homework: get all the following (ASAP):
7

Necessities
• Access to computer(s) (to which one can get all the below) e.g. own laptop
• Internet access (with Google within reach) • A working installation of R
possibly with a GUI (like that for OS X on Mac)
or within an IDE (like RStudio or ESS within Emacs) • A programming editor of your choice:
e.g. Emacs… or RStudio?
or Vim? Atom? Notepad? TextEdit? Gedit? (all free…)
a useful feature: with a formatting extension for code (like ESS for Emacs)
• A document editor of your choice: e.g. TeX (LaTeX)
or Microsoft Word? Google Drive?
8

Modi operandi in R: command line
• command line: you type commands which are then immediately executed; if there is an error, you simply retype the last command, or very few preceding ones
What is sought: a picture of 20 points roughly following a line;
x coordinates are sampled from a uniform distribution on (0,10), y coordinates depend linearly on x + normal error
> for (k in 1:20)
+ x[k]=runif(1,0,10)
Error: object ’x’ not found
> x=1
> for (k in 1:20)
+ x[k]=runif()
Error in runif() : argument “n” is missing, with no default
> for (k in 1:20)
+ x[k]=runif(1,0,10)
> y=0
> for (k in 1:20)
+ y[k]=2+3*x[k]+rnorm(1)
> plot(x,y)
9

In other words…
…we fooled around until we obtained something like this
2 4 6 8 10
x
10
y
10 15 20 25 30

Modi operandi in R continued: script
• script: already a “program”, consisting of several commands in a file – one may edit by a programming editor, and then execute its contents as a whole; if there is an error, you have to correct it via editing the file, and then execute again
Executing can be done by a command source() in R
or, IDE, Integrated Development Environment, may streamline this
A convenient (at least for some) way of creating scripts is to take a command line session and edit it. The result of this, the file my.R, may look as follows:
x=1
> y=0
for (k in 1:20)
+ x[k]=runif(1,0,10)
+ y[k]=2+3*x[k]+rnorm(1)
plot(x,y)
You then do the following in R…
> source(“my.R”)
11

… and see what happens
> source(“my.R”)
Error in file(filename, “r”, encoding = encoding) :
cannot open the connection
In addition: Warning message:
In file(filename, “r”, encoding = encoding) :
cannot open file ’my.R’: No such file or directory
Oops! has to be corrected… the file my.R has to be placed right
> source(“my.R”)
Error in source(“my.R”) : my.R:2:1: unexpected ’>’
1: x=1
2: >
^
Errors, errors… but after a while, you get to something like this
x=1
y=0
for (k in 1:20)
x[k]=runif(1,0,10)
y[k]=2+3*x[k]+rnorm(1)
plot(x,y)
12

… but it won’t work as desired
(Try it if you don’t see why)
After perhaps a longer while, you will finally come to this
x=1
y=0
for (k in 1:20) {
x[k]=runif(1,0,10)
y[k]=2+3*x[k]+rnorm(1) }
plot(x,y)
which means you are ready to submit, aren’t you?
13

And this is why we introduced STAT 513…
… so that we don’t have to look at such creations any more
14

Any questions?
What needs to be done:
1. If in a need to learn basics of R, then study Chapter 1 of Rizzo, and possibly also some online source (and right for the next lecture; at least, you need to understand the code I wrote today in the lecture).
We will have some intro to R – but rather “advanced” one
Talking at the same time about how good code should look like (which can influence your choice of a programming editor)
Test script: sum all 1/k for k running from 1 to, say, 1000000 (Do you know from calculus what it ought to be?)
2. If in a need to refresh you probability and statistics basics, study Chapter 2 of Rizzo – that for the next week, say…
… because after all the course will be not be about coding and similar mundane thing, but about the real ones: algorithms
15