Lecture 1 – GGR376
Spatial Data Science II
Dr. Adams
About Me
Dr. Matthew Adams Office DV3261 md.adams@utoronto.ca
Office Hours: By Appointment
Course Details
GGR376: Spatial Data Science II
Lectures:
1:10pm to 3:00pm (Wednesday)
Labs/Tutorials:
– PRA0101, 9:10am to 11:00am (Thursday) – PRA0102, 11:10am to 1:00pm (Thursday) – PRA0103, 3:10am to 5:00pm (Thursday)
Course Description
This course builds on spatial data analysis and quantitative methods introduced in GGR276, and aims to provide a broad study of advanced statistical methods and their use in a spatial context in physical, social, and environmental sciences. The course covers theories, methods, and applications geared towards helping students develop an understanding of the important theoretical concepts in spatial data analysis, and gain practical experience in application of spatial statistics to a variety of physical, social and environmental problems using advanced statistical software. [24L, 12P]
Course Text
This course uses readings that are highlighted in the course outline.
What is statistics?
The practice or science of collecting and analyzing numerical data in large quantities, especially for the purpose of inferring proportions in a whole from those in a representative sample.
What is my goal in this course?
I want you to be able to use data to answer questions.
Course Objectives I
Further develop your ability to write R code
a <- 42
A<-a*2 #Riscasesensitive print(a)
## [1] 42
print(A) ## [1] 84
Course Objectives II
Develop an understanding of data visualization using the grammer of graphics.
Apply the grammar of graphics using ggplot2 in R (Wickham 2010)
DEMO: Plot EPA fuel efficiency data on the next slide hwy = highway miles per gallon
displ = engine displacement (litres)
Total volume of all the cylinders in an engine
Basic Scatter Plot
234567
displ
hwy
15 20 25 30 35 40 45
Better Scatter Plot (ggplot2)
40
30
20
234567
displ
hwy
Code Comparison
# Plot 1, Basic Scatter Plot
plot(x = mpg$displ, y = mpg$hwy, ylab = "hwy", xlab = "displ")
# Plot 2, Better Scatter Plot
library(ggplot2)
ggplot(data = mpg)+
geom_point(mapping = aes(x = displ, y = hwy))
Modify the Aesthetics: Include car class information
40
30
class 2seater
compact midsize minivan pickup subcompact suv
20
234567
displ
hwy
Course Objectives III
Learn how to select an appropriate statistical approach
Understand the challenges that occur when modelling data Explore cognitive bias and how it may affect your analysis
(Haselton, Nettle, and Andrews 2015)
Car Model
## [1] "Fit a linear model"
## lm(formula = hwy ~ displ, data = mpg)
## [1] "Here are the coefficients"
## Estimate Std. Error Pr(>|t|)
## (Intercept) 35.697651 0.7203676 2.123519e-125
## displ -3.530589 0.1945137 2.038974e-46
## [1] “Explanatory ability”
## [1] 0.5850056
Course Marks
Figure 2: How are you graded?
Course Outline
Review Outline
Software
This course uses free and open-source software (FOSS) R
RStudio
R: A statistical programming language RStudio: A development environment for R
Installing R
Go to https://cran.r-project.org/
Figure 3: CRAN Download Page
Installing RStudio
After downloading R, go to https://www.rstudio.com/products/rstudio/download/#download
Figure 4: RStudio Download Page
R Cheatsheets
A set of cheatsheets (reference guides) are available on the RStudio website.
Bring the Base R cheat sheet to each class.
R Cheatsheets Link: https://www.rstudio.com/resources/cheatsheets/
Email
Only use your University of Toronto e-mail address (. . . @mail.utoronto.ca)
Include the course code (GGR376) in the subject line
Include your full name and student number in the body of the
e-mail.
Please read the course handouts and check the course website site before e-mailing a question, to make sure that it has not already been answered.
If your question is answered in a document, you will receive an email starting: “Review Course Documents”
Extensions
All requests for extensions are handled by the Department of Geography.
Details are on course outline.
Neither I nor a TA can grant you an extension.
Figure 5: Questions before moving on?
Pre-test (15 minutes)
At the end of the Pre-test, please pass all your papers to the front.
Why you should love Statistics | Alan Smith
Think you’re good at guessing stats? Guess again. Whether we consider ourselves math people or not, our ability to understand and work with numbers is terribly limited, says data visualization expert Alan Smith.
Review (Mostly)
Statistics
Statistics encompasses the following with data:
collection
data acquisition
data munging (cleaning) analysis
interpretation presentation organization
Figure 6: Gut Feelings
Population and Sample I
Population, the totality of the set of interest People in a country
Sample, a subgroup of the population. The respondents to a survey.
Population and Sample II
Let’s visualize what the difference.
1000
The Population
750
500
250
0
0 250 500 750 1000
xCord
yCord
Population
Here is a sample.
The Sample
1000
750
500
250
0
0 250 500 750
xCord
yCord
Basic Analysis Steps
1. View summary statistics
2. Basic plots
3. Define my question
4. Conduct an appropriate analysis
What may we be interested in knowing?
Basic summary statistics?
summary(sampleA)[,c(1:3)]
## xCord
## Min. : 41.0
## 1st Qu.:181.2
## Median :402.5
## Mean :413.4
## 3rd Qu.:639.2
## Max. :953.0
yCord
Min. : 29.0
1st Qu.:382.5
Median :548.0
Mean :566.1
3rd Qu.:832.5
Max. :954.0
values
Min. :26.00
1st Qu.:41.00
Median :51.00
Mean :51.20
3rd Qu.:63.25
Max. :74.00
Basic Plots: Histogram
6
4
2
0
20 30 40 50 60 70 80
values
count
Next steps
Identify a statistical model capable of testing our hypothesis or modelling our interest.
For example T-Test
ANOVA
Kruskal-Wallis Test
Linear Regression Model
Major Branches of Statistics
We can separate our statistical techniques into two themes:
1. Descriptive Statistics
Methods of how we organize and summarize the data.
2. Inferential Statistics
Methods of analysis based on probability theory.
Descriptive Statistics
An important step in your analysis because it provides a chance to learn about the data.
Graphs Charts Tables
Inferential Statistics
Inference a conclusion based on evidence and reasoning.
Make conclusions about a population from a sample of observations
Hypothesis testing
Cognitive Bias 1:
During WWII the Center for Naval Analyses collected data on where gunshots were present on returning planes.
Take a few minutes and write down how you would use this information to reinforce planes.
Figure 7: Example of bullet holes
Survivorship Bias
Homework
Reading 1:
Sundali, J., & Croson, R. (2006). Biases in casino betting: The hot hand and the gambler’s fallacy. Judgment and Decision Making, 1(1), 1.
References
Haselton, Martie G., Daniel Nettle, and Paul W. Andrews. 2015. “The Evolution of Cognitive Bias.”
doi:10.1002/9780470939376.ch25.
Wickham, Hadley. 2010. “A Layered grammar of graphics.” Journal of Computational and Graphical Statistics 19 (1): 3–28. doi:10.1198/jcgs.2009.07098.