CS计算机代考程序代写 data science Lecture 1 – GGR376

Lecture 1 – GGR376
Spatial Data Science II
Dr. Adams

About Me
Dr. Matthew Adams Office DV3261 md.adams@utoronto.ca
Office Hours: By Appointment

Course Details
GGR376: Spatial Data Science II
Lectures:
1:10pm to 3:00pm (Wednesday)
Labs/Tutorials:
– PRA0101, 9:10am to 11:00am (Thursday) – PRA0102, 11:10am to 1:00pm (Thursday) – PRA0103, 3:10am to 5:00pm (Thursday)

Course Description
This course builds on spatial data analysis and quantitative methods introduced in GGR276, and aims to provide a broad study of advanced statistical methods and their use in a spatial context in physical, social, and environmental sciences. The course covers theories, methods, and applications geared towards helping students develop an understanding of the important theoretical concepts in spatial data analysis, and gain practical experience in application of spatial statistics to a variety of physical, social and environmental problems using advanced statistical software. [24L, 12P]

Course Text
This course uses readings that are highlighted in the course outline.

What is statistics?
The practice or science of collecting and analyzing numerical data in large quantities, especially for the purpose of inferring proportions in a whole from those in a representative sample.

What is my goal in this course?
I want you to be able to use data to answer questions.

Course Objectives I
􏰀 Further develop your ability to write R code
a <- 42 A<-a*2 #Riscasesensitive print(a) ## [1] 42 print(A) ## [1] 84 Course Objectives II 􏰀 Develop an understanding of data visualization using the grammer of graphics. 􏰀 Apply the grammar of graphics using ggplot2 in R (Wickham 2010) 􏰀 DEMO: Plot EPA fuel efficiency data on the next slide 􏰀 hwy = highway miles per gallon 􏰀 displ = engine displacement (litres) 􏰀 Total volume of all the cylinders in an engine Basic Scatter Plot 234567 displ hwy 15 20 25 30 35 40 45 Better Scatter Plot (ggplot2) 40 30 20 234567 displ hwy Code Comparison # Plot 1, Basic Scatter Plot plot(x = mpg$displ, y = mpg$hwy, ylab = "hwy", xlab = "displ") # Plot 2, Better Scatter Plot library(ggplot2) ggplot(data = mpg)+ geom_point(mapping = aes(x = displ, y = hwy)) Modify the Aesthetics: Include car class information 40 30 class 2seater compact midsize minivan pickup subcompact suv 20 234567 displ hwy Course Objectives III 􏰀 Learn how to select an appropriate statistical approach 􏰀 Understand the challenges that occur when modelling data 􏰀 Explore cognitive bias and how it may affect your analysis (Haselton, Nettle, and Andrews 2015) Car Model ## [1] "Fit a linear model" ## lm(formula = hwy ~ displ, data = mpg) ## [1] "Here are the coefficients" ## Estimate Std. Error Pr(>|t|)
## (Intercept) 35.697651 0.7203676 2.123519e-125
## displ -3.530589 0.1945137 2.038974e-46
## [1] “Explanatory ability”
## [1] 0.5850056

Course Marks
Figure 2: How are you graded?

Course Outline
Review Outline

Software
This course uses free and open-source software (FOSS) 􏰀R
􏰀 RStudio
R: A statistical programming language RStudio: A development environment for R

Installing R
Go to https://cran.r-project.org/
Figure 3: CRAN Download Page

Installing RStudio
After downloading R, go to https://www.rstudio.com/products/rstudio/download/#download
Figure 4: RStudio Download Page

R Cheatsheets
􏰀 A set of cheatsheets (reference guides) are available on the RStudio website.
􏰀 Bring the Base R cheat sheet to each class.
R Cheatsheets Link: https://www.rstudio.com/resources/cheatsheets/

Email
􏰀 Only use your University of Toronto e-mail address (. . . @mail.utoronto.ca)
􏰀 Include the course code (GGR376) in the subject line
􏰀 Include your full name and student number in the body of the
e-mail.
Please read the course handouts and check the course website site before e-mailing a question, to make sure that it has not already been answered.
􏰀 If your question is answered in a document, you will receive an email starting: “Review Course Documents”

Extensions
􏰀 All requests for extensions are handled by the Department of Geography.
􏰀 Details are on course outline.
􏰀 Neither I nor a TA can grant you an extension.

Figure 5: Questions before moving on?

Pre-test (15 minutes)
At the end of the Pre-test, please pass all your papers to the front.

Why you should love Statistics | Alan Smith
Think you’re good at guessing stats? Guess again. Whether we consider ourselves math people or not, our ability to understand and work with numbers is terribly limited, says data visualization expert Alan Smith.

Review (Mostly)

Statistics
Statistics encompasses the following with data:
􏰀 collection
􏰀 data acquisition
􏰀 data munging (cleaning) 􏰀 analysis
􏰀 interpretation 􏰀 presentation 􏰀 organization

Figure 6: Gut Feelings

Population and Sample I
Population, the totality of the set of interest 􏰀 People in a country
Sample, a subgroup of the population. 􏰀 The respondents to a survey.

Population and Sample II
Let’s visualize what the difference.
1000
The Population
750
500
250
0
0 250 500 750 1000
xCord
yCord

Population
Here is a sample.
The Sample
1000
750
500
250
0
0 250 500 750
xCord
yCord

Basic Analysis Steps
1. View summary statistics
2. Basic plots
3. Define my question
4. Conduct an appropriate analysis

What may we be interested in knowing?
Basic summary statistics?
summary(sampleA)[,c(1:3)]
## xCord
## Min. : 41.0
## 1st Qu.:181.2
## Median :402.5
## Mean :413.4
## 3rd Qu.:639.2
## Max. :953.0
yCord
Min. : 29.0
1st Qu.:382.5
Median :548.0
Mean :566.1
3rd Qu.:832.5
Max. :954.0
values
Min. :26.00
1st Qu.:41.00
Median :51.00
Mean :51.20
3rd Qu.:63.25
Max. :74.00

Basic Plots: Histogram
6
4
2
0
20 30 40 50 60 70 80
values
count

Next steps
􏰀 Identify a statistical model capable of testing our hypothesis or modelling our interest.
􏰀 For example 􏰀 T-Test
􏰀 ANOVA
􏰀 Kruskal-Wallis Test
􏰀 Linear Regression Model

Major Branches of Statistics
We can separate our statistical techniques into two themes:
1. Descriptive Statistics
􏰀 Methods of how we organize and summarize the data.
2. Inferential Statistics
􏰀 Methods of analysis based on probability theory.

Descriptive Statistics
An important step in your analysis because it provides a chance to learn about the data.
􏰀 Graphs 􏰀 Charts 􏰀 Tables

Inferential Statistics
Inference a conclusion based on evidence and reasoning.
􏰀 Make conclusions about a population from a sample of observations
􏰀 Hypothesis testing

Cognitive Bias 1:
During WWII the Center for Naval Analyses collected data on where gunshots were present on returning planes.
Take a few minutes and write down how you would use this information to reinforce planes.
Figure 7: Example of bullet holes

Survivorship Bias

Homework
Reading 1:
Sundali, J., & Croson, R. (2006). Biases in casino betting: The hot hand and the gambler’s fallacy. Judgment and Decision Making, 1(1), 1.

References
Haselton, Martie G., Daniel Nettle, and Paul W. Andrews. 2015. “The Evolution of Cognitive Bias.”
doi:10.1002/9780470939376.ch25.
Wickham, Hadley. 2010. “A Layered grammar of graphics.” Journal of Computational and Graphical Statistics 19 (1): 3–28. doi:10.1198/jcgs.2009.07098.