SPATIAL ANALYSIS Applied Analytics: Frameworks and Methods 2
1
Outline
■ Explain spatial data
■ Gather and represent data on a map
■ Analyze spatial data
2
Spatial Analysis
■ Using location of observation units to visualize, identify patterns, and conduct analysis
■ Involves contributions from cartography, biology, ethological studies, ecological studies, epidemiology, statistics, econometrics, geographical information systems, computer science and math.
3
Roots of Spatial Analysis
■ John Snow’s Cholera Map
■ Plotting Cholera cases on a map helped point to a water pump as the source of the disease
Source: Wikipedia
4
Location Data
■ The amount of location data is growing at a rapid rate aided by data generated by
– Mobile devices: Not only smartphone service providers, but also smartphone apps and websites are gathering location information.
– IoT devices such as smartwatches, fitbits, automatic vacuum cleaners are gathering location data
– Machines are gathering location data. These include ride-sharing services (e.g., Uber, Lyft), and self-driving cars (e.g., Tesla).
5
Motivation for Examining Spatial Data ■ Possible questions addressed
– Does the spatial patterning of disease incidences give rise to the conclusion that they are clustered, and if so, are the clusters found related to factors such as age, relative poverty, or pollution sources?
– Given a number of observed soil samples, which part of a study area is polluted?
– Given scattered air quality measurements, how many people are exposed to high levels of black smoke or particulate matter (e.g. PM10), and where do they live?
– Do governments tend to compare their policies with those of their neighbors, or do they behave independently?
6
Motivation for Examining Spatial Data
■ For marketers, where a person lives reflects the lifestyle segment they belong to. Enter a zip code here to see Tapestry segments
■ Keys to success in brick and mortar retail is Location, Location and Location
Source: ESRI
7
Motivation for Examining Spatial Data
■ Pattern of points or lines may yield deeper insights.
■ Visualization of 1.1 Billion Taxi Trips
Source:toddwschneider.com
8
SPATIAL DATA
9
Spatial Data
■ Spatial Data is data about location
■ Geographic Location is determined by a coordinate system consisting of longitude and latitude.
– Longitude: Location from east to west
– Latitude: Location from north to south
■ Here is location data on six AirBnb rentals
– Longitude marked by W are negative
– Latitudes marked N are positive
10
Types of spatial data
■ Point data
– Data associated with a point.
– E.g., house or apartment
■ Line data
– Data is associated with a collection of points on a line
– E.g., road, river
■ Polygon data
– Data associated with an area
– E.g., County, City, State, Country
■ Raster (or grid) data
– A grid with data at every location or cell. If each cell is colored, the grid will look like a heat map
11
Spatial Data Type
■ Data frames are not well-suited for representing spatial data for the following reasons.
– For polygons, non-spatial data is repeated. For e.g., a state is defined by many coordinates but it only has one number for population.
– Data frames work only when there is a uniform coordinate reference system. Differences in coordinate reference system creates challenges in merging different datasources.
■ Spatial Data type
– Many types including sp, sf, raster
– sp is the most widely used
– With its rich ecosystem of spatial analysis packages, R is one of the best tools to examine spatial data.
12
sp
■ S4 object
■ Object has slots. Most important slots
– @data – the data
– @bbox – bounding box
– @coordinates (since SpatialPoints don’t have a hierarchy), @Lines, @polygons – coordinate information
– @proj4string – projection type
■ Different types of Spatial objects based on what they describe
– Points: SpatialPoints
– Lines: SpatialLines
– Polygons: SpatialPolygons
■ Spatial Object + Data
– SpatialPointsDataFrame
– SpatialLinesDataFrame
– SpatialPolygonsDataFrame
13
sp: Projection
■ A projection is the means by which to display the coordinate system and data on a flat surface, such as paper or screen.
■ Mathematical calculations are used to convert the coordinate system used on the curved surface of earth to one for a flat surface.
■ Since there is no perfect way to transpose a curved surface to a flat surface without some distortion, many different map projections exist that provide different properties. Some preserve shape, while some preserve distance.
– ellipsoid: approximation of the shape of the earth
– datum: position of ellipsoid relative to the earth.
– Common projections:
■ Global datasets (datum = WGS84, ellps=WGS84)
■ US Datasets (datum=NAD83, ellps=GRS80)
14
Raster Data
■ A raster is a spatial (geographic) data structure that divides a region into rectangles called ’cells’ (or’pixels’) that can store one or more values for each of these cells
■ This data structure is also referred to as a ’grid’ and is often contrasted with ’vector’ data that is used to represent points, lines, and polygons (Hijmans 2018)
■ A RasterLayer object represents single-layer (variable) raster data.
■ RasterStack and RasterBrick contain many RasterLayer
■ Another way to store spatial data
■ matrix + grid information + coordinate-reference system
■ raster package; S4 object like sp
15
VISUALIZE SPATIAL DATA
16
Visualizing Spatial Data
■ Address: 241 W 42nd St, New York, NY 10036
■ Longitude: -73.988640 (or 73.98 W)
■ Latitude: 40.757020 (or 40.75 N)
■ Longitude: x-coordinate
■ Latitude: y-coordinate
17
Approach to Visualizing Spatial Data
■ Key to representing data on a map is to think about adding layers of data onto a static map. For instance, one could add data on houses to a map of an area
– Map
– Additional geographic detail .
– Data Layer
– More data layers …
18
Visualizing Spatial Data
■ ggplot2
– Useful for plotting data frame data
– However, data lacks a geographical context
■ ggmap
– Complements ggplot2 by providing geographic detail
– ggmap downloads maps from web services (e.g., google maps and open street maps) and adds them as a layer to plots
19
Plotting Spatial Data: tmap
■ library(tmap) offers a flexible, layer-based, and easy to use approach to create thematic maps, such as choropleths and bubble maps.
■ Designed to work with spatial objects
■ Like ggplot2, plot is built with layers
20
ANALYZE SPATIAL DATA
21
Analyze Spatial Data
■ Assumes nearby georeferenced units are associated in some way
■ Point pattern analysis
– Identification of spatial clusters or groups of observations
– Explanation for the clusters
– Changes in spatial pattern over time
■ Area patterns
■ Interpolation and geostatistics
22
Summary
■ This module addressed the following topics
– Explain spatial data
– Gather and represent data on a map
– Analyze spatial data
23