程序代写代做 go Excel graph flex Dynamic Cities

Dynamic Cities
ENVS 257/279: Practical 4 March 2020
Introduction
Interaction (or flow) data tell us about who moves where – key examples are migration (how many people have moved between one area and another over one year?) and commuting (how many people live in a given area but work within another specific area).
This practical makes use of employment and commuting data provided as outputs from the Census in 2011 and it introduces methods for the visualisation of commuting flows between small areas within Liverpool. It also considers how we can consider the determinants of commuting – for example, what draws people from residential areas to areas of employment? Obvious factors might include distance (flows are likely to be larger between places which are close together) and the availability of jobs in destination locations (that is, areas of employment).
Using a method called Poisson regression, you will model the relationship between flows and origin (area of residence) and destination (area of employment).
Your role here is one of a transport planner seeking to consider how well existing transport infrastructure caters for workers in the Liverpool region.
This exercise is based on RStudio. It entails the use of two kinds of data:
1. Flow (or interaction) data.
2. Area data (numbers of people who in employment or unemployed).
Flow data represent movements of people, goods, resources (etc) between places. For some background to the kinds of flow data we will be working with, see: (http://census.ukdataservice.ac.uk/get-data/flow- data.aspx).
Project Tasks
You will write a short executive report (max. 500 words) that will answer the following questions:
1. What are the main commuting patterns in Liverpool?
2. Where are flows largest? What might this suggest?
3. How does the geography of employment/unemployment relate to flows?
4. How important are distance, origin population, destination population (employed people) and other
variables (e.g., unemployment) in explaining commuting flows between the areas of Liverpool?
1

In addition to answering these questions in your report you should present, as a minimum, the following information:
1. A map of percentages of unemployed people.
2. A map showing commuting flows between MSOAs in Liverpool (with your own flow thresholds). 3. A table summarising Poisson regression outputs.
Overall, your report should not have more than 6 maps or charts. Maps can be generated within R or QGIS.
The submission deadline for this summative assignment is Monday, 20 April 2020, before 2:00pm. Only electronic submission via VITAL is required.
How to start a new script for your exercise.
A script is a text editor that allows you to type the commands you enter into Rstudio and then run/re- run/modify these commands as you work through the exercise. To create a new script, go to:
1. File > New File > R Script.
2. Then go to File > Save As and give the new file a sensible name such as ‘Dynamic Cities Script’.
Save it to your M:Drive folder for this module.
3. The script is saved with an extension of “.R” – for example, “Dynamic Cities Script.R”.
4. To execute the code in a script click the Run button.
Data.
[1]. Download Data. (a). Downloading Flow data. We are going to look at commuting across small areas within Liverpool. The areas we will use are called Middle Layer Super Output Areas (MSOAs). In 2011, there were 7201 MSOAs in England and Wales with a mean average population of 7787.
The flow data can be downloaded from the CIDER (Centre for Interaction Data Estimation and Research) website as follows:
1. Open (http://cider.census.ac.uk/) in your web browser.
2. Click on wicid – the flexible query builder. Then click Login using Shibboleth. Login with University of Liverpool. Enter your University of Liverpool MWS username and password. Click Continue.
3. Click on the Data tab. Click Select by dataset and table. Select Commuting and journey to education data, then select 2011.
4. Select 2011 (SWS) MSOA/SOA/Intermediate Zone [Location of usual residence and place of work by sex] – WU01EW – Open
5. Select All usual residents aged 16 and over in employment the week before the census.
6. Click on the Geography tab. Select Select or edit origins. Click on List selection, and click on England and Wales Middle Super Output Areas (MSOA), Northern Ireland Super Output Areas (SOA), Scotland Intermediate Zones 2011.
2

7. Click on the icon below Do you want to select all areas that fall within another larger area?. Select UK Local Authorities (merged) 2011. Click Confirm that you wish to proceed using this geography.
8. Click Next Page several times until you reach Liverpool. Click on the button next to 267/Liver- pool. Click Add chosen areas.
9. Click select geography destinations. Click Copy selection and then click Set the destinations to be the same as the currently selected origins.
10. Click Produce output, then click Run your query. Next, click Continue to Output pages.
11. Select Tabular output. Under Origin and Destination labels select Change. Change the 1st label setting to ONS/GSS area code and click Change labels. Specify the Output layout to Origin- destination pair list. Under Output format select Comma separated values. Then, click Preview Output and Download.
12. Click Download output file. Change the filename to LiverpoolTTW.csv and then click Download now.
13. Save the file to your working directory.
(b). Downloading Employment counts for areas. You will now download employment data for each
MSOA. These data will be downloaded from the Infuse website.
1. Visit the website (http://infuse.mimas.ac.uk/).
2. click on 2011 Census data > Topics.
3. From the list of Topics on the left hand side of the screen, select Economic Activity.
4. You should click Select under the first of the 40 topic combinations: Age, Economic activity. Then, click Next.
5. In the Categories window click the box next to Age 16 to 74, then expand Economically active.
6. Next, you will need to expand the fields by clicking the required boxes (select for the fields indicated)
and then tick the boxes next to:
• Employee > Part-time.
• Employee > Full-time.
• Self-employed with employees > Part-time.
• Self-employed with employees > Full-time.
• Self-employed without employees > Part-time. • Self-employed without employees > Full-time. • Unemployed.
Click Add.
7. You should then have seven category combinations. Click Next.
8. In the Geography window expand Local Authorities (click on “-” symbol to expand the categories). Expand Liverpool and select Middle Super Output Areas and Intermediate Zones (61 areas). Then, click Add > Next.
9. Click Get the data, followed by Download data. Save the data to your working directory.
10. When viewed in Excel, the file should have information in columns A to L, with 63 rows.
3

(c). Downloading Area data. To map the data we have downloaded, we need boundary files illustrating our MSOAs. Area data can be downloaded from (http://edina.ac.uk/ukborders/), but to make life eas- ier for you, the area data you need are provided on VITAL > Learning Resources > Practical > LiverpoolMSOA.zip.
1. Download the file, unzip the downloaded data, and place the contents inside your working directory.
[2]. Importing data into R. (a). Load and prepare employment data pertaining to each MSOA.
In Rstudio, you must first set your working directory. For example:
Read the file in for use later on (making sure your filename is correctly specified within read.csv):
We now wish to rename columns to something more concise, as shown in the table below.
#You have to specify your own path below:
setwd(“M:/Kush/Teaching/E257/2020/Practicals/Practical4”)
#Read csv file, skipping the first two rows:
lpoolemploy <- read.csv("Data_AGE_ECOACT_UNIT.csv", header = FALSE, skip = 2) Old Name CDU-ID GEO_CODE GEO_LABEL GEO_TYPE GEO_TYP2 Age:Age16 Economically Persons Age:Age16 Economically Persons Age:Age16 Economically employees Part-time - Unit : Persons Age : Age 16 to 74 - Economic activity : Economically active Self-employed with employees Full-time - Unit : Persons Age : Age 16 to 74 - Economic activity : Economically active Self-employed without employees Part-time - Unit : Persons Age : Age 16 to 74 - Economic activity : Economically active Self-employed without employees Full-time - Unit : Persons Age : Age 16 to 74 - Economic activity : Economically active Unemployed - Unit : Persons New Name ID geocode label type typeid employeePT employeeFT selfempwithPT selfempwithFT selfempnoPT selfempnoFT unemploy to 74 - Economic activity : active Employee Part-time - Unit : to 74 - Economic activity : active Employee Full-time - Unit : to 74 - Economic activity : active Self-employed with 4 #Now rename the columns: colnames(lpoolemploy) <- c("ID","geocode","label","type","typeid","employeePT", "employeeFT","selfempwithPT","selfempwithFT","selfempnoPT", "selfempnoFT","unemploy") #Select only the first 12 columns (as 13th contains no data): lpoolemploy<-lpoolemploy[, 1:12] #IMPORTANT: Check that the order of new names matches the columns in your CSV dataframe. You now need to compute the percentage of people who are unemployed with the sum of employed and unemployed persons as the denominator. The unemployment percentages will be mapped and analysed later. ## ID geocode label ## 1 11246 E02001347 Liverpool 001 ## 2 11247 E02001348 Liverpool 002 ## 3 11248 E02001349 Liverpool 003 ## 4 11249 E02001350 Liverpool 004 ## 5 11250 E02001351 Liverpool 005 ## 6 11251 E02001352 Liverpool 006 ## type typeid employeePT employeeFT ## 1 Middle Super Output Areas and Intermediate Zones MSOAIZ ## 2 Middle Super Output Areas and Intermediate Zones MSOAIZ ## 3 Middle Super Output Areas and Intermediate Zones MSOAIZ ## 4 Middle Super Output Areas and Intermediate Zones MSOAIZ ## 5 Middle Super Output Areas and Intermediate Zones MSOAIZ ## 6 Middle Super Output Areas and Intermediate Zones MSOAIZ 829 2086 798 2253 870 2621 780 1883 931 1966 821 1750 ## ##1 ##2 ##3 ##4 ##5 ##6 ## unemployPC ##1 9.753394 ##2 8.829902 ##3 8.337292 ## 4 12.252731 selfempwithPT selfempwithFT selfempnoPT selfempnoFT unemploy employed total 13 64 78 14 37 81 6 57 77 13 53 60 12 50 53 1 49 72 187 352 3257 3609 183 326 3366 3692 22835138594210 183 415 2972 3387 157 533 3169 3702 16547528583333 5 t this stage): #First, compute the total number of employed people (you will need this later as well as a lpoolemploy$employed <- lpoolemploy$employeePT+lpoolemploy$employeeFT+ lpoolemploy$selfempwithPT+lpoolemploy$selfempwithFT+ lpoolemploy$selfempnoPT+lpoolemploy$selfempnoFT #Then the total employed and unemployed people: lpoolemploy$total <- lpoolemploy$employed+lpoolemploy$unemploy #Next, compute the percentage unemployed: lpoolemploy$unemployPC <- (lpoolemploy$unemploy/lpoolemploy$total)*100 #Now inspect the data: head(lpoolemploy) ## 5 14.397623 ## 6 14.251425 (b). Load and inspect Travel-to-Work (TTW) data. Read csv file, skipping the first four rows: lpoolttw <- read.csv("LiverpoolTTW.csv", header = FALSE, skip = 4) Next, preview the file by looking at the first five rows. head(lpoolttw) ## V1 V2V3 ## 1 E02001370 E02001347 15 ## 2 E02001404 E02001347 7 ## 3 E02001360 E02001347 37 ## 4 E02001348 E02001347 161 ## 5 E02001365 E02001347 37 ## 6 E02001376 E02001347 19 Again, we want to re-label the columns with more concise names as shown in the table below: Old Name Origins: England and Wales Middle Super Output Areas (MSOA), Northern Ireland Super Output Areas (SOA), Scotland Intermediate Zones 2011 Destinations: England and Wales Middle Super Output Areas (MSOA), Northern Ireland Super Output Areas (SOA), Scotland Intermediate Zones 2011 [Total] All usual residents aged 16 and over in employment the week before the census New Name Origin Destination FlowOD colnames(lpoolttw) <- c("Origin","Destination","FlowOD") Check that this has worked by again looking at the first five rows of the data. head(lpoolttw) ## Origin Destination FlowOD ## 1 E02001370 ## 2 E02001404 ## 3 E02001360 ## 4 E02001348 ## 5 E02001365 ## 6 E02001376 E02001347 15 E02001347 7 E02001347 37 E02001347 161 E02001347 37 E02001347 19 Finally, we will summarise the variables to understand the how the flow data values are distributed. 6 summary(lpoolttw) ## Origin Destination FlowOD ## E02001347: 61 E02001347: 61 Min. : 1.00 ## E02001353: 61 E02001350: 61 1st Qu.: 6.00 ## E02001354: 61 E02001351: 61 Median : 13.00 ## E02001359: 61 E02001354: 61 Mean : 32.57 ## E02001362: 61 E02001355: 61 3rd Qu.: 30.00 ## E02001363: 61 E02001358: 61 Max. :722.00 ## (Other) :3272 (Other) :3272 NA's :2 The two “NAs” indicate rows at the end of the data which were textual information included in the down- loaded file. You don’t need to worry about these for now.Note that the Minimum, 1st Quartile and Median FlowOD values are 1, 6, and 13 respectively. In other words, most flows are small. (c). Load and plot MSOA boundaries. Now we will read in the MSOAs data into a type of object called a SpatialPolygonsDataFrame. library("rgdal") ## Loading required package: sp ## rgdal: version: 1.4-8, (SVN revision 845) ## Geospatial Data Abstraction Library extensions to R successfully loaded ## Loaded GDAL runtime: GDAL 2.2.3, released 2017/11/20 ## Path to GDAL shared files: C:/Users/kthakar/Documents/R/win-library/3.6/rgdal/gdal ## GDAL binary built with GEOS: TRUE ## Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493] ## Path to PROJ.4 shared files: C:/Users/kthakar/Documents/R/win-library/3.6/rgdal/proj ## Linking to sp version: 1.3-2 lpoolMSOA <- readOGR("LiverpoolMSOA.shp") ## OGR data source with driver: ESRI Shapefile ## Source: "M:\Kush\Teaching\E257\2020\Practicals\Practical4\LiverpoolMSOA.shp", layer: "LiverpoolMSOA" ## with 61 features ## It has 3 fields We can see what the SpatialPolygonsDataFrame looks like using a the plot() function. #A basic plot of the MSOAs SpatialPolygonsDataFrame plot(lpoolMSOA) 7 We now have base map of MSOAs in Liverpool we can consider mapping commuting flows between areas. Saving Your Work. If you want to save your workspace, so you can open it up later on, you can use the save.image function. #This will save all objects currently loaded in the workspace: save.image(file = "ttw.RData") Re-loading your Work. You can re-load the objects contained in your saved file at any time (e.g. after closing R) using the ‘load’ command. The file with extension ‘.RData’ is stored in the working directory defined earlier.Remember that you will need to set your working directory again before trying to load your script and data - otherwise R will not know where to find your RData file. #Re-load Data load("ttw.RData") (c). Importing the MSOA boundaries. As the next step, we need to import the Liverpool MSOAs shapefile (vector polygon boundaries) downloaded earlier. We will need to make use of functions which are not contained in the base installation of R. These functions are contained in external ‘packages’ written by a community of developers. First, they must be installed 8 on your computers, and then loaded-in to the work session. Moreover, note that you need to re-load these packages from your PC library each time you begin or return to a programming session in R. You need to load the packages ‘maptools’, and ‘rgeos’ and ‘stringr’. These packages include functions for reading and manipulating GIS data. #Choose the CRAN mirror where the packages are going to be downloaded from options("repos" = c(CRAN = "http://www.stats.bris.ac.uk/R/")) #Download / install and then load sp install.packages(c("sp", "maptools", "stringr", "rgeos"), depend = TRUE) #Load libraries library("sp") library("maptools") ## Checking rgeos availability: TRUE ## rgeos version: 0.5-2, (SVN revision 621) ## GEOS runtime version: 3.6.1-CAPI-1.10.1 ## Linking to sp version: 1.3-2 ## Polygon checking: TRUE First, we need to extract several elements from the LiverpoolMSOA shapefile attribute table. We do this using SpatialPointsDataFrame. To map flows between zones we also need the centroids of the MSOAs, and this is achieved using the coordinates command. MSOAXY is then joined to the lpoolttw and to the lpoolemploy data. We need to join these three files twice - once linking geo_code (in MSOAXY) to Destination in lpoolttw to create Distttw1 and then geocode (in lpoolemploy) to Destination (in the new file Distttw1) to create Distttw2. Then the same steps are repeated for Origin to create Distttw3 and Distttw4. ## geocode ID label ## 1 E02001347 11246 Liverpool 001 ## 2 E02001347 11246 Liverpool 001 ## 3 E02001347 11246 Liverpool 001 ## 4 E02001347 11246 Liverpool 001 ## 5 E02001347 11246 Liverpool 001 ## 6 E02001347 11246 Liverpool 001 ## type typeid employeePT employeeFT ## 1 Middle Super Output Areas and Intermediate Zones MSOAIZ 829 2086 library("stringr") library("rgeos") MSOAXY <- SpatialPointsDataFrame(coordinates(lpoolMSOA), data=as(lpoolMSOA, "data.frame")[c("geo_code")]) #So, first we'll join to Destination and then the output from this to Origin: Distttw1 <- merge(lpoolttw,MSOAXY,by.x="Destination",by.y="geo_code") Distttw2 <- merge(lpoolemploy,Distttw1,by.x="geocode",by.y="Destination",all=TRUE) head(Distttw2) 9 ## 2 Middle Super Output Areas and Intermediate Zones MSOAIZ ## 3 Middle Super Output Areas and Intermediate Zones MSOAIZ ## 4 Middle Super Output Areas and Intermediate Zones MSOAIZ ## 5 Middle Super Output Areas and Intermediate Zones MSOAIZ ## 6 Middle Super Output Areas and Intermediate Zones MSOAIZ ## selfempwithPT selfempwithFT selfempnoPT selfempnoFT unemploy employed total ##1 ##2 ##3 ##4 ##5 ##6 ## unemployPC ## 1 9.753394 E02001370 15 338686.2 396978 ## 2 9.753394 E02001348 161 338686.2 396978 ## 3 9.753394 E02001365 37 338686.2 396978 ## 4 9.753394 E02001404 7 338686.2 396978 ## 5 9.753394 E02001360 37 338686.2 396978 ## 6 9.753394 E02001371 24 338686.2 396978 352 3257 3609 352 3257 3609 352 3257 3609 352 3257 3609 352 3257 3609 352 3257 3609 13 64 13 64 13 64 13 64 13 64 13 64 78 187 78 187 78 187 78 187 78 187 78 187 Origin FlowOD coords.x1 coords.x2 ## geocode ID.x label.x ## 1 E02001347 11246 Liverpool 001 ## 2 E02001347 11246 Liverpool 001 ## 3 E02001347 11246 Liverpool 001 ## 4 E02001347 11246 Liverpool 001 ## 5 E02001347 11246 Liverpool 001 ## 6 E02001347 11246 Liverpool 001 ## type.x typeid.x employeePT.x ## 1 Middle Super Output Areas and Intermediate Zones ## 2 Middle Super Output Areas and Intermediate Zones ## 3 Middle Super Output Areas and Intermediate Zones ## 4 Middle Super Output Areas and Intermediate Zones ## 5 Middle Super Output Areas and Intermediate Zones ## 6 Middle Super Output Areas and Intermediate Zones ## employeeFT.x selfempwithPT.x selfempwithFT.x selfempnoPT.x selfempnoFT.x ## 1 2086 ## 2 2086 ## 3 2086 ## 4 2086 ## 5 2086 ## 6 2086 ## unemploy.x employed.x total.x unemployPC.x geocode.y ID.y label.y ## 1 ## 2 ## 3 ## 4 ## 5 ## 6 ## 352 352 352 352 352 352 3257 3257 3257 3257 3257 3257 3609 3609 3609 3609 3609 3609 9.753394 E02001350 11249 Liverpool 004 9.753394 E02001401 11299 Liverpool 055 9.753394 E02001366 11265 Liverpool 020 9.753394 E02001376 11275 Liverpool 030 9.753394 E02001382 11280 Liverpool 036 9.753394 E02001361 11260 Liverpool 015 type.y typeid.y employeePT.y 13 64 78 187 13 64 78 187 13 64 78 187 13 64 78 187 13 64 78 187 13 64 78 187 10 MSOAIZ 829 MSOAIZ 829 MSOAIZ 829 MSOAIZ 829 MSOAIZ 829 MSOAIZ 829 829 2086 829 2086 829 2086 829 2086 829 2086 #And now the equivalent for Origin: Distttw3 <- merge(Distttw2,MSOAXY,by.x="Origin",by.y="geo_code") Distttw4 <- merge(lpoolemploy,Distttw3,by.x="geocode",by.y="Origin",all=TRUE) head(Distttw4) ## 1 Middle Super Output Areas and Intermediate Zones ## 2 Middle Super Output Areas and Intermediate Zones ## 3 Middle Super Output Areas and Intermediate Zones ## 4 Middle Super Output Areas and Intermediate Zones ## 5 Middle Super Output Areas and Intermediate Zones ## 6 Middle Super Output Areas and Intermediate Zones ## employeeFT.y selfempwithPT.y selfempwithFT.y selfempnoPT.y selfempnoFT.y ## 1 1883 ## 2 2275 ## 3 2355 ## 4 1572 ## 5 2215 ## 6 1863 ## unemploy.y employed.y total.y unemployPC.y FlowOD coords.x1.x coords.x2.x ##1 415 2972 ##2 232 3460 ##3 582 3685 ##4 635 2648 ##5 200 3468 ##6 412 3006 ## coords.x1.y coords.x2.y 3387 12.252731 50 340083.4 3692 6.283857 13 342699.6 4267 13.639559 11 338261.9 3283 19.342065 9 336835.5 3668 5.452563 3 341247.0 3418 12.053833 4 337033.7 396003.7 385411.4 392714.7 391054.2 389565.8 393676.8 ## 1 338686.2 ## 2 338686.2 ## 3 338686.2 ## 4 338686.2 ## 5 338686.2 ## 6 338686.2 396978 396978 396978 396978 396978 396978 13 53 17 105 17 64 18 62 16 92 10 45 60 183 96 186 90 236 75 118 76 234 70 170 MSOAIZ 780 MSOAIZ 781 MSOAIZ 923 MSOAIZ 803 MSOAIZ 835 MSOAIZ 848 #Next, replace the column labels so that they are more descriptive: colnames(Distttw4) <- c("Origin","OriginID","Originlabel","Origintype","OrigintypeID", "OemployeePT","OemployeeFT","OselfempwithPT","OselfempwithFT", head(Distttw4) "OselfempnoPT","OselfempnoFT","Ounemploy","Oemployed","Ototal", "OunemployPC","Destination","DestinationID","Destinationlabel", "Destinationtype","DestinationtypeID","DemployeePT","DemployeeFT", "DselfempwithPT","DselfempwithFT","DselfempnoPT","DselfempnoFT", "Dunemploy","Demployed","Dtotal","DunemployPC","FlowOD","Deast", "Dnorth","Oeast","Onorth") ## Origin OriginID Originlabel ## 1 E02001347 ## 2 E02001347 ## 3 E02001347 ## 4 E02001347 ## 5 E02001347 ## 6 E02001347 ## 11246 Liverpool 001 11246 Liverpool 001 11246 Liverpool 001 11246 Liverpool 001 11246 Liverpool 001 11246 Liverpool 001 ## 1 Middle Super Output Areas and Intermediate Zones ## 2 Middle Super Output Areas and Intermediate Zones ## 3 Middle Super Output Areas and Intermediate Zones ## 4 Middle Super Output Areas and Intermediate Zones ## 5 Middle Super Output Areas and Intermediate Zones ## 6 Middle Super Output Areas and Intermediate Zones MSOAIZ 829 MSOAIZ 829 MSOAIZ 829 MSOAIZ 829 MSOAIZ 829 MSOAIZ 829 Origintype OrigintypeID OemployeePT 11 ## ## 1 ## 2 ## 3 ## 4 ## 5 ## 6 ## ## 1 ## 2 ## 3 ## 4 ## 5 ## 6 ## ## 1 Middle Super Output Areas and Intermediate Zones ## 2 Middle Super Output Areas and Intermediate Zones ## 3 Middle Super Output Areas and Intermediate Zones ## 4 Middle Super Output Areas and Intermediate Zones ## 5 Middle Super Output Areas and Intermediate Zones ## 6 Middle Super Output Areas and Intermediate Zones ## DemployeePT DemployeeFT DselfempwithPT DselfempwithFT DselfempnoPT OemployeeFT OselfempwithPT OselfempwithFT OselfempnoPT OselfempnoFT Ounemploy 2086 13 2086 13 2086 13 2086 13 2086 13 2086 13 64 78 187 352 64 78 187 352 64 78 187 352 64 78 187 352 64 78 187 352 64 78 187 352 Oemployed Ototal OunemployPC Destination DestinationID Destinationlabel 3257 3257 3257 3257 3257 3257 3609 3609 3609 3609 3609 3609 9.753394 9.753394 9.753394 9.753394 9.753394 9.753394 E02001350 E02001401 E02001366 E02001376 E02001382 E02001361 11249 11299 11265 11275 11280 11260 Liverpool 004 Liverpool 055 Liverpool 020 Liverpool 030 Liverpool 036 Liverpool 015 ## 1 780 1883 ## 2 781 2275 ## 3 923 2355 ## 4 803 1572 ## 5 835 2215 ## 6 848 1863 ## DselfempnoFT Dunemploy Demployed Dtotal DunemployPC FlowOD ## 1 183 415 ## 2 186 232 ## 3 236 582 ## 4 118 635 ## 5 234 200 ## 6 170 412 ## Oeast Onorth 2972 3387 3460 3692 3685 4267 2648 3283 3468 3668 3006 3418 12.252731 6.283857 13.639559 19.342065 5.452563 12.053833 ## 1 338686.2 396978 ## 2 338686.2 396978 ## 3 338686.2 396978 ## 4 338686.2 396978 ## 5 338686.2 396978 ## 6 338686.2 396978 Analysis. [1]. Mapping Flows You will now map flows as lines with thicknesses indicating the size of the flows: Destinationtype DestinationtypeID MSOAIZ MSOAIZ MSOAIZ MSOAIZ MSOAIZ MSOAIZ 13 53 60 17 105 96 17 64 90 18 62 75 16 92 76 10 45 70 Deast 50 340083.4 396003.7 13 342699.6 385411.4 11 338261.9 392714.7 9 336835.5 391054.2 3 341247.0 389565.8 4 337033.7 393676.8 Dnorth #Plot the MSOAs and then map the flows plot(lpoolMSOA) for (i in 1:length(Distttw4[, 1])) {lines(Distttw4 [i, c(34, 32)], Distttw4 [i, c(35, 33)], lwd=0.2 + Distttw4 [i,31]/100, col="blue")} 12 The map is almost untelligible at present, so we will modify the flows to make the picture clearer. First, we’ll remove the internal flows (people who lived and worked within the same MSOA). 1. Compute the distances between the Origins and Destinations. 2. Retain only rows with distance > 0; this removes internal flows. At this stage we create a new file, as we will want the rows with a distance of zero for a later stage.
#Compute the distances between the Origins and Destinations
Distttw4$Dist <- sqrt((Distttw4$Oeast-Distttw4$Deast)**2+(Distttw4$Onorth-Distttw4$Dnorth)**2) #Retain rows where Dist > 0
Distttw0 <- Distttw4[ which(Distttw4$Dist > 0),] #Principal flow threshold: retain rows where FlowOD > 250:
DistttwCut <- Distttw0[ which(Distttw0$FlowOD > 250),]
#Now plot the MSOAs and then map the flows
plot(lpoolMSOA)
for (i in 1:length(DistttwCut[, 1])) {lines(DistttwCut [i, c(34, 32)],
DistttwCut [i, c(35, 33)], lwd=0.2 + DistttwCut [i,31]/100, col=”blue”)}
13

That’s better! It is now possible to discern some major trends in commuting across Liverpool. Try some variants with other flow cutoffs (where DistttwCutTest is the new (output) file and Distttw0 is the input).
[2]. Linking, Mapping and Analysing Employment Data. Next we will map the percentage of un- employed people. Firstly, you need to join the employment data to the MSOA areas so that the employment counts and percentages can be mapped:
lpoolMSOA@data=data.frame(lpoolMSOA@data,
lpoolemploy[match(lpoolMSOA@data$geo_code, lpoolemploy$geocode),])
Now we’ll map the percentage of people who were unemployed in 2011 (unemployPC) and add a legend to the map. At this stage, we need two new libraries – RColorBrewer and classInt.
#Install ClassInt and RColorBrewer Libraries
install.packages(“RColorBrewer”, depend = TRUE) install.packages(“classInt”, depend = TRUE)
#Load Libraries
library(“RColorBrewer”) library(“classInt”)
We select a class interval scheme to display the values, and add a legend.
14

fj5 <- classIntervals(lpoolMSOA$unemployPC, n=5, style="fisher") pal <- grey.colors(4, 0.95, 0.55, 2.2) fj5Colours <- findColours(fj5, pal) plot(lpoolMSOA, col=fj5Colours, pch=19) legend("topleft", fill=attr(fj5Colours, "palette"), legend=names(attr(fj5Colours, "table")), bty="n") [4.214123,7.534738) [7.534738,11.57394) [11.57394,15.61473) [15.61473,22.35828) [22.35828,27.25926] To map unemployment rates, it’s useful to be able to round the values for inclusion in the legend. The first line of the code above can be modified to do this; we can also change the “topleft” element under the legend to more precisely position the legend (obviously, changing the x and y coordinates moves the legend); you will also add a scale bar and a north arrow. #Select Interval Classes and Display Scheme fj5 <- classIntervals(round(lpoolMSOA$unemployPC,digits=2), n = 5, style = "fisher") pal <- grey.colors(4, 0.95, 0.55, 2.2) fj5Colours <- findColours(fj5, pal) #Plot Map plot(lpoolMSOA, col=fj5Colours, pch=19) #Add Legend legend(x = 329500, y = 387500, fill=attr(fj5Colours, "palette"), legend=names(attr(fj5Colours, "table")), bty="n") 15 #Add a Scale Bar SpatialPolygonsRescale(layout.scale.bar(), offset = c(331000, 381500), scale = 4000, fill = c("white", "black"), plot.grid = F) text(331000, 381000, "0", cex = 0.6) text(333000, 381000, "2", cex = 0.6) text(335000, 381000, "4km", cex = 0.6) #Add a North Arrow SpatialPolygonsRescale(layout.north.arrow(1), offset= c(344000,395000), scale = 2000, plot.grid=F) [4.21,7.535) [7.535,11.575) [11.575,15.615) [15.615,22.355) 0 2 4km [22.355,27.26] The six figure numbers in the code specifying the Legend, Scale Bar and North Arrow are Eastings and Northings. They are co-ordinates from the British National Grid and can be used to change the position of the legend, scale bar and north arrow features.To determine the coordinates of your preferred location for these features, type the following command: locator() Then click a point in the figure where you wish to place the Legend or Scale Bar. Then hit the ESC key. The coordinates of the location will now appear in the R console. Use these co-ordinates to modify the position of the Scale Bar and Legend by substituting in the code above. Note that you can export the lpoolMSOA file into shapefile format which can then be opened in QGIS. This is done with: 16 writeSpatialShape(lpoolMSOA,"lpmsoa") This creates the output shapefile “lpmsoa.shp”. It is written to your working directory. If you prefer to use QGIS to make maps of MSOA area features you can do so following guidance provided in the previous practical exercises. [3]. Spatial Patterning. One useful way of exploring the spatial patterning in data values (here, un- employment percentages) is to compute a statistic called the Moran’s I spatial autocorrelation coefficient. Positive values of I indicate clustering of similar values, negative values of I indicate clustering of dissimilar values and values close to zero indicate zero spatial autocorrelation (a ‘random’ spatial pattern). First, we need to install and load the package, “spdep”: ## Loading required package: spData ## To access larger datasets in this package, install the spDataLarge ## package with: `install.packages('spDataLarge', ## repos='https://nowosad.github.io/drat/', type='source')` ## Loading required package: sf ## Linking to GEOS 3.6.1, GDAL 2.2.3, PROJ 4.9.3 Next we need to create a list indicating which MSOAs share boundaries with which other MSOAs. This is done using the command ‘poly2nb’. This must then be converted into a weights file using ‘nblistw’. The weights file represents the spatial relationship between MSOAs and if an MSOA borders five other MSOAs then each neighbouring MSOA is given a weight of 0.2 since the weights for each set of neighbours of each MSOA sum to one and 1/5 = 0.2. With this weights file we can measure the average difference between unemployment percentages at each MSOA and at the neighbouring MSOAs. A value of I close to one would indicate that the unemployment percentage values at each MSOA tend to be similar to the unemployment percentage values at other MSOAs with which they share a border - thus, the values cluster. ## Characteristics of weights list object: ## Neighbour list object: ## Number of regions: 61 ## Number of nonzero links: 302 ## Percentage nonzero weights: 8.116098 ## Average number of links: 4.95082 ## ## Weights style: W ## Weights constants summary: ##nnnS0 S1 S2 ## W 61 3721 61 26.82893 251.0616 #Download / install and then load spdep install.packages("spdep") #Load library library("spdep") MSOA.nb <- poly2nb(lpoolMSOA) MSOA.wt <- nb2listw(MSOA.nb) MSOA.wt 17 #Summarise the weights values summary(unlist(MSOA.wt$weights)) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.1250 0.1667 0.1667 0.2020 0.2500 1.0000 Using the new weights file we can now compute Moran’s I: ## ## Moran I test under randomisation ## ## data: lpoolMSOA$unemployPC ## weights: MSOA.wt ## ## Moran I statistic standard deviate = 5.9407, p-value = 1.419e-09 ## alternative hypothesis: greater ## sample estimates: ## Moran I statistic Expectation Variance ## 0.466154721 -0.016666667 0.006605424 Interpretation: The value of 0.466 indicates a moderate level of spatial clustering - that is, neighbouring unemployment percentages tend to be quite similar on average. The p-value of 1.419e-09 indicates that the clustering is highly significant (that is, it is unlikely to have occurred by chance) - values closer to zero indicate a higher significance level. Values such as 1.419e-09 may not mean much to you. This is scientific notation and is a convenient way of displaying very small or very large values. In the case of 1.419e-09, this means that the decimal place is moved 9 positions to the left, so 1.419e-09 = 0.000000001419. [4]. Exploring the Determinants of Commuting. Next, we are going to explore how the distance between MSOAs and their populations are related to the size of commuting flows. We might expect flows to be larger between MSOAs which are close together. Also flows to MSOAs which have large populations (and, by implications, more housing) may also be large. The relationships between variables such as flow size and distance can be analysed using regression. The regression approach you may be familiar with from Excel, for example, is called Ordinary Least Squares (OLS) regression. OLS regression is used to explore how a set of variables (‘independent variables’, for example, population size of MSOAs and distance between MSOAs) are related to a single variable (for example, flows between MSOAs). OLS is used here to explore how flows relate to distance, origin populations (number of employed persons), and destination populations. We run the OLS model with intra-MSOA (internal) flows removed. Research on commuting (and migration) flows suggests that we should transform the variables using logs (that is, they are put on a log scale; zeros can’t be logged, so we add 0.5 to each value): ## ## Call: moransUnemploy <- moran.test(lpoolMSOA$unemployPC, MSOA.wt) moransUnemploy lreg_ttw <- lm(log(FlowOD+0.5) ~ log(Dist+0.5) + log(Oemployed+0.5) + log(Demployed+0.5), data = Distttw0) summary(lreg_ttw) 18 ## lm(formula = log(FlowOD + 0.5) ~ log(Dist + 0.5) + log(Oemployed + ## 0.5) + log(Demployed + 0.5), data = Distttw0) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.7790 -0.7238 -0.0659 0.5940 3.2668 ## ## Coefficients: ## ## (Intercept) ## log(Dist + 0.5) ## log(Oemployed + 0.5) 0.91814 0.08527 10.768 < 2e-16 *** ## log(Demployed + 0.5) -0.25642 0.08485 -3.022 0.00253 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.089 on 3571 degrees of freedom ## Multiple R-squared: 0.1972, Adjusted R-squared: 0.1965 ## F-statistic: 292.4 on 3 and 3571 DF, p-value: < 2.2e-16 Interpretation: The estimate values (parameter estimates) for Dist, Oemployed and Oemployed are the key figures here. The value for Dist is negative, while the value for Oemployed is positive and that for Demployed is negative. This indicates that Dist (distance) has a negative effect - as distance increases, flows decrease. This fits with what we’d expect. The positive value for Oemployed also corresponds to what we’d expect - commuting flows are likely to be larger FROM MSOAs with large populations. The negative value for Demployed seems less intuitive until we note that the employment data we are using are for area of residence and NOT area of workplace. Hence, there may not be a “Pull Factor”” for areas with large numbers of employed residents - instead the “pull” is likely to be towards areas with large numbers of jobs. It is possible to instead assess flows by MSOA of workplace (the area in which people work) rather than (as here) the MSOA of residence (where people live), but will work only with residence data in this exercise. The astrices at the end of the columns indicate that all of the parameter estimates are statistically significant. However, there is a problem with using OLS. Flows are counts which are unlikely to have a Normal (bell-shaped) distribution - there isn’t an equal proportion of very small and very large flows, with the bulk in the middle range. Instead, there are many small flows and a few large flows. OLS is designed to work with data which have a Normal distribution, and so its application to count data like commuting flows is not sensible.An alternative approach is Poisson regression, which is appropriate for flow data. We can apply it using the glm function. Again, we exclude internal flows. ## ## Call: ## glm(formula = FlowOD ~ Dist + Oemployed + Demployed, family = poisson(), ## data = Distttw0) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -10.316 -4.861 -2.780 0.079 49.810 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|)
Estimate Std. Error t value Pr(>|t|)
4.67385 0.99651 4.690 2.83e-06 ***
-0.86549 0.03134 -27.619 < 2e-16 *** #Re-Run Model using Poisson Regression preg_ttw <- glm(FlowOD ~ Dist + Oemployed + Demployed, data = Distttw0, family = poisson() summary(preg_ttw) 19 ) ## (Intercept) 4.262e+00 2.206e-02 193.21 <2e-16 *** ## Dist -1.520e-04 1.269e-06 -119.76 <2e-16 *** ## Oemployed 2.953e-04 5.139e-06 57.46 <2e-16 *** ## Demployed -3.459e-04 4.938e-06 -70.06 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for poisson family taken to be 1) ## ## Null deviance: 171205 on 3574 degrees of freedom ## Residual deviance: 147039 on 3571 degrees of freedom ## AIC: 162888 ## ## Number of Fisher Scoring iterations: 6 Interpretation: As for OLS, the parameter estimate for Dist is negative, that for Oemployed is positive and again that for Demployed is negative. There are still problems with applying Poisson regression but it is conceptually an improvement on OLS regression. We know that data on small flows are unreliable. So, try re-running the Poisson regression model with different (small) flow thresholds (following the steps outlined for mapping flows). Use thresholds of 5 and 10 and compare results to those using all data (that is, a threshold of zero). As a reminder, this can be done with (for the example of a cut-off of 5 people): ## ## Call: ## glm(formula = FlowOD ~ Dist + Oemployed + Demployed, family = poisson(), ## data = DistttwCut) ## ## Deviance Residuals: ## Min 1Q Median 3Q Max ## -10.950 -4.953 -2.910 0.152 49.087 ## ## Coefficients: ## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 4.584e+00 2.216e-02 206.92 <2e-16 *** ## Dist -9.288e-05 1.282e-06 -72.47 <2e-16 *** ## Oemployed 1.970e-04 5.249e-06 37.53 <2e-16 *** ## Demployed -3.718e-04 4.964e-06 -74.89 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## (Dispersion parameter for poisson family taken to be 1) ## ## Null deviance: 127232 on 2673 degrees of freedom ## Residual deviance: 114571 on 2670 degrees of freedom ## AIC: 127848 ## ## Number of Fisher Scoring iterations: 6 n()) DistttwCut <- Distttw0[ which(Distttw0$FlowOD > 5),]
preg_ttw5 <- glm(FlowOD ~ Dist + Oemployed + Demployed, data = DistttwCut, family = poisso summary(preg_ttw5) 20 Optional (Advanced): If you want to add additional variables (perhaps downloaded from Infuse), you could do this as follows (say, for a file named “lpoolNEW” with the same format as the employment data and with only two sets of counts): Then rename to indicate origins and join to the new Disttw5 file created above: Giving the final file Distttw6 (for Dist > 0) containing the flows, distances, employment data and the new counts.
Conclusion
The various analysis that you have prepared during this tutorial will be of assistance when compiling your report. You may have to piece together different parts of the code to achieve this, however, examples of any code that you need to complete your report are contained in this document. However, you may wish to make use of the knowledge developed in previous practicals.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. Acknowledgements: Chris Lloyd, University of Liverpool.
colnames(lpoolNEW) <- c('DID','Dgeocode','Dlabel','Dtype','Dtypeid','Dcount1','Dcount2') Distttw5 <- merge(lpoolNEW,Distttw0,by.x='geocode',by.y='Destination', all=TRUE) colnames(lpoolNEW) <- c('OID','Ogeocode','Olabel','Otype','Otypeid','Ocount1','Ocount2') Distttw6 <- merge(lpoolNEW, Distttw5,by.x='geocode', by.y='Origin', all=TRUE) 21