程序代写代做代考 SQL data science matlab Excel Introduction to information system

Introduction to information system

Data I/O in R

Bowei Chen

School of Computer Science

University of Lincoln

CMP3036M/CMP9063M

Data Science 2016 – 2017 Workshop

Today’s Objectives

• Study the following slides for Data I/O in R

• Do the exercises 1-8 (exercise 5-8 are important!!!)

• Do the additional exercises (which can help you understand the contents of

the last week’s lectures)

Sources of Data for R

Entering Data from Keyboard

Perhaps the simplest method of data

entry is from the keyboard. The edit()
function in R will invoke a text editor

that will allow you to enter your data

manually.

> mydata <- data.frame(age = numeric(0), + gender = character(0), + weight = numeric(0)) > mydata <- edit(mydata) Importing Data from Excel There are many R packages can allow you to import data from excel. For example: openxlsx XLConnect xlsx … Note that most of the advice is for pre-Excel 2007 spreadsheets and not the later .xlsx format. > install.packages(“openxlsx”)

> library(“openxlsx”)
> df <- + read.xlsx( + "PublicHealthEnglandDataTableDistrict.xlsx", + sheet = 1, + startRow = 1, + colNames = TRUE + ) Importing Data from Delimited Text File (1/2) You can import data from delimited text files using read.table() , a function that reads a file in table format and saves it as a data frame. where file is a delimited ASCII file , header is a logical value indicating whether the first row contains variable names (TRUE or FALSE), sep specifies the delimiter separating data values, and row.names is an optional parameter specifying one or more variables to represent row identifiers. > mydataframe <- read.table(file, header = logical_value, + sep = "delimiter", + row.names = "name") Importing Data from Delimited Text File (2/2) > df <- read.table(file, header = TRUE, sep = ",") > head(df)

Date Open High Low Close Volume Adj.Close

1 2016-03-04 581.07 581.40 571.07 575.14 3405100 575.14

2 2016-03-03 577.96 579.87 573.11 577.49 2736700 577.49

3 2016-03-02 581.75 585.00 573.70 580.21 4576900 580.21

4 2016-03-01 556.29 579.25 556.00 579.04 5014400 579.04

5 2016-02-29 554.00 564.81 552.51 552.52 4013400 552.52

6 2016-02-26 560.12 562.50 553.17 555.23 4858200 555.23

Importing Data from XML

> # install and load the necessary package

> install.packages(“XML”)

> library(XML)

> xml.url <- "http://www.w3schools.com/xml/plant_catalog.xml" > xmlfile <- xmlTreeParse(xml.url) > class(xmlfile)

[1] “XMLDocument” “XMLAbstractDocument”

> xmltop = xmlRoot(xmlfile)

> plantcat <- xmlSApply(xmltop, function(x) { xmlSApply(x, xmlValue) } ) > # Finally, get the data in a data-frame and have a look at the first rows and columns

> plantcat_df <- data.frame(t(plantcat),row.names = NULL) > plantcat_df[1:5,1:4]

Importing Data from R Package

> library(MASS)

> data()

> data(phones)

> phones

$year

[1] 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73

$calls

[1] 4.4 4.7 4.7 5.9 6.6 7.3 8.1 8.8 10.6 12.0 13.5 14.9 16.1

[14] 21.2 119.0 124.0 142.0 159.0 182.0 212.0 43.0 24.0 27.0 29.0

Importing Data from Other Sources

• Importing SPSS files into R

• Importing Stata files into R

• Importing SAS files into R

• Importing Minitab files into R

• Importing Matlab files into R

• …

Importing Data in RStudio (1/2)

Importing Data in RStudio (2/2)

Writing Data Frame into CSV or TXT Files

write.table(x, file = “”, append = FALSE, quote = TRUE, sep = ” “,

eol = “\n”, na = “NA”, dec = “.”, row.names = TRUE,

col.names = TRUE, qmethod = c(“escape”, “double”),

fileEncoding = “”)

write.csv(…)

write.csv2(…)

Useful Functions for Working with Data Objects (1/2)

Number of elements/components length(object)
Dimensions of an object dim(object)
Structure of an object str(object)
Class or type of an object class(object)
How an object is stored mode(object)
Names of components in an object names(object)
Combines objects into a vector c(object, object,…)
Combines objects as columns cbind(object, object, …)
Combines objects as rows rbind(object, object, …)
Prints the object object

Useful Functions for Working with Data Objects (2/2)

Lists the first part of the object head(object)
Lists the last part of the object tail(object)
Lists current objects ls()
Deletes one or more objects. rm(object, object, …)
Edits object and saves as new object newobject <- edit(object) drop Argument By default, subscripting operations reduce the dimensions of an array whenever possible. To avoid that, we can use the drop=FALSE argument > mat <- matrix(1:12, 3, 4, byrow = TRUE) > mat

[,1] [,2] [,3] [,4]

[1,] 1 2 3 4

[2,] 5 6 7 8

[3,] 9 10 11 12

> s1 <- mat[1,]; s1 [1] 1 2 3 4 > dim(s1)

NULL

> s2 <- mat[1,,drop=FALSE]; s2 [,1] [,2] [,3] [,4] [1,] 1 2 3 4 > dim(s2)

[1] 1 4

Combined Selection

Suppose we want to get all the columns
for which the element at the first row is
less than 3:

> mat <- matrix(1:12, 3, 4, byrow = TRUE) > mat

[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12

> mycols <- mat[1,] < 3; mycols [1] TRUE TRUE FALSE FALSE > mat[ , mycols, drop=FALSE]
[,1] [,2]
[1,] 1 2
[2,] 5 6
[3,] 9 10

Using SQL Statements in R to Manipulate Data Frames

# install the package

> install.packages(“sqldf”)

> library(sqldf)

> newdf <- sqldf("select * from mtcars where carb=1 order by mpg", row.names=TRUE) > newdf

mpg cyl disp hp drat wt qsec vs am gear carb

Valiant 18.1 6 225.0 105 2.76 3.46 20.2 1 0 3 1

Hornet 4 Drive 21.4 6 258.0 110 3.08 3.21 19.4 1 0 3 1

Toyota Corona 21.5 4 120.1 97 3.70 2.46 20.0 1 0 3 1

Datsun 710 22.8 4 108.0 93 3.85 2.32 18.6 1 1 4 1

Fiat X1-9 27.3 4 79.0 66 4.08 1.94 18.9 1 1 4 1

Fiat 128 32.4 4 78.7 66 4.08 2.20 19.5 1 1 4 1

Toyota Corolla 33.9 4 71.1 65 4.22 1.83 19.9 1 1 4 1

References

• W. Venables, D. Smith, and the R Core Team (2015) An Introduction to R.

• P. Teetor (2011) R Cookbook. O’Reilly.

• J. Adler (2012) R in a Nutshell, 2nd Edition, O’Reilly

Exercise 1/8

1) Create a vector x represent numbers from 1 to 11

2) Save x into the x.RData file

3) Remove the object x from R workspace

4) Import the x.RData file into R and save it to x.

Please google it 

Exercise 2/8

1) Create the following data frame dfA

ID Case Number
1 case1 10
2 case2 20
3 case3 30

2) Save dfA into the dfA.csv file

Exercise 3/8

1) Download PublicHealthEnglandDataTableDistrict.xlsx from Blackboard

2) Import the dataset into R using openxlsx package

3) Show the first and last 20 lines of the dataset, respectively

4) Obtain the column names of the dataset

5) Create a new data frame which has the same column names of the

dataset and has the first and last 20 lines of the dataset

Exercise 4/8

1) Download the file AMZN.csv from Blackboard

2) Import the dataset into R

3) Show the class of all columns/fields

4) Create a new data frame where Open <= 570 and Close >= 550

5) Sort the data frame by High (in decreasing order)

6) Create another data frame where Close >= Open

Exercise 5/8

1) Import data from url “http://www.w3schools.com/xml/plant_catalog.xml”

2) Use the xmlTreePares function to parse xml file directly from the web

3) Use the xmlRoot function to access the top node

Exercise 6/8

1) Install the MASS package

2) Find Cars93 dataset

3) Extract all the records for the Volkswagen from the field Manufacturer

4) Order the extracted records (ascend) by Price and save it to a data frame

5) Write the data frame to Cars93FilteredData.csv

Exercise 7/8

1) Use SQL statements to manipulate data frame as required in Exercise 4/6

2) Write the data frame to Cars93FilteredData.RData

Exercise 8/8

1) Download the files iris1.csv and iris2.csv from Blackboard

2) Import these two files into R

3) Combine these two datasets into one data frame

4) Calculate the mean value of every columns

5) What will you do with missing values?

Additional Exercises

Well done if you’ve completed the exercises. Once you complete these

additional exercises, you can leave the workshop sessions 

Additional Exercise (1/2)

If a random variable 𝑋 follows a Poisson distribution, 𝑋~𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜆), please
prove the following equations:

1) 𝑥=0
∞ 𝑒

−𝜆𝜆𝑥

𝑥!
= 1

2) 𝔼 X = 𝜆

3) 𝕍 𝑋 = 𝜆

Additional Exercise (2/2)

If a random variable 𝑋 follows a Binomial distribution, 𝑋~𝐵𝑖𝑛 𝑛, 𝑝 . Suppose
that 𝑛𝑝 = 𝜆, please prove the following equation

lim
𝑛→∞

𝑓 𝑥; 𝑛, 𝑝 = lim
𝑛→∞

𝑛

𝑥
𝑝𝑥 1 − 𝑝 𝑛−𝑥 =

𝜆𝑥

𝑥!
𝑒−𝜆 = 𝑓(𝑥; 𝜆)

Thank You!

bchen@lincoln.ac.uk

mailto:bchen@lincoln.ac.uk