CS计算机代考程序代写 algorithm data science data structure DSCC 201/401

DSCC 201/401
Tools and Infrastructure for Data Science
March 10, 2021

• Brief history and overview
• R interfaces
• Language syntax and examples • Useful libraries
R
2

Objects in R
• Scalars and Characters

1
• Vectors

• Matrices

c(1,2,3)
matrix(c(1,2,3,4,5,6),nrow=2,ncol=3)
list(1,2,3,”hello”,sqrt)
• Data Frames (“table” or “heterogeneous matrix”) • Factors (“categorical data”)
• Functions (operations on objects)
• Lists (i.e. “heterogeneous vectors”)

3

• R is a calculator (operations on scalars) • Assignment of scalars and vectors
• Simple functions
• Sequence generators
• Operations on vectors
Scalars and Vectors
4

• Matrix creation and representation • Matrix operations
• Solving systems of linear equations • Eigenvalues and eigenvectors
Matrices
5

• Tags and values
• Simple tables
• Filtering data
• Reading and writing data
• Simple analysis of data in tables
Lists and Data Frames
6

Exercise
• What is the average of the high temperatures in Rochester for Summer 2014 (all data)?
• What is the standard deviation of the low temperatures in Rochester for July 2014? How does this compare with the standard deviation of the high temperatures in the same month?
7

Data Pre-Processing
• One of the most essential functions before data analysis can be performed
• Data pre-processing can be categorized into 4 main operations: • Data Cleaning
• Data Integration
• Data Transformation
• Data Reduction
8

Data Pre-Processing: Data Cleaning
• Data often needs to be “cleaned” before it can be used for a useful analysis
• Examples of how data can be identified as “dirty”
• Incomplete data – missing values or missing attributes
• Noisy data – contains obvious errors or many outliers
• Inconsistent data – contains discrepancies in codes and letters
9

Data Pre-Processing: Data Integration
• Data often needs to be combined or integrated from multiple data sources before an analysis can be performed
• Examples of how data integration can be performed:
• Combine data from different data sources into a common storage type, e.g. CSV (comma-separated values) file
• Perform a schema integration and add add data to common data structure for analysis
10

Data Pre-Processing: Data Transformation
• Data often needs to be transformed so it is consistent and in the appropriate format for analysis
• Examples of data transformation:
• Data Smoothing – Removing noise from the data
• Data Normalization – Scaling data to fit a specified range
11

Data Pre-Processing: Data Reduction
• Data can be reduced so it produces the same or similar analytical result
• Advantages include working with a smaller data set or the ability to use different algorithms for analysis
• Examples of data transformation:
• Data Aggregation – Only use the necessary features for the analysis from a very large data set
• Data Discretization – Combine ranges of numerical or labeled data into common sets (e.g. “binning”)
• Data Dimensionality Reduction – Decrease the number of variables needed to perform the analysis (e.g. principal component analysis)
12