MIE1624H – Introduction to Data Science and Analytics
Lecture 2 – Python Programming
Python essentials
Copyright By PowCoder代写 加微信 powcoder
Variables and types
Operators and comparisons
Compound types – strings, tuples, lists and dictionaries Functions
Files and the operating system
Control flow – conditional statements (if, elif, else), loops Exception handling
Introduction to Pandas
Introduction to pandas data structures – DataFrame, index objects Pandas essential functionality
Summarizing and computing descriptive statistics
Pivot tables in pandas
IPython notebooks
Introducing Python
Python is an interpreted language (not a compiled one)
=> you can run code incrementally, one statement at a time
Expressions and Values
Arithmetic Operators
Expression
English description
11 plus 56
subtraction
23 minus 52
multiplication
4 multiplied by 5
exponentiation
2 to the power of 5
9 divided by 2
integer division
9 divided by 2
modulo (remainder)
Arithmetic Operator Precedence – 1
When multiple operators are combined in a single expression, the operations are evaluated in order of precedence.
Precedence
– (negation)
*, /, //, %
+ (addition), – (subtraction)
Arithmetic Operator Precedence – 2
>>> 10 % 3
>>> 25 + 30 / 6
>>> 5 + 4 ** 2
>>> 100 – 25 * 3 % 4
>>> 3 + 2 + 1 – 5 + 4 % 2 – 1 / 4 + 6
A type is a set of values and the operations that can be performed on those values.
Types int and float int: integer
3, 4, 894, 0, -3, -18
float: floating point number (an approximation to a real number)
5.6, 7.342, 53452.0, 0.0, -89.34, -9.5
>>> 5 + 2 * 4
>>> 5.0 + 2 * 4
>>> 30//6 59
Type str – 1
A string literal is a sequence of characters. A string can
be made up of letters, numbers, and special characters.
Strings in Python start and end with a single quote (‘) or double quotes (“).
If a string begins with a single quote, it must end with a single quote. The same applies to double-quoted strings.
The two types of quotes cannot be mixed!
>>> “Hello!”
>>> ‘how are you?’
‘how are you?’
>>> ‘short- and long-term’
‘short- and long-term’
Type str – 2
Dictionaries -1 Lists/Tuples/Strings are ordered collections
Dictionaries are mapped (unordered) collections – data is accessed using keys
– data is unordered
my_dictionary = {key_1:data_1, key_2:data_2,… , key_n:data_n}
key_1 maps to data_1, key_2 maps to data_2, etc.
my_dict = {‘Name’: ‘ ‘, ‘SN’: 990632802, ‘Status: ‘Registered’}
A dictionary keeps track of associations between keys and values
– like a regular dictionary where a key is a word and the value is the definition of that word.
► This makes look-ups and other operations very easy
Dictionary values can be any Python object ( standard objects or user-defined objects.
Dictionary keys must be immutable objects: strings, numbers, tuples, etc.:
– no duplicate key is allowed 13
Dictionaries – Properties
cmp(dict1, dict2) len(dict)
dictionary
Compares elements of both dict
Gives the total length of the dictionary, i.e., the number of items in the dictionary
Produces a string representation of a
Dictionary Functions
clear() copy() fromkeys()
get(key, default=None) has_key(key)
update(dict2) values()
Removes all elements of the dictionary
Returns a shallow copy of the dictionary Create a new dictionary with keys from
seq and values set to value.
Returns the value key maps to or default if
key is not in the dictionary Returns true if key is in the dictionary,
false otherwise
Returns the data in the dictionary as a list
of tuples (key, value)
Returns the keys in the dictionary as a list
Adds dict2’s key-values pairs
Returns the values in the dictionary as a list
Dictionary Methods
Form to define a dictionary:
my_dict = {key1: value1, key2: value2, …}
Form to look up a key’s value:
my_dict[key]
Form to add or update a key-value pair:
my_dict[key] = value
Form to delete from a dictionary
del dict[‘Name’]
dict.clear()
# remove entry with key ‘Name’ # remove all entries in dict
# delete entire dictionary
– my_dict is the name of the dictionary object.
– key is any immutable object (string, int, tuple). – value is any Python object.
Example -1
>>> CO2_by_year = {1799:1, 1800:70, 1801:74,
… 1802:82, 1902:215630, 2002:1733297}
>>> # Look up the emissions for the given year
>>> CO2_by_year[1801]
>>> # Add another year to the dictionary
>>> CO2_by_year[1950] = 734914
>>> CO2_by_year{1799: 1, 1800: 70, 1801: 74,
… 1802: 82, 1902: 215630, 2002: 1733297,
… 1950: 734914}
Example -2 >>> CO2_ by_year[2009] = 1000000
>>> CO2_by_year[2000] = 10 >>> CO2_by_year
{2000: 10, 2002: 1733297, 1799: 1, 1800: 70, 1801: 74, 1802: 82, 2009: 1000000, 1902: 215630}
>>> 1950 in CO2_by_year False
>>> del CO2_by_year[1950] >>> len(CO2_by_year)
>>> for key in CO2_by_year: . . . print(key)
How can we iterate through the keys?
for k in d.keys():
How can we iterate through the values? for v in d.values():
How can we iterate through the key-value pairs?
for key, value in dict.items():
Are similar to lists, but cannot be changed
Useful when we want to group data together, in assignment statements
my_tuple = (1, 2, 3)
Like mathematical sets, they are unordered!
An element is either in the set or it is not. eff icient membership check
>>> my_set = {1, 2, 3}
>>> x = [1, 2, 3]
>>> my_set = set(x)
Assignment statements – 1
General form of an assignment statement: variable = expression
General rule for executing an assignment statement:
1. Evaluate the expression to the right of the = sign.
-produces a memory address of the value the expression evaluates to
2. Store the memory address in the variable on the left of the = sign.
Assignment statements – 2
>>> difference = 20
>>> double = 2 * difference
>>> double
>>> difference = 5
>>> double
The expression on the right of the = sign is evaluated to 20 The value 20 will be put at memory address id1.
The variable on the left of the = sign, difference, will refer to 20 by storing id1 in difference.
Assignment statements – 3
>>> double = 2 * difference
The expression on the right of the = sign: 2 * difference is evaluated
=> difference refers to the value 20
=> difference * 20 evaluates to 40.
The memory address id2 is assigned to the value 40.
The variable on the left of the = sign, double, will refer to 40 by storing id2.
Assignment statements – 4
>>> base = 20
>>> height = 12
>>> area = base * height / 2
>>> celsius = 22
>>> fahrenheit = celsius * 9/5 + 32
>>> fahrenheit
Assignment statements – 5
Strings can also be stored as variables.
>>> reminder_text = ‘Buy groceries after work’
>>> reminder_text
‘Buy groceries after work’
>>> str_var = ‘Welcome to APS106’
>>> str_var
‘Welcome to APS106’
>>>str_var = “What is 10 * (2 + 9)?”
>>> str_var
‘What is 10 * (2 + 9)?’
Assignment statements – 6
Strings can also be stored as variables.
>>> reminder_text = ‘Please buy groceries after
>>> reminder_text
‘Please buy groceries after work’
>>> str_var = ‘Welcome to APS106’
>>> str_var
‘Welcome to APS106’
>>>str_var = “What is 10 * (2 + 9)?”
>>> str_var
‘What is 10 * (2 + 9)?’
Augmented Assignment Operators
>>> number = 3
>>> number
>>> number = 2 * number
>>> number
>>> number = number * number
>>> number
>>> score = 50
>>> score
50
>>> score = score + 20
>
>> score
70
Augmented Assignment Operators
Expression
Identical Expression
English description
x =7 x += 2
x= 7 x=x+ 2
x refers to 9
x =7 x -= 2
x= 7 x=x- 2
x refers to 5
x =7 x *= 2
x= 7 x=x* 2
x refers to 14
x =7 x /= 2
x= 7 x=x/ 2
x refers to 3.5
x =7 x //= 2
x = x // 2
x refers to 3
x =7 x %= 2
x= 7 x=x% 2
x refers to 1
x =7 x **= 2
x = x ** 2
x refers to 49 29
What is a function?
A function is a block of organized (reusable) code that is
used to perform an activity.
In Python a function is implemented as a compound
statement.
Python has built-in functions, but programmers can also create their own user-defined functions.
Defining a Python Function -1 The general form of a function definition:
def function_name (parameters):
[”’function_docstring”’]
function_body
[return [expression]]
A function block begins with the keyword def followed by the function name and parentheses ( ).
Parameters/arguments:
– 0 or more, separated by a comma, are placed within
the parentheses.
– variables whose values are supplied when the function
– by default, parameters have a positional behaviour
(exception: named arguments, which can be given in any order)
Defining a Python Function -2 Function body: consists of one or more statements,
The code block/body within every function starts with a colon (:) and is indented.
The first statement of a function can be an optional statement – the documentation string of the function, a.k.a. the docstring.
The statement return[expression]exits if a function is passing back a value to the caller.
=> A return statement with no arguments is the same as return None.
Defining a function only gives the function a name, declares its input parameters and specifies its behaviour, i.e., the instructions to be performed when the function is executed => but nothing gets executed yet!
Once the definition of a function is ready, the function can be executed by calling it from another function or directly from the Python prompt.
Using Functions
Calling a Function The general form of a function call:
function_name(arguments)
Executing a function call:
> Evaluate the arguments
> Call the function, passing in the argument values
>> the instructions in the body of the function are carried out
Function Design Recipe – Six Steps 1.Pick a meaningful name: a short answer to ‘What does the function do’?
2.Prepare the Type Contract and write the function header
> What are the parameter types? > What type of value is returned?
– Pick meaningful parameter names: it is much easier to understand a function if the variables have names that reflect their meaning.
3.Prepare a few examples (function calls) – ‘What should the function do’?
4.Description: write a docstring describing the function. Mention every parameter in your description. Describe the return value.
5.Body – Write the body of the function.
6.Test – Run the examples designed in Step 3 to make sure they work as expected. 35
Applying the Design Recipe -1
The United States measures temperature in Fahrenheit and Canada measures it in Celsius. When travelling between the two countries it helps to have a conversion function. Write a function that converts from Fahrenheit to Celsius.
1. Pick a name: convert_to_celsius
2. Type Contract and Header (what the function will look like)
Type Contract
(number) -> number
Header
def convert_to_celsius(fahrenheit):
3. Examples
convert_to_celsius(32)
=> 0
convert_to_celsius(212)
=> 100
4. Description
- Return the number of Celsius degrees
equivalent to fahrenheit degrees.
degrees = (fahrenheit – 32) * 5 / 9
return degrees
Applying the Design Recipe -2
Complete function definition
def convert_to_celsius(fahrenheit):
”’ (number) -> number
Return the celsius degrees equivalent to
fahrenheit degrees.
celsius = (fahrenheit – 32) * 5 / 9
return celsius
6. Test – run the examples.
>>> convert_to_celsius(32) 0
>>> convert_to_celsius(212) 100
Calling functions within other function definitions -1
Let us write a function to convert from hours to seconds.
def convert_to_minutes(num_hours):
“””(number) -> number
Return the number of minutes there are in num_hours
result = num_hours * 60
return result
def convert_to_seconds(num_hours):
“””(number) -> number
Return the number of seconds there are in num_hours
return convert_to_minutes(num_hours) * 60
>>> convert_to_minutes(2)
120
>>> convert_to_seconds(2)
7200
Calling functions within other function definitions -2
def convert_to_celsius(fahrenheit):
”’ (number) -> number
Return the number of celsius degrees
equivalent to fahrenheit degrees.
degrees = (fahrenheit – 32) * 5 / 9
return degrees
def convert_to_kelvin(fahrenheit):
”’ (number) -> number
Return the number of kelvin degrees equivalent
to fahrenheit degrees.
kelvin = convert_to_celsius(fahrenheit) + 273.15
return kelvin
>>> convert_to_kelvin(32) 39 273.15
Use Function Calls as Arguments to Other Functions
One triangle has a base of length 3.8 and a height of length 7.0 and a second triangle has a base of length 3.5 and a height of length 6.8. Find the area of the larger triangle.
The approach: pass calls to function area as arguments to built-in function max.
>>> max(triangle_area(3.8, 7.0), triangle_area(3.5, 6.8))
A module is a file containing Python definitions and statements.
The file name is the module name with the suffix .py appended. Example: fibo.py module
# Fibonacci numbers module
def fib(n): # write Fibonacci series up to n
a, b = 0, 1 while b < n:
a, b = b, a+b
def fib2(n): # return Fibonacci series up to n
result = [] a, b = 0, 1 while b < n:
result.append(b)
a, b = b, a+b return result
When needed just: import fibo
open(filename, mode)
(str,str) -> io.TextIOWrapper opens the f ile Filename
in the same directory as the .py f ile returns a f ile-handle
mode can take several values:
r: open the f ile for reading
w: open the f ile for writing (erasing the content!) a: open the f ile for writing, appending new
information to the end of the f ile
Opening Files
Opening Files -2
To start using a f ile, given its f ilename , it has to be open. (The
name is a string.)
To open the f ile, use the function open()
myfile = open(“story.txt”, “r”) open()isa Pythonfunction
story.txt is the name of the f ile to be open myfile is a variable that is assigned the f ile object
returned by open
r is a string indicating what we wish to do with the f ile.
Options for this string are “r”, “w”, “a”, meaning read, write or append. The default is “r “
Note: writing to a f ile that already exists, erases the existing content. Use append if you want to preserve the content.
Closing Files
myfile.close() (NoneType) -> NoneType
myf ile is the f ile object returned by open()
Reading Files
We call a f ile object that was opened for reading a reader
1. Read lines one at a time from beginning to end: for line in myf ile:
2. Read everything in the f ile at once into a list of strings: read the whole f ile into list str_ls. Each element of str_ls is a line.
str_ls = myf ile.readlines() print(str_ls)
Various ways to read from a reader:
3.Read everything in the f ile at once into a string:
s = myfile.read() # Read the whole file into string s. print(s)
4. Read a certain number of characters:
s = myfile.read(10) # Read 10 characters into s print(s)
5. Read a line at a time:
s = myfile.readline() # Read a line into s. print(s)
s = myfile.readline() # Read the next line into s. print(s)
Reading Files – Recap myfile.readline()- read 1 line from the f ile
myfile.read()- read the whole f ile into a single string myfile.readlines() – read the whole f ile into a list, with each
element being one line of text
myfile.readlines(n)- read the next N bytes of a f ile, rounded up to the end of a line.
Reading CSV in Python First we open a f ile to write, then we write the contents Introduction to Pandas Pandas Data Structures -Series One-dimensional labeled array Series from from an ndarray Series from from a dictionary If data is a scalar value, an index must be provided. The value will be repeated to match the length of index Series acts very similarly to a ndarray, and is a valid argument to class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)[source] data : numpy ndarray, dictionary (Series, arrays, constants, or list-like objects), or DataFrame The result index will be the union of the indexes of the various Series. The row and column labels can be accessed, respectively, by accessing the index and columns attributes: 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com
import csv
input_file = open(file_name)
reader = csv.reader(input_file)
for line in reader:
# Read as from an ordinary file, but line
f ilehandle.write()
Writing to a f lie
Just like printing, except you have to add your own newline characters
Close your f lie
an open source Python library providing high performance data structures and analysis tools.
>>> import pandas as pd
>>> import numpy as np
>>> import matplotlib.pyplot as plt
Holds any data type (integers, strings, floating point numbers, Python objects, etc.)
The axis labels are collectively referred to as the index.
>>> s = pd.Series(data, index=index)
data: a dictionary, an ndarray, a scalar value (e.g., 11) index: is a list of axis labels.
>>> s = pd.Series(np.random.randn(5), index=[‘a’,
‘b’, ‘c’, ‘d’, ‘e’])
a 0.2735
b 0.6052
c -0.1692
d 1.8298
e 0.5432
dtype: float64
>>> s.index
Index([‘a’, ‘b’, ‘c’, ‘d’, ‘e’], dtype=’object’)
>>> pd.Series(np.random.randn(5))
0 0.3674
1 -0.8230
2 -1.0295
3 -1.0523
4 -0.8502
dtype: float64
If an index is passed, the values in data corresponding to the labels in the index will be pulled out.
If no index is passed, an index will be constructed from the sorted keys of the dict, if possible.
>>> d = {‘a’ : 0., ‘b’ : 1., ‘c’ : 2.} >>> pd.Series(d)
dtype: float64
>>> pd.Series(d, index=[‘b’, ‘c’, ‘d’, ‘a’]) b 1.0
dtype: float64
NOTE: NaN is the standard missing data marker used in pandas
>>> pd.Series(5., index=[‘a’, ‘b’, ‘c’, ‘d’, ‘e’])
dtype: float64
Series from from a scalar value
most NumPy functions.
>>> 0.27348116325673794
>>> a 0.2735
b 0.6052
c -0.1692
dtype: float64
A Series is like a fixed-size dict in that you can get and set values
by index label:
>>> s[‘a’]
>>> 0.27348116325673794
>>> s[‘e’] = 12.
>>> s.get(‘a’)
>>> 0.27348116325673794
Series Behaviour
Two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled rows and columns.
Dictionar-like container for Series objects.
Arithmetic operations align on both row and column labels.
DataFrame Objects
index : index or array-like to use for resulting frame.
columns : Index or array-like, labels to use for resulting frame. dtype : data_type (default None), to force, otherwise infer copy : boolean (default False), to copy data from inputs.
>>> d = {‘col1’: ts1, ‘col2’: ts2}
>>> df1 = DataFrame(data = d, index = index)
>>> df2 = DataFrame(numpy.random.randn(10, 5))
>>> df3 = DataFrame(numpy.random.randn(10, 5),
columns=[‘a’, ‘b’, ‘c’, ‘d’, ‘e’])
numpy.random.randn returns a sample(s) from the
“standard normal” distribution.
DataFrame – Parameters
d = {‘one’ : pd.Series([1., 2., 3.], index=[‘a’, ‘b’, ‘c’]),
‘two’ : pd.Series([1., 2., 3., 4.], index=[‘a’, ‘b’, ‘c’, ‘d’])} >>> df = pd.DataFrame(d)
>>> pd.DataFrame(d, index=[‘d’, ‘b’, ‘a’]) one two
DataFrames from Series or dictionaries
Note: when a particular set of columns is passed along with a dict of data,