Module2-InClass
David’s First Notebook¶
By: – University of Connecticut
Copyright By PowCoder代写 加微信 powcoder
This is me coding along with the professor and learning some Colab markdown basics.
I would like to thanks and for much of the material herein.
# Put all import statements here
import pandas as pd
import numpy as np
Basic programming¶
We will begin with a quick overview of the programming/Python skills you will need for this course. This is not a programming/Python course, though, and you will not need advanced programming skills to create the business decision models we will see in this semester.
Additional references:
https://www.w3schools.com/python/python_intro.asp
Mount your Google Drive¶
You will need to mount your Google Drive every time you open a Colab file. In order to do that, just left-click the folder icon on the upper left side of the screen.
Alternatively, you can use the following code:
from google.colab import drive
drive.mount(‘/content/drive’)
Mounted at /content/drive
Import modules¶
Modules provide functionalities that were created by other people and you will be able to use in your code. There are several modules out there, so it makes sense to import only those you will need in your code. In case you forget, don’t worry: you will get an error message warning you that your desired functionality could not be found, and you will be able to import that module.
You can import modules on the go, but it is a good practice to import everything right from the beginning, after mounting your drive.
# import packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Printing values (or “Your first Python program!”)¶
Let’s start with a very basic program, where you will print a sentence in the output screen. For that, you will use the function print(), which is very simple: it just prints in the console whatever you include between parenthesis. You can print a string (which should appear between ” “), a number, or a variable (we will that see below).
If you want to print different things in the same line, you can just enumerate them, separating by comma. Alternatively, you can concatenate these things, but remember that, in this case, all these things should be strings. If you have a number, you can convert it in a string by using the function str().
Exercise: Play with the examples below, using different strings, integer values, fractionary values, printing several elements…. Printing is super easy and super important, so get familiar with that.
print(“z is: “, z)
Import modules¶
Sometimes, we will want to use some “famous” functions that are not so easy to compute by hand, like square root. There are Python libraries for that, so we just need to reuse them. For that, you need:
Import the library
Find out how to use the function you want (i.e., the parameters you need to provide to the function)
Exercise: Try to play with functions and variables in Python. There are several funtions involving strings and numbers. Some examples:
Check if a string has a certain substring or not (e.g., “ball” is a substring of “football”)
Try other mathematical functions (log, power, absolute value)
import math as mth
mth.sqrt(x)
## most common way to import the math package is to import all of it
from math import *
In most real-world problems, data is presented to us as lists (e.g., list of clients, list of values, list of products). Lists are very powerful in Python, and we will see them very often.
my_list = [1,5,3,”david”,x]
print(my_list)
[1, 5, 3, ‘david’, 9]
my_list[0]
my_list[0] + my_list[1]
my_list[3] = my_list[0] + my_list[1]
print(my_list)
[1, 5, 3, 6, 9]
my_list[0] = my_list[0] + 3
[4, 5, 3, 6, 9]
Some types of lists appear more frequently than others. It may be a good idea to check and practice with the examples below.
Variables allow you to do loops, which provide a systematic way to check sequences of values. For that, you can use “for”, as presented below:
for i in range(10):
print(sqrt(i))
1.4142135623730951
1.7320508075688772
2.23606797749979
2.449489742783178
2.6457513110645907
2.8284271247461903
What is the sum of all of the numbers from 0 to 9?
# Option 1
all_square_roots = []
for i in range(10):
all_square_roots.append(sqrt(i))
sum(all_square_roots)
19.30600052603572
print(all_square_roots)
[0.0, 1.0, 1.4142135623730951, 1.7320508075688772, 2.0, 2.23606797749979, 2.449489742783178, 2.6457513110645907, 2.8284271247461903, 3.0]
# Option 2: accumulation
partial_sum = 100
for i in range(10):
partial_sum += sqrt(i)
print(partial_sum)
119.3060005260357
### exactly the same thing as: x = x + 3
# What if we wanted the sum of the square roots of all numbers between 3 and 20
partial_sum = 0
for i in range(3,21):
partial_sum += sqrt(i)
print(partial_sum)
59.2517642490467
Nested for loops¶
# What is the sum of the square roots of the sum of all 100 pairs of numbers:
# [0,1,…,9]
# [0,1,…,9]
partial_sum = 0
for i in range(10):
for j in range(10):
print(“i=”,i,” j=”,j)
partial_sum += sqrt(i+j)
print(partial_sum)
i= 0 j= 0
i= 0 j= 1
i= 0 j= 2
i= 0 j= 3
i= 0 j= 4
i= 0 j= 5
i= 0 j= 6
i= 0 j= 7
i= 0 j= 8
i= 0 j= 9
i= 1 j= 0
i= 1 j= 1
i= 1 j= 2
i= 1 j= 3
i= 1 j= 4
i= 1 j= 5
i= 1 j= 6
i= 1 j= 7
i= 1 j= 8
i= 1 j= 9
i= 2 j= 0
i= 2 j= 1
i= 2 j= 2
i= 2 j= 3
i= 2 j= 4
i= 2 j= 5
i= 2 j= 6
i= 2 j= 7
i= 2 j= 8
i= 2 j= 9
i= 3 j= 0
i= 3 j= 1
i= 3 j= 2
i= 3 j= 3
i= 3 j= 4
i= 3 j= 5
i= 3 j= 6
i= 3 j= 7
i= 3 j= 8
i= 3 j= 9
i= 4 j= 0
i= 4 j= 1
i= 4 j= 2
i= 4 j= 3
i= 4 j= 4
i= 4 j= 5
i= 4 j= 6
i= 4 j= 7
i= 4 j= 8
i= 4 j= 9
i= 5 j= 0
i= 5 j= 1
i= 5 j= 2
i= 5 j= 3
i= 5 j= 4
i= 5 j= 5
i= 5 j= 6
i= 5 j= 7
i= 5 j= 8
i= 5 j= 9
i= 6 j= 0
i= 6 j= 1
i= 6 j= 2
i= 6 j= 3
i= 6 j= 4
i= 6 j= 5
i= 6 j= 6
i= 6 j= 7
i= 6 j= 8
i= 6 j= 9
i= 7 j= 0
i= 7 j= 1
i= 7 j= 2
i= 7 j= 3
i= 7 j= 4
i= 7 j= 5
i= 7 j= 6
i= 7 j= 7
i= 7 j= 8
i= 7 j= 9
i= 8 j= 0
i= 8 j= 1
i= 8 j= 2
i= 8 j= 3
i= 8 j= 4
i= 8 j= 5
i= 8 j= 6
i= 8 j= 7
i= 8 j= 8
i= 8 j= 9
i= 9 j= 0
i= 9 j= 1
i= 9 j= 2
i= 9 j= 3
i= 9 j= 4
i= 9 j= 5
i= 9 j= 6
i= 9 j= 7
i= 9 j= 8
i= 9 j= 9
289.920123073269
# What is the sum of the square roots of the sum of all pairs of numbers from 0-10:
# Suppose there are 50 students in this class. How many possible groups of size 3 are there?
for i in range(4):
print(“I am in the first level loop with i = “, i)
for j in range (i+1,4):
print(“I am in the second level loop with j = “, j)
#print(i,j)
I am in the first level loop with i = 0
I am in the second level loop with j = 1
I am in the second level loop with j = 2
I am in the second level loop with j = 3
I am in the first level loop with i = 1
I am in the second level loop with j = 2
I am in the second level loop with j = 3
I am in the first level loop with i = 2
I am in the second level loop with j = 3
I am in the first level loop with i = 3
partial_sum = 0
max_number = 9
for i in range(max_number + 1):
#print(“I am in the first level loop with i = “, i)
for j in range (i+1,max_number + 1):
#print(“I am in the second level loop with j = “, j)
#print(i,j)
partial_sum += sqrt(i+j)
print(partial_sum)
131.30865764708363
# Suppose there are 50 students in this class. How many possible groups of size 3 are there?
n_students = 50
counter = 0
for i in range(n_students):
for j in range(i+1,n_students):
for k in range(j+1,n_students):
counter += 1
print(counter)
If statements¶
# What is the sum of the numbers from 0 to 4 added to the
# square root of the numbers from 5 to 9?
partial_sum = 0
for i in range(10):
if i <= 4:
partial_sum += i
partial_sum += sqrt(i)
print(partial_sum)
23.15973615609375
# What is the sum of all of the even numbers from 0 to 9?
partial_sum = 0
for i in range(10):
print("At the start of the for loop with i = ", i)
if i%2 == 0:
print("Inside the if condition")
partial_sum += i
print("Outside the if condition")
print(partial_sum)
At the start of the for loop with i = 0
Inside the if condition
Outside the if condition
At the start of the for loop with i = 1
Outside the if condition
At the start of the for loop with i = 2
Inside the if condition
Outside the if condition
At the start of the for loop with i = 3
Outside the if condition
At the start of the for loop with i = 4
Inside the if condition
Outside the if condition
At the start of the for loop with i = 5
Outside the if condition
At the start of the for loop with i = 6
Inside the if condition
Outside the if condition
At the start of the for loop with i = 7
Outside the if condition
At the start of the for loop with i = 8
Inside the if condition
Outside the if condition
At the start of the for loop with i = 9
Outside the if condition
BostonHousing: General EDA Template¶
OPIM 5641: Business Decision Modeling - University of Connecticut
Try to fill in the blanks for memory everyday (type it, don't just copy/paste so that your fingers get used to the syntax!)
You may not need to use all of these codes everytime, but you will use all of these often as your wrangle data.
An introduction to the Data Frame¶
Data frames are common objects that hold tabular data.
df = pd.read_csv("https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv")
Head, shape, column names, data types, missing values¶
See what you're working with - any missing values? Wrong data types? Strange values?
If there are missing values, you can a) delete or b) impute (constant value like 0 or -999, mean or median of column, forward fill, backfill, interpolate, etc.)
crim zn indus chas nox rm age dis rad tax ptratio b lstat medv
0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98 24.0
1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14 21.6
2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7
3 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4
4 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33 36.2
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
501 0.06263 0.0 11.93 0 0.573 6.593 69.1 2.4786 1 273 21.0 391.99 9.67 22.4
502 0.04527 0.0 11.93 0 0.573 6.120 76.7 2.2875 1 273 21.0 396.90 9.08 20.6
503 0.06076 0.0 11.93 0 0.573 6.976 91.0 2.1675 1 273 21.0 396.90 5.64 23.9
504 0.10959 0.0 11.93 0 0.573 6.794 89.3 2.3889 1 273 21.0 393.45 6.48 22.0
505 0.04741 0.0 11.93 0 0.573 6.030 80.8 2.5050 1 273 21.0 396.90 7.88 11.9
506 rows × 14 columns
df.head(7)
crim zn indus chas nox rm age dis rad tax ptratio b lstat medv
0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98 24.0
1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14 21.6
2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7
3 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4
4 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33 36.2
5 0.02985 0.0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21 28.7
6 0.08829 12.5 7.87 0 0.524 6.012 66.6 5.5605 5 311 15.2 395.60 12.43 22.9
# see the shape of the dataframe:
print("Number of rows: ", df.shape[0])
print("Number of columns: ", df.shape[1])
Number of rows: 506
Number of columns: 14
# just get column names:
df.columns
Index(['crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax',
'ptratio', 'b', 'lstat', 'medv'],
dtype='object')
# look at the type of each column:
crim float64
zn float64
indus float64
chas int64
nox float64
rm float64
age float64
dis float64
rad int64
tax int64
ptratio float64
b float64
lstat float64
medv float64
dtype: object
RangeIndex: 506 entries, 0 to 505
Data columns (total 14 columns):
# Column Non-Null Count Dtype
— —— ————– —–
0 crim 506 non-null float64
1 zn 506 non-null float64
2 indus 506 non-null float64
3 chas 506 non-null int64
4 nox 506 non-null float64
5 rm 506 non-null float64
6 age 506 non-null float64
7 dis 506 non-null float64
8 rad 506 non-null int64
9 tax 506 non-null int64
10 ptratio 506 non-null float64
11 b 506 non-null float64
12 lstat 506 non-null float64
13 medv 506 non-null float64
dtypes: float64(11), int64(3)
memory usage: 55.5 KB
df.describe()
crim zn indus chas nox rm age dis rad tax ptratio b lstat medv
count 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000
mean 3.613524 11.363636 11.136779 0.069170 0.554695 6.284634 68.574901 3.795043 9.549407 408.237154 18.455534 356.674032 12.653063 22.532806
std 8.601545 23.322453 6.860353 0.253994 0.115878 0.702617 28.148861 2.105710 8.707259 168.537116 2.164946 91.294864 7.141062 9.197104
min 0.006320 0.000000 0.460000 0.000000 0.385000 3.561000 2.900000 1.129600 1.000000 187.000000 12.600000 0.320000 1.730000 5.000000
25% 0.082045 0.000000 5.190000 0.000000 0.449000 5.885500 45.025000 2.100175 4.000000 279.000000 17.400000 375.377500 6.950000 17.025000
50% 0.256510 0.000000 9.690000 0.000000 0.538000 6.208500 77.500000 3.207450 5.000000 330.000000 19.050000 391.440000 11.360000 21.200000
75% 3.677083 12.500000 18.100000 0.000000 0.624000 6.623500 94.075000 5.188425 24.000000 666.000000 20.200000 396.225000 16.955000 25.000000
max 88.976200 100.000000 27.740000 1.000000 0.871000 8.780000 100.000000 12.126500 24.000000 711.000000 22.000000 396.900000 37.970000 50.000000
df.describe()
crim zn indus chas nox rm age dis rad tax ptratio b lstat medv
count 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000 506.000000
mean 3.613524 11.363636 11.136779 0.069170 0.554695 6.284634 68.574901 3.795043 9.549407 408.237154 18.455534 356.674032 12.653063 22.532806
std 8.601545 23.322453 6.860353 0.253994 0.115878 0.702617 28.148861 2.105710 8.707259 168.537116 2.164946 91.294864 7.141062 9.197104
min 0.006320 0.000000 0.460000 0.000000 0.385000 3.561000 2.900000 1.129600 1.000000 187.000000 12.600000 0.320000 1.730000 5.000000
25% 0.082045 0.000000 5.190000 0.000000 0.449000 5.885500 45.025000 2.100175 4.000000 279.000000 17.400000 375.377500 6.950000 17.025000
50% 0.256510 0.000000 9.690000 0.000000 0.538000 6.208500 77.500000 3.207450 5.000000 330.000000 19.050000 391.440000 11.360000 21.200000
75% 3.677083 12.500000 18.100000 0.000000 0.624000 6.623500 94.075000 5.188425 24.000000 666.000000 20.200000 396.225000 16.955000 25.000000
max 88.976200 100.000000 27.740000 1.000000 0.871000 8.780000 100.000000 12.126500 24.000000 711.000000 22.000000 396.900000 37.970000 50.000000
round(df[‘indus’].mean(),2)
df[‘indus’].median()
df[‘indus’].var()
47.064442473682135
df[‘indus’].quantile(0.3)
df[‘indus’].quantile([0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9])
0.1 2.91
0.2 4.39
0.3 5.96
0.4 7.38
0.5 9.69
0.6 12.83
0.7 18.10
0.8 18.10
0.9 19.58
Name: indus, dtype: float64
np.sqrt(df[‘medv’])
0 4.898979
1 4.647580
2 5.890671
3 5.779273
4 6.016644
501 4.732864
502 4.538722
503 4.888763
504 4.690416
505 3.449638
Name: medv, Length: 506, dtype: float64
# record a new column which represents if the values of medv is above its mean
df[‘sqrt_of_medv’] = np.sqrt(df[‘medv’])
crim zn indus chas nox rm age dis rad tax ptratio b lstat medv sqrt_of_medv
0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98 24.0 4.898979
1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14 21.6 4.647580
2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7 5.890671
3 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4 5.779273
4 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33 36.2 6.016644
… … … … … … … … … … … … … … … …
501 0.06263 0.0 11.93 0 0.573 6.593 69.1 2.4786 1 273 21.0 391.99 9.67 22.4 4.732864
502 0.04527 0.0 11.93 0 0.573 6.120 76.7 2.2875 1 273 21.0 396.90 9.08 20.6 4.538722
503 0.06076 0.0 11.93 0 0.573 6.976 91.0 2.1675 1 273 21.0 396.90 5.64 23.9 4.888763
504 0.10959 0.0 11.93 0 0.573 6.794 89.3 2.3889 1 273 21.0 393.45 6.48 22.0 4.690416
505 0.04741 0.0 11.93 0 0.573 6.030 80.8 2.5050 1 273 21.0 396.90 7.88 11.9 3.449638
506 rows × 15 columns
# Add a column which indicates if the medv is above its mean
df[“medv_high”] = np.where(df[“medv”] <= np.mean(df["medv"]),0,1)
crim zn indus chas nox rm age dis rad tax ptratio b lstat medv sqrt_of_medv medv_high
0 0.00632 18.0 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90 4.98 24.0 4.898979 1
1 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14 21.6 4.647580 0
2 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7 5.890671 1
3 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4 5.779273 1
4 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33 36.2 6.016644 1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
501 0.06263 0.0 11.93 0 0.573 6.593 69.1 2.4786 1 273 21.0 391.99 9.67 22.4 4.732864 0
502 0.04527 0.0 11.93 0 0.573 6.120 76.7 2.2875 1 273 21.0 396.90 9.08 20.6 4.538722 0
503 0.06076 0.0 11.93 0 0.573 6.976 91.0 2.1675 1 273 21.0 396.90 5.64 23.9 4.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com