3/8/22, 7:32 PM Functions, Operations
Functions, Operations CSCI-UA.0479-001
Numpy Functions, Apply, and Map
In addition to basic arithmetic and comparison operators, there are some methods and
Copyright By PowCoder代写 加微信 powcoder
functions that allow for operations across all elements, row, or columns in a DataFrame: →
apply to use a function for every element in a row or column or applymap to use a function for every element
numpy universal functions (ufuncs) – a function that operates on every element in an ndarray
such as floor, abs, add, etc. (see the numpy docs for more)
Sample Data
To go over apply and applymap, we’ll use two very small sample data sets →
Precipitation by month from www.usclimeatedata.com
(data sourced from NOAA)
Wikipedia – List of the largest information technology companies
(data sourced from official earnings reports, stats sites, etc.)
Precipitation Data
Copy and paste the following into your notebook / interactive shell. →
import pandas as pd
rain = pd.DataFrame([[3.50, 4.53, 4.13, 3.98],
[7.91, 5.98, 6.10, 5.12],
[3.94, 5.28, 3.90, 4.49],
[1.42, 0.63, 0.75, 1.65]],
index=[‘ ‘, ‘ ‘, ‘Atlanta’, ‘Seattle’],
columns=[‘Jun’, ‘Jul’, ‘Aug’, ‘Sept’])
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/pandas-func-ops.html?print 1/6
3/8/22, 7:32 PM Functions, Operations
apply allows a function to be called on one-dimensional arrays by row or by column →
d.apply(fn, axis=a)
fn is the function to be applied to every element in the one-dimensional array axis is a keyword argument specifying which axis to work across
if axis is 0 or index, it will work across rows (and consequently, the results will be per column)
if axis is 1 or columns, it will work across columns (results will be per row)
the function passed in should take a single argument, a Series
Technology Companies Data
Copy and paste the following into your notebook / interactive shell. →
import pandas as pd
d = [[“$229.2”, 2017, 123000, “$1100”, “Cupertino, US”],
[“$211.9”, 2017, 320671, “$284”, “Suwon, South Korea”],
[“$177.8”, 2017, 566000, “$985”, “Seattle, US”],
[“$154.7”, 2017, 1300000, “$66″, ” City, Taiwan”],
[“$110.8”, 2017, 80110, “$834”, “Mountain View, US”]]
comps = [“apple”, “samsung”, “amazon”, “foxconn”, “alphabet”]
cols = [“revenue”, “fy”, “employees”, “mcap”, “location”]
c = pd.DataFrame(d, index=comps, columns=cols)
Wait, What?
The function passed in to apply is called with a Series representing a row or col Let’s see →
rain.apply(lambda arg: str(type(arg))) (remember that lamdas return the expression after the 🙂
Jun
Jul
Aug
Sept
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/pandas-func-ops.html?print 2/6
3/8/22, 7:32 PM Functions, Operations
apply Continued
Using our rain data… let’s try to calculate the following: →
what is the total rainfall for each month for all cities combined?
rain.apply(lambda month: sum(month), axis=0)
# or axis=’index’
# default is 0 anyway, so axis can be omitted
what is the total rainfall for each city during the summer (all 4 months)?
rain.apply(lambda city: sum(city), axis=1)
# or axis=’columns’
DataFrame and Series Methods
HOLD ON! There are built in methods that work on rows / columns already! →
Let’s take a look at a few common ones that work on both DataFrames and Series (see DataFrames and Series docs for more):
sum (we just did this with apply)
reimplementing the solutions from the previous slide: rain.sum(axis=0), rain.sum(axis=1)
mean – find the average rainfall for each city for all of the months in the DataFrame rain.mean(axis=”columns”)
max and min – how much rainfall was there for each city during the least rainy month rain.min(axis=’columns’)
of course, there are others, like median, mode, etc.
So, Uh, Apply Again?
Where does that leave us with apply? Well, we can perform even more complicated
row/column calculations. →
For each city, show the difference between the rainiest and least rainy summer month (that is, what’s the difference between the max rain and the min rain?)
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/pandas-func-ops.html?print 3/6
3/8/22, 7:32 PM Functions, Operations
map, applymap
map and applymap call a function on every element in a Series and DataFrame
respectively What Series will this give back?
pd.Series([‘ant’, ‘cat’, ‘bat’]).map(lambda word: word + ‘s’)
…and using a named function on every element in a DataFrame: def factorial(n):
return 1 if n == 0 else n * factorial(n – 1)
nums = pd.DataFrame(np.arange(9).reshape((3, 3)))
nums.applymap(factorial)
map and applymap Practice
using our rainfall data, convert every number from inches to centimeters (1 inch is about 2.5
cm) and add cm as a label (it’s ok if all values are converted to strings)
rain.applymap(lambda inches: f'{inches * 2.5:.2f} cm’) using our tech company data, show the revenue column such that each dollar amount is
converted to an actual number (remove the dollar sign and convert to a numeric type)
c[‘revenue’].map(lambda revenue: float(revenue[1:]))
# note the use of map instead of applymap
# since this is a Series
(btw, if you want to actually set that conversion, you can use assignment: c[‘revenue’] = c[‘revenue’].map(lambda revenue: float(revenue[1:])))
rain.apply(lambda city: city.max() – city.min(), axis=”columns”)
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/pandas-func-ops.html?print 4/6
3/8/22, 7:32 PM Functions, Operations
A dataframe can be sorted by index or by values →
sort_index – sorts by row label lexicographically
sort_values – using the by keyword argument, sorts by a specific column asecending=False – sorts in descending order for both methods above
Sorting Practice
Using our Tech Company data… →
1. show the data in ascending order based on revenue
c.sort_values(by=’revenue’)
2. add a column that shows revenue per employee, then show the company that has most
revenue per employee (you can do this in multiple steps)
# convert revenue to float again
c[‘revenue’] = c[‘revenue’].map(lambda rev: float(rev[1:]))
# create a new column for revenue per employee
c[‘rev_emp’] = c[‘revenue’] * 1000000000 / c[’employees’
# sort descending by rev_emp
c.sort_values(by=’rev_emp’, ascending=False)
Built-In Methods Again
As you may have guessed, there are built in methods that apply to all elements as well →
in some ways, same as using +
but you can also pass in a Series and broadcast across rows or columns
for our rain data, let’s try to round all of the values to one decimal place
rain.round(1)
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/pandas-func-ops.html?print 5/6
3/8/22, 7:32 PM Functions, Operations
Summary Statistics, Unique, Counts
Calling describe on your DataFrame yields some descriptive statistics for →
count (number of rows), min, max, mean, etc. rain.describe() … c.describe() note that depending on types, output will vary
To help get an overview of the values and labels you have, use unique and value_counts on a Series
the unique method will give back an array of the unique values in a Series value_counts is a top-level function that will give back the counts of the distinct values in a Series
(btw, for a DataFrame, value_counts is the count of unique rows, ignoring rows with NA)
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/pandas-func-ops.html?print 6/6
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com