3/8/22, 7:31 PM Pandas Indexing, Operations Review
Pandas Indexing, Operations Review CSCI-UA.0479-001
Whew. That was a lot of pandas we’ve been through.
The material was a little dry ( ), so to review, let’s try put a little context around it:
Copyright By PowCoder代写 加微信 powcoder
Wikipedia – List of the largest information technology companies
(which in turn sourced its data from official earnings reports, other stats sites, etc.)
(we’re just using it to practice some )
some interesting data that It contains includes:
company name revenue from 2017 number of employees location
All The Data
Copy and paste the following into your notebook / interactive shell. →
this will create a DataFrame
…containing some slightly modified (to compact) data from the Wikipedia article mentioned previously
import pandas as pd
d = [[“$229.2”, 2017, 123000, “$1100”, “Cupertino, US”],
[“$211.9”, 2017, 320671, “$284”, “Suwon, South Korea”],
[“$177.8”, 2017, 566000, “$985”, “Seattle, US”],
[“$154.7”, 2017, 1300000, “$66″, ” City, Taiwan”],
[“$110.8”, 2017, 80110, “$834”, “Mountain View, US”]]
comps = [“apple”, “samsung”, “amazon”, “foxconn”, “alphabet”]
cols = [“revenue”, “fy”, “employees”, “mcap”, “location”]
c = pd.DataFrame(d, index=comps, columns=cols)
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/pandas-review-arithmetic-index.html?print 1/5
3/8/22, 7:31 PM Pandas Indexing, Operations Review
Removing Columns
Looking at the data, fy (fiscal year), is the same throughout. We’re also not going to use the
mcap (market cap) column →
Name two ways to remove these two (fy, mcap) columns in place:
use the del operator use the .drop method
del c[‘mcap’]
c.drop(‘fy’, axis=1, inplace=True)
# … (default is to make a copy, so
# use keyword argument, inplace)
# btw, can also use axis=’columns’
It should look like this →
revenue fy employees mcap
apple $229.2 2017 123000 $1100
samsung $211.9 2017 320671 $284
amazon $177.8 2017 566000 $985
foxconn $154.7 2017 1300000 $66 City, Taiwan
alphabet $110.8 2017 80110 $834 Mountain View, US
Cupertino, US
Suwon, South Korea
Seattle, US
Retrieving Values
Now let’s try getting some values out of this DataFrame. →
1. Only show me the employees column
c[’employees’]
2. What was Amazon’s revenue in 2017?
c[‘revenue’][‘amazon’]
c.loc[‘amazon’, ‘revenue’]
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/pandas-review-arithmetic-index.html?print 2/5
3/8/22, 7:31 PM Pandas Indexing, Operations Review
Adding Columns!?
Hm. It looks like we’re missing some location data for the US based companies. Let’s add a
state column →
it’s ok to have missing values for companies that don’t have a state associated with it google and apple in CA, and amazon in WA
hint: think about label alignment…
c[‘state’] = pd.Series({‘apple’: ‘CA’, ‘amazon’: ‘WA’, ‘alphabet’:
Arithmetic and Comparisons/Selections
1. Only show the employees column… but do it so that the amount is in hundreds of thousands (for example 200000 should be 2… any precision is ok)
c[’employees’] / 100000
1. Show the companies that have less than 200,000 employees:
c[c[’employees’] < 200000]
Retrieving Values Continued
1. Show the revenue and location of rows apple through amazon
c[:3][['revenue', 'location']]
c.loc['apple':'amazon', ['revenue', 'employees']]
# inclusive when using labels for slicing!
note that .iloc can do the same thing by position:
c.iloc[:3, [0, 2]]
2. Only get the names of the companies... (or rather, how do you get the row labels?)
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/pandas-review-arithmetic-index.html?print 3/5
3/8/22, 7:31 PM Pandas Indexing, Operations Review
Speaking of NaN
If you have missing values, then it may make sense to fill them in with another default value
Use the .fillna method to do this (first argument is value to use to replace NaN with): c['state'] = c['state'].fillna('')
Vectorized String Methods
In addition to the arithmetic operators we've used, we can also use vectorized string
operations on Series →
methods are called off of str attribute some examples include:
str.upper: c['location'] = c['location'].str.upper()
str.split returns a Series of lists for each value it works on
the result of each split can be accessed through .str
c['country'] = c['location'].str.split(',').str[-1]
One Last Selection
Now that we've added state... maybe we want to show only the companies that have a state
associated with it →
Show all the companies that have a missing/NA/NaN value
c[c['state'].notnull()] (or companies that don't have a state)
c[c['state'].isnull()]
Rearranging
I'd like to rearrange the the table a little bit: →
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/pandas-review-arithmetic-index.html?print 4/5
3/8/22, 7:31 PM Pandas Indexing, Operations Review
move apple to the end of the list
swap location and state
while we're at it, why don't we add another row for microsoft (with NaN values filled in) (ok if it's at the end, after apple)
c.reindex(index=[*(list(c.index)[1:]), 'apple', 'microsoft'],
columns=['revenue', 'employees', 'state', 'location'])
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/pandas-review-arithmetic-index.html?print 5/5
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com