3/8/22, 7:35 PM Num !
Num ! CSCI-UA.0479-001
Creating Arrays
numpy provides a multidimensional container for homogeneous (same type and size in
Copyright By PowCoder代写 加微信 powcoder
memory) types: ndarray (n-dimenional array You can create an array by →
array called with a sequence (like a list, tuple, etc.)
ones, zeros called with an integer or tuple of ints (dimensions ) arange called with a start, stop and step
random.randn called with arbitrary number of args as dimensions
Creating Arrays Examples
# both of these sequences results equivalent arrays
np.array([[1, 1], [2, 2]])
np.array(((1, 1), (2, 2)))
# array([[1, 1],
# [2, 2]])
np.zeros((2, 5)) # array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]])
np.arange(4, 12, 2) # array([ 4, 6, 8, 10])
np.random.randn(2, 3)
# array([[-0.41478999, -0.87304136, -0.23290474],
# [ 0.30277282, 0.44985592, 1.06013982]])
Describing an Array
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/numpy-basics.html?print 1/16
3/8/22, 7:35 PM Num !
A Note on Types
numpy arrays can contain values of the following types →
there are a large number types…
a set of these types use a bit width convention to ensure sizes of arrays
int64, unit64, float16, etc.
types can be abbreviated using single characters (U for unicode) type of array is widest type
arrays can actually have different types
Describing an Array Examples
Given the following array, what will the ndim, shape and dtype properties be? →
arr = np.array([[[1, 1], [2, 2], [3, 3]],
[[4, 4], [5, 5], [6, 6]],
[[7, 7], [8, 8], [9, 9]]])
arr.ndim → 3
arr.shape → (3, 3, 2) arr.dtype → dtype(‘int64’)
The following properties can be used to get some information about your shiny, new array… →
ndim – number of dimensions
shape – a tuple containing the size of each dimension
think of this like nested lists…
1st element is outermost dimension
last element is the innermost: [[77], [88]] → (2, 1) (can also be assigned a value to reshape)
dtype – the data type of the array
(inferred from values, or set explicitly via keyword arg, dtype=”type name”) can convert type with astype
Shape Again
Remember, shape ■ gives us the size of each dimension as a tuple, starting from the outermost dimension
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/numpy-basics.html?print 2/16
3/8/22, 7:35 PM Num !
You’ll often see the term axis, followed by a number, to address a specific dimension →
axis 0, axis 1, etc.
this describes the position of the dimension as given by .shape for example, [[1, 2, 3], [4, 5, 6]]
.shape is (2, 3), so…. axis 0, rows, is 2
axis 1, columns, is 3
In higher dimensions, row and column is not going to be 0 and 1 (likely last 2, instead!) In lower dimension, only columns, so axis 0 is columns, not rows!
What is the resulting .shape tuple for the following; describe what the tuple represents in natural language →
np.array([1, 2, 3]) (3,) – 3 columns
np.array([[1, 2, 3], [1, 2, 3]]) (2, 3) – 2 rows, 3 columns
np.array([
[[1, 2], [3, 4], [5, 6]],
[[7, 8], [9, 10], [11, 12]]
(2, 3, 2) – 2 “tables”, each with 3 rows, and 2 columns
Yeah, so with that said…. We’ll be working with tabular data, so we’ll be sticking to 2 dimensions mostly.
When might higher dimensional data be needed, though (lets think through some scenarios)?
keeping track of historical tabular data (for example, people responding to the same survey questions over time)
image data as separate channels (a grid of red, grid for green, blue …)
…and of course, video (several images over time)
dealing with a large feature set for machine learning
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/numpy-basics.html?print 3/16
3/8/22, 7:35 PM Num !
About That Reshaping
You can change the dimensions and shape of an array by: →
assigning a tuple to the shape property changes ndarray in place
…or calling reshape
accepts tuple as argument
returns new ndarray with specified shape
a = np.arange(9)
a.reshape((3, 3))
# gives back:
# array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
# (but a stays the same)
a.shape = (3, 3)
# changes a itself!
Array Arithmetic
Arithmetic operations behave differently based on the type of the other operand. For example.
If the other operand is a scalar (single value types like int, float, boolean, string, etc.), then the operation is performed on every element using the same scalar as the second operand (vectorization):
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr * 10)
[[10 20 30]
[40 50 60]]
Broadcasting
Multiplying an ndarray with a scalar is the most simple case of broadcasting.
Broadcasting is a fancy term for how numpy deals with arrays with different shapes.
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/numpy-basics.html?print 4/16
3/8/22, 7:35 PM Num !
Simple Broadcasting
Same shape or w/ scalar. What are the resulting arrays? →
# same shape: perform operation on elements in same positions
np.ones((2, 3)) + np.array([[1, 2, 3], [4, 5, 6]])
array([[2., 3., 4.],
[5., 6., 7.]])
# stretch scalar value out over all dimensions needed to
# to create array of same dimensions ([[5, 5, 5], [5, 5, 5]])
np.ones((2, 3)) * 5
array([[5., 5., 5.],
[5., 5., 5.]])
only works when the arrays being used are compatible (more on that later) provides a mechanism for vectorizing array operations by…
stretching out dimensions / shapes to make two arrays the same shape
no loops have to be written to apply operations to every array element looping occurs in C instead of Python
no extra copies of data have to made to do this
Not So Simple Broadcasting
That was easy. …but what about different shapes? →
Broadcasting can only be performed if the dimensions, starting from the end, either →
or… one of them is 1
If different dimensions, left pad with 1, and follow rules above
Can the following shapes be made compatible? →
(2, 3, 2), (2, 1, 2) (2, 2, 3) and (3, 2) (3,) and (4, 3)
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/numpy-basics.html?print 5/16
3/8/22, 7:35 PM Num !
Compatible , Now What?
If two Arrays are compatible, how do we make the shape of both arrays match? →
left pad with 1 to make equal ndim
stretch out dimensions with size 1 by repeating elements
a1 = np.ones((3, 3)); a2 = np.array([1, 2, 3])
change shape of a2 from (3, ) to (1, 3): [[1, 2, 3]] repeat along new axis / dimension until size matches (repeat 3 times)
[[1, 2, 3], [1, 2, 3], [1, 2, 3]]
a1 + a2→[[2, 3, 4], [2, 3, 4], [2, 3, 4]]
And Another One
What are the shape properties of a1 and a2? →
a1 = np.ones((2, 3, 2))
a2 = np.array([[[8, 9]], [[88, 99]]])
a1 + a2 # ????
a1.shape # 2, 3, 2
a2.shape # 2, 1, 2
How is a2 stretched along axis 1 to allow a1 + a2? np.array([[[8, 9], [8, 9], [8, 9]],
[[88, 99], [88, 99], [88, 99]]])
And the result of a1 + a2 is:
[[[ 9., 10.], [ 9., 10.], [ 9., 10.]],
[[ 89., 100.], [ 89., 100.], [ 89., 100.]]]
Some More Examples Want some more practice…? →
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/numpy-basics.html?print 6/16
3/8/22, 7:35 PM Num !
Now for Some Indexing
Works like you’d expect (again, think of nested lists):
a = np.array([[10, 11, 12], [13, 14, 15]])
Get the first element of a:
a[0] # array([10, 11, 12])
Now get the last item of the first sub array of a a[0][2] # 12 (also a[0][-1]
Alternatively, use tuple a[0, 2] or a[(0, 2)]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
array([[ 2, 4, 6],
[ 8, 10, 12]])
arr + np.array([1, 2, 3])
array([[2, 4, 6],
[5, 7, 9]])
Reduced Dimensions
Note that when you index with a value containing less dimensions, you get an array with less
dimensions consisting of only the data in the higher dimensions →
a = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
a[1] – only axis 0 is given, so resulting array is data from axis 1 and 2 array([[5, 6], [7, 8]])
a[1, 0] – both axis 0 and 1 are given, so resulting array is data from axis 2 only array([5, 6])
(we already sort of do this intuitively)
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/numpy-basics.html?print 7/16
3/8/22, 7:35 PM Num !
Assignment
We can use indexing to perform assignment, as with regular lists… but with some magic!
a = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
a[0][0][0] = 135 – a[0][0] = 135
[[[135, 135], [3, 4]], [[4, 6], [5, 8]]] both elements in a[0][0] set to 135
(repeat 135 along axis 2 at a[0][0])
a[0][0] = [99, 135] – [[[ 99, 135], [3, 4]], …] a[0] = [987, 987]
[[[987, 987], [987, 987]], [[ 4, 6], …] (repeat [987, 987] along axis 1 at a[0])
Again, kind of like working with lists… indexing into an array gives you a view into the array,
not a new sub array →
consequently, you’re not getting a copy back if you index
so if you perform assignment on the value that you get back after indexing, it changes the original array
a = np.ones((2, 3))
last_row = a[-1]
last_row[-1] = 456
#…what does a look like???
# a is now… array([[ 1.,
# [ 1., 1., 456.]])
Same, but different… as usual. This should be familiar… np.ones(5)[:2] → [1., 1.]
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/numpy-basics.html?print 8/16
3/8/22, 7:35 PM Num !
For Reference
Here’s a view of our array, a →
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8]],
[[ 9, 10, 11],
[12, 13, 14],
[15, 16, 17]],
[[18, 19, 20],
[21, 22, 23],
[24, 25, 26]],
[[27, 28, 29],
[30, 31, 32],
[33, 34, 35]]])
What are the rules for slicing syntax again? →
leave out value before colon :m start at beginning (0)
leave out value after colon n: end at end
leave out both : beginning to end
slices grab a range of elements along an axis
…but the crazy part is that you can have multiple slices in a single expression
a = np.arange(36).reshape((4, 3, 3))
a[1:3,:2,1:]
array([[[10, 11], [13, 14]],
[[19, 20], [22, 23]]])
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/numpy-basics.html?print 9/16
3/8/22, 7:35 PM Num !
Mixing Slices and Indexes
Care must be taken when mixing slices and indexes →
For example, the following indexing:
a = np.array([[2, 4], [6, 8], [10, 12], [14, 16]])
print(a[1:3, -1])
… can be interpreted as:
select the two rows at index 1 and 2 (2nd and 3rd row)
select the last element of each row
the result is an array containing the last element of 2nd and 3rd rows
How’d We Slice That?
Let’s take a look at the slice in more detail.
a = np.arange(36).reshape((4, 3, 3))
a[1:3,:2,1:]
first…we can think of axis 0 as “table/panel”, axis 1 as row and axis 2 as col so this says, only give me tables 1 and 2
and from those tables, I want the first 2 rows, and the last 2 columns
array([[[10, 11],
[13, 14]],
[[19, 20],
[22, 23]]])
Compared with Consecutive Indexes ([]’s)
With single integers as indexes, using consecutive []’s yields the same results:
a = np.array([[2, 4], [6, 8], [10, 12], [14, 16]])
print(a[0, 1] == a[0][1]) # True!
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/numpy-basics.html?print 10/16
3/8/22, 7:35 PM Num !
Slice and Assign
Based on what we’ve seen before, what will happen here? →
a = np.arange(36).reshape((4, 3, 3))
a[1:3,:2,1:] = 0
array([[[ 0, 1, 2],…
[[ 9, 0, 0],
[12, 0, 0],
[15, 16, 17]],
[[18, 0, 0],
[21, 0, 0],
[24, 25, 26]], …]])
Slicing Also a View
… ok, so here’s where numpy ndarray differs from list and other sequence types.
unlike sequences, ndarray slices give a view (rather than a new array) so assignment changes the original array!
However, the results are different when introducing slicing… how are these two expressions different? →
print(a[2:, 0])
print(a[2:][0])
print(a[2:, 0]) # [10, 14] (first element of last two rows)
print(a[2:][0]) # [10, 12] (first row of last two rows)
Slicing / Indexing Practice
Give me multiple ways to retrieve the X’s form the following 3, 3 arrays →
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/numpy-basics.html?print 11/16
3/8/22, 7:35 PM Num !
Boolean Selections
You can use a boolean list / array as an index as well!
for the axis that it’s used as an index on, it will include, positionally, everything that’s True, and exclude False
given a = np.arange(15).reshape(5, 3)…
and rows = [False, True, False, True, False]
using rows as the index for axis 0, only the rows in positions where there is a True value will be included
array([[ 3, 4, 5],
[ 9, 10, 11]])
The number of elements in the boolean list / array must be the same as the size of the axis you’re indexing
1. X X _ X X _ _ _ _
2. _ _ X _ _X _ _X
3. _ _ _ XXX XXX
4. _ _ _ _XX ___
a[:2, :2] # 1
a[:, 2:] # 2
a[1:, :] # 3
a[1, 1:] # 4
(there are multiple ways to do each, and dimensions of the returned array may differ)
Mix and Match
Given this monstrosity (what’s it look like?) … →
a = np.arange(24).reshape((3, 4, 2))
array([[[ 0, 1],
[[ 8, 9],
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/numpy-basics.html?print 12/16
3/8/22, 7:35 PM Num !
Slicing and Boolean Selection
Using the previous slide and at least one boolean list to index, give back … →
the first two tables,
the 2nd and last row of both of those tables the last element of each row
a[:2, [False, True, False, True], 1:]
a[:2, [False, True, False, True], -1] # (less dims)
Not only can you use booleans, you can also use a list of integers as an index →
the integers specify which elements to include
and their order specifies the order to include the elements n
passing in multiple lists allows you to essentially pick and choose specific elements!
For example, given a single column, a as [[0], [1], [2], [3]] … to select just the last row twice, then the first row:
a[[-1, -1, 0]]
[14, 15]],
[[16, 17],
[22, 23]]])
[[0, 0], [1, 1]]
Try these examples of fancy indexing… →
a = np.arange(9).reshape(3, 3) # [[0, 1, 2],
# [3, 4, 5],
# [6, 7, 8]]
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/numpy-basics.html?print 13/16
3/8/22, 7:35 PM Num !
Transpose / Matrix Operations Turn columns into rows or find the dot product →
a = np.arange(12).reshape((3, 4))
a.T # rows into columns (transpose)
m1 = np.arange(6).reshape(2, 3)
m2 = np.arange(6, 12).reshape(3, 2)
m1.dot(m2) # matrix dot product
# (sum or products of elements of rows from m1
# …and columns of m2)
Functions on Array Elements
numpy comes with built-in functions that work on every element in an array … some
examples include: →
unary functions
sqrt and square floor and ceil sum and mean * etc.
binary functions
add floor_divide etc.
Check out the table in the book for others…
Note that these are all functions called from the numpy module … and they either take on or two arguments.
# what do we get with these indexes????
a[[0, 1, 2], [2, 1, 0]]
array([[6, 7, 8],
[3, 4, 5]])
array([2, 4, 6])
https://cs.nyu.edu/courses/spring22/CSCI-UA.0479-001/_site/slides/python/numpy-basics.html?print 14/16
3/8/22, 7:35 PM Num !
Ternary With where Remember the ternary / one-line if-else? →
val1 if cond else val2
# in other languages cond ? val1 : val2
The numpy equivalent of a ternary operator is a function called where
argument 1 is condition
argument 2 is value to return if condition is True argument 3 is value to return if condition is False
It gives back a new array with the values specified above.
sum, mean, std
These functions are a little different from the other functions in the previous slides →
these functions can be called on instance of array as well asnumpy
they’ll give back a single value
OR… they can take an axis keyword argument specifying which column to aggregate on
a = np.arange(9).reshape(3, 3)
a.mean(axis=0)
array([3., 4., 5.])
What do you think this will give back →
a = np.arange(9).reshape(3, 3)
np.where(a < 5, 'YAS', 'OH NO')
array([['YAS', 'YAS', 'YAS'],
['YAS', 'YAS', 'OH NO'],
['OH NO', 'OH NO', 'OH NO']], dtype='