程序代写代做代考 interpreter database python 01_Introduction-checkpoint

01_Introduction-checkpoint

Introduction¶
Getting started with Jupyter notebooks¶
The majority of your work in this course will be done using Jupyter notebooks so we will here introduce some of the basics of the notebook system. If you are already comfortable using notebooks or just would rather get on with some coding feel free to skip straight to the exercises below.

Note: Jupyter notebooks are also known as IPython notebooks. The Jupyter system now supports languages other than Python hence the name was changed to make it more language agnostic however IPython notebook is still commonly used.

Jupyter basics: the server, dashboard and kernels¶
In launching this notebook you will have already come across two of the other key components of the Jupyter system – the notebook server and dashboard interface.

We began by starting a notebook server instance in the terminal by running

jupyter notebook

This will have begun printing a series of log messages to terminal output similar to

$ jupyter notebook
[I 08:58:24.417 NotebookApp] Serving notebooks from local directory: ~/mlpractical
[I 08:58:24.417 NotebookApp] 0 active kernels
[I 08:58:24.417 NotebookApp] The Jupyter Notebook is running at: http://localhost:8888/

The last message included here indicates the URL the application is being served at. The default behaviour of the jupyter notebook command is to open a tab in a web browser pointing to this address after the server has started up. The server can be launched without opening a browser window by running jupyter notebook –no-browser. This can be useful for example when running a notebook server on a remote machine over SSH. Descriptions of various other command options can be found by displaying the command help page using

juptyer notebook –help

While the notebook server is running it will continue printing log messages to terminal it was started from. Unless you detach the process from the terminal session you will need to keep the session open to keep the notebook server alive. If you want to close down a running server instance from the terminal you can use Ctrl+C – this will bring up a confirmation message asking you to confirm you wish to shut the server down. You can either enter y or skip the confirmation by hitting Ctrl+C again.

When the notebook application first opens in your browser you are taken to the notebook dashboard. This will appear something like this

The dashboard above is showing the Files tab, a list of files in the directory the notebook server was launched from. We can navigate in to a sub-directory by clicking on a directory name and back up to the parent directory by clicking the .. link. An important point to note is that the top-most level that you will be able to navigate to is the directory you run the server from. This is a security feature and generally you should try to limit the access the server has by launching it in the highest level directory which gives you access to all the files you need to work with.

As well as allowing you to launch existing notebooks, the Files tab of the dashboard also allows new notebooks to be created using the New drop-down on the right. It can also perform basic file-management tasks such as renaming and deleting files (select a file by checking the box alongside it to bring up a context menu toolbar).

In addition to opening notebook files, we can also edit text files such as .py source files, directly in the browser by opening them from the dashboard. The in-built text-editor is less-featured than a full IDE but is useful for quick edits of source files and previewing data files.

The Running tab of the dashboard gives a list of the currently running notebook instances. This can be useful to keep track of which notebooks are still running and to shutdown (or reopen) old notebook processes when the corresponding tab has been closed.

The notebook interface¶
The top of your notebook window should appear something like this:

The name of the current notebook is displayed at the top of the page and can be edited by clicking on the text of the name. Displayed alongside this is an indication of the last manual checkpoint of the notebook file. On-going changes are auto-saved at regular intervals; the check-point mechanism is mainly meant as a way to recover an earlier version of a notebook after making unwanted changes. Note the default system only currently supports storing a single previous checkpoint despite the Revert to checkpoint dropdown under the File menu perhaps suggesting otherwise.

As well as having options to save and revert to checkpoints, the File menu also allows new notebooks to be created in same directory as the current notebook, a copy of the current notebook to be made and the ability to export the current notebook to various formats.

The Edit menu contains standard clipboard functions as well as options for reorganising notebook cells. Cells are the basic units of notebooks, and can contain formatted text like the one you are reading at the moment or runnable code as we will see below. The Edit and Insert drop down menus offer various options for moving cells around the notebook, merging and splitting cells and inserting new ones, while the Cell menu allow running of code cells and changing cell types.

The Kernel menu offers some useful commands for managing the Python process (kernel) running in the notebook. In particular it provides options for interrupting a busy kernel (useful for example if you realise you have set a slow code cell running with incorrect parameters) and to restart the current kernel. This will cause all variables currently defined in the workspace to be lost but may be necessary to get the kernel back to a consistent state after polluting the namespace with lots of global variables or when trying to run code from an updated module and reload is failing to work.

To the far right of the menu toolbar is a kernel status indicator. When a dark filled circle is shown this means the kernel is currently busy and any further code cell run commands will be queued to happen after the currently running cell has completed. An open status circle indicates the kernel is currently idle.

The final row of the top notebook interface is the notebook toolbar which contains shortcut buttons to some common commands such as clipboard actions and cell / kernel management. If you are interested in learning more about the notebook user interface you may wish to run through the User Interface Tour under the Help menu drop down.

Markdown cells: easy text formatting¶
This entire introduction has been written in what is termed a Markdown cell of a notebook. Markdown is a lightweight markup language intended to be readable in plain-text. As you may wish to use Markdown cells to keep your own formatted notes in notebooks, a small sampling of the formatting syntax available is below (escaped mark-up on top and corresponding rendered output below that); there are many much more extensive syntax guides – for example this cheatsheet.

## Level 2 heading
### Level 3 heading

*Italicised* and **bold** text.

* bulleted
* lists

and

1. enumerated
2. lists

Inline maths $y = mx + c$ using [MathJax](https://www.mathjax.org/) as well as display style

$$ ax^2 + bx + c = 0 \qquad \Rightarrow \qquad x = \frac{-b \pm \sqrt{b^2 – 4ac}}{2a} $$

Level 2 heading¶
Level 3 heading¶
Italicised and bold text.

bulleted
lists

and

enumerated
lists

Inline maths $y = mx + c$ using MathJax as well as display maths

$$ ax^2 + bx + c = 0 \qquad \Rightarrow \qquad x = \frac{-b \pm \sqrt{b^2 – 4ac}}{2a} $$

We can also directly use HTML tags in Markdown cells to embed rich content such as images and videos.

Code cells: in browser code execution¶
Up to now we have not seen any runnable code. An example of a executable code cell is below. To run it first click on the cell so that it is highlighted, then either click the button on the notebook toolbar, go to Cell > Run Cells or use the keyboard shortcut Ctrl+Enter.

In [1]:

from __future__ import print_function
import sys

print(‘Hello world!’)
print(‘Alarming hello!’, file=sys.stderr)
print(‘Hello again!’)
‘And again!’

Hello world!
Hello again!

Alarming hello!

Out[1]:

‘And again!’

This example shows the three main components of a code cell.

The most obvious is the input area. This (unsuprisingly) is used to enter the code to be run which will be automatically syntax highlighted.

To the immediate left of the input area is the execution indicator / counter. Before a code cell is first run this will display In [ ]:. After the cell is run this is updated to In [n]: where n is a number corresponding to the current execution counter which is incremented whenever any code cell in the notebook is run. This can therefore be used to keep track of the relative order in which cells were last run. There is no fundamental requirement to run cells in the order they are organised in the notebook, though things will usually be more readable if you keep things in roughly in order!

Immediately below the input area is the output area. This shows any output produced by the code in the cell. This is dealt with a little bit confusingly in the current Jupyter version. At the top any output to stdout is displayed. Immediately below that output to stderr is displayed. All of the output to stdout is displayed together even if there has been output to stderr between as shown by the suprising ordering in the output here.

The final part of the output area is the display area. By default this will just display the returned output of the last Python statement as would usually be the case in a (I)Python interpreter run in a terminal. What is displayed for a particular object is by default determined by its special __repr__ method e.g. for a string it is just the quote enclosed value of the string itself.

Useful keyboard shortcuts¶
There are a wealth of keyboard shortcuts available in the notebook interface. For an exhaustive list see the Keyboard Shortcuts option under the Help menu. We will cover a few of those we find most useful below.

Shortcuts come in two flavours: those applicable in command mode, active when no cell is currently being edited and indicated by a blue highlight around the current cell; those applicable in edit mode when the content of a cell is being edited, indicated by a green current cell highlight.

In edit mode of a code cell, two of the more generically useful keyboard shortcuts are offered by the Tab key.

Pressing Tab a single time while editing code will bring up suggested completions of what you have typed so far. This is done in a scope aware manner so for example typing a + [Tab] in a code cell will come up with a list of objects beginning with a in the current global namespace, while typing np.a + [Tab] (assuming import numpy as np has been run already) will bring up a list of objects in the root NumPy namespace beginning with a.
Pressing Shift+Tab once immediately after opening parenthesis of a function or method will cause a tool-tip to appear with the function signature (including argument names and defaults) and its docstring. Pressing Shift+Tab twice in succession will cause an expanded version of the same tooltip to appear, useful for longer docstrings. Pressing Shift+Tab four times in succession will cause the information to be instead displayed in a pager docked to bottom of the notebook interface which stays attached even when making further edits to the code cell and so can be useful for keeping documentation visible when editing e.g. to help remember the name of arguments to a function and their purposes.

A series of useful shortcuts available in both command and edit mode are [modifier]+Enter where [modifier] is one of Ctrl (run selected cell), Shift (run selected cell and select next) or Alt (run selected cell and insert a new cell after).

A useful command mode shortcut to know about is the ability to toggle line numbers on and off for a cell by pressing L which can be useful when trying to diagnose stack traces printed when an exception is raised or when referring someone else to a section of code.

Magics¶
There are a range of magic commands in IPython notebooks, than provide helpful tools outside of the usual Python syntax. A full list of the inbuilt magic commands is given here, however three that are particularly useful for this course:

%%timeit Put at the beginning of a cell to time its execution and print the resulting timing statistics.
%precision Set the precision for pretty printing of floating point values and NumPy arrays.
%debug Activates the interactive debugger in a cell. Run after an exception has been occured to help diagnose the issue.

Plotting with matplotlib¶
When setting up your environment one of the dependencies we asked you to install was matplotlib. This is an extensive plotting and data visualisation library which is tightly integrated with NumPy and Jupyter notebooks.

When using matplotlib in a notebook you should first run the magic command

%matplotlib inline

This will cause all plots to be automatically displayed as images in the output area of the cell they are created in. Below we give a toy example of plotting two sinusoids using matplotlib to show case some of the basic plot options. To see the output produced select the cell and then run it.

In [2]:

# use the matplotlib magic to specify to display plots inline in the notebook
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

# generate a pair of sinusoids
x = np.linspace(0., 2. * np.pi, 100)
y1 = np.sin(x)
y2 = np.cos(x)

# produce a new figure object with a defined (width, height) in inches
fig = plt.figure(figsize=(8, 4))
# add a single axis to the figure
ax = fig.add_subplot(111)
# plot the two sinusoidal traces on the axis, adjusting the line width
# and adding LaTeX legend labels
ax.plot(x, y1, linewidth=2, label=r’$\sin(x)$’)
ax.plot(x, y2, linewidth=2, label=r’$\cos(x)$’)
# set the axis labels
ax.set_xlabel(‘$x$’, fontsize=16)
ax.set_ylabel(‘$y$’, fontsize=16)
# force the legend to be displayed
ax.legend()
# adjust the limits of the horizontal axis
ax.set_xlim(0., 2. * np.pi)
# make a grid be displayed in the axis background
ax.grid(‘on’)

Exercises¶
Today’s exercises are meant to allow you to get some initial familiarisation with the mlp package and how data is provided to the learning functions. Next week onwards, we will follow with the material covered in lectures.

If you are new to Python and/or NumPy and are struggling to complete the exercises, you may find going through this Stanford University tutorial by Justin Johnson first helps. There is also a derived Jupyter notebook by Volodymyr Kuleshov and Isaac Caswell which you can download from here – if you save this in to your mlpractical/notebooks directory you should be able to open the notebook from the dashboard to run the examples.

Data providers¶
Open (in the browser) the mlp.data_providers module. Have a look through the code and comments, then follow to the exercises.

Exercise 1¶
The MNISTDataProvider iterates over input images and target classes (digit IDs) from the MNIST database of handwritten digit images, a common supervised learning benchmark task. Using the data provider and matplotlib we can for example iterate over the first couple of images in the dataset and display them using the following code:

In [3]:

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import mlp.data_providers as data_providers

def show_single_image(img, fig_size=(2, 2)):
fig = plt.figure(figsize=fig_size)
ax = fig.add_subplot(111)
ax.imshow(img, cmap=’Greys’)
ax.axis(‘off’)
plt.show()
return fig, ax

# An example for a single MNIST image
mnist_dp = data_providers.MNISTDataProvider(
which_set=’valid’, batch_size=1, max_num_batches=2, shuffle_order=True)

for inputs, target in mnist_dp:
show_single_image(inputs.reshape((28, 28)))
print(‘Image target: {0}’.format(target))

Image target: [[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]]

Image target: [[ 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]]

Generally we will want to deal with batches of multiple images i.e. batch_size > 1. As a first task:

Using MNISTDataProvider, write code that iterates over the first 5 minibatches of size 100 data-points.
Display each batch of MNIST digits in a $10\times10$ grid of images.

Notes:

Images are returned from the provider as tuples of numpy arrays (inputs, targets). The inputs matrix has shape (batch_size, input_dim) while the targets array is of shape (batch_size,), where batch_size is the number of data points in a single batch and input_dim is dimensionality of the input features.
Each input data-point (image) is stored as a 784 dimensional vector of pixel intensities normalised to $[0, 1]$ from inital integer values in $[0, 255]$. However, the original spatial domain is two dimensional, so before plotting you will need to reshape the one dimensional input arrays in to two dimensional arrays 2D (MNIST images have the same height and width dimensions).

In [4]:

def show_batch_of_images(img_batch, fig_size=(3, 3)):
fig = plt.figure(figsize=fig_size)
batch_size, im_height, im_width = img_batch.shape
# calculate no. columns per grid row to give square grid
grid_size = int(batch_size**0.5)
# intialise empty array to tile image grid into
tiled = np.empty((im_height * grid_size,
im_width * batch_size // grid_size))
# iterate over images in batch + indexes within batch
for i, img in enumerate(img_batch):
# calculate grid row and column indices
r, c = i % grid_size, i // grid_size
tiled[r * im_height:(r + 1) * im_height,
c * im_height:(c + 1) * im_height] = img
ax = fig.add_subplot(111)
ax.imshow(tiled, cmap=’Greys’)
ax.axis(‘off’)
fig.tight_layout()
plt.show()
return fig, ax

batch_size = 100
num_batches = 5

mnist_dp = data_providers.MNISTDataProvider(
which_set=’valid’, batch_size=batch_size,
max_num_batches=num_batches, shuffle_order=True)

for inputs, target in mnist_dp:
# reshape inputs from batch of vectors to batch of 2D arrays (images)
_ = show_batch_of_images(inputs.reshape((batch_size, 28, 28)))

Exercise 2¶
MNISTDataProvider as targets currently returns a vector of integers, each element in this vector represents an the integer ID of the class the corresponding data-point represents.

For training of neural networks a 1-of-K representation of multi-class targets is more useful. Instead of representing class identity by an integer ID, for each data point a vector of length equal to the number of classes is created, will all elements zero except for the element corresponding to the class ID.

For instance, given a batch of 5 integer targets [2, 2, 0, 1, 0] and assuming there are 3 different classes
the corresponding 1-of-K encoded targets would be

[[0, 0, 1],
[0, 0, 1],
[1, 0, 0],
[0, 1, 0],
[1, 0, 0]]

Implement the to_one_of_k method of MNISTDataProvider class.
Uncomment the overloaded next method, so the raw targets are converted to 1-of-K coding.
Test your code by running the the cell below.

In [5]:

mnist_dp = data_providers.MNISTDataProvider(
which_set=’valid’, batch_size=5, max_num_batches=5, shuffle_order=False)

for inputs, targets in mnist_dp:
assert np.all(targets.sum(-1) == 1.)
assert np.all(targets >= 0.)
assert np.all(targets <= 1.) print(targets) [[ 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.] [ 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.] [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.] [ 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]] [[ 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] [ 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.] [ 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]] [[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] [ 0. 0. 1. 0. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 1. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.] [ 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]] [[ 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.] [ 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] [ 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]] [[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.] [ 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.] [ 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.] [ 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.] [ 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]] Exercise 3¶ Here you will write your own data provider MetOfficeDataProvider that wraps weather data for south Scotland. A previous version of this data has been stored in data directory for your convenience and skeleton code for the class provided in mlp/data_providers.py. The data is organised in the text file as a table, with the first two columns indexing the year and month of the readings and the following 31 columns giving daily precipitation values for the corresponding month. As not all months have 31 days some of entries correspond to non-existing days. These values are indicated by a non-physical value of -99.9. You should read all of the data from the file (np.loadtxt may be useful for this) and then filter out the -99.9 values and collapse the table to a one-dimensional array corresponding to a sequence of daily measurements for the whole period data is available for. NumPy's boolean indexing feature could be helpful here. A common initial preprocessing step in machine learning tasks is to normalise data so that it has zero mean and a standard deviation of one. Normalise the data sequence so that its overall mean is zero and standard deviation one. Each data point in the data provider should correspond to a window of length specified in the __init__ method as window_size of this contiguous data sequence, with the model inputs being the first window_size - 1 elements of the window and the target output being the last element of the window. For example if the original data sequence was [1, 2, 3, 4, 5, 6] and window_size=3 then input, target pairs iterated over by the data provider should be [1, 2], 3 [4, 5], 6 Extension: Have the data provider instead overlapping windows of the sequence so that more training data instances are produced. For example for the sequence [1, 2, 3, 4, 5, 6] the corresponding input, target pairs would be [1, 2], 3 [2, 3], 4 [3, 4], 5 [4, 5], 6 Test your code by running the cell below. In [6]: batch_size = 3 for window_size in [2, 5, 10]: met_dp = data_providers.MetOfficeDataProvider( window_size=window_size, batch_size=batch_size, max_num_batches=1, shuffle_order=False) fig = plt.figure(figsize=(6, 3)) ax = fig.add_subplot(111) ax.set_title('Window size {0}'.format(window_size)) ax.set_xlabel('Day in window') ax.set_ylabel('Normalised reading') # iterate over data provider batches checking size and plotting for inputs, targets in met_dp: assert inputs.shape == (batch_size, window_size - 1) assert targets.shape == (batch_size, ) ax.plot(np.c_[inputs, targets].T, '.-') ax.plot([window_size - 1] * batch_size, targets, 'ko')