Before you start working on the exercise¶
Use Python version 3.7 up to 3.9. Make sure not to use Python 3.10
It is highly recommended to create a virtual environment for this course. You can find resources on how to create a virtual environment on the ISIS page of the course.
Copyright By PowCoder代写 加微信 powcoder
Make sure that no assertions fail or exceptions occur, otherwise points will be subtracted.
Use all the variables given to a function unless explicitly stated otherwise. If you are not using a variable you are doing something wrong.
Read the whole task description before starting with your solution.
After you submit the notebook more tests will be run on your code. The fact that no assertions fail on your computer locally does not guarantee that you completed the exercise correctly.
Please submit only the notebook file with its original name. If you do not submit an ipynb file you will fail the exercise.
Edit only between YOUR CODE HERE and END YOUR CODE.
Verify that no syntax errors are present in the file.
Before uploading your submission, make sure everything runs as expected. First, restart the kernel (in the menubar, select Kernel\Restart) and then run all cells (in the menubar, select Cell\Run All).
import sys
if (3,7) <= sys.version_info[:2] <= (3, 9):
print("Correct Python version")
print(f"You are using a wrong version of Python: {'.'.join(map(str,sys.version_info[:3]))}")
Sheet 5: Rounding, Overflow, Linear Algebra¶
In this exercise sheet, we look at various sources of numerical overflow when executing Python and numpy code for large input values, and how to efficiently handle them, for example, by using numpy special functions. There are other packages (e.g. Decimal) that can handle arbitrary precision but they are very slow so we tend not to use them
import utils
import numpy as np
import itertools
import unittest
from typing import Union
from minified import max_allowed_loops, no_imports
t = unittest.TestCase()
Building a robust "softplus" nonlinear function (30 P)¶
The softplus function is defined as:
\mathrm{softplus}(x) = \log(1+\exp(x)).
$$It intervenes as elementary computation in certain machine learning models such as neural networks. Plotting it gives the following curve
where the function tends to zero for very negative input values and tends to the identity for very positive input values.
def softplus(z):
return np.log(1 + np.exp(z))
We consider an input vector from the module utils containing varying values between 1 and 10000. We would like to apply the softplus function to all of its element in an element-wise manner.
X = utils.softplus_inputs
We choose these large values in order to test whether the behavior of the function is correct in all regimes of the function, in particular, for very small or very large values. The code below applies the softplus function directly to the vector of inputs and then prints for all cases the input and the corresponding function output:
Y = softplus(X)
for x, y in zip(X, Y):
print(f"softplus({x:11.4f}) = {y:11.4f}")
For large input values, the softplus function returns inf whereas analysis of that function tells us that it should compute the identity. Let's now try to apply the softplus function one element at a time, to see whether the problem comes from numpy arrays:
for x in X:
y = softplus(x)
print(f"softplus({x:11.4f}) = {y:11.4f}")
Unfortunately, the result is the same. We observe that the function stops working somewhere between 700 and 800, even though the input was given in high precision float64.
Create an alternative function for softplus_robust that applies to input scalars (int, float, etc.) and that correctly applies to values that can be much larger than 1000 (e.g. billions or more). Your function can be written in Python directly and does not need numpy parallelization.
@no_imports
def softplus_robust(
x: Union[float, np.float32, np.float64]
) -> Union[float, np.float32, np.float64]:
Numerically stable implementation of softplus function. Will never
overflow to infinity if input is finite
x (Union[float, np.float32, np.float64]): The number of which we
want to calculate the softplus value
Union[float, np.float32, np.float64]: softplus(x)
# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE
# Verify your function
y_scalar = [softplus_robust(x) for x in X]
for x, y in zip(X, y_scalar):
print(“softplus(%11.4f) = %11.4f” % (x, y))
# the elements can be any of the three
t.assertIsInstance(y_scalar[0], (float, np.float32, np.float64))
t.assertAlmostEqual(softplus_robust(100000000), 100000000)
# This cell is for grading. Do not delete it
As we have seen in previous exercise sheets, the problem of functions that apply to scalars only is that they are less efficient than functions that apply to vectors directly. Therefore, we would like to handle the rounding issue directly at the vector level.
Create a new softplus function that applies to vectors and that has the desired behavior for large input values. Your function should be fast for large input vectors. This means that you cannot use loops (including list/dict/set comprehesions) as well as the functions map, np.vectorize/np.fromiter/np.apply_along_axis in order to solve this exercise.
@max_allowed_loops(0)
@no_imports
def softplus_robust_vec(X: “vector-like”):
Vectorized version of the numericaly robust softplus function
X (vector-like): A vector (ndim=1) of values on which we want to apply the softplus function.
It is not always a np.ndarray
np.ndarray: A vector (ndim=1) where the ret[i] == softplus_robust(X[i])
# these are wrong!!!
# return np.array([softplus_robust(x) for x in X])
# return np.array(list(map(softplus_robust, X)))
# return np.vectorize(softplus_robust)(X)
# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE
# Verify your function
Y = softplus_robust_vec(X)
t.assertIsInstance(Y, np.ndarray)
t.assertEqual(Y.dtype, np.float)
for tup in zip(X, Y):
print(“softplus(%11.4f) = %11.4f” % tup)
This is just a demonstration.
As long as your vectorized function is consistently faster than the
loop implementation, your solution should pass.
There are no concrete numbers about the speed-up, but as a reference
on our machines the vectorized code needs < 1ms and the one using a loop
requires > 20ms
RAND_INPUT = np.random.rand(10000)
print(“Vectorized function needs…”)
%timeit -r2 -n5 softplus_robust_vec(RAND_INPUT)
def vectorize_with_loop(X):
# This is a wrong implementation
return np.array([softplus_robust(x) for x in X])
Y_loop = vectorize_with_loop(X)
np.testing.assert_allclose(Y, Y_loop)
print(“Python loops need…”)
%timeit -r2 -n5 vectorize_with_loop(RAND_INPUT)
Computing a partition function (40 P)¶
In this exercise we want to explore a practical example where numerical stability can affect the results of our experiments.
We consider a discrete probability distribution of type
p(\boldsymbol{x};\boldsymbol{w}) = \frac{1}{Z(\boldsymbol{w})} \exp(\boldsymbol{x}^\top \boldsymbol{w})
where $\boldsymbol{x} \in \{-1,1\}^{10}$ is an observation, and $\boldsymbol{w} \in \mathbb{R}^{10}$ is a vector of parameters. The term $Z(\boldsymbol{w})$ is called the partition function and is chosen such that the probability distribution sums to 1. That is, the equation:
\sum_{\boldsymbol{x} \in \{-1,1\}^{10}} p(\boldsymbol{x};\boldsymbol{w}) = 1
must be satisfied. Below is a simple method that computes the log of the partition function $Z(\boldsymbol{w})$ for various choices of parameter vectors. The considered parameters (w_small, w_medium, and w_large) are increasingly large (and thus problematic), and can be found in the file utils.py.
Note: You don’t need an intuitive understanding of the above formulas. It is enough that you are able to implement them with Python/NumPy code.
def generate_all_observations() -> np.ndarray:
All x in { -1,1 }^10 (vectors with 10 elements where each element
containts either -1 or 1)
np.ndarray : All valid obvervations
return np.array(tuple(itertools.product([-1.0, 1.0], repeat=10)))
def calc_logZ(w: np.ndarray) -> float:
Calculates the log of the partition function Z
w (np.ndarray): A ten element vector (shape=(10,)) of parameters
float: The log of the partition function Z
Z = np.sum(np.exp(generate_all_observations() @ w))
return np.log(Z)
print(f”observations shape: {generate_all_observations().shape}”)
print([round(w,2) for w in utils.w_small], f”{calc_logZ(utils.w_small):.4f}”, sep=’\n’, end=’\n\n’)
print([round(w,2) for w in utils.w_medium], f”{calc_logZ(utils.w_medium):.4f}”, sep=’\n’, end=’\n\n’)
print([round(w,2) for w in utils.w_big], f”{calc_logZ(utils.w_big):.4f}”, sep=’\n’, end=’\n\n’)
We can observe from these results, that for parameter vectors with large values (e.g. utils.w_big), the exponential function overflows, and thus, we do not obtain a correct value for the logarithm of Z.
Implement an improved function calc_logZ_robust that avoids the overflow problem, and evaluates the partition function for the same parameters.
Verify that the finite outputs of you function are the same as the outputs of the naive implementation and that for all test inputs your function computes finite values.
@no_imports
@max_allowed_loops(0)
def calc_logZ_robust(w: np.ndarray) -> np.ndarray:
# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE
# Verify your function
logZ_small = calc_logZ_robust(utils.w_small)
logZ_medium = calc_logZ_robust(utils.w_medium)
logZ_big = calc_logZ_robust(utils.w_big)
print(f”{logZ_small:.4f}”)
print(f”{logZ_medium:.4f}”)
print(f”{logZ_big:.4f}”)
t.assertAlmostEqual(logZ_small, calc_logZ(utils.w_small))
t.assertAlmostEqual(logZ_medium, calc_logZ(utils.w_medium))
R = calc_logZ_robust(utils.w_big)
t.assertTrue(np.isfinite(R))
t.assertTrue(24919 < R < 24920)
# This cell is for grading. Do not delete it
Evaluate the log-probability of the binary vectors generated by generate_all_observations, and return a np.ndarray of the indices (starting from 0) of those that have probability greater or equal to 0.001.
By calculating the log probability we "squish" the range of the function, making the function much more numerically stable.
Keep in mind that logarithmic functions are monotonic increasing. This property is very useful when we want to compare values but do not care about absolute values.
@no_imports
@max_allowed_loops(0)
def important_indexes(w: np.ndarray, tol: float=0.001) -> np.ndarray:
Calculates the indexes of important binary vectors for the
parameter vector `w`. Important vectors are defined as those
that have a probability greater than or equal to `tol`. The
indices are returned in sorted order.
w (np.ndarray): The parameter vector of the partition function
tol (float): The probability threshhold
(np.ndarray): The indexes where the probability is greter or equal
logZ = calc_logZ_robust(w)
# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE
# Verify your function
imp_idxs = important_indexes(utils.w_big)
print(f”important indexes -> {imp_idxs}”)
t.assertEqual(len(imp_idxs), 24)
t.assertEqual(imp_idxs.dtype, int)
t.assertEqual(imp_idxs[0], 81)
t.assertEqual(imp_idxs[-1], 983)
Probability of generating data from a Gaussian model (30 P)¶
In this example we will explore a problem which has many sources of potential instabilities. Our goal is to eliminate some so as to create a more stable function.
Consider a multivariate Gaussian distribution of mean vector m and covariance S. The probability associated to a vector x is given by:
p(\boldsymbol{x};(\boldsymbol{m},S)) = \frac{1}{\sqrt{(2\pi)^d \mathrm{det}(S)}} \exp \Big( – \frac12 (\boldsymbol{x}-\boldsymbol{m})^\top S^{-1} (\boldsymbol{x}-\boldsymbol{m})\Big)
$$We consider the calculation of the probability of observing a certain dataset
\mathcal{D} = (\boldsymbol{x}^{(1)},\dots,\boldsymbol{x}^{(N)})
$$assuming the data is generated according to a Gaussian distribution of fixed parameters $\boldsymbol{m}$ and $S$. Such probability density is given by the formula:
\log P(\mathcal{D};(\boldsymbol{m},S)) = \log \prod_{i=1}^N p(\boldsymbol{x}^{(i)};(\boldsymbol{m},S))
$$The function below implements such function:
def logp(X, m, S):
# Find the number of dimensions from the data vector
d = X.shape[1]
# Invert the covariance matrix
Sinv = np.linalg.inv(S)
# Compute the quadratic terms for all data points
Q = -0.5 * (np.dot(X – m, Sinv) * (X – m)).sum(axis=1)
# Raise them quadratic terms to the exponential
Q = np.exp(Q)
# Divide by the terms in the denominator
P = Q / np.sqrt((2 * np.pi) ** d * np.linalg.det(S))
# Take the product of the probability of each data points
Pprod = np.prod(P)
# Return the log-probability
return np.log(Pprod)
Evaluation of this function for various datasets and parameters provided in the file utils.py gives the following probabilities:
print(f”{logp(utils.X1, utils.m1, utils.S1):.4f}”)
print(f”{logp(utils.X2, utils.m2, utils.S2):.4f}”)
print(f”{logp(utils.X3, utils.m3, utils.S3):.4f}”)
except OverflowError as error:
print(‘Exception happened when calculating’)
print(error)
This function is numerically unstable for multiple reasons. The product of many probabilities, the inversion of a large covariance matrix, and the computation of its determinant, are all potential causes for overflow. Thus, we would like to find a numerically robust way of performing each of these.
Implement a numerically stable version of the function logp
Evaluate it on the same datasets and parameters as the function logp
@no_imports
@max_allowed_loops(0)
def logp_robust(X: np.ndarray, m: np.ndarray, S: np.ndarray):
Numerically robust implemenation of `logp` function
(float): The logp probability density
# YOUR CODE HERE
raise NotImplementedError(“Relplace this line with your code”)
# YOUR CODE HERE
# Verify your function
logp1 = logp_robust(utils.X1, utils.m1, utils.S1)
logp2 = logp_robust(utils.X2, utils.m2, utils.S2)
logp3 = logp_robust(utils.X3, utils.m3, utils.S3)
print(f”{logp1:.4f}”)
print(f”{logp2:.4f}”)
print(f”{logp3:.4f}”)
outputs = np.array((logp1, logp2, logp3))
t.assertTrue(np.isfinite(outputs).all())
t.assertTrue(np.all(outputs < 0))
print("\nall log propabilities values below zero 😀")
t.assertTrue(-25 < logp1 < -24)
t.assertTrue(-2976 < logp2 < -2975)
t.assertTrue(-266011 < logp3 < -266010)
t.assertAlmostEqual(logp(utils.X1, utils.m1, utils.S1), logp1)
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com