1. Writing loops and custom functions
2. Making a variety of plots in Matplotlib
3. Using several built-in NumPy functions
4. Defining functions that manipulate NumPy arrays
Part 1: Random Number Distributions and Properties¶
In this problem, we will look at distributions of random numbers and make functions that work with random numbers.
Question 1: Make sure you set up your notebook to produce matplotlib plots and import the right modules, as well NumPy. Do that here.
In [ ]:
Generate an array $x$ that has $n=100$ random numbers that are uniformly distributed over the interval $[0,1)$. Look up how to use the uniform() submodule of numpy.random for this question.
In [ ]:
Question 2: We want to divide the numbers in the interval $[0,1)$ into equally spaced sub-intervals. For example, say we want to take an array of 100 numbers that have values greater than or equal to 0 and less than 1. We might like to divide these numbers into 4 equally spaced half-open sub-intervals: 0 to 0.25; 0.25 to 0.5; 0.5 to 0.75; and 0.75 to 1.
Write a function named partition_array using loops and if statements that takes the array $x$ along with the number $S$ of sub-intervals as inputs and outputs (returns) a new list numentries_subintervals with $S$ entries. Each entry in numentries_subintervals will contain the number of values from x within each of the sub-intervals. The function will also output a second list that contains the mid points of each corresponding sub-interval. Test it on the array generated in Question 1, use $S=5$ subintervals, and print the function outputs to verify the results.
NOTE: You can assume that your input array $x$ will have values in the interval $[0,1)$.
Your results should be something like the following (Your number of entries will vary slightly):
The number of entries in the subintervals are [15, 18, 19, 26, 22]
The subinterval mid points of the intervals are [0.1, 0.3, 0.5, 0.7, 0.9]
In [ ]:
Question 3: Now, write code to create a bar plot of the number of entries in the sub-intervals. The x coordinates of the bars are the mid points of the intervals and the heights are the total number of elements of the array $x$ residing in each interval (basically the results from your function above). Look up the function matplotlib.pyplot.bar. Set the widths of the bars to the width of each subinterval.
Repeat the experiments in Questions 1 and 2 and generate the bar plots for $S=20$ and $n=10^{2}$, $n=10^{3}$, and $n=10^{5}$ (three cases). Make sure to label all the x and y axes and give each plot an expressive title that indicates the value of $n$. Matplotlib contains the function subplot() that enables making multiple plots and arranging them together. Plot the three bar plots for the three values of $n$ together in a single row. Use the command plt.tight_layout() to arrange the subplots cleanly.
In [ ]:
Question 4: What do you observe as the value of $n$ increases?
✎Put your answer here
Question 5: Write a function compute_average (from scratch) that computes and returns the average of the values in an array.
Then, generate arrays of different lengths $n$ where $n$ takes the values from $[10, 10^{2}, 5\times 10^{2}, 10^{3}, 5 \times 10^{3}, 10^{4}, 5 \times 10^{4}, 10^{5}, 5 \times 10^{5}, 10^{6}, 5 \times 10^{6}]$. The arrays contain random numbers that are uniformly distributed over the interval $[0,1)$. Basically, loop through these different values of $n$, for each $n$ make a uniformly distributed array of length $n$. Compute the array average for each $n$. Plot this average value versus $n$ using a log scale for the x-axis. Make sure to label the axes and give the plot a title. What do you observe as $n$ increases? Look up information about the law of large numbers.
In [ ]:
✎Write your observations of the behavior in the plot here
Part 2: Denoising 1D Data¶
In this problem, we will study methods for denoising simple one-dimensional (1D) signals that have been corrupted by random additive noise.
Question 6: We will consider a piece-wise constant 1D signal in this part. Assume the signal (a function $f(t)$) contains $S=10$ discrete sub-intervals of equal length. Each sub-interval contains 50 samples. This signal is given below in the variable signal acquired at sample locations sampling_locations (also provided below).
Plot the 1D signal, create one plot with the regular plot commands and one plot with the stem command (use subplots to plot the two next to each other in a row). Label all axes and give the plots titles. The x-axis values should be the sample locations in $[0,10]$.
In [ ]:
# Provided variables
num_samples = 500
num_subintervals = 10
signal = np.array([ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,
10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 9, 9, 9, 9, 9, 9, 9, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9,
9, 9, 9, 9, 9, 9, 9])
sampling_locations=np.arange(0,10,10/num_samples) #create an array with the sample locations in [0,1]
In [ ]:
Question 7: Generate an array of the same length as the 1D signal, whose elements are drawn from a random Gaussian distribution with mean zero and standard deviation $\sigma=3$. Look up the normal() submodule under numpy.random. Example syntax is
mu = 0 # mean
sigma = 3 # standard deviation
noise = np.random.normal(mu, sigma, n)
Generate a new signal by adding the piece-wise constant signal from above and your new random Gaussian array (noise). Plot this new signal both as a line and stem plot (use subplot() again) over the sampled points. Label all axes and give the plots titles. What do you observe has happened to the signal?
In [ ]:
Question 8: Think about how you can remove the noise in the signal using the ideas from Question 4 and 5. Assume you know that the underlying signal is piece-wise constant and you know the location of the constant subintervals. Think in terms of averages and the law of large numbers. Describe your algorithm briefly below and explain why it would work.
✎ Put your answer here
Question 9: Write a function named signaldenoiser that takes the noisy 1D array and the number of subintervals as inputs and returns a denoised array. This function should average all the samples in each subinterval and return a denoised signal with each subinterval set to the average value in that subinterval. Test the function on the noisy signal generated in Question 6.
Plot the denoised signal and the original signal in the same plot (use different line colors for the different signals) and label the axes and give the plot a title. Assign plot legends to distinguish the two signals. How does the denoised signal compare to the original signal?
When successful, your results should look something like the following (results may vary slightly):

In [ ]:
Question 10: Now, we would like to explore the impact of having less samples in the signal which also means less samples available for performing denoising. Currently, the signal has 500 samples, with 50 samples for each of the 10 intervals. Take your noisy signal generated in question 6 above and create a new signal with one half of the sampling rate. Basically, take your noisy signal and remove every other sample. This new signal will be length 250 samples.
Prepare a set of plots with two rows and two columns. In the first subplot, make a plot of this new noise sample. Then perform the denoising of the sample with your signaldenoiser function from question 9. Make a plot that matches the plot from question 9 below the noisy signal.
Likewise, take your noisy signal generated in question 7 above and create a new signal with one tenth of the sampling rate. Basically, make a signal with 1 out of every 10 samples. This new signal will be length 50 samples. In the second subplot, make a plot of this new 1/10th sampled signal. Then perform the denoising of the sample with your signaldenoiser function from question 9. Make a plot that matches the plot from question 9 below the noisy signal. These plots will form the second column of the set of plots.
Label all axes, and add titles and legends for all plots.
When successful, your results should look something like the following (results may vary slightly):

In [1]:
Question 11: Which denoised signal is most accurate, the one with 50 samples per interval, 25 samples per interval, or 5 samples per interval. What do you observe about the denoising performance as $n$ decreases?
Write your answer here
In [ ]: