程序代写代做代考 GPU Keras flex algorithm deep learning go Deep Learning

Deep Learning
By Majid Babaei

Convolutional Neural Network (CNN)

Overview of Keras
Keras runs on top of open source libraries like TensorFlow, Theano or Cognitive Toolkit (CNTK).
Theano is a python library used for fast numerical computation tasks.
TensorFlow is the most famous symbolic math library used for creating neural networks and deep learning models.

Architecture of Keras
Keras API can be divided into three main categories
• Model
• SequentialModel−alinearcompositionofLayers. • Functional API
• Layers
• CoreLayers
• ConvolutionLayers • PoolingLayers
• RecurrentLayers
• Core Modules
• Activations module • Lossmodule
• Optimizer module • Regularizes

Your First Deep Learning Project in Python with Keras Step-By-Step

Overview
There is not a lot of code required, but we are going to step over it slowly so that you will know how to create your own models in the future.
We take the following steps: • LoadDataset
• DefineKerasModel.
• CompileKerasModel. • FitKerasModel.
• EvaluateKerasModel. • MakePredictions

A few requirements
• You have Python 2 or 3 installed
• You have SciPy (including NumPy) installed
• You have Keras and a backend (Theano or TensorFlow) installed

1. Load Dataset

What is our dataset?
• We are going to use the Pima Indians onset of diabetes dataset.
• This is a standard machine learning dataset from the UCI Machine Learning repository
• It describes patient medical record data and whether they had an onset of diabetes within five years.
• It is a binary classification problem
• All of the input variables that describe each patient are numerical
• This makes it easy to use directly with neural networks that expect numerical input and output values
• The dataset is available from here:
• Dataset CSV File (pima-indians-diabetes.csv) • Dataset Details

Load Libraries
• Download the dataset and place it in your local working directory, the same location as your python file.
• The first step is to define the functions and classes we intend to use
• We will use the NumPy library to load our dataset and we will use two classes from the Keras library to define our model.
from keras.models import Sequential from keras.layers import Activation, Dense from numpy import loadtxt

1. Load Dataset
• We can split the array into two arrays by selecting subsets of columns using the standard NumPy slice operator.
• We can select the first 8 columns from index 0 to index 7 via “dataset[:,0:8]”. • We can then select the output column (the 9th variable) via “dataset[:,8]”.
# load the dataset
dataset = loadtxt(‘pima-indians-diabetes.csv’, delimiter=’,’) # split into input (X) and output (y) variables
X = dataset[:,0:8]
y = dataset[:,8]

• If this is new to you, then you
can learn more about array
slicing and ranges in this link:
How to Index, Slice and Reshape NumPy Arrays for Machine Learning in Python
Index, Slice and Reshape NumPy Arrays

2. Define Keras Model

2. Define Keras Model
• Models in Keras are defined as a sequence of layers.
• We create a Sequential model and add layers one at a time until we
are happy with our network architecture.
• The first thing to get right is to ensure the input layer has the right number of input features.
• This can be specified when creating the first layer with the input_dim argument and setting it to 8.
Why 8?!

• This is a very hard question!
• There are heuristics that we can use and often the best network structure is found through a process of trial and error experimentation.
• Generally, you need a network large enough to capture the structure of the problem.
How do we know the number of layers?

2. Define Keras Model
• In this example, we will use a fully-connected network structure with three layers.
• Fully connected layers are defined using the Dense class.
• We can specify the number of neurons or nodes in the layer as the first argument
• We can specify the activation function using the activation argument
• In this example, we will use the rectified linear unit activation function referred to as ReLU on the first two layers and the Sigmoid function in the output layer.

2. Define Keras Model
• It used to be the case that Sigmoid and Tanh activation functions were preferred for all layers. These days, better performance is achieved using the ReLU activation function.
• We use Sigmoid on the output layer to ensure our network output is between 0 and 1.

2. Define Keras Model
• We can piece it all together by adding each layer:
• The model expects rows of data with 8 variables (input_dim=8 )
• The first hidden layer has 12 nodes and uses the relu activation function.
• The second hidden layer has 8 nodes and uses the relu activation function. • The output layer has one node and uses the sigmoid activation function.
# define the keras model
model = Sequential()
model.add(Dense(12, input_dim=8, activation=’relu’)) model.add(Dense(8, activation=’relu’)) model.add(Dense(1, activation=’sigmoid’))

•The most confusing thing here is that the shape of the input to the model is defined as an argument on the first hidden layer.
Note!

3. Compile Keras Model

3. Compile Keras Model
• Now that the model is defined, we can compile it.
• Compiling the model uses efficient numerical libraries under the
covers (the so-called backend) such as Theano or TensorFlow.
• The backend automatically chooses the best way to represent the network for training and making predictions to run on your hardware, such as CPU or GPU or even distributed.

3. Compile Keras Model
• When compiling, we must specify some additional properties required when training the network.
• Remember training a network means finding the best set of weights to map inputs to outputs in our dataset.

What are additional properties we need to set when training the network?

Additional
properties
We must specify the loss function to use to evaluate a set of weights
We must specify the optimizer which is used to search through different weights for the network
Any optional metrics we would like to collect and report during training.

What do we set in our example?!
• We will use “cross entropy” as the loss function
• This loss function is usually used for binary classification problems • It is defined in Keras as “binary_crossentropy“
• We will define the optimizer as the efficient stochastic gradient descent algorithm “adam”.
• This is a popular version of gradient descent
• it automatically tunes itself and gives good results in a wide range of problems.
• Finally, because it is a classification problem, we will collect and report the classification accuracy, defined via the metrics argument.

3. Compile Keras Model : Code
# compile the keras model
model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])

4. Fit Keras Model

4. Fit Keras Model
• We have defined our model and compiled it ready for efficient computation. • Now it is time to execute the model on some data.
• We can train or fit our model on our loaded data by calling the fit() function. • Training occurs over epochs and each epoch is split into batches.

4. Fit Keras Model: Epoch vs. Batch
• One epoch is comprised of one or more batches, based on the chosen batch size • Durring the Training Process : The model is fit for many epochs.
What are Epoch and Bache?!
• Epoch: One pass through all of the rows in the training dataset.
• Batch: One or more samples considered by the model within an epoch.

4. Fit Keras Model: arguments
• The training process will run for a fixed number of iterations through the dataset
called epochs
• We can specify that using the epochs argument.
• We can also set the number of dataset rows that are considered within each epoch, called the batch size and set using the batch_size argument.

• These configurations can be chosen experimentally by trial and error.
• We train the model, so that it learns a good mapping of rows of input data to the output classification.
• The model will always have some error, but the amount of error will level out after some point for a given model configuration.
• This is called model convergence
How do we know the best values for epoch and batch size?

What do we set in our example?!
• For this problem, we will run for a small number of epochs (150) and use a relatively small batch size of 10.
# fit the keras model on the dataset model.fit(X, y, epochs=150, batch_size=10)
This work happens on your CPU or GPU.

5. Evaluate Keras Model

5. Evaluate Keras Model
• We have trained our neural network on the entire dataset and we can evaluate the performance of the network on the same dataset.
• This will only give us an idea of how well we have modeled the dataset (e.g. train accuracy)
• But no idea of how well the algorithm might perform on new data.
We have done this for simplicity, but ideally, you could separate your data into train and test datasets for training and evaluation of your model.

5. Evaluate Keras Model
• We use the evaluate() function which returns two values.
• The first will be the loss of the model on the dataset
• The second will be the accuracy of the model on the dataset.
• We are only interested in reporting the accuracy, so we will ignore the loss value.

5. Evaluate Keras Model : Code
# evaluate the keras model
_, accuracy = model.evaluate(X, y) print(‘Accuracy: %.2f’ % (accuracy*100))

Let’s Tie It All Together

Tie It All Together
• You have just seen how you can easily create your first neural network model in Keras.
• Let’s tie it all together into a complete code example.
You can then run the Python file, keras_first_network.py

Running this example
• You should see a message for each of the 150 epochs printing the loss and accuracy, followed by the final evaluation of the trained model on the training dataset.
• Note, the accuracy of your model will vary.
• It takes about 10 seconds to execute on my workstation running on the CPU. • Ideally, we would like the loss to go to zero and accuracy to go to 1.0.

6. Make Predictions

6. Make Predictions
• The number one question I get asked is:
“After I train my model, how can I use it to make predictions on new data?”

6. Make Predictions
• We can adapt this example and use it to generate predictions
• Making predictions is as easy as calling the predict() function on the model.
• We are using a sigmoid activation function on the output layer, so the predictions will be a probability in the range between 0 and 1.
• We can easily convert them into a binary prediction for this classification task by rounding them.

6. Make Predictions
• For example:
# make probability predictions with the model predictions = model.predict(X)
# round predictions
rounded = [round(x[0]) for x in predictions]
• Alternately, we can call the predict_classes() function on the model to predict crisp classes directly, for example:
# make class predictions with the model predictions = model.predict_classes(X)

Run the code
Please open “keras_first_network_with_predictions.py” and run the script

Results
The input rows and predicted class value for the first 5 examples is printed and compared to the expected class value.
In fact, we would expect about 76.9% of the rows to be correctly predicted based on our estimated performance

2.
Compilation
49

Step2: Compilation
50

Step2: Compilation
Before training a model, you need to configure the learning process, which is done via the compile method.
• It receives three arguments: An optimizer, A loss function, and A list of metrics
• An optimizer. This could be the string identifier of an existing optimizer
• optimizer=’rmsprop’ OR optimizer=’adagrad’ OR etc.
• A loss function. This is the objective that the model will try to minimize.
• loss=’binary_crossentropy’ OR loss=’categorical_crossentropy’ OR loss=’mse’ OR etc.
• A list of metrics. Like any classification problem your need to a metric to evaluate the model. This could be the string identifier of an existing metric or a custom metric function.
• metrics=[‘accuracy’] OR etc.
51

For a useful mental model, you can think of a hiker trying to get down a mountain with a blindfold on.
It’s impossible to know which direction to go in, but there’s one thing they can know:
if they are going down (making progress) or
going up (losing progress)
Step2: Compilation : What are Optimization Algorithms?

Step2: Compilation : What are Optimization Algorithms?
Eventually, if they keep taking steps that lead them downwards, they will reach the base.

Similarly, it’s impossible to know what your model’s weights should be right from the start.
But with some trial and error based on the loss function, you can end up getting there eventually.
Step2: Compilation : What are Optimization Algorithms?

Step2: Compilation – An Optimizer
• Optimization algorithms helps us to minimize the loss function
• The Weights(w) and the Bias(b) values of the neural network as its learnable parameters
which are used in computing the output values.
• They are updated in the direction of optimal solution for minimizing the Loss by the network’s training process.
We use various optimization algorithms to calculate optimum values for the Weights(w) and the Bias(b).
55

ent Descent
An Optimizer : Gradi

Step2: Compilation – An Optimizer : Gradient Descent
This algorithm is used across all types of Machine Learning to optimize. It’s fast, robust, and
flexible.
Gradients represent what a small change in a weight or parameter would do to the loss function
The gradient of a function at a point represents its slope at the point.
57

Step2: Compilation – An Optimizer : Gradient Descent Here’s how Gradient Descent works:
• #1: It calculates what a small change in each individual weight would do to the loss function (i.e. which direction should the hiker walk in)
• #2: It adjusts each individual weight based on its gradient (i.e. take a small step in the determined direction)
Keep doing steps #1 and #2 until the loss function gets as low as possible
58

Local Minima
◎One situation that you might experience in optimization is getting stuck on local minima
◎ When dealing with high dimensional data sets (lots of variables) it’s possible you’ll find an area where it seems like you’ve reached the lowest possible value for your loss function, but it’s really just a local minimum.
In the hiker example, this is like finding a small valley within the mountain you’reclimbingdown.Itrequiresclimbing up!
How to avoid getting stuck on local minima?
59

Step2: Compilation – An Optimizer : Gradient Descent
◎ToavoidgettingstuckinlocalminimaindeepLearning,weneedtomakesurethatwe are using the proper learning rate
What Is the Learning Rate?
60

Learning Rate
• The amount that the weights are updated during training is referred to as the step size or the “learning rate.”
• The learning rate is a configurable hyperparameter used in the training of neural networks that has a small positive value, often in the range between 0.0 and 1.0.
61

Effect of Learning Rate
• It controls the rate or speed at which the model learns.
• When the learning rate is too large, gradient descent can inadvertently increase rather than decrease the training error.
• When the learning rate is too small, training is not only slower, but may become permanently stuck with a high training error.
62

High vs. Low learning rate
63

How to Configure Learning Rate
◎The learning rate may, in fact, be the most important hyperparameter to configure for your model.
Unfortunately, we cannot analytically calculate the optimal learning rate for a given model on a given dataset. Instead, a good (or good enough) learning rate must be discovered via trial and error.
64

But! Some tips
◎A traditional default value for the learning rate is 0.1 or 0.01, and this may represent a good starting point on your problem.
◎Diagnostic plots can be used to investigate how the learning rate impacts the rate of learning and learning dynamics of the model.
◎One example is to create a line plot of loss over training epochs during training. How can we translate this plot ?!
65

How does a learning curve in ML look like?
Generally, a learning curve is a plot that shows time or experience on the x-axis and learning or improvement on the y-axis.

How to use Learning Curves to Diagnose Machine Learning Model Performance

Using Learning Curves
• Learning curves are widely used as a diagnostic tool in machine learning for algorithms that learn from a training dataset incrementally.
• It can be used to diagnose problems with learning, such as: • Underfit model
• Overfit model
• Whether the training and validation datasets are suitably representative.

We will discover learning curves and how they can be used to diagnose the learning and generalization behavior of machine learning models.

The metric used to evaluate learning
It could be maximizing, meaning that better scores (larger numbers) indicate more learning. An example would be classification accuracy.
It could be minimizing, such as loss or error whereby better scores (smaller numbers) indicate more learning and a value of 0.0 indicates that the training dataset was learned perfectly and no mistakes were made.

Keep this in mind
During the training of a machine learning model, the current state of the model at each step can be evaluated.

On the training dataset to give an idea of how well the model is “learning.”
On the test dataset gives an idea of how well the model is “generalizing.”

It is common to create dual learning curves for a machine learning model during training.

In some cases, it is also common to create learning curves for multiple metrics
model may be optimized according to loss scores
model performance is evaluated using classification accuracy

Now that we are familiar with
the use of learning curves,
let’s look at some common
shapes observed in learning
curve plots.

The shape and dynamics of a learning curve is important!
• It can be used to diagnose the behavior of a machine learning model • There are three common dynamics that you are likely to observe in
learning curves: • Underfit.
• Overfit.
• Good Fit.
We will take a closer look at each with examples!

• Underfitting refers to a model that cannot learn the training dataset. Underfit
Learning Curves
• A plot of learning curves shows underfitting if:
• Thetraininglossremainsflatregardlessoftraining.
• The training loss continues to decrease until the end of training.

Underfit Learning Curves
The model does not have a suitable capacity for the complexity of the dataset.

Underfit Learning Curves
Training loss is decreasing and continues to decrease at the end of the plot.
This indicates that the model is capable of further learning and possible further improvements

• Overfitting refers to a model that has learned the training dataset too well, including the statistical noise or random fluctuations in the training dataset.
Overfit Learning Curves
• The problem with overfitting, is that the more specialized the model becomes to training data, the less well it is able to generalize to new data
• A plot of learning curves shows overfitting if:
• The plot of training loss continues to decrease with experience while the plot of validation loss decreases to a point and begins increasing again.

Overfit Learning Curves
The inflection point in validation loss may be the point at which training could be halted as experience after that point shows the dynamics of overfitting.

• A good fit is the goal of the learning algorithm and exists between an overfit and underfit model.
Good Fit
Learning
Curves
• The loss of the model will almost always be lower on the training dataset than the validation dataset.
• This means that we should expect some gap between the train and validation loss learning curves. This gap is referred to as the “generalization gap.”

Good Fit Learning Curves
• A plot of learning curves shows a good fit if:
• The plot of training loss decreases to a point of stability while the plot of validation loss decreases to a point of stability and has a small gap with the training loss.

Good Fit Learning Curves
The example plot here demonstrates a case of a good fit.
Continued training of a good fit will likely lead to an overfit.

Diagnosing Unrepresentative Datasets
Learning curves can also be used to diagnose properties of a dataset and whether it is relatively representative.
An unrepresentative dataset means a dataset that may not capture the statistical characteristics relative to another dataset, such as a train and a validation dataset.
This can commonly occur if the number of samples in a dataset is too small, relative to another dataset.

Two common cases
• There are two common cases that could be observed: • Training dataset is relatively unrepresentative.
• Validation dataset is relatively unrepresentative.

Unrepresentative Train Dataset
• It means that the training dataset does not provide sufficient information to learn the problem.
• This may occur if the training dataset has too few examples as compared to the validation dataset.
• This situation can be identified by a learning curve for training loss that shows improvement and similarly a learning curve for validation loss that shows improvement, but a large gap remains between both curves.

Unrepresentative Train Dataset
Both curves are improving but a large gap remains between both curves

Unrepresentative Validation Dataset
• It means that the validation dataset does not provide sufficient information to evaluate the ability of the model to generalize.
• This may occur if the validation dataset has too few examples as compared to the training dataset.
• This case can be identified by a learning curve for training loss that looks like a good fit (or other fits) and a learning curve for validation loss that shows noisy movements around the training loss.

Unrepresentative Validation Dataset
It shows noisy movements around the training loss

Unrepresentative Validation Dataset
It may also be identified by a validation loss that is lower than the training loss.
It indicates that the validation dataset may be easier for the model to predict than the training dataset.

92