Deep Learning and Text Analytics
¶
References:
• General introduction
▪ http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/
• Word vector:
▪ https://code.google.com/archive/p/word2vec/
• Keras tutorial
▪ https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
• CNN
▪ http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/
1. Agenda¶
• Introduction to neural networks
• Word/Document Vectors (vector representation of words/phrases/paragraphs)
• Convolutionary neural network (CNN)
• Application of CNN in text classification
2. Introduction neural networks¶
• A neural network is a computational model inspired by the way biological neural networks in the human brain process information.
• Neural networks have been widely applied in speech recognition, computer vision and text processing
2.1. Single Neuron¶

$$h_{W,b}(x)=f(w_1x_1+w_2x_2+w_3x_3+b)$$
• Basic components:
▪ input ($X$): $[x_1, x_2, x_3]$
▪ weight ($W$): $[w_1, w_2, w_3]$
▪ bias: $b$
▪ activation function: $f$
• Different activation functions:
▪ Sigmoid (logistic function): takes a real-valued input and squashes it to range [0,1]. $$f(z)=\frac{1}{1+e^{-z}}$$, where $z=w_1x_1+w_2x_2+w_3x_3+b$
▪ Tanh (hyperbolic tangent): takes a real-valued input and squashes it to the range [-1, 1]. $$f(z)=tanh(z)=\frac{e^z-e^{-z}}{e^z+e^{-z}}$$
▪ ReLU (Rectified Linear Unit): $$f(z)=max(0,z)$$
▪ Softmax (normalized exponential function): a generalization of the logistic function. If $z=[z_1, z_2, …, z_k]$ is a $k$-dimensional vector, $$f(z)_{j \in k}=\frac{e^{z_j}}{\sum_{i=1}^k{e^{z_i}}}$$
◦ $f(z)_{j} \in [0,1]$
◦ $\sum_{j \in k} {f(z)_{j}} =1 $
◦ $f(z)_{j}$ is treated as the probability of component $j$, a probability distribution over $k$ different possible outcomes
◦ e.g. in multi-label classification, softmax gives a probability of each label
**softmax总和等于1,跟sigmoid不同
2.2 Neural Network Model¶
• A neural network is composed of many simple neurons, so that the output of a neuron can be the input of another
• The sample neural network model has 3 input nodes, 3 hidden units, and 1 output unit
▪ input layer: the leftmost layer
▪ outout layer: the rightmost layer (produce target, i.e. prediction, classification)
▪ bias units: indicated by “+1” node
▪ hidden layer: the middle layer of nodes 
• $W$, $x$, and $b$ usually represented as arrays (i.e. vectorized)
▪ $w_{ij}^{(l)}$: the weight associated with the link from unit $j$ in layer $l$ to unit $i$ in layer $l+1$
▪ $W^{(1)} \in \mathbb{R}^{3\text{x}3}$, $W^{(2)} \in \mathbb{R}^{1\text{x}3}$, $b^{(1)} \in \mathbb{R}^{3\text{x}1}$, $b^{(2)} \in \mathbb{R}^{1\text{x}1}$
▪ Note $W^{(l)}x$ is the dot product between $W^{(l)}$ and $x$, i.e. $W^{(l)} \cdot x$
• If a neural network contains more than 1 hidden layer, it’s called a deep neural network (deep learning)
• Training a neural network model is to find $W$ and $b$ that optimize some cost function, given tranining samples (X,Y), where X and Y can be multi-dimensional
2.3. Cost function¶
• Training set: m samples denoted as $(X,Y)={(x^{(1)}, y^{(1)}), (x^{(2)}, y^{(2)}), …, (x^{(m)}, y^{(m)})}$
• A typical cost function: mean_squared_error
▪ Sum of square error: $J(W,b;x,y)=\frac{1}{2}||h_{W,b}(x)-y||^2$
▪ Regularization (square of each weight, or L2): $\sum_{i, j, l}(w_{ij}^{(l)})^2$. An important mechanism to prevent overfitting
▪ Cost function: $$J(W,b)=\frac{1}{m}\sum_i^m{(\frac{1}{2}||h_{W,b}(x)-y||^2)}+ \frac{\lambda}{2}\sum_{i, j, l}(w_{ij}^{(l)})^2$$, where $\lambda$ is regularization coefficient
• Other popular cost functions
▪ Cross-entropy cost
◦ Let’s assume a single neuron with sigmoid activation function 
◦ Let $\widehat y=h_{W,b}(x)$, the prediction of true value $y$. $\widehat y, y \in [0,1]$.
◦ Then cross-entrophy cost is defined as: $$J=-\frac{1}{m}\sum_{i=1}^m{y_i\ln{\widehat y_i}+(1-y_i)\ln{(1-\widehat y_i)}}$$
◦ What makes cross-entropy a good cost function
◦ It’s non-negative
◦ if the neuron’s output $\widehat y$ is close to the actual value $y$ (0 or 1) for all training inputs, then the cross-entropy will be close to zero
• For comparison between “Sum of Square error” and “Cross-entropy cost”, read http://neuralnetworksanddeeplearning.com/chap3.html
2.4. Gradient Descent¶
• An optimization algorithm used to find the values of parameters ($W, b$) of a function ($J$) that minimizes a cost function ($J(W,b)$.
• It is best used when the parameters cannot be calculated analytically (e.g. using linear algebra) and must be searched for by an optimization algorithm 
• resource: https://www.analyticsvidhya.com/blog/2017/03/introduction-to-gradient-descent-algorithm-along-its-variants/
• It uses derivatives of cost function to determine the direction to move the parameter values in order to get a lower cost on the next iteration
• Procedure:
1. initialize $W$ with random values
2. given samples (X,Y) as inputs, calculate dirivatives of cost function with regard to every parameter $w_{ij}^{(l)}$, i.e. $\frac{\partial{J}}{\partial{w_{ij}^{(l)}}}$
3. update parameters by $(w_{ij}^{(l)})^{‘}=w_{ij}^{(l)}-\alpha*\frac{\partial{J}}{\partial{w_{ij}^{(l)}}}$, where $\alpha$ is the learning rate
4. repeat steps 2-3 until $w_{ij}^{(l)}$ converges
• Learning rate $\alpha$
▪ It’s critical to pick the right learning rate. Big $\alpha$ or small $\alpha$?
▪ $\alpha$ may need to be adapted as learning unfolds
• Challenges of Gradient Descent
▪ It is expensive to compute $\frac{1}{m}\sum_i^m{(\frac{1}{2}||h_{W,b}(x_i)-y_i||^2)}$ for all samples in each round
▪ It is difficult to compute $\frac{\partial{J}}{\partial{w_{ij}^{(l)}}}$ if a neural netowrk has many layers
2.5. Stochastic Gradient Descent¶
• Estimate of cost function using a subset of randomly chosen training samples (mini-batch) instead of the entire training set
• Procedure:
1. pick a randomly selected mini-batch, train with them and update $W, b$,
2. repeat step (1) with another randomly selected mini-batch until the training set is exhausted (i.e. complete an epoch),
3. start over with another epoch until $W, b$ converge
• Hyperparameters (parameters that control the learning of $W, b$)
▪ Batch size: the size of samples selected for each iteration
▪ Epoches: One epoch means one complete pass through the whole training set. Ususally we need to use many epoches until $W, b$ converge
▪ e.g. if your sample size is 1000, and your batch size is 200, how many iterations are needed for one epoch?
▪ e.g. if you set # of epoches to 5, how many times in total you update $W, b$?
2.6. Backpropagation Algorithm — The efficient way to calcluate gradients (i.e. partial derivatives)¶
An intutive video: https://www.youtube.com/watch?v=tIeHLnjs5U8
Forward Propagation
Backprogation


input signals are passing through each layer by multiplying the weights
backpropagate the error back to each layer proportional to perspective weights, and update the weights based on attributed errors in hope to correct the error
• Algorithm:
1. perform a feedforward pass, computing the activations for layers L2, L3, … and so on up to the output layer
2. for output layer $n$,
$\delta^{(n)} = \frac{\partial}{\partial z^{(n)}} J(W,b; x, y) = \frac{\partial}{\partial z^{(n)}} \frac{1}{2} \left\|y – h_{W,b}(x)\right\|^2 = – (y – a^{(n)}) \cdot f'(z^{(n)})$
3. for $l=n-1, n-2, …, n-3, …, 2$,
$ \delta^{(l)} = \left((W^{(l)})^T \delta^{(l+1)}\right) \cdot f'(z^{(l)})$
4. Compute the desired partial derivatives, which are given as:
$ \frac{\partial}{\partial W_{ij}^{(l)}} J(W,b; x, y) = a^{(l)}_j \delta_i^{(l+1)}$
$\frac{\partial}{\partial b_{i}^{(l)}} J(W,b; x, y) = \delta_i^{(l+1)}$
• Example:
▪ $\delta^{(3)} = \frac{\partial}{\partial z^{(3)}} J(W,b; x, y) = (a^{(3)} – y) \cdot f'(z^{(3)})$
▪ $ \delta^{(2)} = \left((W^{(2)})^T \delta^{(3)}\right) \cdot f'(z^{(2)})$
▪ $ \frac{\partial}{\partial W_{12}^{(2)}} J(W,b; x, y) = a^{(2)}_2 \delta_1^{(3)}$
2.7 Hyperparameters¶
• Hyperparameters are parameters that control the learning of $w, b$ (our learning target)
• Summary of hyperparameters:
▪ Network structure:
◦ number of hidden layers
◦ number of neurons of each layer
◦ activation fucntion of each layer
▪ Learning rate ($\alpha$)
▪ regularization coeffiecient ($\lambda$)
▪ mini-batch size
▪ epoches
• For detailed explanation, watch: https://www.coursera.org/learn/neural-networks-deep-learning/lecture/TBvb5/parameters-vs-hyperparameters
3. Develop your First Neural Network Model with Keras¶
• Keras:
▪ high-level library for neural network models
▪ It wraps the efficient numerical computation libraries Theano and TensorFlow
• Why Keras:
▪ Simple to get started and keep going
▪ Written in python and higly modular; easy to expand
▪ Built-in modules for some sophisticated neural network models
• Installation
▪ pip install keras (or pip install keras –upgrade if you already have it) to install the latest version
▪ pip install theano
▪ pip install tensorflow
▪ pip install np-utils
• Basic procedure
1. Load data
2. Define model
3. Compile model
4. Fit model
5. Evaluate model
3.1. Basic Keras Modeling Constructs¶
• Sequential model: linear stack of layers
• Layers
▪ Dense: in a dense layer, each neuron is connected to neurons in the next layer
▪ Embedding
▪ Convolution
▪ MaxPooling
▪ …
• Cost (loss) functions
▪ mean_squared_error
▪ binary_crossentropy
▪ categorical_crossentropy
▪ …
• Optimizer (i.e. optimization algorithm)
▪ SGD (Stochastic Gradient Descent): fixed learning rate in all iterations
▪ Adagrad: adapts the learning rate to the parameters, performing larger updates for infrequent, and smaller updates for frequent parameters
▪ Adam (Adaptive Moment Estimation): computes adaptive learning rates for each parameter.
• Metrics
▪ accuracy: a ratio of correctly predicted samples to the total samples
▪ precision/recall/f1 through sklearn package
▪ Example:
◦ acc: (90+85)/200=87%
◦ prec:
◦ recall:
Predicted T
Predicted F
Actual T
90
10
Actual F
15
85
3.2. Example¶
• Example: build a simple neural network model to predict diabetes using “Pima Indians onset of diabetes database” at http://archive.ics.uci.edu/ml/datasets/Pima+Indians+Diabetes
▪ Columns 1-8: variables
▪ Column 9: class variable, 0 or 1
• A sequential model with 4 layers
▪ each node is a tensor, a function of multidimensional arrays
◦ Input (L1)
◦ L2 (hidden layer, dense)
◦ L3 (hidden layer, dense)
◦ Output (dense)
▪ the model is a tensor graph (computation graph)
• 
•
Training a deep learning model is a very empirical process. You may need to tune the hyperparameters in many iterations
In [2]:
# set up interactive shell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = “all”
In [3]:
# Exercise 3.1. Load data
import numpy as np
import pandas as pd
# Load data
data=pd.read_csv(“../../../dataset/pima-indians-diabetes.csv”,header=None)
data.head()
data[8].value_counts()
X=data.values[:,0:8]
y=data.values[:,8]
X.shape
Out[3]:
0
1
2
3
4
5
6
7
8
0
6
148
72
35
0
33.6
0.627
50
1
1
1
85
66
29
0
26.6
0.351
31
0
2
8
183
64
0
0
23.3
0.672
32
1
3
1
89
66
23
94
28.1
0.167
21
0
4
0
137
40
35
168
43.1
2.288
33
1
Out[3]:
0 500
1 268
Name: 8, dtype: int64
Out[3]:
(768, 8)
In [4]:
# Exercise 3.2. Create Model
# sequential model is a linear stack of layers
from keras.models import Sequential
# in a dense layer which each neuron is connected to
# each neuron in the next layer
from keras.layers import Dense
# import packages for L2 regularization
from keras.regularizers import l2
# fix random seed for reproducibility
np.random.seed(7)
# set lambda (regularization coefficient)
lam=0.01
# create a sequential model
model = Sequential()
# add a dense layer with 12 neurons, 8 input variables
# and rectifier activation function (relu)
# and L2 regularization
# how many parameters in this layer?
model.add(Dense(12, input_dim=8, \
activation=’relu’, \
kernel_regularizer=l2(lam), \
name=’L2′) )
# add another hidden layer with 8 neurons
model.add(Dense(8, activation=’relu’, \
kernel_regularizer=l2(lam),\
name=’L3′) )
# add the output layer with sigmoid activation function
# to return probability
model.add(Dense(1, activation=’sigmoid’, \
name=’Output’))
# compile the model using binary corss entropy cost function
# adam optimizer and accuracy
model.compile(loss=’binary_crossentropy’, \
optimizer=’adam’, \
metrics=[‘accuracy’])
/Users/rliu/anaconda/lib/python2.7/site-packages/h5py/__init__.py:34: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
WARNING:tensorflow:From /Users/rliu/anaconda/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:1188: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
WARNING:tensorflow:From /Users/rliu/anaconda/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py:1290: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
In [5]:
# Exercise 3.3. Check model configuration
model.summary()
# Show the model in a computation graph
# it needs pydot and graphviz
# don’t worry if you don’t have them installed
#from keras.utils import plot_model
#plot_model(model, to_file=’model.png’)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
L2 (Dense) (None, 12) 108
_________________________________________________________________
L3 (Dense) (None, 8) 104
_________________________________________________________________
Output (Dense) (None, 1) 9
=================================================================
Total params: 221
Trainable params: 221
Non-trainable params: 0
_________________________________________________________________
In [6]:
# Exercise 3.4. Fit Model
# train the model with min-batch of size 10,
# 100 epoches (# how many iterations?)
# Keep 20% samples for test
# shuffle data before train-test split
# set fitting history into variable “training”
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test=train_test_split(X, y, \
test_size=0.25, random_state=123)
training=model.fit(X_train, y_train, \
validation_data=[X_test, y_test], \
shuffle=True,epochs=150, \
batch_size=32, verbose=2)
Train on 576 samples, validate on 192 samples
Epoch 1/150
0s – loss: 5.0593 – acc: 0.6615 – val_loss: 4.8492 – val_acc: 0.6146
Epoch 2/150
0s – loss: 3.3136 – acc: 0.5955 – val_loss: 3.3641 – val_acc: 0.5208
Epoch 3/150
0s – loss: 1.8646 – acc: 0.5434 – val_loss: 1.5383 – val_acc: 0.5000
Epoch 4/150
0s – loss: 1.2648 – acc: 0.5799 – val_loss: 1.2490 – val_acc: 0.5312
Epoch 5/150
0s – loss: 1.0285 – acc: 0.6042 – val_loss: 1.0567 – val_acc: 0.5417
Epoch 6/150
0s – loss: 0.9132 – acc: 0.6181 – val_loss: 0.9434 – val_acc: 0.5990
Epoch 7/150
0s – loss: 0.8696 – acc: 0.6441 – val_loss: 0.9042 – val_acc: 0.5781
Epoch 8/150
0s – loss: 0.8435 – acc: 0.6476 – val_loss: 0.8860 – val_acc: 0.6094
Epoch 9/150
0s – loss: 0.8111 – acc: 0.6632 – val_loss: 0.8955 – val_acc: 0.6458
Epoch 10/150
0s – loss: 0.8007 – acc: 0.6892 – val_loss: 0.8708 – val_acc: 0.6406
Epoch 11/150
0s – loss: 0.7828 – acc: 0.6823 – val_loss: 0.8381 – val_acc: 0.6146
Epoch 12/150
0s – loss: 0.7680 – acc: 0.6892 – val_loss: 0.8299 – val_acc: 0.6667
Epoch 13/150
0s – loss: 0.7591 – acc: 0.6649 – val_loss: 0.8488 – val_acc: 0.6667
Epoch 14/150
0s – loss: 0.7544 – acc: 0.6684 – val_loss: 0.8606 – val_acc: 0.6458
Epoch 15/150
0s – loss: 0.7568 – acc: 0.6493 – val_loss: 0.8552 – val_acc: 0.6615
Epoch 16/150
0s – loss: 0.7284 – acc: 0.6962 – val_loss: 0.8074 – val_acc: 0.6615
Epoch 17/150
0s – loss: 0.7116 – acc: 0.6944 – val_loss: 0.7963 – val_acc: 0.6510
Epoch 18/150
0s – loss: 0.7320 – acc: 0.6736 – val_loss: 0.7672 – val_acc: 0.6979
Epoch 19/150
0s – loss: 0.7166 – acc: 0.6927 – val_loss: 0.7678 – val_acc: 0.6771
Epoch 20/150
0s – loss: 0.6963 – acc: 0.7014 – val_loss: 0.8036 – val_acc: 0.6510
Epoch 21/150
0s – loss: 0.6984 – acc: 0.6997 – val_loss: 0.7668 – val_acc: 0.6615
Epoch 22/150
0s – loss: 0.6900 – acc: 0.6892 – val_loss: 0.7656 – val_acc: 0.6927
Epoch 23/150
0s – loss: 0.6944 – acc: 0.6944 – val_loss: 0.7571 – val_acc: 0.6927
Epoch 24/150
0s – loss: 0.6788 – acc: 0.7240 – val_loss: 0.7424 – val_acc: 0.6875
Epoch 25/150
0s – loss: 0.6835 – acc: 0.7083 – val_loss: 0.7465 – val_acc: 0.6771
Epoch 26/150
0s – loss: 0.6804 – acc: 0.7031 – val_loss: 0.7307 – val_acc: 0.6875
Epoch 27/150
0s – loss: 0.6696 – acc: 0.7153 – val_loss: 0.7549 – val_acc: 0.6719
Epoch 28/150
0s – loss: 0.6664 – acc: 0.7153 – val_loss: 0.7315 – val_acc: 0.6927
Epoch 29/150
0s – loss: 0.6671 – acc: 0.7205 – val_loss: 0.7204 – val_acc: 0.6927
Epoch 30/150
0s – loss: 0.6704 – acc: 0.7240 – val_loss: 0.7361 – val_acc: 0.6823
Epoch 31/150
0s – loss: 0.6568 – acc: 0.6997 – val_loss: 0.7233 – val_acc: 0.6510
Epoch 32/150
0s – loss: 0.6670 – acc: 0.7066 – val_loss: 0.7498 – val_acc: 0.6562
Epoch 33/150
0s – loss: 0.6735 – acc: 0.7049 – val_loss: 0.7177 – val_acc: 0.6667
Epoch 34/150
0s – loss: 0.6619 – acc: 0.7170 – val_loss: 0.7313 – val_acc: 0.6771
Epoch 35/150
0s – loss: 0.6530 – acc: 0.7240 – val_loss: 0.7219 – val_acc: 0.6771
Epoch 36/150
0s – loss: 0.6521 – acc: 0.7188 – val_loss: 0.7241 – val_acc: 0.6510
Epoch 37/150
0s – loss: 0.6441 – acc: 0.7205 – val_loss: 0.7191 – val_acc: 0.6875
Epoch 38/150
0s – loss: 0.6477 – acc: 0.7135 – val_loss: 0.7236 – val_acc: 0.6667
Epoch 39/150
0s – loss: 0.6564 – acc: 0.7205 – val_loss: 0.7389 – val_acc: 0.6562
Epoch 40/150
0s – loss: 0.6481 – acc: 0.7257 – val_loss: 0.6948 – val_acc: 0.6927
Epoch 41/150
0s – loss: 0.6360 – acc: 0.7309 – val_loss: 0.7153 – val_acc: 0.6719
Epoch 42/150
0s – loss: 0.6455 – acc: 0.7170 – val_loss: 0.7369 – val_acc: 0.6771
Epoch 43/150
0s – loss: 0.6384 – acc: 0.7257 – val_loss: 0.6997 – val_acc: 0.6719
Epoch 44/150
0s – loss: 0.6322 – acc: 0.7083 – val_loss: 0.6917 – val_acc: 0.6979
Epoch 45/150
0s – loss: 0.6355 – acc: 0.7292 – val_loss: 0.6912 – val_acc: 0.6771
Epoch 46/150
0s – loss: 0.6354 – acc: 0.7326 – val_loss: 0.6916 – val_acc: 0.7083
Epoch 47/150
0s – loss: 0.6310 – acc: 0.7309 – val_loss: 0.6927 – val_acc: 0.6771
Epoch 48/150
0s – loss: 0.6320 – acc: 0.7274 – val_loss: 0.6830 – val_acc: 0.6979
Epoch 49/150
0s – loss: 0.6312 – acc: 0.7274 – val_loss: 0.6861 – val_acc: 0.7031
Epoch 50/150
0s – loss: 0.6297 – acc: 0.7292 – val_loss: 0.6802 – val_acc: 0.6927
Epoch 51/150
0s – loss: 0.6278 – acc: 0.7309 – val_loss: 0.6845 – val_acc: 0.6823
Epoch 52/150
0s – loss: 0.6388 – acc: 0.7170 – val_loss: 0.6919 – val_acc: 0.6771
Epoch 53/150
0s – loss: 0.6280 – acc: 0.7274 – val_loss: 0.6768 – val_acc: 0.7188
Epoch 54/150
0s – loss: 0.6256 – acc: 0.7309 – val_loss: 0.6970 – val_acc: 0.6927
Epoch 55/150
0s – loss: 0.6247 – acc: 0.7344 – val_loss: 0.6751 – val_acc: 0.6979
Epoch 56/150
0s – loss: 0.6322 – acc: 0.7274 – val_loss: 0.6698 – val_acc: 0.7188
Epoch 57/150
0s – loss: 0.6285 – acc: 0.7205 – val_loss: 0.6795 – val_acc: 0.7031
Epoch 58/150
0s – loss: 0.6189 – acc: 0.7257 – val_loss: 0.6728 – val_acc: 0.7292
Epoch 59/150
0s – loss: 0.6324 – acc: 0.7240 – val_loss: 0.6797 – val_acc: 0.6979
Epoch 60/150
0s – loss: 0.6202 – acc: 0.7222 – val_loss: 0.6867 – val_acc: 0.6615
Epoch 61/150
0s – loss: 0.6246 – acc: 0.7240 – val_loss: 0.6813 – val_acc: 0.6875
Epoch 62/150
0s – loss: 0.6407 – acc: 0.7170 – val_loss: 0.6909 – val_acc: 0.6979
Epoch 63/150
0s – loss: 0.6157 – acc: 0.7344 – val_loss: 0.6670 – val_acc: 0.7031
Epoch 64/150
0s – loss: 0.6126 – acc: 0.7431 – val_loss: 0.6904 – val_acc: 0.6979
Epoch 65/150
0s – loss: 0.6313 – acc: 0.7153 – val_loss: 0.6980 – val_acc: 0.6979
Epoch 66/150
0s – loss: 0.6324 – acc: 0.7309 – val_loss: 0.6988 – val_acc: 0.6823
Epoch 67/150
0s – loss: 0.6252 – acc: 0.7240 – val_loss: 0.7038 – val_acc: 0.6875
Epoch 68/150
0s – loss: 0.6343 – acc: 0.7031 – val_loss: 0.7461 – val_acc: 0.6719
Epoch 69/150
0s – loss: 0.6297 – acc: 0.7222 – val_loss: 0.6771 – val_acc: 0.6875
Epoch 70/150
0s – loss: 0.6124 – acc: 0.7292 – val_loss: 0.6736 – val_acc: 0.6979
Epoch 71/150
0s – loss: 0.6108 – acc: 0.7326 – val_loss: 0.6626 – val_acc: 0.7188
Epoch 72/150
0s – loss: 0.6161 – acc: 0.7292 – val_loss: 0.6931 – val_acc: 0.6771
Epoch 73/150
0s – loss: 0.6172 – acc: 0.7292 – val_loss: 0.6713 – val_acc: 0.6875
Epoch 74/150
0s – loss: 0.6330 – acc: 0.7222 – val_loss: 0.6812 – val_acc: 0.6823
Epoch 75/150
0s – loss: 0.6161 – acc: 0.7101 – val_loss: 0.6711 – val_acc: 0.6979
Epoch 76/150
0s – loss: 0.6118 – acc: 0.7361 – val_loss: 0.6559 – val_acc: 0.7240
Epoch 77/150
0s – loss: 0.6214 – acc: 0.7309 – val_loss: 0.6808 – val_acc: 0.6927
Epoch 78/150
0s – loss: 0.6147 – acc: 0.7292 – val_loss: 0.6919 – val_acc: 0.6927
Epoch 79/150
0s – loss: 0.6113 – acc: 0.7378 – val_loss: 0.6780 – val_acc: 0.7083
Epoch 80/150
0s – loss: 0.6074 – acc: 0.7309 – val_loss: 0.6666 – val_acc: 0.7031
Epoch 81/150
0s – loss: 0.6065 – acc: 0.7188 – val_loss: 0.6658 – val_acc: 0.6875
Epoch 82/150
0s – loss: 0.6040 – acc: 0.7240 – val_loss: 0.6729 – val_acc: 0.7292
Epoch 83/150
0s – loss: 0.6068 – acc: 0.7361 – val_loss: 0.6480 – val_acc: 0.7188
Epoch 84/150
0s – loss: 0.6026 – acc: 0.7170 – val_loss: 0.6618 – val_acc: 0.6927
Epoch 85/150
0s – loss: 0.6018 – acc: 0.7500 – val_loss: 0.6971 – val_acc: 0.6927
Epoch 86/150
0s – loss: 0.6025 – acc: 0.7361 – val_loss: 0.6715 – val_acc: 0.6927
Epoch 87/150
0s – loss: 0.6167 – acc: 0.7240 – val_loss: 0.6541 – val_acc: 0.7240
Epoch 88/150
0s – loss: 0.5975 – acc: 0.7344 – val_loss: 0.6523 – val_acc: 0.7135
Epoch 89/150
0s – loss: 0.5937 – acc: 0.7378 – val_loss: 0.6522 – val_acc: 0.7031
Epoch 90/150
0s – loss: 0.5991 – acc: 0.7309 – val_loss: 0.6552 – val_acc: 0.7031
Epoch 91/150
0s – loss: 0.6020 – acc: 0.7413 – val_loss: 0.6607 – val_acc: 0.7188
Epoch 92/150
0s – loss: 0.5946 – acc: 0.7500 – val_loss: 0.6442 – val_acc: 0.7083
Epoch 93/150
0s – loss: 0.5994 – acc: 0.7396 – val_loss: 0.6494 – val_acc: 0.7292
Epoch 94/150
0s – loss: 0.5981 – acc: 0.7292 – val_loss: 0.6527 – val_acc: 0.7083
Epoch 95/150
0s – loss: 0.5905 – acc: 0.7292 – val_loss: 0.6458 – val_acc: 0.7135
Epoch 96/150
0s – loss: 0.6068 – acc: 0.7153 – val_loss: 0.6397 – val_acc: 0.6979
Epoch 97/150
0s – loss: 0.6007 – acc: 0.7431 – val_loss: 0.6575 – val_acc: 0.7135
Epoch 98/150
0s – loss: 0.5993 – acc: 0.7309 – val_loss: 0.6612 – val_acc: 0.6979
Epoch 99/150
0s – loss: 0.5968 – acc: 0.7361 – val_loss: 0.6945 – val_acc: 0.6875
Epoch 100/150
0s – loss: 0.6143 – acc: 0.7153 – val_loss: 0.7618 – val_acc: 0.6719
Epoch 101/150
0s – loss: 0.6133 – acc: 0.7188 – val_loss: 0.7340 – val_acc: 0.6615
Epoch 102/150
0s – loss: 0.6163 – acc: 0.7240 – val_loss: 0.6751 – val_acc: 0.6927
Epoch 103/150
0s – loss: 0.6000 – acc: 0.7292 – val_loss: 0.6524 – val_acc: 0.7240
Epoch 104/150
0s – loss: 0.5820 – acc: 0.7483 – val_loss: 0.6641 – val_acc: 0.6979
Epoch 105/150
0s – loss: 0.5875 – acc: 0.7431 – val_loss: 0.6409 – val_acc: 0.7188
Epoch 106/150
0s – loss: 0.5877 – acc: 0.7483 – val_loss: 0.6656 – val_acc: 0.6979
Epoch 107/150
0s – loss: 0.5886 – acc: 0.7396 – val_loss: 0.6414 – val_acc: 0.7240
Epoch 108/150
0s – loss: 0.5852 – acc: 0.7517 – val_loss: 0.6776 – val_acc: 0.6927
Epoch 109/150
0s – loss: 0.5874 – acc: 0.7326 – val_loss: 0.6422 – val_acc: 0.7292
Epoch 110/150
0s – loss: 0.5895 – acc: 0.7326 – val_loss: 0.6443 – val_acc: 0.7344
Epoch 111/150
0s – loss: 0.5970 – acc: 0.7274 – val_loss: 0.6330 – val_acc: 0.7344
Epoch 112/150
0s – loss: 0.5849 – acc: 0.7448 – val_loss: 0.6344 – val_acc: 0.7396
Epoch 113/150
0s – loss: 0.5887 – acc: 0.7378 – val_loss: 0.6380 – val_acc: 0.7292
Epoch 114/150
0s – loss: 0.5780 – acc: 0.7500 – val_loss: 0.6352 – val_acc: 0.7344
Epoch 115/150
0s – loss: 0.5822 – acc: 0.7431 – val_loss: 0.6349 – val_acc: 0.7448
Epoch 116/150
0s – loss: 0.5830 – acc: 0.7535 – val_loss: 0.6630 – val_acc: 0.7240
Epoch 117/150
0s – loss: 0.5825 – acc: 0.7465 – val_loss: 0.6478 – val_acc: 0.7083
Epoch 118/150
0s – loss: 0.5763 – acc: 0.7396 – val_loss: 0.6330 – val_acc: 0.7396
Epoch 119/150
0s – loss: 0.5828 – acc: 0.7344 – val_loss: 0.6487 – val_acc: 0.7240
Epoch 120/150
0s – loss: 0.5796 – acc: 0.7448 – val_loss: 0.6265 – val_acc: 0.7240
Epoch 121/150
0s – loss: 0.5779 – acc: 0.7292 – val_loss: 0.6412 – val_acc: 0.7292
Epoch 122/150
0s – loss: 0.5806 – acc: 0.7361 – val_loss: 0.6348 – val_acc: 0.7188
Epoch 123/150
0s – loss: 0.5721 – acc: 0.7465 – val_loss: 0.6369 – val_acc: 0.7448
Epoch 124/150
0s – loss: 0.5964 – acc: 0.7309 – val_loss: 0.6450 – val_acc: 0.7396
Epoch 125/150
0s – loss: 0.6017 – acc: 0.7309 – val_loss: 0.6416 – val_acc: 0.7135
Epoch 126/150
0s – loss: 0.5778 – acc: 0.7465 – val_loss: 0.6459 – val_acc: 0.7188
Epoch 127/150
0s – loss: 0.5842 – acc: 0.7326 – val_loss: 0.6368 – val_acc: 0.7188
Epoch 128/150
0s – loss: 0.5762 – acc: 0.7396 – val_loss: 0.6342 – val_acc: 0.7240
Epoch 129/150
0s – loss: 0.5702 – acc: 0.7483 – val_loss: 0.6270 – val_acc: 0.7240
Epoch 130/150
0s – loss: 0.5782 – acc: 0.7535 – val_loss: 0.6417 – val_acc: 0.7344
Epoch 131/150
0s – loss: 0.5777 – acc: 0.7465 – val_loss: 0.6312 – val_acc: 0.7344
Epoch 132/150
0s – loss: 0.5683 – acc: 0.7431 – val_loss: 0.6350 – val_acc: 0.7135
Epoch 133/150
0s – loss: 0.5839 – acc: 0.7465 – val_loss: 0.6499 – val_acc: 0.7500
Epoch 134/150
0s – loss: 0.5690 – acc: 0.7587 – val_loss: 0.6203 – val_acc: 0.7240
Epoch 135/150
0s – loss: 0.5646 – acc: 0.7483 – val_loss: 0.6249 – val_acc: 0.7292
Epoch 136/150
0s – loss: 0.5668 – acc: 0.7517 – val_loss: 0.6174 – val_acc: 0.7240
Epoch 137/150
0s – loss: 0.5839 – acc: 0.7378 – val_loss: 0.6230 – val_acc: 0.7396
Epoch 138/150
0s – loss: 0.5701 – acc: 0.7500 – val_loss: 0.6202 – val_acc: 0.7188
Epoch 139/150
0s – loss: 0.5625 – acc: 0.7639 – val_loss: 0.6377 – val_acc: 0.7135
Epoch 140/150
0s – loss: 0.5769 – acc: 0.7465 – val_loss: 0.6280 – val_acc: 0.7188
Epoch 141/150
0s – loss: 0.5673 – acc: 0.7483 – val_loss: 0.6165 – val_acc: 0.7344
Epoch 142/150
0s – loss: 0.5766 – acc: 0.7378 – val_loss: 0.6260 – val_acc: 0.7500
Epoch 143/150
0s – loss: 0.5837 – acc: 0.7396 – val_loss: 0.6243 – val_acc: 0.7344
Epoch 144/150
0s – loss: 0.5758 – acc: 0.7465 – val_loss: 0.6236 – val_acc: 0.7344
Epoch 145/150
0s – loss: 0.5659 – acc: 0.7604 – val_loss: 0.6435 – val_acc: 0.7292
Epoch 146/150
0s – loss: 0.5703 – acc: 0.7413 – val_loss: 0.6663 – val_acc: 0.7188
Epoch 147/150
0s – loss: 0.5810 – acc: 0.7448 – val_loss: 0.6782 – val_acc: 0.7135
Epoch 148/150
0s – loss: 0.5669 – acc: 0.7500 – val_loss: 0.6448 – val_acc: 0.7240
Epoch 149/150
0s – loss: 0.5653 – acc: 0.7500 – val_loss: 0.6155 – val_acc: 0.7448
Epoch 150/150
0s – loss: 0.5763 – acc: 0.7396 – val_loss: 0.6117 – val_acc: 0.7292
In [7]:
# Exercise 3.5. Get prediction and performance
from sklearn import metrics
# evaluate the model using samples
scores = model.evaluate(X_test, y_test)
print(“\n%s: %.2f%%” % (model.metrics_names[1], \
scores[1]*100))
# get prediction
predicted=model.predict(X_test)
print(predicted[0:5])
# reshape the 2-dimension array to 1-dimension
predicted=np.reshape(predicted, -1)
# decide prediction to be 1 or 0 based probability
predicted=np.where(predicted>0.5, 1, 0)
# calculate performance report
print(metrics.classification_report(y_test, predicted, \
labels=[0,1]))
32/192 [====>…………………….] – ETA: 0s
acc: 72.92%
[[0.7468275 ]
[0.44640124]
[0.72950506]
[0.265513 ]
[0.11131728]]
precision recall f1-score support
0 0.77 0.80 0.79 119
1 0.65 0.62 0.63 73
avg / total 0.73 0.73 0.73 192