Machine Learning I
Machine Learning II
Lecture 13 – Unsupervised learning and DNN
1
1
Unsupervised learning
What is unsupervised learning?
Unsupervised learning is a branch of machine learning that learns from data that has not been labeled, classified or categorized.
Example 1: A company has hired a data scientist. Here is what they asked him: “Please use all the data we have about our customers and tell us the insights about our customers, which we don’t know. We really want to use data science to improve our business.”
Example 2: A bank wants to divide its customers so that they can recommend the right products to them.
www.analyticsvidhya.com
2
Applications
Recommendation system: By learning the users’ purchase history, a clustering model can segment users by similarities, helping you find like-minded users or related products.
Biology: sequence clustering algorithms attempt to group biological sequences that are somehow related. Proteins were clustered according to their amino acid content.
Image or video clustering analysis to divide them groups based on similarities.
In a medical database, each patient may have a distinct real-valued measure for specific tests (e.g., glucose, cholesterol). Clustering patients first may help us understand how binning should be done on real-valued features to reduce feature sparsity and improve accuracy on classification tasks such as survival prediction of cancer patients.
General use case, generating a compact summary of data for classification, pattern discovery, hypothesis generation and testing.
www.analyticsvidhya.com
3
K-Means
www.analyticsvidhya.com
For start, let’s have a review on how we handled an unsupervised learning model using K-means
This gives an accuracy of 53%
from sklearn.cluster import KMeans
from keras.datasets import mnist
import numpy as np
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x = np.concatenate((x_train, x_test))
y = np.concatenate((y_train, y_test))
x = x.reshape((x.shape[0], -1))
x = np.divide(x, 255.)
# 10 clusters
n_clusters = 10
# Runs in parallel 4 CPUs
kmeans = KMeans(n_clusters=n_clusters, n_init=20, n_jobs=4)
# Train K-Means.
y_pred_kmeans = kmeans.fit_predict(x)
# Evaluate the K-Means clustering accuracy.
metrics.acc(y, y_pred_kmeans)
4
Autoencoders
https://blog.keras.io/building-autoencoders-in-keras.html
“Autoencoding” is a data compression algorithm where the compression and decompression functions are
1) data-specific: which means that they will only be able to compress data similar to what they have been trained on. This is different from, say, the MPEG-2 Audio Layer III (MP3) compression algorithm, which only holds assumptions about “sound” in general, but not about specific types of sounds. An autoencoder trained on pictures of faces would do a rather poor job of compressing pictures of trees, because the features it would learn would be face-specific.
2) lossy: which means that the decompressed outputs will be degraded compared to the original inputs (similar to MP3 or JPEG compression). This differs from lossless arithmetic compression
3) learned automatically from examples rather than engineered by a human.
5
Autoencoders
https://blog.keras.io/building-autoencoders-in-keras.html
To build an autoencoder, you need three things:
An encoding function,
A decoding function,
A distance function (“loss” function)
An autoencoder is a type of artificial neural network used to learn efficient data codings in an unsupervised manner. The aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore signal “noise”- Wikipedia.
6
MNIST example
https://blog.keras.io/building-autoencoders-in-keras.html
Make the autoencoder:
# Design autoencoder and decoder
from keras.layers import Input, Dense
from keras.models import Model
from keras.datasets import mnist
import numpy as np
# this is the size of our encoded representations
encoding_dim = 32
# 32 floats -> compression of factor 24.5, assuming the input is 784 floats # this is our input placeholder
input_img = Input(shape=(784,))
# “encoded” is the encoded representation of the input
encoded = Dense(encoding_dim, activation=’relu’)(input_img)
# “decoded” is the lossy reconstruction of the input
decoded = Dense(784, activation=’sigmoid’)(encoded)
# this model maps an input to its reconstruction
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer=’adam’, loss=’binary_crossentropy’)
7
MNIST example
https://blog.keras.io/building-autoencoders-in-keras.html
Evaluate the model
# After 50 epochs, the autoencoder seems to reach a stable train/test loss value of about 0.09
# We can try to visualize the reconstructed inputs and the encoded representations. We will use Matplotlib.
# encode and decode some digits
# note that we take them from the *test* set
encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)
# use Matplotlib
import matplotlib.pyplot as plt
n = 10
# how many digits we will display
plt.figure(figsize=(20, 4))
for i in range(n):
# display original
ax = plt.subplot(2, n, i + 1)
plt.imshow(x_test[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# display reconstruction
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(decoded_imgs[i].reshape(28, 28))
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
8
MNIST example: Deeper model
https://blog.keras.io/building-autoencoders-in-keras.html
Let’s have more autoencoder layers
# deeper model
input_img = Input(shape=(784,))
encoded = Dense(128, activation=’relu’)(input_img)
encoded = Dense(64, activation=’relu’)(encoded)
encoded = Dense(32, activation=’relu’)(encoded)
decoded = Dense(64, activation=’relu’)(encoded)
decoded = Dense(128, activation=’relu’)(decoded)
decoded = Dense(784, activation=’sigmoid’)(decoded)
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer=’adam’, loss=’binary_crossentropy’)
autoencoder.fit(x_train, x_train, epochs=50, batch_size=256, shuffle=True, validation_data=(x_test, x_test))
# encode and decode some digits
# note that we take them from the *test* set
encoded_imgs = encoder.predict(x_test)
decoded_imgs = decoder.predict(encoded_imgs)
9
Ensemble model
https://www.dlology.com/blog/how-to-do-unsupervised-clustering-with-keras/
One other approach is to use encoder as the input of the Kmeans model
A t-test can later find out the closeness to centroid point
10
Regression
https://www.kaggle.com/c/house-prices-advanced-regression-techniques/
We will work on the problem of regression using a Kaggle example
House Prices: Advanced Regression Techniques
Ask a home buyer to describe their dream house, and they probably won’t begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition’s dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.
With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.
11
Example- House Prices
https://www.kaggle.com/c/house-prices-advanced-regression-techniques/
SalePrice – the property’s sale price in dollars. This is the target variable that you’re trying to predict.
MSSubClass: The building class
MSZoning: The general zoning classification
LotFrontage: Linear feet of street connected to property
LotArea: Lot size in square feet
Street: Type of road access
Alley: Type of alley access
LotShape: General shape of property
Condition2: Proximity to main road or railroad (if a second is present)
HouseStyle: Style of dwelling
YearBuilt: Original construction date
YearRemodAdd: Remodel date
Electrical: Electrical system
SaleCondition: Condition of sale
…
12
Example- House Prices
First we need to read the data into Python and drop the na values.
We can use describe to see more about data
If we investigate data more closely, we will see that:
Number of numerical columns with no nan values : 25
Number of nun-numerical columns with no nan values : 20
13
Example- Numerical columns distributions
14
Variables correlation
How many of features are correlated ?
Looks like 15
15
DNN model
Now is the time to design the model
Use ‘relu’ as the activation function for the hidden layers
Use a ‘normal’ initializer as the kernal_intializer
Define the output layer with only one node
Use ‘linear ’as the activation function for the output layer
16
from keras import Sequential
NN_model = Sequential()
# The Input Layer :
NN_model.add(Dense(128, kernel_initializer=’normal’,input_dim = train.shape[1], activation=’relu’))
# The Hidden Layers :
NN_model.add(Dense(256, kernel_initializer=’normal’,activation=’relu’))
NN_model.add(Dense(256, kernel_initializer=’normal’,activation=’relu’))
NN_model.add(Dense(256, kernel_initializer=’normal’,activation=’relu’))
# The Output Layer :
NN_model.add(Dense(1, kernel_initializer=’normal’,activation=’linear’))
# Compile the network :
NN_model.compile(loss=’mean_absolute_error’, optimizer=’adam’, metrics=[‘mean_absolute_error’]) NN_model.summary()
DNN model- Save the model
This code starts to save the best model while it is trained
The result will look like
Now we can choose the model with least loss. Remember the format of the file Weights-{epoch}–{val_loss}.hdf5
17
from keras.callbacks import ModelCheckpoint
# Define a checkpoint to save the data
checkpoint_name = ‘address/to/file/Weights-{epoch:03d}–{val_loss:.5f}.hdf5′
checkpoint = ModelCheckpoint(checkpoint_name, monitor=’val_loss’, verbose = 1, save_best_only = True, mode =’auto’)
callbacks_list = [checkpoint]
# Train the model
NN_model.fit(train, target, epochs=500, batch_size=32, validation_split = 0.2, callbacks=callbacks_list)
DNN model- Load the best model
Let’s load the best model which appears to be: Weights-180–17805.73619.hdf5
To start predictions:
18
# Load wights file of the best model :
wights_file = ‘address/to/file/Weights-180–17805.73619.hdf5′
# choose the best checkpoint
NN_model.load_weights(wights_file)
# load it
NN_model.compile(loss=’mean_absolute_error’, optimizer=’adam’, metrics=[‘mean_absolute_error’])
# Make predictions
predictions = NN_model.predict(test)
Analyze the Loss
We can analyze the loss to see how it is changing.
It seems that around epoch 200 we were done with designing the model. How can we stop the model from over training
19
plt.style.use(‘ggplot’)
def plot_history(history):
loss = history.history[‘loss’]
val_loss = history.history[‘val_loss’]
x = range(1, len(loss) + 1)
plt.figure(figsize=(12, 5))
plt.plot(x, loss, ‘b’, label=’Training loss’)
plt.plot(x, val_loss, ‘r’, label=’Validation loss’)
plt.title(‘Training and validation loss’)
plt.legend()
plt.show()
plot_history(hist)
Early stopping
We can use following code for early stopping
We see that within 79 epochs we are done with model
20
early_stop = keras.callbacks.EarlyStopping(monitor=’val_loss’, patience= 30)
callbacks_list= ModelCheckpoint(‘Models/Weights-{epoch:03d}–{val_loss:.5f}.hdf5′, monitor=’val_loss’, save_best_only = True)
callbacks = [early_stop, callbacks_list]
hist = NN_model.fit(train, target, epochs=500, batch_size=32, validation_split = 0.2, callbacks=callbacks)
What else can we control?
ReduceLROnPlateau
factor: factor by which the learning rate will be reduced. new_lr = lr * factor
patience: number of epochs that produced the monitored quantity with no improvement after which training will be stopped.
mode: one of {auto, min, max}. In min mode, lr will be reduced when the quantity monitored has stopped decreasing; in max mode it will be reduced when the quantity monitored has stopped increasing; in auto mode, the direction is automatically inferred from the name of the monitored quantity.
min_delta: threshold for measuring the new optimum, to only focus on significant changes.
cooldown: number of epochs to wait before resuming normal operation after lr has been reduced.
min_lr: lower bound on the learning rate.
CSVLogger
Callback that streams epoch results to a csv file.
21
keras.callbacks.callbacks.ReduceLROnPlateau(monitor=’val_loss’, factor=0.1, patience=10, verbose=0, mode=’auto’, min_delta=0.0001, cooldown=0, min_lr=0)
keras.callbacks.callbacks.CSVLogger(filename, separator=’,’, append=False)
Assignment 7
We had a argument named patience in early stopping code. Change this number to 20 and see how number of epochs is changed. What is this patience based on your findings?
We trained a regression model but we did not analyze the performance of the model. Suggest a method to see the performance of the model. Note that since we are not dealing with categorical data, we cannot report the accuracy.
22
Note- You must submit code along with your answers
/docProps/thumbnail.jpeg