COMP5046 Assignment 1¶
Make sure you change the file name with your unikey.
Readme¶
If there is something to be noted for the user, please mention here.
If you are planning to implement a program with Object Oriented Programming style
Visualising the comparison of different results is a good way to justify your decision.
1 – Data Preprocessing¶
1.1. Download Dataset¶
In [0]:
# Code to download file into Colaboratory:
!pip install -U -q PyDrive
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
id = ‘1vF3FqgBC1Y-RPefeVmY8zetdZG1jmHzT’
downloaded = drive.CreateFile({‘id’:id})
downloaded.GetContentFile(‘imdb_train.csv’)
id = ‘1XhaV8YMuQeSwozQww8PeyiWMJfia13G6’
downloaded = drive.CreateFile({‘id’:id})
downloaded.GetContentFile(‘imdb_test.csv’)
import pandas as pd
df_train = pd.read_csv(“imdb_train.csv”)
df_test = pd.read_csv(“imdb_test.csv”)
reviews_train = df_train[‘review’].tolist()
sentiments_train = df_train[‘sentiment’].tolist()
reviews_test = df_test[‘review’].tolist()
sentiments_test = df_test[‘sentiment’].tolist()
print(“Training set number:”,len(reviews_train))
print(“Testing set number:”,len(reviews_test))
1.2. Preprocess data¶
You are required to describe which data preprocessing techniques were conducted with justification of your decision.
For questions, I implement five steps to preprocess them. The first step is to decapitalize the text. Because the first letter the first word of a sentence is usually capitalized. If using the origin text for further model, these first-letter-capitalized words will be recognized as different ones from the decapitalized words. The second step is to remove punctuations. Punctuations have no actual meanings. The third step is to tokenize the question. I need to get a list of individual tokens in the reviews. The forth step is to remove stopwords. Because stopwords often don’t really important to the main meaning of the sentences. So removing stopwords is a method to extract the core parts from the sentences. The last step is lemmatisation. This step is implemented to avoid unnecessary recognition. For sentiment labels, I encode the unique labels into numeric variables.
In [0]:
# Please comment your code
import re
import nltk
nltk.download(‘punkt’)
from nltk.tokenize import word_tokenize
nltk.download(‘stopwords’)
from nltk.corpus import stopwords as sw
nltk.download(‘wordnet’)
from nltk.stem import WordNetLemmatizer
# Remove punctuations
def remove_punctuation_re(x):
x = re.sub(r'[^\w\s]’,”,x)
return x
reviews_train = [remove_punctuation_re(s) for s in reviews_train]
reviews_test = [remove_punctuation_re(s) for s in reviews_test]
# Case folding (Decapitalization)
reviews_train = [s.lower() for s in reviews_train]
reviews_test = [s.lower() for s in reviews_test]
# Tokenization
reviews_train = [word_tokenize(s) for s in reviews_train]
reviews_test = [word_tokenize(s) for s in reviews_test]
# Remove stopwords
stop_words = sw.words()
reviews_train_ns=[]
for tokens in reviews_train:
filtered_sentence = [w for w in tokens if not w in stop_words]
reviews_train_ns.append(filtered_sentence)
reviews_test_ns=[]
for tokens in reviews_test:
filtered_sentence = [w for w in tokens if not w in stop_words]
reviews_test_ns.append(filtered_sentence)
# Lemmatisation
lemmatizer = WordNetLemmatizer()
reviews_train_le = []
for tokens in reviews_train_ns:
lemma_sentence = [lemmatizer.lemmatize(w) for w in tokens ]
reviews_train_le.append(lemma_sentence)
reviews_test_le = []
for tokens in reviews_test_ns:
lemma_sentence = [lemmatizer.lemmatize(w) for w in tokens ]
reviews_test_le.append(lemma_sentence)
# Label encoding
from sklearn.preprocessing import LabelEncoder
import numpy as np
labels = np.unique(sentiments_train)
lEnc = LabelEncoder()
lEnc.fit(labels)
label_train_n = lEnc.transform(sentiments_train)
label_test_n = lEnc.transform(sentiments_test)
numClass = len(labels)
print(labels)
print(lEnc.transform(labels))
In [0]:
del df_train, df_test, reviews_train, reviews_test, reviews_train_ns, reviews_test_ns, sentiments_train, sentiments_test
In [0]:
2 – Model Implementation¶
2.1. Word Embeddings¶
You are required to describe which model was implemented (i.e. Word2Vec with CBOW, FastText with SkipGram, etc.) with justification of your decision
In these section, I choose to implement Word2Vec with SkipGram via Pytorch. First of all, although FastText can train the model faster and can deal with words which are not in the training vocabulary dictionary, it will loss the sequence information between words. So I choose Word2Vec. Secondly, as we know, CBOW is learning to predict the word by the context. Or maximize the probability of the target word by looking at the context. And this happens to be a problem for rare words. For example, given the context “yesterday was a really […] day” CBOW model will tell you that most probably the word is “beautiful” or “nice”. Words like ”delightful“ will get much less attention of the model, because it is designed to predict the most probable word. This word will be smoothed over a lot of examples with more frequent words. On the other hand, the skip-gram model is designed to predict the context. Given the word “delightful” it must understand it and tell us that there is a huge probability that the context is “yesterday was really […] day”, or some other relevant context. With skip-gram the word ”delightful“ will not try to compete with the word ”beautiful“ but instead, ”delightful”+context pairs will be treated as new observations. As the training datasets of this project are not really big and there are so many rare words in the datasets (the word frequency distribution is shown as the figure below), SkipGram may perform better than CBOW.
2.1.1. Data Preprocessing for Word Embeddings¶
You are required to describe which preprocessing techniques were used with justification of your decision.
Important: If you are going to use the code from lab3 word2vec preprocessing. Please note that word_list = list(set(word_list)) has randomness. So to make sure the word_list is the same every time you run it, you can put word_list.sort() after that line of code.
In this section, we use frequency distribution plot to visulize the frequent word and remove the less meaningful word to make the model more stable and reduce the running time.
In [0]:
# Please comment your code
# Plot unique word frequncy distribution
import matplotlib
import matplotlib.pyplot as plt
dict1 = {}
for sentence in reviews_train_le:
for key in sentence:
dict1[key] = dict1.get(key, 0) + 1
for sentence in reviews_test_le:
for key in sentence:
dict1[key] = dict1.get(key, 0) + 1
print (len(dict1))
freq = []
for key, values in dict1.items():
freq.append(values)
print(freq)
plt.figure(0)
plt.xlabel(‘Word Frequency’)
plt.ylabel(‘Number of Unique Words’)
plt.title(‘Word Frequency Distribution’)
plt.hist(freq, bins = 20)
freq_100 = []
for i in freq:
if i <= 100:
freq_100.append(i)
plt.figure(1)
plt.xlabel('Word Frequency')
plt.ylabel('Number of Unique Words')
plt.title('Word Frequency (Less than 100) Distribution')
plt.hist(freq_100, bins = 20)
In [0]:
del freq, freq_100
In [0]:
# Remove less frequent words
reviews_train_clean = []
reviews_test_clean = []
for tokens in reviews_train_le:
temp = []
for token in tokens:
if dict1[token] >= 6:
temp.append(token)
reviews_train_clean.append(temp)
for tokens in reviews_test_le:
temp = []
for token in tokens:
if dict1[token] >= 6:
temp.append(token)
reviews_test_clean.append(temp)
In [0]:
del reviews_train_le, reviews_test_le, dict1
In [0]:
import numpy as np
# Get the word sequence list
word_sequence = []
for tokens in reviews_train_clean:
for token in tokens:
word_sequence.append(token)
for tokens in reviews_test_clean:
for token in tokens:
word_sequence.append(token)
# Get a vocabulary list for the unique words
word_list = sorted(list(set(word_sequence)))
voc_size = len(word_list)
print(voc_size)
# Make dictionary so that we can be reference each index of unique word
word_dict = {w: i for i, w in enumerate(word_list)}
# Making window size 1 skip-gram
# i.e.) he likes cat
# -> (he, [likes]), (likes,[he, cat]), (cat,[likes])
# -> (he, likes), (likes, he), (likes, cat), (cat, likes)
skip_grams = []
for i in range(1, len(word_sequence) – 1):
# (context, target) : ([target index – 1, target index + 1], target)
target = word_dict[word_sequence[i]]
context = [word_dict[word_sequence[i – 1]], word_dict[word_sequence[i + 1]]]
# skipgrams – (target, context[0]), (target, context[1])..
for w in context:
skip_grams.append([target, w])
In [0]:
del word_sequence
In [0]:
# prepare random batch from skip-gram – we do not have enought data so we randomly select data
def prepare_batch(data, size):
random_inputs = []
random_labels = []
random_index = np.random.choice(range(len(data)), size, replace=False)
for i in random_index:
input_temp = [0]*voc_size
input_temp[data[i][0]] = 1
random_inputs.append(input_temp) # target
random_labels.append(data[i][1]) # context word
return np.array(random_inputs), np.array(random_labels)
2.1.2. Build Word Embeddings Model¶
You are required to describe how hyperparameters were decided with justification of your decision.
I set learning_rate to be 0.01. It will not be good for gradient decent process if the learning rate is too large. And if it is too small, it will take a long time for training or being trapped into local minimum. The batch_size is better be 2^n. After experiment, I choose 1024 here. If the batch size is too small, the model will not get proper training. On the other hand, it will take a long time for training if the batch size is too large. The embedding_size is better be in the range of 50 to 300. Because there is a large mount of words in the vocabulary and the RAM is limited, I choose 50 here. According to the figure bellow, when the Epoch number is larger than 90, the cost will not change too much. So I choose 90 to be the total number of Epochs.
In [0]:
# Please comment your code
# Hyperparameters we set
word_learning_rate = 0.005
batch_size =1024
embedding_size = 20
word_epochs = 30
In [0]:
#imprt the package for word Eembeddings Model and set the running time with cpu
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from sklearn.metrics import accuracy_score
device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)
In [0]:
#SkipGram model for wordvec2
class SkipGram(nn.Module):
def __init__(self):
super(SkipGram, self).__init__()
self.linear1 = nn.Linear(voc_size, embedding_size, bias=False)
self.linear2 = nn.Linear(embedding_size, voc_size, bias=False)
def forward(self, x):
hidden = self.linear1(x)
out = self.linear2(hidden)
return out
word_emb_model = SkipGram().to(device)
criterion = nn.CrossEntropyLoss()
optimiser = optim.Adam(word_emb_model.parameters(), lr=word_learning_rate)
epoch_num = []
loss_num = []
2.1.3. Train Word Embeddings Model¶
In [0]:
for epoch in range(word_epochs):
inputs,embedding_labels = prepare_batch(skip_grams, batch_size)
inputs_torch = torch.from_numpy(inputs).float().to(device)
labels_torch = torch.from_numpy(embedding_labels).to(device)
# 1. zero grad
word_emb_model.train()
optimiser.zero_grad()
# 2. forword propagation
outputs = word_emb_model(inputs_torch)
# 3. calculate loss
loss = criterion(outputs, labels_torch)
# 4. back propagation
loss.backward()
optimiser.step()
# if epoch % display_interval == display_interval – 1:
print(‘Epoch: %d, loss: %.4f’ %(epoch + 1, loss))
epoch_num.append(epoch)
loss_num.append(loss)
plt.xlabel(‘Number of Epochs’)
plt.ylabel(‘Loss’)
plt.title(‘Word Embeddings’)
plt.plot(epoch_num, loss_num)
plt.show()
2.1.4. Save Word Embeddings Model¶
In [0]:
# Please comment your code
torch.save(word_emb_model, ‘word_emb_model.pt’)
In [0]:
del word_emb_model
2.1.5. Load Word Embeddings Model¶
In [0]:
# Please comment your code
word_emb_model = torch.load(‘word_emb_model.pt’)
word_emb_model.eval()
# Get word embedding list
weight1 = word_emb_model.linear1.weight
word_embeddings = weight1.detach().T.cpu().numpy()
print(word_embeddings.shape)
In [0]:
def get_embeddings(corpus, word_embeddings):
out = {}
for i in corpus:
out[i] = word_embeddings[word_dict[i]].tolist()
return out
In [0]:
word_emb_dict = get_embeddings(word_list, word_embeddings)
In [0]:
del word_embeddings, weight1, word_emb_model
In [0]:
len(word_emb_dict)
2.2. Character Embeddings¶
2.2.1. Data Preprocessing for Character Embeddings¶
You are required to describe which preprocessing techniques were used with justification of your decision.
In [0]:
# Please comment your code
word_len_list = [len(s) for s in word_list]
print(max(word_len_list))
print(sum([1 for l in word_len_list if l<=15])/len(word_len_list))
plt.hist(word_len_list, bins = 20)
In [0]:
# Please comment your code
# We have the following character instances
word_len = 15
char_list = []
word_list_pad = []
for word in word_list:
if len(word) > word_len:
word_list_pad.append(word[:word_len])
else:
for j in range(word_len-len(word)):
word += “@”
word_list_pad.append(word)
for word in word_list_pad:
for char in word:
char_list.append(char)
char_arr = sorted(list(set(char_list)))
# one-hot encoding and decoding
num_dic = {n: i for i, n in enumerate(char_arr)}
dic_len = len(num_dic)
# Make a batch to have sequence data for input and ouput
def make_batch(seq_data):
input_batch = []
target_batch = []
for seq in seq_data:
input_data = [num_dic[n] for n in seq]
target = word_emb_dict[word_list[seq_data.index(seq)]]
# convert input to one-hot encoding.
# if input is [3, 4, 4]:
# [[ 0, 0, 0, 1, 0, 0, 0, … 0]
# [ 0, 0, 0, 0, 1, 0, 0, … 0]
# [ 0, 0, 0, 0, 1, 0, 0, … 0]]
input_batch.append(np.eye(dic_len)[input_data])
target_batch.append(target)
return input_batch, target_batch
def char_batch(input, target, size):
random_inputs = []
random_labels = []
random_index = np.random.choice(range(len(input)), size, replace=False)
for i in random_index:
random_inputs.append(input[i])
random_labels.append(target[i])
return random_inputs, random_labels
2.2.2. Build Character Embeddings Model¶
You are required to describe how hyperparameters were decided with justification of your decision.
In [0]:
# Please comment your code
# Hyperparameters
char_learning_rate = 0.1
char_batch_size = 1024
n_hidden = 20
char_epochs = 60
# Number of sequences for RNN
n_step = 3
# number of inputs (dimension of input vector)
n_input = dic_len
# number of classes
n_class = 20
2.1.4. Train Character Embeddings Model¶
In [0]:
# Please comment your code
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.lstm = nn.LSTM(n_input, n_hidden, batch_first =True, bidirectional=True, dropout=0.2)
self.linear = nn.Linear(n_hidden*2, n_class)
def forward(self, sentence):
#h_n of shape (num_layers * num_directions, batch, hidden_size):tensor containing the hidden state for t = seq_len.
lstm_out, (h_n,c_n) = self.lstm(sentence)
#concat the last hidden state from two direction
hidden_out =torch.cat((h_n[0,:,:],h_n[1,:,:]),1)
z = self.linear(hidden_out)
return z,hidden_out
# Move the model to GPU
char_emb_model = Net().to(device)
# Loss function and optimizer
char_criterion = nn.MSELoss()
chat_optimizer = optim.Adam(char_emb_model.parameters(), lr=char_learning_rate)
epoch_num = []
loss_num = []
In [0]:
# Preparing input
char_input, char_target = make_batch(word_list_pad)
In [0]:
del word_list_pad
In [0]:
for epoch in range(char_epochs):
char_input_batch, char_target_batch = char_batch(char_input, char_target, char_batch_size)
# Convert input into tensors and move them to GPU by using tensor.to(device)
char_input_batch_torch = torch.from_numpy(np.array(char_input_batch)).float().to(device)
char_target_batch_torch = torch.from_numpy(np.array(char_target_batch)).to(device)
char_target_batch_torch = char_target_batch_torch.type(torch.DoubleTensor)
# Set the flag to training
char_emb_model.train()
# forward + backward + optimize
outputs,_ = char_emb_model(char_input_batch_torch)
outputs = outputs.type(torch.DoubleTensor)
loss = char_criterion(outputs, char_target_batch_torch)
loss.backward()
chat_optimizer.step()
chat_optimizer.zero_grad()
# Set the flag to evaluation, which will ‘turn off’ the dropout
char_emb_model.eval()
outputs,_ = char_emb_model(char_input_batch_torch)
# Evaluation loss and accuracy calculation
loss = char_criterion(outputs, char_target_batch_torch.cuda())
_, predicted = torch.max(outputs, 1)
print(‘Epoch: %d, loss: %.5f’ %(epoch + 1, loss.item()))
epoch_num.append(epoch)
loss_num.append(loss.float())
plt.xlabel(‘Number of Epochs’)
plt.ylabel(‘Loss’)
plt.title(‘Word Embeddings’)
plt.plot(epoch_num, loss_num)
plt.show()
2.1.5. Save Character Embeddings Model¶
In [0]:
# Please comment your code
torch.save(char_emb_model, ‘char_emb_model.pt’)
In [0]:
del char_emb_model
2.1.6. Load Character Embeddings Model¶
In [0]:
# Please comment your code
char_emb_model = torch.load(‘char_emb_model.pt’)
char_emb_model.eval()
# Get character embedding list
char_input = torch.from_numpy(np.array(char_input)).float().to(device)
_, hidden_state = char_emb_model(char_input)
char_embeddings = hidden_state.detach().cpu().numpy()
print(char_embeddings.shape)
In [0]:
char_emb_dict = get_embeddings(word_list, char_embeddings)
In [0]:
del char_embeddings
In [0]:
len(char_emb_dict)
2.3. Sequence model¶
2.3.1. Apply/Import Word Embedding and Character Embedding Model¶
You are required to describe how hyperparameters were decided with justification of your decision.
In [0]:
# Plot sequence length distribution
len_list = [len(s) for s in reviews_train_clean]
print(sum([1 for l in len_list if l<=400])/len(len_list))
plt.hist(len_list, bins = 20)
In [0]:
# Padding
seq_length = 300
def add_padding(corpus, seq_length):
output = []
for sentence in corpus:
if len(sentence)>seq_length:
output.append(sentence[:seq_length])
else:
for j in range(seq_length-len(sentence)):
sentence.append(‘
output.append(sentence)
return output
reviews_train_pad = add_padding(reviews_train_clean,seq_length )
reviews_test_pad = add_padding(reviews_test_clean,seq_length )
In [0]:
del reviews_train_clean, reviews_test_clean
In [0]:
# Add embeddings for ‘
word_emb_dict[‘
char_emb_dict[‘
# Get embeddings
def get_seq_embeddings(corpus, word_emb_dict, char_emb_dict):
out = []
for sentence in corpus:
out_temp = []
for word in sentence:
out_temp.append(word_emb_dict[word] + char_emb_dict[word])
out.append(out_temp)
return np.array(out)
In [0]:
# Generate embeddings for training set
train_seq_emb = get_seq_embeddings(reviews_train_pad, word_emb_dict, char_emb_dict)
In [0]:
del reviews_train_pad
In [0]:
# Generate embeddings for test set
test_seq_emb = get_seq_embeddings(reviews_test_pad, word_emb_dict, char_emb_dict)
In [0]:
del reviews_test_pad, word_emb_dict, char_emb_dict
2.3.2. Build Sequence Model¶
You are required to describe how hyperparameters were decided with justification of your decision.
In [0]:
# Please comment your code
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from sklearn.metrics import accuracy_score
device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)
In [0]:
# prepare random batch
def seq_batch(data, label, size):
random_inputs = []
random_labels = []
random_index = np.random.choice(range(len(data)), size, replace=False)
for i in random_index:
random_inputs.append(data[i]) # target
random_labels.append(label[i]) # context word
return np.array(random_inputs), np.array(random_labels)
In [0]:
# Hyperparameters
n_input =2*embedding_size
n_hidden = 128
n_class = numClass
seq_epochs = 100
seq_learning_rate = 4e-4
seq_batch_size =512
In [0]:
In [0]:
In [0]:
2.3.3. Train Sequence Model¶
In [0]:
# Please comment your code
class SeqNet(nn.Module):
def __init__(self):
super(SeqNet, self).__init__()
self.lstm = nn.LSTM(n_input, n_hidden, num_layers=2, batch_first=True, bidirectional=True, dropout=0.2)
self.linear = nn.Linear(n_hidden*2, n_class)
def forward(self, sentence):
#h_n of shape (num_layers * num_directions, batch, hidden_size):
lstm_out, (h_n,c_n) = self.lstm(sentence)
#concat the last hidden state from two direction
hidden_out =torch.cat((h_n[0,:,:],h_n[1,:,:]),1)
z = self.linear(hidden_out)
log_output = F.log_softmax(z, dim=1)
return log_output, hidden_out
seq_model = SeqNet().to(device)
seq_criterion = nn.NLLLoss()
seq_optimizer = optim.Adam(seq_model.parameters(), lr=seq_learning_rate)
In [0]:
from sklearn.metrics import f1_score
epoch_num = []
f1_val = []
for epoch in range(seq_epochs):
seq_input_batch_torch, seq_target_batch_torch = seq_batch(train_seq_emb, label_train_n, seq_batch_size)
seq_input_batch_torch = torch.from_numpy(seq_input_batch_torch).float().to(device)
seq_target_batch_torch = torch.from_numpy(seq_target_batch_torch).view(-1).to(device)
seq_model.train()
outputs,_ = seq_model(seq_input_batch_torch)
loss = seq_criterion(outputs, seq_target_batch_torch)
loss.backward()
seq_optimizer.step()
seq_optimizer.zero_grad()
seq_model.eval()
outputs,_ = seq_model(seq_input_batch_torch)
if epoch%10 == 9:
loss = seq_criterion(outputs, seq_target_batch_torch)
_, predicted = torch.max(outputs, 1)
acc= accuracy_score(predicted.cpu().numpy(),seq_target_batch_torch.cpu().numpy())
print(‘Epoch: %d, loss: %.5f, train_acc: %.2f’ %(epoch + 1, loss.item(), acc))
if epoch%100 == 99:
test_input_torch = torch.from_numpy(np.array(test_seq_emb)).float().to(device)
test_target_torch = torch.from_numpy(label_test_n).float().to(device)
preds = torch.empty(25000)
for test_batch in range(10):
test_input_batch_torch = test_input_torch[2500*test_batch:2500*(test_batch+1)]
test_target_batch_torch = test_target_torch[2500*test_batch:2500*(test_batch+1)]
outputs,_ = seq_model(test_input_batch_torch)
_, predicted = torch.max(outputs, 1)
preds[2500*test_batch:2500*(test_batch+1)] = predicted
f1 = f1_score(np.array(label_test_n), preds.cpu().numpy(), average=’weighted’)
epoch_num.append(epoch)
f1_val.append(f1)
print(‘Finished Training’)
In [0]:
2.3.4. Save Sequence Model¶
In [0]:
# Please comment your code
torch.save(seq_net, ‘seq_net.pt’)
2.3.5. Load Sequence Model¶
In [0]:
# Please comment your code
seq_model = torch.load(‘seq_net.pt’)
seq_model.eval()
3 – Evaluation¶
(Please show your empirical evidence)
3.1. Performance Evaluation¶
You are required to provide the table with precision, recall, f1 of test set.
In [0]:
# Please comment your code
test_input_torch = torch.from_numpy(np.array(test_seq_emb)).float().to(d
evice)
test_target_torch = torch.from_numpy(label_test_n).float().to(device)
preds = torch.empty(25000)
for test_batch in range(10):
test_input_batch_torch = test_input_torch[2500*test_batch:2500*(test_batch+1)]
test_target_batch_torch = test_target_torch[2500*test_batch:2500*(test_batch+1)]
outputs,_ = seq_model(test_input_batch_torch)
predicted = torch.max(outputs, 1)
preds[2500*test_batch:2500*(test_batch+1)] = predicted
from sklearn.metrics import classification_report
print(classification_report(np.array(label_test_n), preds.cpu().numpy(),digits=4))
print(‘Weighted average f1-score:’ %.4f %f1)
3.2. Hyperparameter Testing¶
You are required to draw a graph(y-axis: f1, x-axis: epoch) for test set and explain the optimal number of epochs based on the learning rate you have already chosen.
In [0]:
# Please comment your code
Object Oriented Programming codes here¶
You can use multiple code snippets. Just add more if needed
In [0]:
# If you used OOP style, use this section