程序代写代做 html go Keras CS 570: Machine Learning

CS 570: Machine Learning
Problem Set 5: Due April 19
For this assignment, we’ll revisit the SPAM Email dataset with Keras.
1. (5 points) First Steps: Begin by reading the Keras Text: Word Embeddings tutorial1. The tutorial demonstrates two main ideas: (1) creating a simple model that uses word embeddings for classification; and (2) acquiring these embeddings (potentially for fu- ture use). Run, in your own notebook, the code from the word embeddings tutorial (you don’t have to do the last step of visualizing the embeddings, but feel free to try it out.)
2. (5 points) Next Step: Next, read through the Keras blog post on using Convolutional Neural Networks for text classification2. You may need to do some minor work to make the tutorial run (full disclosure: I have not done this step, I directly did the next problem).
3. (5 points) Load in the text from the SPAM data set, create a histogram showing the distribution of message lengths. Additionally, print the maximum, mean, and median message lengths. Note that the maximum message length is quite long.
4. (10 points) Convolutional Neural Network: Use the architecture from question 2 (the CNN) to train on the spam detection problem. You’ll need to pick a length for the representation of the document, use 2x the mean document length (in words, rounded to the nearest 1000). Use the 50 dimension Glove embeddings, and turn training off on the embedding layer (as in the tutorial). Document your performance after 10 epochs.
5. (10 points) Compare the performance of the CNN to a simple network that just uses the Glove embeddings. For this step, you’ll need to use an architecture that has simi- larities to both the CNN and the architecture for question 1. Specifically, you’ll need an input layer and embedding layer as in question 4, but then these should go into a GlobalAveragePooling1D layer (as in question 1) followed by two Dense layers. The Dense layers should have a size of 50 and then 1.
6. (5 points) Performance on Q4 was probably not ideal. Mine wasn’t! Try dropping the size of the padding from 2x the mean document length to something notably smaller, or try modifying the CNN model. Can you achieve 94 percent or better on the test data? Provide an explanation for why you think your changes were important.
Submit your code via Blackboard.
1https://www.tensorflow.org/tutorials/text/word embeddings
2 https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html
Problem Set 5: Due April 19 – page 1 of 1