INM432 Big Data
Coursework Part 2
Machine Learning in the Cloud
This coursework is about using the Google Cloud for machine learning with the CloudML framework. This coursework can be done individually or in pairs (recommended).
Preparation:
Run the following Could ML example:
https://cloud.google.com/blog/products/gcp/how-to-classify-images-with-tensorflow-using- google-cloud-machine-learning-and-cloud-dataflow
The relevant code is available here:
https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/flowers
It is an application of the well-known inception model to a relatively small ¡°Flowers¡± dataset (3600 images, 5 classes).
Task 1: (40%) Modify the source code so that you use a larger dataset. You can use the Coastline dataset, that is available on the Google cloud ( gs://tamucc_coastline/ ~10000 images) with some explanations here: https://codelabs.developers.google.com/codelabs/scd- coastline/index.html?index=..%2F..cloud-quest-scientific-data#2
If you feel adventurous, you can use the Cartoon Set (100,000 images with labels https://google.github.io/cartoonset/download.html ), or part of the extended OpenImages dataset (total 478,000 images across 6,000+ categories, available at https://storage.googleapis.com/openimages/web/extended.html )
Task 2: (30%) Modify the project, so that you use different server/cluster configurations with and without GPUs. Take the time for training and document the development of metrics during the training using Tensorboard.
Task 3: (30%) Explore the effect of dropout with the flowers dataset and your bigger dataset.
Task 4 for pairs: (40%) Implement data augmentation, based the method in lab 9 and evaluate its effect.
More detailed information will follow for the tasks. Write a short report on the implementation and interpret the effects you observe. Submit your code and outputs.