Homework Assignment #5 (Data Visualization)
There is an opening in the Data Visualization Team at Wholoo.com. To pre-screen potential candidates, the team lead invited all current interns to submit solutions to the following data visualization exercises. The DataViz team is a python matplotlib shop, but they also like working with seaborn (https://seaborn.pydata.org) — python visualization library that builds upon and extends matplotlib. You can use either “pure” matplotlib or seaborn in your solutions.
1. Deconstruct/reconstructagraphexercise.ThefollowingKaggleblogposttriesto answer some interesting questions about movies using R ggplot2 data visualization: https://www.kaggle.com/gsdeepakkumar/imdb-database-visualisati on-analysis. Your task is to take plots from sections 9 and 11 of this blog post (“What content has got maximum user reviews over the year?” and “How do the IMDB Scores vary by category?”) and re-plot the same data in a better way. More specifically,
a. forsection9plot,plotthedatawithoutbarplots.Commentonwhyyour choice of graphics shows the data in a more digestible form;
b. forsection11plot,keeptheboxplotsorsomeothervariationofthistypeof plot, but significantly clean up the presentation of the plot.
Don’t try to use the same data that the author of the blog post used. Instead, use Small MovieLens data from https://grouplens.org/datasets/movielens/. You dealt with these data in homework assignment #2, but this time you should be able to load the data directly into Panda’s DataFrame. The MovieLens data doesn’t have content rating (PG13, PG, R, etc.), so instead of this categorical variable, use genre (Action, Adventure, etc.). So the blog post questions for sections 9 and 11 will be modified to “How many MovieLens reviews different genres got in each year?” and to “How do MovieLens reviews vary by genre?”
2. Copythemastersexercise.Reproducethefirsttwoplotsofthefollowing FiveThirtyEight article: https://fivethirtyeight.com/features/fandango-movies-ratings/. The data used in the article can be found at https://github.com/fivethirtyeight/data/tree/master/fandango.
3. Anothercopythemastersexercise.Reproducetheplotfromthefollowing FiveThirtyEight article: https://fivethirtyeight.com/features/every-guest-jon-stewart-ever-had-on-the-daily-sh ow/ The data used the article is available here: https://github.com/fivethirtyeight/data/tree/master/daily-show-guests
What to submit:
● Please write up your solution in the form of a Jupyter notebook. For portions of the homework where you are asked to provide comments, put these in markdown cells.
● Upload to canvas both
– your .ipynb notebook (remove any personal information such as your
database password!)
– and a .html file (created from File → Download as → HTML in Jupyter)
containing all the graphics, comments etc.