CS代考 AugustReferralExercise 01

AugustReferralExercise 01

AugustReferralExercise 01
your ID here

Copyright By PowCoder代写 加微信 powcoder

Table of Contents
0. Setup the document 1
1. Read and Investigate the Data 1
2. Cluster Recipes 2
3. Investigate Clusters 2
4. Conclusion 3

0. Setup the document
Clear the workspace
Load all the packages you need for this notebook here.
1. Read and Investigate the Data
Read in the data.
recipes <- read_csv('./recipes.csv') ## -- Column specification -------------------------------------------------------- ## recipe_id = col_double(), ## ingredient = col_character(), ## cuisine = col_character() Make sure the ID variable is a character or factor, not numeric. # your code here What variables are in the dataset? What data type are they? # your code here How many different unique ingredients are there? # your code here What are the top five most frequent ingredients? # your code here Create a frequency bar chart of the ingredients showing the distribution of ingredients. # your code here Optional: Create a dot plot of the top 5% of ingredients. # your code here 2. Cluster Recipes Uses this Jaccard similarity function to compare two sets of ingredients for a pair of recipes. source('jaccard_matrix.R') # your code here Use kmeans to cluster the recipes. First determine the appropriate number of centroids to use. Compute the total within sum of squares using between one and twenty clusters. Then plot the results. Use the elbow method to determine number of clusters. # your code here Use your chosen number of clusters to cluster the recipes. Use a high nstart to ensure a good quality solution. # your code here Use fviz_cluster to visualize the clusters. What patterns do you see? # your code here 3. Investigate Clusters Merge the clusters back into the recipe dataset so that each recipe has a new variable with the cluster ID. # your code here What are the top 5 ingredients and cuisines in each cluster? # your code here # your code here 4. Conclusion What patterns did the cluster analysis and PCA from fviz_cluster reveal? What further cleaning or processing of the data do you think is necessary? What other visualizations or analyses could you perform explain the patterns that you see in the data? 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com