AugustReferralExercise 01
AugustReferralExercise 01
your ID here
Copyright By PowCoder代写 加微信 powcoder
Table of Contents
0. Setup the document 1
1. Read and Investigate the Data 1
2. Cluster Recipes 2
3. Investigate Clusters 2
4. Conclusion 3
0. Setup the document
Clear the workspace
Load all the packages you need for this notebook here.
1. Read and Investigate the Data
Read in the data.
recipes <- read_csv('./recipes.csv')
## -- Column specification --------------------------------------------------------
## recipe_id = col_double(),
## ingredient = col_character(),
## cuisine = col_character()
Make sure the ID variable is a character or factor, not numeric.
# your code here
What variables are in the dataset? What data type are they?
# your code here
How many different unique ingredients are there?
# your code here
What are the top five most frequent ingredients?
# your code here
Create a frequency bar chart of the ingredients showing the distribution of ingredients.
# your code here
Optional: Create a dot plot of the top 5% of ingredients.
# your code here
2. Cluster Recipes
Uses this Jaccard similarity function to compare two sets of ingredients for a pair of recipes.
source('jaccard_matrix.R')
# your code here
Use kmeans to cluster the recipes. First determine the appropriate number of centroids to use. Compute the total within sum of squares using between one and twenty clusters. Then plot the results. Use the elbow method to determine number of clusters.
# your code here
Use your chosen number of clusters to cluster the recipes. Use a high nstart to ensure a good quality solution.
# your code here
Use fviz_cluster to visualize the clusters. What patterns do you see?
# your code here
3. Investigate Clusters
Merge the clusters back into the recipe dataset so that each recipe has a new variable with the cluster ID.
# your code here
What are the top 5 ingredients and cuisines in each cluster?
# your code here
# your code here
4. Conclusion
What patterns did the cluster analysis and PCA from fviz_cluster reveal?
What further cleaning or processing of the data do you think is necessary?
What other visualizations or analyses could you perform explain the patterns that you see in the data?
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com