CIS 415 – Assignment 5
This assignment is individual effort. Problem Definition
In this Assignment, we are going to use Amazon Product Co-purchase data to make Book Recommendations using Social Network Analysis.
This assignment has three objectives:
• Review Python concepts to read and manipulate data and get it ready for analysis
• Apply Social Network Analysis concepts to Build and Analyze Graphs
• Apply concepts in Text Processing, Social Network Analysis and Recommendation Systems to make a product recommendation
We will be using the Amazon Meta-Data Set maintained on the SNAP site. This data set is comprised of product and review metdata on 548,552 different products. The data was collected in 2006 by crawling the Amazon website. You can view the data by double-clicking on the file amazon-meta.txt that’s been included in SocialNetworkAnalysis.zip. The following information is available for each product in this dataset:
• Id: Product id (number 0, …, 548551)
• ASIN: Amazon Standard Identification Number.
The Amazon Standard Identification Number (ASIN) is a 10-character alphanumeric unique identifier assigned by Amazon.com for product identification. You can lookup products by ASIN using following link: https://www.amazon.com/product-reviews/
• title: Name/title of the product
• group: Product group. The product group can be Book, DVD, Video or Music.
• salesrank: Amazon Salesrank
The Amazon sales rank represents how a product is selling in comparison to other products in its primary
category. The lower the rank, the better a product is selling.
• similar: ASINs of co-purchased products (people who buy X also buy Y)
• categories: Location in product category hierarchy to which the product belongs (separated by |, category id in
[])
• reviews: Product review information: total number of reviews, average rating, as well as individual customer
review information including time, user id, rating, total number of votes on the review, total number of helpfulness votes (how many people found the review to be helpful)
The first step is to use the transformed data provided to you in the SocialNetworkAnalysis.zip file with file names amazon-books.txt and amazon-books-copurchase.edgelist to make Book Recommendations.
You have been provided with a Python script called AnalyzeAmazonBooks.py that’s been included in SocialNetworkAnalysis.zip. This script takes the “amazon-books.txt” and “amazon-books-copurchase.edgelist” files as input, and performs the following steps:
• Read amazon-books.txt data into the amazonBooks Dictionary
• Read amazon-books-copurchase.edgelist into the copurchaseGraph Structure
• We then assume a User has purchased a Book with ASIN=0805047905. The question then is, how do we make other Book Recommendations to this User, based on the Book copurchase data that we have? We could potentially take ALL books that were ever copurchased with this book and recommend all of them. However, the Degree Centrality of Nodes in a Product Co-Purchase Network can typically be large. We should therefore come
up with
o
o
o
o
o
a better strategy. Let’s take the following approach:
First we examine the metadata associated with the Book that the User is looking to purchase (ASIN=0805047905), including Title, SalesRank, TotalReviews, AvgRating, DegreeCentrality, and ClusteringCoefficient. We notice that this Book has a DegreeCentrality of 216 – which means 216 other Books were copurchased with this Book by other Customers. So yes, it would indeed make sense to come up with a better strategy of recommending copurchased Books.
So now, let’s consider the Ego Network (depth 1) of the Book that the User is looking to purchase (ASIN=0805047905). This is essentially ALL of the Books that have ever been co-purchased with the book under consideration.
Recall that the Edge Weight between any two Nodes (Book ASINs) in our copurchaseGraph is the Category Similarity between the two Nodes (Book ASINs) connected by the Edge. So we can actually use the Island method to get rid of copurchased Books with a very low degree of category similarity. We pick a threshold of 0.5, and create a Trimmed Ego Network.
We can then consider the Copurchased Books (Nodes or ASINs) that are still connected to the Book that the User is looking to purchase (ASIN=0805047905) in the Trimmed Ego Network. We can then sort these copurchased books in descending order by their AvgRatings, and recommend the Top 3.
We can examine the metadata associated with the top the recommended books, including Title, SalesRank, TotalReviews, AvgRating, DegreeCentrality, and ClusteringCoefficient. We find that they are all pretty good matches.
Requirements for this Assignment
Step 1
Complete each of the following steps as follows:
1) Download and unzip the SocialNetworkAnalysis.zip file from our course site
2) Make sure that the python script AnalyzeAmazonBooks.py is in the root directory of your unzipped file
a. Root directory or folder means that the Python scripts for this assignment need to run in the same directory or folder (not another directory or folder) where the unzipped files amazon-books.txt, amazon-books- copurchase.edgelist, and amazon-meta.txt are located.
b. Failure to have the Python scripts for this assignment in the correct directory or folder, as described in step 1.c.i will result in the following Python error upon execution: FileNotFoundError: [Errno 2] No such file or directory: ‘./amazon-books.txt’
c. The best way to ensure you are running the Python scripts in the correct root directory or folder location is to do the following:
i. Close Spyder (this is important)
ii. Use File Explorer (Windows) or Finder Window (If you installed Spyder locally in your Mac) to
navigate to your directory or folder where you unzipped the file SocialNetworkAnalysis.zip. iii. Ensure that such directory or folder has the files: AnalyzeAmazonBooks.py, amazon-meta.txt,
amazon-books.txt, and amazon-books-copurchase.edgelist for all to work correctly.
iv. Now open Spyder. In Spyder, go to the File menu, then Open menu, navigate to the directory or folder where you unzipped your SocialNetworkAnalysis.zip file, and select the AnalyzeAmazonBooks.py script. Then click on Open or Ok.
3) Save a copy of AnalyzeAmazonBooks.py script with a new name that should follow the name convention AnalyzeAmazonBooks-[ASURITE id].py. Make sure that this updated script is saved in the root directory or folder of your unzipped SocialNetworkAnalysis.zip file, as seen in step 1.c above for your new script file name AnalyzeAmazonBooks-[ASURITE id].py
Step 2
Read, understand, and execute the AnalyzeAmazonBooks.py script and ensure you can see the Top 3 Recommendations
for ASIN=0805047905.
Step 3
Using Spyder, open your other script named AnalyzeAmazonBooks-[ASURITE id].py and update it to do the following:
a. Make Top 5 Recommendations for a Buyer who is purchasing ASIN = 0812580036.
b. List the Top 5 Recommendations and associated Metadata
c. Recall that once we had trimmed the Ego Network, we considered the Copurchased Books that are still connected to the Book that the User is looking to purchase in the Trimmed Ego Network. We then sorted these copurchased books in descending order by their AvgRatings, and recommend the Top 3. Is there some other data and/or logic we could have used to pick the Top Book Recommendations? Would you implement a different sort for recommendations? Would you implement a different logic on the script to narrow your recommendations? What would that be? Briefly document your new logic for later use in this assignment.
d. Now update this script to use your new logic (step 3.c above) and make Top 5 Recommendations for ASIN = 0812580036.
Recommendations
I would not recommend only implementing a simple change such as changing the Island threshold value here and submitting your assignment with this change being your only update to this script, as this will receive a low grade for this assignment. My expectations are that you will be using our concepts learned in class to implement a meaningful change in your updated AnalyzeAmazonBooks-[ASURITE id].py script. Several ideas for a meaningful change have been provided in step 3 above for your consideration. Attend our office hours, should you have any questions.
Submission for this Assignment
Submit the following for this Assignment:
• For Step 1 above, you will submit one file as follows:
o Create a word document file named Step1-[ASURITE].docx
o In this file, paste a screenshot (File Explorer for Windows, Finder for Mac OS) in your word document
named Step1-[ASURITE].docx that clearly shows the following files AnalyzeAmazonBooks.py, AnalyzeAmazonBooks-[ASURITE id].py, amazon-meta.txt, amazon-books.txt, and amazon-books- copurchase.edgelist
o Submit your document named Step1-[ASURITE].docx with the required screenshot outlined in this document section as part of your submission [1 point]
• For Step 2 above, you will submit one file as follows:
o Create a Word document file named Step2-[ASURITE].docx
o In this file, list the Top 3 Recommendations for a Buyer who is purchasing ASIN=0805047905. o Submit your document named Step2-[ASURITE].docx with the top 3 recommendations for
ASIN=0805047905 as part of your submission [1 point] • For Step 3 above, you will submit two files as follows:
o File 1 – Submit your updated script file named AnalyzeAmazonBooks-[ASURITE id].py. This script will have in its Python implementation your new updated logic (Step 3.c above). Consider the following:
Note that your updated script named AnalyzeAmazonBooks-[ASURITE id].py will be run as-is in the correct root folder directory with files amazon-meta.txt, amazon-books.txt, and amazon- books-copurchase.edgelist
Points will be deducted if your updated script does not execute and/or does not provide the top 5 recommendations for ASIN = 0812580036 based on the updated logic (step 3.c above) implementation in your script named AnalyzeAmazonBooks-[ASURITE id].py
Submit your script AnalyzeAmazonBooks-[ASURITE id].py as part of your submission [1 point]
o File 2 – Submit a Word document file named Step3-[ASURITE].docx with the following two elements: Brief Description of alternate/enhanced logic to make Top Recommendations from the Trimmed Ego Network (step 3.c above). [1 point]
Based on your newly executed logic, list Top 5 Recommendations for a Buyer who is purchasing ASIN = 0812580036. The Top Recommendations listed here should match the output from the updated script named AnalyzeAmazonBooks-[ASURITE id].py. [1 point]
Assignment Submission Requirements
Some things to keep in mind as you code:
• Make your code readable – for instance, use meaningful variable names and comments.
• Make your code elegant – for instance, balance the number of variables you introduce – too many or too few
make your code difficult to debug, read, and maintain.
• Make your output readable and user-friendly
Once you have written up the script, save it as follows. Submit the script by uploading your python script. Note: upload the actual script – DO NOT attach a screenshot of the script!
The submitted script will be run as-is for grading using the Anaconda Spyder with Python 3.7.0+ installation required for the class. Your submission will receive zero points for your overall submission grade if you:
• Use scripts not created, executed, and/or edited using Anaconda Spyder with Python 3.7.0+ as required for this course
• Do not use the Python Template Script and/or Python Data provided in the SocialNetworkAnalysis.zip file for this Assignment
Your submission will have points deducted for scripts that:
• are difficult to read/follow
• don’t compile/run
• don’t have all the various pieces of code required
• have hard-code values instead of using variables
• have logical errors
• don’t result in the expected output
• don’t have user-friendly output
• don’t follow the software principles in the Zen of Python Developers Guide located here:
https://www.python.org/dev/peps/pep-0020/
• don’t’ run efficiently following the practices and lecture examples demonstrated in our lectures and problem
sets (located in the course site)
Review the latest Syllabus document posted in our course site for additional Assignment requirements.