midterm-checkpoint
Midterm: Recommender System for Movies¶
(Note: This midterm assignment will have hidden test cases)
Copyright By PowCoder代写 加微信 powcoder
In this project, you will implement a recommender system for your classmates, professor and TAs based on the movie survey we have conducted. The movie preference file is at ./data/movie_preference.csv
Recommender System¶
The objective of a Recommender System is to recommend relevant items to users, based on their preference. Recommender systems are prevalent in the digital space. For example, when you go shopping on Amazon, you notice that Amazon is recommending products on the front page before you even type anything in the search box. Similarly, when you go on YouTube, the top bar of Youtube is typically “videos recommended to you.” All these features are based on recommmender systems.
What item to recommend to which user is arguably the most important business decision in many digital platforms. For instance, YouTube cannot control the videos its users upload to it. It cannot control which videos users like to watch either. Moreoveor, since watching videos is free, YouTube cannot control the behavior of its users by changing the price of its items. It does not have inventory either since each video can be viewed as many times as possible. In this case, what could YouTube control? Or in other words, what differentiates a good video streaming service from a bad one? The answer is its recommender system.
Types of Recommender Systems¶
There are three types of recommender systems.
Popularity-based Recommendation¶
The most obvious system is popularity-based recommendation. In this case, we recommend to a user the most popular items that the user has not previously consumed. In the movie setting, we will recommend the movie that most users have watched and liked. In other words, this system utilizes the “wisdom of the crowds.” It usually provides good recommendations for most people. Since it is easy to implement, the popularity-based recommendation system is used as a baseline. Note: this system is not personalized. If two consumers have not watched Movie A, and Movie A is the most popular one, both of them will be recommended Movie A, no matter how different these two consumers are.
Content-based Recommendation¶
This recommender system leverages the data on a customer’s historical actions. It first uses available data to identify a set of features that describes an item (for example, for movies, we can use the movie’s director, main actor, main actress, genre, etc. to describe the movie). When a user comes in, the system will recommend the movie that is closest, in terms of these features, to the movies that the user has watched and liked. For instance, if a user likes action movies from Nolan the most, this system will recommend another action movie from Nolan that this user has not watched. Note: we will not implement this system in this project since it requires knowledge about supervised learning. We may come back to this topic at the end of this semester.
Collaborative Filtering Recommendation¶
The last type of recommender system is called collaborative filtering. This approach uses the memory from previous users’ interactions to compute users’ similarities based on items they’ve interacted (user-based approach) or compute items’ similarities based on the users that have interacted with them (item-based approach).
A typical example of this approach is User Neighbourhood-based CF, in which the top-N similar users (usually computed using Pearson correlation) for a user are selected first. The items that are liked by these users are then used to identify the best candidate to recommend to the current user.
Read-in the preference file¶
The first exercise is to read in the movie preference csv file (you need to use relative path).
You must return two things:
A dictionary where the key is username and the value is a vector of (-1, 0, 1) that indicates the user’s preference across movies (in the order of the csv file). Note that 1 encodes a “like” and -1 encodes a “dislike”. A zero means that the user has not watched that movie yet.
A list of strings that contains movie names. (The order of movie names should be the same as the order in the original csv file)
Note 1: Your result should exactly match the results from the assert statements. This means you should pay attention to extra space, newline, etc.
Note 2: If there are two records with the same name, use the first record from the person.
def read_in_movie_preference():
“””Read the move data, and return a
preference dictionary.”””
preference = {}
movies = []
# YOUR CODE HERE
raise NotImplementedError()
return [movies, preference]
[movies, preference] = read_in_movie_preference()
assert len(movies) == 20
[movies, preference] = read_in_movie_preference()
assert movies == [‘The Shawshank Redemption’, ‘The Godfather’,
‘The Dark Knight’, ‘Star Wars: The Force Awakens’,
‘The Lord of the Rings: The Return of the King’,
‘Inception’, ‘The Matrix’, ‘Avengers: Infinity War’,
‘Interstellar’, ‘Spirited Away’, ‘Coco’, ‘The Dark Knight Rises’,
‘Braveheart’, ‘The Wolf of Wall Street’, ‘Gone Girl’, ‘La La Land’,
‘Shutter Island’, ‘Ex Machina’, ‘The Martian’, ‘Kingsman: The Secret Service’]
[movies, preference] = read_in_movie_preference()
assert preference[” “] == [1, 1, 1, 1, 1, 1, 1, 1, -1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1]
assert preference[” “] == [0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1]
Popularity-based Ranking¶
Compute the ranking of most popular movies¶
Your next task is to take the movie preference dataframe and compute the popularity scores of the movies. To compute a movie’s popularity score, you should first compute the number of times people have liked movies in the entire dataset across all movies. You should then compute the number of times people have disliked movies in the entire dataset across all movies.
Let’s assume that people have liked movies A times in the entire dataset and disliked movies B times in the entire dataset. The popularity score of a movie is then defined as Num_of_People_Like_the_Movie – A / B * Num_of_People_Dislike_the_Movie
(We use A/B to normalize the weights of likes and dislikes because if one type of reaction is rare, it derseves more weights. For example, if a typical movie gets on average 100 likes and no dislike, a dislike conveys a much stronger message on a movie’s quality than a like).
Your function should return:
A dictionary where the keys are movie names and the values are correpsonding movie popularity score.
A list of movie names sorted descendingly by their popularity. For example, if ‘The Shawshank Redemption’ is the second most popular movie, the second element in the list should be ‘The Shawshank Redemption’.
A and B as defined above.
Note: You may want to use prior functions to help you read data inside this function
def movies_popularity_ranking():
movie_popularity = {}
movie_popularity_rank = []
total_likes = 0
total_dislikes = 0
# YOUR CODE HERE
raise NotImplementedError()
return movie_popularity, movie_popularity_rank, total_likes, total_dislikes
movie_popularity, movie_popularity_rank, total_likes, total_dislikes = movies_popularity_ranking()
assert total_likes == 1300
assert total_dislikes == 236
movie_popularity, movie_popularity_rank, total_likes, total_dislikes = movies_popularity_ranking()
assert round(movie_popularity[“The Shawshank Redemption”], 2) == 66.98
assert round(movie_popularity[“Avengers: Infinity War”], 2) == 14.86
movie_popularity, movie_popularity_rank, total_likes, total_dislikes = movies_popularity_ranking()
assert movie_popularity_rank == [‘The Shawshank Redemption’,
‘Inception’,
‘Kingsman: The Secret Service’,
‘The Wolf of Wall Street’,
‘The Matrix’,
‘Avengers: Infinity War’,
‘The Dark Knight Rises’,
‘Interstellar’,
‘The Dark Knight’,
‘The Martian’,
‘Spirited Away’,
‘The Godfather’,
‘Braveheart’,
‘La La Land’,
‘Shutter Island’,
‘Gone Girl’,
‘The Lord of the Rings: The Return of the King’,
‘Ex Machina’,
‘Star Wars: The Force Awakens’]
Recommendation¶
You now implement a polularity-based recommendation function. This function takes in a user’s name. It returns a string representing the name of a movie that satisfies the following three conditions:
The user has not watched this movie.
This movie has the best popularity score (among those that are not watched by the user).
This movie has higher popularity score than the average of the popularity scores of the movies that this user has watched (the average is computed over all movies wateched by the user, regardless of whether they were liked by the user or not).
If the user name does not exit, this function should return “Invalid user.”
If the user has watched all movies, this function should return “Unfortunately, no new movies for you.”
If the unwatched movies all have lower popularity scores than the average score of the movies watched by this user, this function should return “Unfortunately, no new movies for you.”
Note: Again, you may want to use prior functions to help you read data and rank movies inside this function
def Recommendation(name):
recommended_movie = “”
# YOUR CODE HERE
raise NotImplementedError()
return recommended_movie
assert Recommendation(” “) == ‘Inception’
assert Recommendation(“Nobody”) == ‘The Shawshank Redemption’
assert Recommendation(” “) == ‘Kingsman: The Secret Service’
assert Recommendation(“Test Student 2”) == ‘Invalid user.’
Cosine Similarity¶
Let us now use collaborative filtering to find a good recommendation.
In order to do so, we need to get the cosine similarity beween movies and users. Again, we can use the preference file we used in Section 2. The file represents each person by a vector of (0, 1, -1). Cosine similarity in our case is the dot product of the two preference vectors divided by the product of the magnitude of the two preference vectors. In other words, if person A has preference vector A, and person B has preference vector B, their cosine similarity is equal to
$$ \frac{A \cdot B}{||A||||B||} = \frac{\sum_i^n A_iB_i}{\sqrt{\sum_i^nA_i^2}\sqrt{\sum_i^nB_i^2}}$$If a person has not watched any movies, then the cosine similarity between this person and any other person is defined as 0. For more information on cosine simialrity, you can read this wiki page
As an example, let the following two vectors represent Naveed’s and Jake’s preference over 3 movies.
Inception Coco The Dark Knight
Jake 1 -1 0
Naveed -1 0 1
In this case, Naveed and Jake’s cosine similarity is equal to
$$ \frac{1*(-1)+(-1)*0+0*(-1)}{\sqrt{1+(-1)^2}*\sqrt{(-1)^2+1}} = \frac{-1}{2} = -0.5$$Your task is to write a similarity function that takes in two names and returns the Cosine similarity between these two users. If one or both names do not exist in the database, return 0.
def Similarity(name_1, name_2):
“””Given two names and preference, get the similarity
between two people”””
cosine = 0
# YOUR CODE HERE
raise NotImplementedError()
return cosine
assert round(Similarity(“Test Student”, “Nobody”), 2) == 0.17
assert round(Similarity(“Test Student”, “DJZ2”), 2) == -0.27
assert round(Similarity(“Test Student”, “Test Student 2”), 2) == 0
Movie Soulmate¶
Your next task is to find the movie soulmate for a person. In order to find a person’s movie soulmate, you will compute the cosine similarity between this person and every other person in the dataset. You will then return the person who has the highest cosine similarity with the focal person. If two people have the same cosine similarity with the focal person, you can tie break by the length of names (the person with shorter name will be the soulmate). If the focal person does not exist in the database, return an empty string as the soulmate name.
Your function will return two things:
the name of the soulmate
the largest cosine similarity
def Movie_Soul_Mate(name):
“””Given a name, get the player that has highest Jaccard
similarity with this person.”””
soulmate = “”
cosine_similarity = -100
# YOUR CODE HERE
raise NotImplementedError()
return soulmate, cosine_similarity
soulmate, cosine_similarity = Movie_Soul_Mate(“Q”)
assert soulmate == ‘ ‘
assert round(cosine_similarity, 2) == 0.75
soulmate, cosine_similarity = Movie_Soul_Mate(“Test Student”)
assert soulmate == ‘ ‘
assert round(cosine_similarity, 2) == 0.80
soulmate, cosine_similarity = Movie_Soul_Mate(” “)
assert soulmate == ‘Yuchen’
assert round(cosine_similarity, 2) == 0.81
Memory-based Collaborative Filtering Recommendation¶
Now after finding a person’s movie soulmate, we can construct a (very preliminary) collaborative filtering recommendation. In our recommendation system, for a focal person, we first find his or her soulmate. We then find all the movies that he/she has not watched but the soulmate has watched and liked. Among all of these movies, we recommend the movie with the highest popularity score defined in Section 3.1.
if the user name does not exit, this function should return “Invalid user.”
If the person has watched all the movies, return “Unfortunately, no new movies for you.”
If there are no movies that are watched and liked by the soulmate but not watched by the focal person, then return the movie (or string) that should be returned in Section 3.2.
def Recommendation2(name):
recommended_movie = “”
# YOUR CODE HERE
raise NotImplementedError()
return recommended_movie
assert Recommendation2(“Test Student”) == ‘Inception’
assert Recommendation2(“Test Student Long Name”) == ‘The Shawshank Redemption’
assert Recommendation2(“Test Student Long Name”) == ‘The Shawshank Redemption’
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com