代写代考 COMP3308/3608, Lecture 13a

COMP3308/3608, Lecture 13a
ARTIFICIAL INTELLIGENCE
Applications of Artificial Intelligence – Recommender Systems
, COMP3308/3608 AI, week 13a, 2022 1

Copyright By PowCoder代写 加微信 powcoder

• Introduction
• Collaborative filtering approaches
• Content-based approaches
, COMP3308/3608 AI, week 13a, 2022 2

Recommender Systems
Recommender Systems (RS) recommend items to people Which digital camera (mobile phone, washing machine, etc) should I buy?
What is the best holiday for me?
Which movie should I rent?
Which web sites will I find interesting?
Which book should I buy for my next holiday?
Which degree and university are the best for my future?
, COMP3308/3608 AI, week 13a, 2022 3

Motivation
• WWW has become the primary source of information
• It has a huge content -> need to reduce the information
• RS help users to find products, services and information, by predicting their relevance
, COMP3308/3608 AI, week 13a, 2022 4

Three Main Approaches
1) Collaborative Filtering
• “Show me books that are popular among my peers”
• Uses the preferences of the community and not the item’s features
2) Content-Based
• “Show me more of the type of books I like”
• Uses only the item’s features and not the preferences of the
3) Hybrid – combinations of collaborative filtering and content-based
, COMP3308/3608 AI, week 13a, 2022 5

Collaborative Filtering
, COMP3308/3608 AI, week 13a, 2022 6

Collaborative Filtering (CF)
The most prominent and widely used approach Well understood, many variations
Main idea: use the “wisdom of the crowd” Input:
a matrix of user–item ratings, i.e. assumes that the target user U had rated some of the items
a predicted rating for a new (unseen) item indicating how much U will like it
a list of predicted items (top-N list) sorted in decreasing order based on their scores
, COMP3308/3608 AI, week 13a, 2022 7

Example – Movielens
• User U rates some movies: actual
• The system uses U’s ratings, together with the ratings of other users, to give recommendations to U:
predicted ratings
, COMP3308/3608 AI, week 13a, 2022 8

Main CF Methods
• Nearest neighbour
• user-to-user – uses the similarity between users
• item-to-item – uses the similarity between items
• Advanced
• Matrix factorization – maps users and items to a joint factor
• Probabilistic
• Clustering-based
• Using association rules
, COMP3308/3608 AI, week 13a, 2022 9

User-to-User CF
• Makes predictions using the similarity between users
• Finds like-minded users who can complement each other’s
Given: a target user Alice and an item 𝑖 not yet seen by Alice
• Find (neighbourhood) N – a set of users (nearest neighbors), who liked the same items as Alice in the past and who have rated item 𝑖
• Combine the ratings of the users in N (e.g. take the average) to predict if Alice will like item 𝑖
• Do this for all items Alice has not seen and recommend the best-rated
, COMP3308/3608 AI, week 13a, 2022 10

• Given: a matrix of ratings of Alice and other users
• Goal: Find if Alice will like or dislike Item5, which Alice has not
rated (seen) yet
Questions:
1) How do we measure similarity between users?
2) How many neighbors should we consider?
3) How do we generate a prediction from the neighbors’ ratings?
, COMP3308/3608 AI, week 13a, 2022 11

Prediction for Alice
1) Correlation as a similarity (distance measure)
2) 2-Nearest-Neighbour – closest neighbours are User1 and User3 3) Average of neighbors’ ratings:
=> P(Alice, item5) = (3+4)/2=3.5
sim = 0.85
sim = 0.00
sim = 0.70
sim = -0.79
COMP3308/3608 AI, week 13a, 2022 12 12

Similarity Measure – Correlation
• Pearson correlation coefficient
• Most popular in user-based collaborative filtering
σ𝒑 ∈𝑷(𝒓𝒂,𝒑 − 𝒓ത𝒂)(𝒓𝒃,𝒑 − 𝒓ത𝒃)
σ 𝒓 −𝒓ത 𝟐 σ 𝒓 −𝒓ത 𝟐 𝒑∈𝑷𝒂,𝒑𝒂 𝒑∈𝑷𝒃,𝒑𝒃
𝒂, 𝒃 – users
𝑷 -setofitems,ratedbyboth𝒂and𝒃 𝒓𝒂,𝒑 – rating of user 𝒂 for item 𝒑
𝒓ത𝒂 – average rating of user 𝒂
, COMP3308/3608 AI, week 13a, 2022 13

Combining the Ratings – Various Ways
• How many neighbors?
• use a similarity threshold or fixed number of neighbors
• How to combine the ratings of the neighbors?
• Simple average
• Weighed average (by the similarity between the neighbour and target user)
• Weighed normalised adjusted average – commonly used:
Prediction for weighing by the similarity Is b’s rating of 𝑝 higher or
user a for item p between a & b lower than b’s average 𝒑𝒓𝒆𝒅 𝒂,𝒑 =𝒓𝒂 +σ𝒃∈𝑵𝒔𝒊𝒎 𝒂,𝒃 ∗(𝒓𝒃,𝒑 −𝒓𝒃) rating?
a’s average σ𝒃 ∈𝑵 𝒔𝒊𝒎 𝒂, 𝒃
normalization
• Adjusted = considers rating differences between users
, COMP3308/3608 AI, week 13a, 2022 14

Improving the Prediction – Variations
• Not all neighbor ratings are equally valuable
• E.g. agreement on commonly liked items is not so informative as
agreement on controversial items
• Solution: Give more weight to items that have a higher variance
• Number of co-rated items
• Reduce the weight when the number of co-rated items is low
• Case amplification
• Higher weight to very similar neighbors, i.e. similarity close to 1
, COMP3308/3608 AI, week 13a, 2022 15

Item-to-Item CF
• Makes predictions using the similarity between items
• Main idea:
• Find items that are similar to the new item (item5) based on the rankings • Use Alice’s ratings for these items to predict her rating for the new item
e.g. P(Alice, item5) =(5+4)/2=4.5
Recommendation for Alice: Alice will like item 5, as she bought and liked
items 1 and 4, and item 5 is similar to items 1 and 4 based other people’s
, COMP3308/3608 AI, week 13a, 2022 16

Similarity Measure – Cosine
• Most popular in item-based collaborative filtering
• Ratings are seen as vector in n-dimensional space
• Cosine similarity is the angle between the vectors
𝒂∙𝒃 𝒂 ∗|𝒃|
• Adjusted cosine similarity
• Considers also the average user ratings to transform the original
• 𝑈: set of users who have rated both items 𝑎 and 𝑏
σ𝒖∈𝑼(𝒓𝒖,𝒂 − 𝒓𝒖)(𝒓𝒖,𝒃 − 𝒓𝒖)
σ 𝒓−𝒓𝟐σ 𝒓−𝒓𝟐 𝒖∈𝑼 𝒖,𝒂 𝒖 𝒖∈𝑼 𝒖,𝒃 𝒖
, COMP3308/3608 AI, week 13a, 2022 17

Combining the Ratings
• Common prediction function – weighted prediction: 𝒑𝒓𝒆𝒅𝒖,𝒑 =σ𝒊∈𝒓𝒂𝒕𝒆𝒅𝑰𝒕𝒆𝒎(𝒖)𝒔𝒊𝒎𝒊,𝒑 ∗𝒓𝒖,𝒊
σ𝒊∈𝒓𝒂𝒕𝒆𝒅𝑰𝒕𝒆𝒎(𝒖) 𝒔𝒊𝒎 𝒊, 𝒑
, COMP3308/3608 AI, week 13a, 2022 18

Can we Precompute the Similarities?
• Rating matrix: a large number of items and a small number of ratings per user
• User-to-user collaborative filtering
• similarity between users is unstable (changes considerably when only a few ratings are changing (e.g. added) because of the small number of commonly ranked items)
• => pre-computing the similarities leads to poor performance
• Item-to-item collaborative filtering
• similarity between items is more stable
• we can pre-compute the item-to-item similarity and the nearest neighbours
• prediction involves lookup for these values and computing the weighed sum (Amazon does this)
, COMP3308/3608 AI, week 13a, 2022 19

Explicit vs Implicit Ratings
, COMP3308/3608 AI, week 13a, 2022 20

Explicit Ratings
• Explicitly provided by the user
• Most commonly used scales: 1 to 5 or 1 to 7 on Likert scale
• Explicit ratings are reliable but it is often difficult to obtain a sufficient number of ratings
• Small number of available ratings => sparse rating matrices => poor recommendation quality
• How to encourage users to rate more items?
, COMP3308/3608 AI, week 13a, 2022 21

Implicit Ratings
• Collected automatically by the web shop or application in which the recommender system is embedded
• When a customer buys an item, many recommender systems interpret this behaviour as a positive rating
• Clicks, page views, time spent on some page, demo downloads, etc. • Main problem – might not be reliable
• One cannot be sure whether the user behaviour is correctly interpreted
• E.g. a user might not like all the books he bought or might have bought a book for someone else
• Typically used in conjunction with the explicit ratings
, COMP3308/3608 AI, week 13a, 2022 22

Advanced CF Methods
, COMP3308/3608 AI, week 13a, 2022 23

SVD-based CF
• Idea: Generate predictions faster by reducing the dimensionality of the rating matrix
• Based on Singular Value Decomposition (SVD) for dimensionality reduction of the rating matrix
• SVD is widely used for dimensionality reduction – e.g. feature selection in ML, image compression, efficient indexing and retrieval in IR (Latent Semantic Analysis)
, COMP3308/3608 AI, week 13a, 2022 24

• Any n x m matrix X (n>=m) can be written as the product of 3
U – n x m orthogonal matrix
Vt – the transpose of matrix V, where V is a m x m orthogonal matrix
 – m x m diagonal matrix containing the singular values along the main diagonal, sorted in increasing order
• X is the original data
• V defines a new set of axes which provide information about the variance in data (the 1st axis is the most important – it goes in the direction of the highest variance, the 2nd axis is orthogonal to the 1st and goes in the direction of the second highest variance, etc)
• U is the transformed data
• We can approximate X by retaining only the most important axes, those with the largest singular values, and project the data on them -> new features, smaller number
• In our example (slide 28), we calculate 𝑈, 𝑉, and Σ but retain only the 2 most important axes by taking only the first 2 columns of 𝑈 and the first two rows of 𝑉t
, COMP3308/3608 AI, week 13a, 2022 25

Without data reduction
SVD – Graphical Representation
Withdata m k k m
COMP3308/3608 AI, week 13a, 2022 26

Compression Ratio
• Space needed before SVD: n x m
• Space needed after SVD: n*k + k + k*m
• The first k columns of U (n dimensional vectors), the k singular vectors and the first k columns of V (m dimensional vectors) => total = k(n+1+m)
• Compression ratio
r=nk+k+km nm
• For n >> m >k, this ratio is approximately k/m
, COMP3308/3608 AI, week 13a, 2022 27

SVD-Based Recommendation
new features (factors)
Calculating the prediction for user u and item i=EPL:
= 3 + 0.84 = 3.84
Predictions are generated faster on the low dimensional data
M =U  VT kkkk
rˆ =r +U (Alice) VT(EPL) uiuk kk
, COMP3308/3608 AI, week 13a, 2022 28

Matrix Factorization
• An extension of SVD-based recommendation
• The rating matrix is sparse and the standard SVD cannot be directly
• Earlier systems filled in the missing data and then applied SVD – data distortion, inaccurate predictions
• Current approach
• Use only the available ratings
• Formulate the problem of finding U and V as an optimisation problem (minimizing the reqularized squared error on the set of known ratings)
• Use stochastic gradient descent to solve it
• Matrix factorization is the state-of-the-art approach for CF
Y. Koren, R. Bell and C. Volinsky (2009). Matrix Factorization Techniques for Recommender Systems. IEEE Computer, Volume 42, Issue 8, p.30-37.
, COMP3308/3608 AI, week 13a, 2022 29

Probabilistic CF
• We can use Naïve Bayes
• What is the probability that Alice will give rating 1 to item 5, given her
previous ratings?
• Evidence E (Alice’s previous ratings) – 5 pieces:
E=item1=1, item2=3, item3=3, item4=2
• 5 hypotheses H (possible ratings for item5): H1: item5=1, H2: item5=2,…, H5: item5=5
• Assumption: the previous ratings (5 pieces of E) are independent of each other given a hypothesis:
, COMP3308/3608 AI, week 13a, 2022 30
P(H | E) = P(E | H)P(H) P(E)
P(H | E) =
P(Ei |H)P(H)

Probabilistic CF
Estimate the right hand side from the data for the other users:
𝑷 𝒊𝒕𝒆𝒎𝟓=𝟏𝑬
=𝑷 𝑰𝒕𝒆𝒎𝟏=𝟏𝑰𝒕𝒆𝒎𝟓=𝟏 ×𝑷 𝑰𝒕𝒆𝒎𝟐=𝟑𝑰𝒕𝒆𝒎𝟓=𝟏
×𝑷 𝑰𝒕𝒆𝒎𝟑=𝟑𝑰𝒕𝒆𝒎𝟓=𝟏 ×𝑷 𝑰𝒕𝒆𝒎𝟒=𝟐𝑰𝒕𝒆𝒎𝟓=𝟏 =𝟐×𝟏×𝟏×𝟏≈𝟎.𝟏𝟐𝟓 𝟐𝟐𝟐𝟐
𝑷 𝒊𝒕𝒆𝒎𝟓=𝟐𝑬
=𝑷 𝑰𝒕𝒆𝒎𝟏=𝟏𝑰𝒕𝒆𝒎𝟓=𝟐 ×𝑷 𝑰𝒕𝒆𝒎𝟐=𝟑𝑰𝒕𝒆𝒎𝟓=𝟐
×𝑷 𝑰𝒕𝒆𝒎𝟑=𝟑𝑰𝒕𝒆𝒎𝟓=𝟐 ×𝑷 𝑰𝒕𝒆𝒎𝟒=𝟐𝑰𝒕𝒆𝒎𝟓=𝟐 =𝟎×⋯×⋯×⋯=𝟎 𝟎
𝑷 𝒊𝒕𝒆𝒎𝟓=𝟑𝑬 =… 𝑷 𝒊𝒕𝒆𝒎𝟓=𝟒𝑬 =… 𝑷 𝒊𝒕𝒆𝒎𝟓=𝟓𝑬 =…
Select the hypothesis (rating for item 5) with highest probability
, COMP3308/3608 AI, week 13a, 2022 31

Clustering-Based CF
• Cluster the users based on their ratings
• Various distance measures and clustering algorithms
• Each cluster is represented with its centroid, a vector whose i-th component is the average rating of all users in the cluster for item i
• To make a recommendation for a new user U, find the closest centroid to U’s ratings and use the predictions of the users in this cluster
COMP3308/3608 AI, week 13a, 2022 32

Using Association Rules
• Association rules are used to analyse purchasing behaviour of customers
– Customers who buy X and Y, also buy Z
• Rule quality measures:
support(X−Y)= #(XY) n
confidence(X−Y)= #(XY) #X
, COMP3308/3608 AI, week 13a, 2022 33

Association Rules-Based CF
Transform 5-point ratings into binary
• e.g. rating = 1, if it is above the user’s average score and 0 otherwise
Mine rules such as Item1 → Item5 • support=2/4
• confidence=2/2 (without Alice)
To make recommendations for Alice:
• Left-hand side – find “relevant” rules based on Alice’s transactions (the above rule will be relevant as Alice bought Item1)
• Right-hand side – find items not already bought by Alice and sort them based on the rules’ confidence values
Different variations possible
• dislike statements, user associations, etc.
, COMP3308/3608 AI, week 13a, 2022 34

CF – Strengths and Weaknesses
• Strengths
• Minimum knowledge engineering – requires only a rating
matrix and no knowledge about the features of the products • Weaknesses
• Requires user community
• Requires sufficient number of co-rated items
• Sparse rating matrix
• Cold start problem – users need to rate items before a recommendation can be made
• Does not provide explanation for the recommendation
, COMP3308/3608 AI, week 13a, 2022 35

Content-Based Recommenders
, COMP3308/3608 AI, week 13a, 2022 36

Content-Based Recommender
• Uses information about the items and not about the user community
• e.g. recommend fantasy novels to people who liked fantasy novels in the past
• Has its roots in Information Retrieval
• What we need:
• Information about the content of the items (e.g. for movies: genre, leading actors, director, awards, etc.)
• Information about what the user likes (user preferences, also called user profile) – explicit (e.g. movie rankings by the user) or implicit
• Task: recommend items that match the user preferences
, COMP3308/3608 AI, week 13a, 2022 37

Classification Task
• Content-based recommendation can be formulated as a classification/regression task
• Using the rated data as a training set, build a classifier, for each user, that predicts the rating of new items
Item’s features
class: like/dislike
Leading actors
Keywords or Movie review
The lives of others
Drama, triller
Koch, , ̈he
Oscar and 66 other awards
In 1984 East Berlin, an agent of the secret police, conducting surveillance on a writer and his lover, …
The Spanish apartment
Comedy, drama, romance
France, Spain
Romain Duris, ,…
A French student moves into an apartment in Barcelona with six other European students. Together, they speak the international language of love and friendship. …
The sea inside
Drama, biography
Javier Bardem, Belén Rueda, ̃as
Oscar and 61 other awards
The real-life story of , who …

Mike Myers,
A 1960s hipster secret agent is brought out of cryofreeze …
, COMP3308/3608 AI, week 13a, 2022 38

Representing Text Documents
• Content-based approaches are mainly used for recommending text- based products, e.g. web pages, news, etc.
• The task becomes text classification
• Standard representation of a document: a vector of tf-idf values for
each selected word
• Each document is represented as a vector of words, each word is
represented with its tf-idf value (weight)
• tf (term frequency) part: measures how often a term (word) appears a document, assumes that important terms appear more often
• idf (inverse document frequency) part: aims to reduce the weight of terms that appear in all documents
• Instead of single words, bigrams and n-grams can be used • An example of bigram: “machine learning”
, COMP3308/3608 AI, week 13a, 2022 39

Example of tf-idf Representation
• tf part – term frequency:
• Each document is a count vector in N 𝒗
“Antony” appears 73 times in the document
Antony and

Each document is represented as a vector 𝒗 with dimension 𝒗 = 𝟕
Example from http://informationretrieval.org
, COMP3308/3608 AI, week 13a, 2022 40

Example of tf-idf Representation (2)
• tf-idf weights
• The idf part penalizes terms that appear in many documents

Antony and Cleopatra

The Tempest
1.21 6.1 000

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com