CS代考 COMP20008 Elements of Data Processing

Recommender Systems
School of Computing and Information Systems
@University of Melbourne 2022

Copyright By PowCoder代写 加微信 powcoder

Why recommender systems?
COMP20008 Elements of Data Processing

Why recommender systems?
• Scarcity to Abundance
• Internet changed shopping behaviours
• Online business is heavily dependent on recommender systems.
COMP20008 Elements of Data Processing
item sorted by popularity
– long tail –
popularity

Why recommender systems?
• The Long Tail by : “In 1988, a British mountain climber named wrote a book called ‘Touching the Void’, a harrowing account of near death in the Peruvian Andes. It got good reviews but, only a modest success, it was soon forgotten. Then, a decade later, a strange thing happened. wrote ‘Into Thin Air’, another book about a mountain-climbing tragedy, which became a publishing sensation. Suddenly Touching the Void started to sell again”.
• “A lot of times, people don’t know what they want until you show it to them” –
COMP20008 Elements of Data Processing

• LinkedIn • Facebook • Twitter
• Youtube • Netfix
This item: The Martian by Paperback $8.92
The Revenant: A Novel of Revenge by Paperback $9.52 The Life We Bury by Paperback $8.75
Recommender systems – examples
Customers Who Bought This Item Also Bought
Page 1 of 15
13/03/2016 10:03 26am
Kids Categories Search Kids… Exit Kids
The Wiggles My Little Pony Mako Mermaids H2O: Just Add Water Good Luck Charlie Pokémon
Top Picks for Kids
The Boys in the Boat: Nine Americans and Their Epic Quest for Gold at the…
The Revenant: A Novel of Revenge
Ready Player One: A Novel ›
The Life We Bury ›
Recently watched
The 5th Wave: The First Book of the 5th Wave Series
COMP20008 Elements of Data Processing
https://www.netflix.com/Kids
Page 1 of 4
#1 Paperback $9.15
in Boating

Recommender systems
• “75% of what people watch is from some sort of recommendation” (Netflix)
• “If I have 3 million customers on the web, I should have 3 million stores on the web.” (Amazon CEO)
Movie recommender systems
• finding best matched movies,
• reducing search times and frustration.
The Martian
Jurassic World
COMP20008 Elements of Data Processing

Recommender systems – How it works
• An online system where many users interact with many items.
• Each user has a profile
• User rate items
• Explicitly: give a score
• Implicitly: web usage mining: Time spent on viewing the item, etc.
• System does the rest, How?
COMP20008 Elements of Data Processing

Popularity based recommendation
• Show popular items.
• Which item is popular?
• Simple but not personalised.

Recommender Systems
School of Computing and Information Systems
@University of Melbourne 2022

Collaborative filtering
Recommender systems
COMP20008 Elements of Data Processing

Collaborative filtering
• Collaborative Filtering: Making predictions about a user’s missing data according to the collective behaviour of many other users
• Look at users’ collective behavior (e.g. ratings) • Active user history
• Combine!
• Item-based collaborative filtering (Item-Item) • User-based collaborative filtering (User-User)
COMP20008 Elements of Data Processing

Collaborative filtering – A framework
$: %-items
‘: ) * + → –
• Asetof!users”andasetof#Items$
• A !×# Interaction Matrix or Rating Matrix &
• Find unknown ratings !!”
COMP20008 Elements of Data Processing
“: #-users

Item-based method: Intuition
People like things similar to other things they like • Search for similarities among items
• Many users like both Batman and Supermanàthe two movies are similar.
• Similarity is collective similarity in ratings by many users.
• Recommend items similar to those rated by the target user. • Superman and Batman are similar
• If Peter liked Batman then recommend Superman to Peter.
Superman Batman
COMP20008 Elements of Data Processing

Item based collaborative filtering
Three questions to address:
• How to measure item similarities?
• How to find similar items?
• How to combine ratings of these items?
i1 ii ij in u1
COMP20008 Elements of Data Processing

Q1: Measure item-similarity
Example similarity between item !! and item !” : • Euclidean distance with mean imputation
• Imputation with their mean values • . / 0 1 2 ( = ) $ ) $ * = 2 . 7
• ./012 =3.75 +
• Similarity score based on Euclidean distance
#$% $!,$” = # where( $!,$” = ∑$ (+ −+ )’
•./!,/” =3.24
= (3 − 3.75)’+(2.7 − 3.5)’ + (2.7 − 3)’+(3 − 3.5)’ +(2 − 4)’ +(2.7 − 4)’+ (2.7 − 4.5)’
•6/#/!,/”= # =7.89 #$%.'(
🙂 :* 🙂 :*
3 − 3 3.75 − 3.5 2.7 3.5 −3 3 2 . 7 3
2 3.5 3 3.5 4 2 4
− 4 2.7 4 4.5 2.7 4.5
#$% !#,!$ !”# !% !&
COMP20008 Elements of Data Processing

Q2: How to find similar items?
• We have an answer to Q1, for item 81, we have the similarities between it and other items:
• Thetargetuser:hasratedsomeitems:
• Chooseanumber;,findk-mostsimilaritemsto81forusera
• Let < = 3, which 3 items? • Items !2, !6, and !5 • similarities 0.48, 0.35, and 0.33. • Ratings: 4, 3, 3 %$&((), (+) %$&((), (-) %$&((), (.) %$&((), (/) %$&((), (0) COMP20008 Elements of Data Processing Q3: How to combine ratings of similar items? • Predict the rating of item !1 for user # • From Q1 and Q2, we get: • Foruser0,the3(k=3)mostsimilaritemsto81:82,86,85 %$&((), (+) %$&((), (-) 1$&((), (.) %$&((), (/) %$&((), (0) • The ratings of these 3 items by user a: 4, 3.5, 3 • Rating = weighted average over the ratings of the 3 most similar items • >,,-. = /.12×1 $ /.))×) $ /.)4×) = 3.41
/.12 $ /.)) $ /.)4
COMP20008 Elements of Data Processing

Item-based Collaborative Filtering – Algorithm
• Phase 1 – For each item j,
• Compute similarities between j and other items.
similarity: e.g. Euclidean distance with mean imputation.
• Batch, Off-line
• Phase 2 – Predict rating of item j by user ” based on the k-most similar items (among items rated by “)
• Predicted rating = weighted average over the ratings of the k-most similar
$;< = '() (, + ×$;= '() (, + COMP20008 Elements of Data Processing Item-based filtering – Practice example • Predict $;< (# = .(); + = 012345(61) The Martian Jurassic World Phase – 1 offline: similarities between Inception and other movies sim( Inception, Titanic) sim( Inception, Batman) sim( Inception, Superman) sim( Inception, The Martian) sim( Inception, Jurassic World) 0.48 (d=1.08) 0.24 (d=3.24) 0.20 (d=3.89) 0.33 (d=2.05) 0.34 (d=1.97) COMP20008 Elements of Data Processing Item-based filtering – Practice example cont. Phase – 2 online: • select 3-most similar items (k=3) w.r.t. (Tim, Inception) sim(Inception, Titanic) sim(Inception, Batman) sim(Inception, Superman) sim(Inception, The Martian) sim(Inception, Jurassic World) The Martian Jurassic World • weightedavgovertheratingsofthe3-mostsimilaritems • $;< = G.HI×K L G.KK×K.M L G.KH×K = 3.14 G.HI L G.KK L G.KH COMP20008 Elements of Data Processing Item-based collaborative filtering Summary • Item similarities computation is off-line • So, efficient at runtime. • Developed by Amazon, suited for situations #users >> #items • What do we do with new items? (Cold-start problem)
COMP20008 Elements of Data Processing

Recommender Systems
School of Computing and Information Systems
@University of Melbourne 2022

User based collaborative filtering
Recommender systems
COMP20008 Elements of Data Processing

User-based collaborative filtering: Intuition
People like things liked by other people with similar taste • Search for similarities among users
• Two users Jane and Bob tend to like same movies; they have similar taste in movies.
• Recommend items like by users similar to the target user.
• Jane and Bob have similar rating behaviours (taste),
• If Jane liked Batman then recommend Batman to Bob.
• Mathematically similar to Item-based methods. COMP20008 Elements of Data Processing

User-based method
Q1: how to measure similarity?
Q2: how to find similar users?
Target user
Q3: how to combine?
COMP20008 Elements of Data Processing

Q1: How to measure similarity between users
• Euclidean distance with mean imputation mean(u1): 18.1
mean(u2): 14.1
(17 − 8)”+ (18.1 − 14.1)”+(20 − 14.1)”+(18 − 17)”+(17 − 14)”+(18.5 − 17.5)”
• Meanimputationforu1andu2
• ComputeEuclideandistancebetweenresultingrows
• Convertthedistanceintoasimilarity(highsimilarityfor
low distance, low similarity for high distance)
•!”# $!,$” )*!,*” =11.9=
!#$(&!,&”)
COMP20008 Elements of Data Processing

User-based: Q2: How to find similar users? Q3: How to combine ratings?
With respect to user a and item *:
• Choose + most similar users who have rated item *.
• Prediction of rating is weighted average of the ratings of item * from the top-k similar users.
• Mathematically similar to Item-based method.
COMP20008 Elements of Data Processing

User-based method
• Mathematically similar to Item-based method.
• However:
• Item-based performs better in many practical cases: movies, books, etc.
• User preference is dynamic; relatively static for item based High update frequency of offline-calculated information
• Sparsity problem with user based method. • No recommendation for new users
• Scalability issues
• As the number of users increase, more costly to find similar users. • Offline clustering of users
COMP20008 Elements of Data Processing

Scale-up search of k-similar users
COMP20008 Elements of Data Processing

Scale up search of k-similar users
• Offline step
COMP20008 Elements of Data Processing

Options for Q1: Similarity metrics
• Item-item: Considers the similar items • User-user: Considers the similar users
• We looked at Euclidean distance based similarity.
• The other two popular similarity measures are • Cosine similarity and
• (centered cosine similarity).
COMP20008 Elements of Data Processing

Content based recommender systems
• Content-based approach is an alternative to collaborative filtering methods.
• Content-based method uses pre-determined features of an item to recommend similar items.
• Real applications combine multiple approaches into a hybrid- approach.
COMP20008 Elements of Data Processing

Content based recommender systems
Feature vector of an Item:
• Create feature vector from an item’s attributes.
• For content based methods, we need rich content for items
• Author, title, actors, directors, metadata, gender, friends, …
• Pandora (Music Genome Project)
musicians labeled each music with 400+ attributes.
• Others: users create profiles when signing up. Feature vector of an user:
• Weighted average of the feature vectors of (rated-) items. Recommend items:
• Similarity based on the feature vectors
• Cosine similarity of item’s and user’s feature vectors.
COMP20008 Elements of Data Processing

Content-based method
• Content-based recommendation
• Relies on the properties of items: need solid profiles.
• This can be hard to choose
• Transparency
• Independent from other users’ ratings
• Sufficient descriptions on itemsàno new-item problem. • Can recommend unpopular items to user
• Over specification
• Tends to recommend items similar to those already seen by the users
• Still has new-user problem
COMP20008 Elements of Data Processing

Recommender Methods Summary
• Popularity based – no personalisation
• Collaborative filtering
• Personalisation
• Item-item filtering is efficient compared to user based for large user space
• Has cold-start problem for new items and new users
• Content based recommendation
• Build feature vector from attributes of items
• without new item issue, still has the new user issue
• Other advanced methods: Multi-armed bandit method, deep learning, and reinforcement learning…

Measurement for system performance
• Withhold some known ratings : test data
• A simple method is RMSE (root mean squared error) between predicted and actual ratings

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com