EECS 485 Lecture 16
Link Analysis
Copyright By PowCoder代写 加微信 powcoder
Learning Objectives
• Build a link graph from web pages and explain what insights it gives about web pages
• Define PageRank on a graphical and mathematical level • Predict the outcomes of the PageRank algorithm on
example link graphs
• Describe and use the HITS algorithm, including identifying the root and base sets of nodes in a graph for a query
Link-based ranking
Document importance
• Search for “Python threads”
• Which document is more likely to be what I’m looking for?
• docs.python.org
• alex.oonutrition.ru
• How can we measure importance?
How do people indicate something is important?
• Talk about it • Reference it • Upvote it
• Link to it
Link graph
Importance in graphs
Say hi to partner
Importance in graphs
• Which node(s) are the most important? • How would you measure it?
Using the link graph for importance
• If lots of pages link to a page, it is important
• If an important page links to a page, that page also is important
• PageRank
PageRank concepts
• Algorithm that made Google the best search engine
Idea behind PageRank
• Humans know better than computers which pages are important
• Humans indicate importance through links • Like citations on an academic paper
PageRank model
• Assume someone starts on a random web page • They click a random link on that page
• They do this over and over
• For each page on the Internet, how often do they visit that page?
PageRank model example
eecs485.org docs.python.org
umich.edu en.wikipedia.org
• If many pages link to you, random person will be at your site more often
• If you’re popular, you will make the pages you link to more popular
Random visits
eecs485.org docs.python.org
umich.edu en.wikipedia.org
PageRank Formula
PageRank algorithm
Example link graph
Initial Page Rank
• Initialize to 1/N
P R ( A ) = (1 – d ) + d å P R ( I i ) N i C(Ii)
Exercise Graph 1
• Highest PR • Lowest PR • # iterations
Exercise Graph 2
• Highest PR • Lowest PR • # iterations
Combining PageRank and text-based ranking
• Define a weight for PageRank as % of total • Use tf-idf for rest
• You will do this in P5
HITS algorithm
HITS concept
• Account for both links in and out of a node • Authority score: value of content
• Hub score: value of links
HITS formula
• Authority score = sum of hub scores of pages that link to me
• Hub score = sum of authority scores of pages I link to
HITS algorithm
• Start with a root set
• Build a base set
• Ignore all nodes and edges not in base set
• Repeat until converged:
• Compute all authority scores
• Compute all hub scores • Normalize
• Authority scores for root set used for ranking
HITS outcomes
HITS outcomes
Exercise – root set [1]
Exercise – root set [4, 5]
Combining HITS with text-based ranking
• Put the best text-based results as root set
• Combine HITS authority score with weight (like for PageRank)
Learning Objectives
• Build a link graph from web pages and explain what insights it gives about web pages
• Define PageRank on a graphical and mathematical level • Predict the outcomes of the PageRank algorithm on
example link graphs
• Describe and use the HITS algorithm, including identifying the root and base sets of nodes in a graph for a query
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com