程序代写 EECS 485 Lecture 16

EECS 485 Lecture 16
Link Analysis

Copyright By PowCoder代写 加微信 powcoder

Learning Objectives
• Build a link graph from web pages and explain what insights it gives about web pages
• Define PageRank on a graphical and mathematical level • Predict the outcomes of the PageRank algorithm on
example link graphs
• Describe and use the HITS algorithm, including identifying the root and base sets of nodes in a graph for a query

Link-based ranking

Document importance
• Search for “Python threads”
• Which document is more likely to be what I’m looking for?
• docs.python.org
• alex.oonutrition.ru
• How can we measure importance?

How do people indicate something is important?
• Talk about it • Reference it • Upvote it
• Link to it

Link graph

Importance in graphs

Say hi to partner

Importance in graphs
• Which node(s) are the most important? • How would you measure it?

Using the link graph for importance
• If lots of pages link to a page, it is important
• If an important page links to a page, that page also is important

• PageRank

PageRank concepts

• Algorithm that made Google the best search engine

Idea behind PageRank
• Humans know better than computers which pages are important
• Humans indicate importance through links • Like citations on an academic paper

PageRank model
• Assume someone starts on a random web page • They click a random link on that page
• They do this over and over
• For each page on the Internet, how often do they visit that page?

PageRank model example
eecs485.org docs.python.org
umich.edu en.wikipedia.org

• If many pages link to you, random person will be at your site more often
• If you’re popular, you will make the pages you link to more popular

Random visits
eecs485.org docs.python.org
umich.edu en.wikipedia.org

PageRank Formula

PageRank algorithm

Example link graph

Initial Page Rank
• Initialize to 1/N

P R ( A ) = (1 – d ) + d å P R ( I i ) N i C(Ii)

Exercise Graph 1
• Highest PR • Lowest PR • # iterations

Exercise Graph 2
• Highest PR • Lowest PR • # iterations

Combining PageRank and text-based ranking
• Define a weight for PageRank as % of total • Use tf-idf for rest
• You will do this in P5

HITS algorithm

HITS concept
• Account for both links in and out of a node • Authority score: value of content
• Hub score: value of links

HITS formula
• Authority score = sum of hub scores of pages that link to me
• Hub score = sum of authority scores of pages I link to

HITS algorithm
• Start with a root set
• Build a base set
• Ignore all nodes and edges not in base set
• Repeat until converged:
• Compute all authority scores
• Compute all hub scores • Normalize
• Authority scores for root set used for ranking

HITS outcomes

HITS outcomes

Exercise – root set [1]

Exercise – root set [4, 5]

Combining HITS with text-based ranking
• Put the best text-based results as root set
• Combine HITS authority score with weight (like for PageRank)

Learning Objectives
• Build a link graph from web pages and explain what insights it gives about web pages
• Define PageRank on a graphical and mathematical level • Predict the outcomes of the PageRank algorithm on
example link graphs
• Describe and use the HITS algorithm, including identifying the root and base sets of nodes in a graph for a query

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com