程序代写代做代考 Recommender Systems

Recommender Systems

Social Network Analysis
Centrality II
Robin Burke
DePaul University
Chicago, IL

1

Centrality correlation
Normally centrality values are correlated
high degree nodes living the middle of the network
But not always

2

Dolphins

3

Outliers?
Low degree Low closeness Low betweenness
High degree ✖
High closeness ✖
High betweenness ✖

4

Centrality correlation
Centrality measures are usually correlated
important nodes are usually important no matter how you measure
But
sometimes they are not
those differences can be interesting

5

Eigenvector centrality
What if your centrality depends on your neighbor’s centrality?
Centrality as a relational concept
having to do with being associated with other central individuals

6

Example: Dining
Ellen has the same in-degree (2) but her in-links come from two zero-degree nodes.
Maxine has low in-degree (2) but her in-links come from Eva (6) and Anna (3)

Basic idea
We have a recursive definition of centrality
you’re central if your friends are central
x’=Ax
If we repeat this over and over
x(t) = Atx(0)
We know from linear algebra that the largest eigenvector dominates At
as t goes to infinity

8

Eigenvector centrality
A solution to the equation is
x(t) → Cv1
as t →∞
where v1 is the largest eigenvector of A
We can normalize however we want
usually sum to 1
doesn’t matter because we are interested in comparing nodes

9

Dining network again
(directed EV centrality)

10

Undirected EV centrality

Eigenvector centrality
kind of a generalization of degree
a node with high eigenvector centrality is important IF
being connected to the popular nodes is valuable
and the more the better
no “attention” effect
goes beyond immediate neighbors
who are the friends of friends?

12

Problems with eigenvector centrality
Does not always work very well on directed graphs
A node with no in-degree by definition has 0 centrality
Acyclic graphs
all eigenvalues zero

13

Application
Use eigenvector centrality
if there is “reflected glory”
having popular friends matters
Like degree
more is always better

14

PageRank
Named after Larry Page
one of the founders of Google
How to rank web pages?
“reflected glory” yes
but some page may be indiscriminate linkers
want to count links more heavily
if there are fewer of them

15

Problem: Adversarial IR
NASA,
mars,
rover,
photo
NASA,
mars,
rover,
photo
NASA,
mars,
rover,
photo
NASA,
mars,
rover,
photo

16

Random walk model
At time 0
choose a random node in the network
A time 1
choose a random edge from this node
Continue indefinitely
Repeat for different starting nodes
How often is node n encountered?
this is our measure of centrality

17

Sounds hard
Lots of trials over the network
Lots of random walks, etc.
But
linear algebra to the rescue

18

Matrix definition
We can use this “random surfer” model
Derive a matrix-based definition of PageRank
which we can solve

19

Matrix definition
Instead of an adjacency matrix
we need a “transition probability” matrix
Technically, a Markov model
for each node,
equal probability of moving to all linked nodes
New adjacency matrix
each row is divided by degree

20

Weight = probability = 1/d
A
C
1/2
1/2
1
1/3
1/3
1/3
1

D
B
A
C
D
B

21

Markov model
A model of a process that can be in multiple states
Transitions between states are defined by links
the transitions are probabilistic
Always add up to 1
leaving any node

22

After k steps?
vk =(NT)kv0
We are interested (like before)
in convergence
So, at what vector v* does
v* =(NT)v*
there no scaling factor
because probability must sum to 1

23

Eigenvectors
In a Markov matrix
Only one real positive non-zero eigenvalue = 1
That’s our (Basic) PageRank

24

Problem
Random surfer might get “stuck”

No way out…

25

Fix
Add a random jump probability
Each iteration
small probability of jumping to a random node anywhere in the network
if not, then random surf
choose an edge randomly from the current node
This is enough to avoid sinks

26

New equation
vt = Pvt-1
where P has entries

α+β=1
β is the random jump probability
Solve as before
eigenvalues of new matrix
largest eigenvector = PageRank vector

27

Comparison (Dining)

28

PageRank (directed)

PageRank (undirected)

PageRank
Similar to eigenvector centrality
accounts for “attention” effect
1 friend in 50 < 1 friend in 5 Makes sense in information networks models connected edges A node is central IF if it connected to other central nodes who are selective Very effective for adversarial IR 31 Different centrality measures What to use? It depends Measures generally correlated so maybe it doesn’t matter? not quite 32 Degree Assumption direct benefits A node gets a benefit from each connected neighbor Corollaries node does not get a benefit from indirect connections there’s no limit 33 Dining network 34 Centralities out-degree uninformative in-degree prestige – who’s “cool”? betweenness not well-defined in a directed graph might tell us something about communication but links might be aspirational closeness not well-defined in a directed graph might say something about influence who is “in” with the “cool” crowd? 35 Key point Our understanding of the network gives meaning to the types of centrality You can’t make use of a centrality measure without having a theory about what the network is for 36 Dolphin data Nodes of degree 7 Are these equally important to the network? Jonah SN100 Feather 37 It depends What is our theory of dolphin social behavior? mutual aid hunting or defensive behavior communication pod-wide conversations? resource sharing caring for young dominance hierarchy kinship What effect are we interested in learning about? 38 Krackhardt friend data 39 In-degree vs out-degree Correlation low: 0.12 40 Out-degree Subject to acquisition method how were subjects asked for friends’ names? “at least two?” somebody named all but 3 individuals High out degree function of social aspiration “everyone is my friend?” 41 In-degree More like a normal distribution peak around 6 Who is nominated as a friend? not reciprocal very common finding High in-degree function of prestige (like dining data) 42 Take home lesson Centrality measures like any metric Meaningful relative to the kind of network the meaning of an edge what is communicated over edges how they were gathered 43 Note on weights Edge weights can be interpreted differently by different centrality metrics In igraph Closeness, betweenness Treat weight as distance Edges with high weights are more distant Contribute less to centrality Eigenvector, PageRank Treat weight as tie strength Edges with high weights contribute more to centrality Note on weights, cont’d If your network has an edge attribute called “weight” As in Homework 3 Must use the “weights” parameter To overrides the normal interpretation of the metric Homework 3 Bipartite projection Weights are strength of association Closeness and betweenness would interpret incorrectly Solution Use weights=NA Metric will ignore edge weight attribute Homework 3 A character-to-character projection From Shakespeare’s Antony and Cleopatra Characters are connected by an edge If they speak in the same scene Weight indicates how many scenes Example Krackhardt network Friend and advice networks Degree correlation Comparing in-degrees 47 M = 0 1 1 0 0 0 0 1 1 1 0 1 0 0 1 0 ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ M= 0110 0001 1101 0010 é ë ê ê ê ê ù û ú ú ú ú N = 0 12 1 2 0 0 0 0 1 1 3 1 3 0 1 3 0 0 1 0 ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ N= 0 1 2 1 2 0 0001 1 3 1 3 0 1 3 0010 é ë ê ê ê ê ê ê ù û ú ú ú ú ú ú pt, j =α(mi, j / degout (i))+β p t,j =a(m i,j /deg out (i))+b /docProps/thumbnail.jpeg