Recommender Systems
Social Network Analysis
Centrality II
Robin Burke
DePaul University
Chicago, IL
1
Centrality correlation
Normally centrality values are correlated
high degree nodes living the middle of the network
But not always
2
Dolphins
3
Outliers?
Low degree Low closeness Low betweenness
High degree ✖
High closeness ✖
High betweenness ✖
4
Centrality correlation
Centrality measures are usually correlated
important nodes are usually important no matter how you measure
But
sometimes they are not
those differences can be interesting
5
Eigenvector centrality
What if your centrality depends on your neighbor’s centrality?
Centrality as a relational concept
having to do with being associated with other central individuals
6
Example: Dining
Ellen has the same in-degree (2) but her in-links come from two zero-degree nodes.
Maxine has low in-degree (2) but her in-links come from Eva (6) and Anna (3)
Basic idea
We have a recursive definition of centrality
you’re central if your friends are central
x’=Ax
If we repeat this over and over
x(t) = Atx(0)
We know from linear algebra that the largest eigenvector dominates At
as t goes to infinity
8
Eigenvector centrality
A solution to the equation is
x(t) → Cv1
as t →∞
where v1 is the largest eigenvector of A
We can normalize however we want
usually sum to 1
doesn’t matter because we are interested in comparing nodes
9
Dining network again
(directed EV centrality)
10
Undirected EV centrality
Eigenvector centrality
kind of a generalization of degree
a node with high eigenvector centrality is important IF
being connected to the popular nodes is valuable
and the more the better
no “attention” effect
goes beyond immediate neighbors
who are the friends of friends?
12
Problems with eigenvector centrality
Does not always work very well on directed graphs
A node with no in-degree by definition has 0 centrality
Acyclic graphs
all eigenvalues zero
13
Application
Use eigenvector centrality
if there is “reflected glory”
having popular friends matters
Like degree
more is always better
14
PageRank
Named after Larry Page
one of the founders of Google
How to rank web pages?
“reflected glory” yes
but some page may be indiscriminate linkers
want to count links more heavily
if there are fewer of them
15
Problem: Adversarial IR
NASA,
mars,
rover,
photo
NASA,
mars,
rover,
photo
NASA,
mars,
rover,
photo
NASA,
mars,
rover,
photo
16
Random walk model
At time 0
choose a random node in the network
A time 1
choose a random edge from this node
Continue indefinitely
Repeat for different starting nodes
How often is node n encountered?
this is our measure of centrality
17
Sounds hard
Lots of trials over the network
Lots of random walks, etc.
But
linear algebra to the rescue
18
Matrix definition
We can use this “random surfer” model
Derive a matrix-based definition of PageRank
which we can solve
19
Matrix definition
Instead of an adjacency matrix
we need a “transition probability” matrix
Technically, a Markov model
for each node,
equal probability of moving to all linked nodes
New adjacency matrix
each row is divided by degree
20
Weight = probability = 1/d
A
C
1/2
1/2
1
1/3
1/3
1/3
1
D
B
A
C
D
B
21
Markov model
A model of a process that can be in multiple states
Transitions between states are defined by links
the transitions are probabilistic
Always add up to 1
leaving any node
22
After k steps?
vk =(NT)kv0
We are interested (like before)
in convergence
So, at what vector v* does
v* =(NT)v*
there no scaling factor
because probability must sum to 1
23
Eigenvectors
In a Markov matrix
Only one real positive non-zero eigenvalue = 1
That’s our (Basic) PageRank
24
Problem
Random surfer might get “stuck”
No way out…
25
Fix
Add a random jump probability
Each iteration
small probability of jumping to a random node anywhere in the network
if not, then random surf
choose an edge randomly from the current node
This is enough to avoid sinks
26
New equation
vt = Pvt-1
where P has entries
α+β=1
β is the random jump probability
Solve as before
eigenvalues of new matrix
largest eigenvector = PageRank vector
27
Comparison (Dining)
28
PageRank (directed)
PageRank (undirected)
PageRank
Similar to eigenvector centrality
accounts for “attention” effect
1 friend in 50 < 1 friend in 5
Makes sense in information networks
models connected edges
A node is central IF
if it connected to other central nodes
who are selective
Very effective for adversarial IR
31
Different centrality measures
What to use?
It depends
Measures generally correlated
so maybe it doesn’t matter?
not quite
32
Degree
Assumption
direct benefits
A node gets a benefit from each connected neighbor
Corollaries
node does not get a benefit from indirect connections
there’s no limit
33
Dining network
34
Centralities
out-degree
uninformative
in-degree
prestige – who’s “cool”?
betweenness
not well-defined in a directed graph
might tell us something about communication
but links might be aspirational
closeness
not well-defined in a directed graph
might say something about influence
who is “in” with the “cool” crowd?
35
Key point
Our understanding of the network
gives meaning to the types of centrality
You can’t make use of a centrality measure
without having a theory about what the network is for
36
Dolphin data
Nodes of degree 7
Are these equally important to the network?
Jonah
SN100
Feather
37
It depends
What is our theory of dolphin social behavior?
mutual aid
hunting or defensive behavior
communication
pod-wide conversations?
resource sharing
caring for young
dominance hierarchy
kinship
What effect are we interested in learning about?
38
Krackhardt friend data
39
In-degree vs out-degree
Correlation low: 0.12
40
Out-degree
Subject to acquisition method
how were subjects asked for friends’ names?
“at least two?”
somebody named all but 3 individuals
High out degree
function of social aspiration
“everyone is my friend?”
41
In-degree
More like a normal distribution
peak around 6
Who is nominated as a friend?
not reciprocal
very common finding
High in-degree
function of prestige (like dining data)
42
Take home lesson
Centrality measures
like any metric
Meaningful relative to
the kind of network
the meaning of an edge
what is communicated over edges
how they were gathered
43
Note on weights
Edge weights can be interpreted differently by different centrality metrics
In igraph
Closeness, betweenness
Treat weight as distance
Edges with high weights are more distant
Contribute less to centrality
Eigenvector, PageRank
Treat weight as tie strength
Edges with high weights contribute more to centrality
Note on weights, cont’d
If your network has an edge attribute called “weight”
As in Homework 3
Must use the “weights” parameter
To overrides the normal interpretation of the metric
Homework 3
Bipartite projection
Weights are strength of association
Closeness and betweenness would interpret incorrectly
Solution
Use weights=NA
Metric will ignore edge weight attribute
Homework 3
A character-to-character projection
From Shakespeare’s Antony and Cleopatra
Characters are connected by an edge
If they speak in the same scene
Weight indicates how many scenes
Example
Krackhardt network
Friend and advice networks
Degree correlation
Comparing in-degrees
47
M =
0 1 1 0
0 0 0 1
1 1 0 1
0 0 1 0
⎡
⎣
⎢
⎢
⎢
⎢
⎤
⎦
⎥
⎥
⎥
⎥
M=
0110
0001
1101
0010
é
ë
ê
ê
ê
ê
ù
û
ú
ú
ú
ú
N =
0 12
1
2 0
0 0 0 1
1
3
1
3 0
1
3
0 0 1 0
⎡
⎣
⎢
⎢
⎢
⎢
⎢
⎢
⎤
⎦
⎥
⎥
⎥
⎥
⎥
⎥
N=
0
1
2
1
2
0
0001
1
3
1
3
0
1
3
0010
é
ë
ê
ê
ê
ê
ê
ê
ù
û
ú
ú
ú
ú
ú
ú
pt, j =α(mi, j / degout (i))+β
p
t,j
=a(m
i,j
/deg
out
(i))+b
/docProps/thumbnail.jpeg