Recommender Systems
Social Network Analysis
Centrality I
Robin Burke
DePaul University
Chicago, IL
1
Project
Getting started on the project
Form groups (ASAP)
Proposal (due in 2 weeks)
2
Project Proposal example
Name(s): Robin Burke
Brief description of network / data set: Bi-partite affiliation network of professors at DePaul University and the University-level committees on which they have served from 2001-2011.
Data source: Does not exist, but it would be cool if it did
Network size (if known) nodes / edges: 357 faculty / 87 committees, 8,342 edges
Edge relation(s): Faculty member -> committee served
Vertex attributes: Professors: ID (anonymized), age, sex, ethnicity, race, college as of 2011, years at DePaul (integer 0-35), rank,
Committees: ID, name, standing, elected, size, date of creation (2001-1-1 to 2011-12-31), date of dissolution (2001-1-1 to 2011-12-31)
Edge attributes: Date of start of committee service (2001-1-1 to 2011-12-31)
Analysis question(s): Who are the influential faculty members with respect to university committees? How are these individuals distributed by college?
Is there a correlation between academic rank and committee service?
How are committee assignments distributed by college and by demographic factors?
Are there clusters of individual who co-serve on the same committees?
What committees are linked by co-membership? Does this association change over time?
Notes: End of service data was not available is most cases. I am still working to calculate this from other sources.
3
Ideas
Links to sources network data sets on D2L
Examples
Included on D2L are two projects from previous quarters
To give you some ideas
To show what an excellent project looks like
4
Outline
Terminology
Paths
Components
Linear Algebra review
Centrality
Degree
Betweenness
Closeness
Break
Eigenvector and related measures
Eigenvector centrality
PageRank
Example
5
Movement on a graph
On a graph you can move from node to node
via edges
Might be actual movement
messages in a communication network
Or it might be more implicit
social influence
A
B
C
D
E
6
Walk
Sequence of edges
such that the node ending each step is the beginning of the next
B-A-C-A is a walk…
Length
# of edges
A
B
C
D
E
7
Random walk
A random walk is a sequence of edges
chosen randomly at each vertex
Turns out to be a very important construct
more later tonight
8
Path
A walk where no vertex appears more than once
B-A-C-A is not a path
B-A-C-D is a path
A
B
C
D
E
9
Cycle
A walk where the beginning and endings nodes are repeated
but no others
B-A-B is a cycle
B-A-C-A is not a cycle
B-A-C-D-B is a cycle
A
B
C
D
E
10
Cyclic / acyclic graphs
Distinction for directed graphs
Cyclic graph is one with cycles
Acyclic has no cycles
A
B
C
D
E
11
Shortest path / geodesic
There may be multiple paths from one node to another
B-A
B-A (length 1)
B-D-C-A (length 3)
B-D-C-D-C-A
not a path
Shortest path aka
geodesic
A
B
C
D
E
12
No path
It is possible that there is no path between two nodes
D-E no path
length =
A
B
C
D
E
13
Diameter
length of the longest geodesic
maximum distance inside the network
Disconnected graph?
technically should be ∞ (or undefined)
sometimes it is the maximum diameter of any component
14
Intuition
Shortest path from A to other nodes
A-C is the diameter
longest of these shortest paths
A
C
B
15
What is the diameter of this network?
https://www.polleverywhere.com/multiple_choice_polls/WXs8i7umaRBv5XB
16
Connected
A network is connected
if there are is a path between all the nodes in a network
A
B
C
D
E
A
B
C
D
E
17
Component
A component is a connected group of maximal size
If the network is connected
there is one component
Otherwise there are multiple components
A
B
C
D
E
A
B
C
D
E
18
Giant component
Real-world networks often have a “giant component”
which is 80-90% of the network all in one component
19
Bow-tie structure of the web
20
Directed network
Here there is no path from A to E
but there is one from E to A
We have a new term
“strongly-connected”
A
B
C
D
E
21
Strongly-connected
A directed network is strongly-connected
if there is a path from every node to every other
Note that this requires
cyclic graph
A
B
C
D
E
22
Weakly-connected
A directed network is weakly-connected
if it becomes connected when all edges are made undirected
A
B
C
D
E
23
Strongly-connected components
A weakly-connected network
is made up of strongly-connected components
chunks of the network that are strongly connected
A
B
C
D
E
24
If a graph is directed and acyclic, what is the longest path it can have?
https://www.polleverywhere.com/multiple_choice_polls/hDcJNEUhuC2CACR
25
In igraph
decompose()
Returns the components of a graph
List format
Remember [[1]] type indexing
Can specify strong / weak
26
Linear Algebra Review
27
Adjacency Matrix
A
B
C
D
E
28
Linear Algebra Review
An adjacency matrix is a square matrix
usually binary valued (0, 1)
Results from linear algebra very relevant
for working with networks
29
Matrix multiplication
Equals graph traversal
A
B
C
D
E
30
Start at A
[1 0 0 0 0] M = [0 1 1 0 1]
all 1 step paths from A
[0 1 1 0 1] M = [3 0 0 2 0]
all 2 step paths from A
[3 0 0 2 0] M = [0 5 5 0 3]
all 3 step paths from A
Note
A to A – no three step paths
means there are no triangles starting at A
31
Transformations
Transformation by the adjacency matrix
represents paths in the network
Mk = paths of length k
What about M*?
arbitrary k
32
Special case
Periodicity
if the network is periodic
no interesting answer to this question
A
C
D
33
Otherwise
M* will represent the strongest “tendency” of the matrix
That might be an interesting thing to know!
34
Matrices as transformations
Let M be a nxn square matrix
like an adjacency matrix
Let v be a n dimensional vector
Mv=w
where w is another n dimensional vector
(Yes, I switched sides for the multiplication. This is equivalent to using the transpose of the matrix.)
Or defining “from” as the columns and “to” as the rows
35
Eigenvectors
if Mv=λv
then v = eigenvector of M
λ = eigenvalue of M
36
Eigenvalues
An nxn matrix has at most n eigenvalues
det(A- λI)=0
given an n-degree polynomial that defines the eigenvalues
Sometimes there are multiple eigenvectors with the same λ
“defective” matrices
Symmetric matrix
has real eigenvalues
non-symmetric matrices might have complex eigenvalues
37
Ordering eigenvalues
We can sort the eigenvectors by eigenvalue
talk about the 1st eigenvector of a matrix, etc.
38
Decomposition
Assume that M has n distinct eigenvalues
true in most practical graph applications
Let Q be a square matrix of the eigenvectors
MQ=QΛ
where Λ is the matrix with the corresponding eigenvalues on the diagonal
Means
MQQ-1=QΛQ-1
M =QΛQ-1
So the matrix M can be expressed in terms of just its eigenvalues and eigenvectors
39
Powers of M
Mk = (QΛQ-1)k
= Q ΛkQ-1
Meaning that the largest eigenvalue will dominate
as k -> infinity
40
Steady state tendency
Leading eigenvector
corresponding to largest eigenvalue
Represents
where paths through the network
will tend to stay as they get longer and longer
Interesting way of ranking nodes
which are more or less probable?
We will come back to this insight!
41
Centrality
42
The Question
What are the most important nodes in a network?
“Important” is vague
will vary
by network type
by analysis task
43
Dolphin Network
44
The Answer
Basic idea
apply a node-level metric
rank nodes
What metrics to apply?
45
Three basic types of centrality
Different measures capture different ideas of what makes a node important
Here X > Y by different measures
degree
betweenness
closeness
46
Degree
We have talked about this before
# of incident edges
High degree makes a node important IF
immediate connections are important
47
Example: Dolphin network
48
Degree distribution
49
Degree centrality
Only counts immediate neighbors
Each neighbor equally valuable
Many variants
weighted
in-degree
out-degree
more exotic extensions
50
Brokerage
Not captured by degree
51
Brokerage
The extent to which one individual is an intermediary between different groups
who otherwise lack direct connections
52
Constraint / Structural holes
My dog ate it.
No homework today!
53
Intuition: how many pairs of individuals would have to go through you
in order to reach one another in the minimum number of hops?
Capturing brokerage
54
Betweenness
10 Pairs
A-B, A-C, A-D, A-E, B-C, B-D, B-E, C-D, C-E, D-E
How many involve C? (without ending at C)
How many involve D?
How many involve E?
A
B
C
D
E
Answer: 4
Answer: 3
Answer: 0
55
But
How to account for multiple shortest paths?
C and D “share”
divide by the number of shortest paths
A
B
C
D
E
56
Betweenness examples
A
B
C
E
D
why do C and D each have betweenness 1?
They are both on shortest paths for pairs (A,E), and (B,E), and so must share credit:
½+½ = 1
57
Dolphin Network
Low degree but high betweenness
58
Betweenness distribution
59
Computation
Betweenness is kind of slow to compute
especially if you do it in a naive way
O(n3)
We have to count every path between every pair of nodes
best algorithm is O(n2)
Compare computing degree = O(n)
With very large networks
might not be able to compute this
might prefer measures of lower complexity
igraph has bounded betweenness
only consider paths of a certain length
reduces the computational cost greatly
60
Betweenness
A node with high betweenness is important IF
the purpose of the network is communication or the flow of things
and the person benefits from the flow
61
Closeness
What if it’s not so important to have many direct friends?
Or be “between” others
But one still wants to be in the “middle” of things, not too far from the center
62
Importance without brokerage
63
Closeness
Average distance to other nodes
This gives a strange result
smaller is more important
So to match other metrics
1 / (average distance)
Now the range is 0 to 1
close to zero means far from most nodes
64
Closeness examples
A
B
C
E
D
65
Dolphin Network
66
Disconnected graphs
What to do if there is no path between node A and node B?
igraph
Treats path as if it were longer than the longest possible path
|N|
We need to do this because a singleton node should not have high closeness
Should be lowest possible
igraph will give a warning if the graph is not (strongly) connected
67
Closeness
A node with high closeness is important IF
reaching others in a small number of steps is important
68
Break
69
ú
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ê
ë
é
0
0
0
0
1
0
0
1
1
0
0
1
0
0
1
0
1
0
0
1
1
0
1
1
0
Y
X
Y
X
Y X
Y
X
Y
X
Y
X
Y X
Y
X
/docProps/thumbnail.jpeg