程序代写代做代考 Excel algorithm Recommender Systems

Recommender Systems

Social Network Analysis
Centrality I
Robin Burke
DePaul University
Chicago, IL

1

Project
Getting started on the project
Form groups (ASAP)
Proposal (due in 2 weeks)

2

Project Proposal example
Name(s): Robin Burke
Brief description of network / data set: Bi-partite affiliation network of professors at DePaul University and the University-level committees on which they have served from 2001-2011.
Data source: Does not exist, but it would be cool if it did
Network size (if known) nodes / edges: 357 faculty / 87 committees, 8,342 edges
 
Edge relation(s): Faculty member -> committee served
Vertex attributes: Professors: ID (anonymized), age, sex, ethnicity, race, college as of 2011, years at DePaul (integer 0-35), rank,
Committees: ID, name, standing, elected, size, date of creation (2001-1-1 to 2011-12-31), date of dissolution (2001-1-1 to 2011-12-31)
 
Edge attributes: Date of start of committee service (2001-1-1 to 2011-12-31)
 
Analysis question(s): Who are the influential faculty members with respect to university committees? How are these individuals distributed by college?
Is there a correlation between academic rank and committee service?
How are committee assignments distributed by college and by demographic factors?
Are there clusters of individual who co-serve on the same committees?
What committees are linked by co-membership? Does this association change over time?

Notes: End of service data was not available is most cases. I am still working to calculate this from other sources.
 

3

Ideas
Links to sources network data sets on D2L
Examples
Included on D2L are two projects from previous quarters
To give you some ideas
To show what an excellent project looks like

4

Outline
Terminology
Paths
Components
Linear Algebra review
Centrality
Degree
Betweenness
Closeness
Break
Eigenvector and related measures
Eigenvector centrality
PageRank
Example

5

Movement on a graph
On a graph you can move from node to node
via edges
Might be actual movement
messages in a communication network
Or it might be more implicit
social influence
A
B
C
D
E

6

Walk
Sequence of edges
such that the node ending each step is the beginning of the next
B-A-C-A is a walk…
Length
# of edges
A
B
C
D
E

7

Random walk
A random walk is a sequence of edges
chosen randomly at each vertex
Turns out to be a very important construct
more later tonight

8

Path
A walk where no vertex appears more than once
B-A-C-A is not a path
B-A-C-D is a path
A
B
C
D
E

9

Cycle
A walk where the beginning and endings nodes are repeated
but no others
B-A-B is a cycle
B-A-C-A is not a cycle
B-A-C-D-B is a cycle

A
B
C
D
E

10

Cyclic / acyclic graphs
Distinction for directed graphs
Cyclic graph is one with cycles
Acyclic has no cycles
A
B
C
D
E

11

Shortest path / geodesic
There may be multiple paths from one node to another
B-A
B-A (length 1)
B-D-C-A (length 3)
B-D-C-D-C-A
not a path
Shortest path aka
geodesic

A
B
C
D
E

12

No path
It is possible that there is no path between two nodes
D-E no path
length = 
A
B
C
D
E

13

Diameter
length of the longest geodesic
maximum distance inside the network
Disconnected graph?
technically should be ∞ (or undefined)
sometimes it is the maximum diameter of any component

14

Intuition
Shortest path from A to other nodes
A-C is the diameter
longest of these shortest paths

A
C

B

15

What is the diameter of this network?
https://www.polleverywhere.com/multiple_choice_polls/WXs8i7umaRBv5XB
16

Connected
A network is connected
if there are is a path between all the nodes in a network

A
B
C
D
E
A
B
C
D
E

17

Component
A component is a connected group of maximal size
If the network is connected
there is one component
Otherwise there are multiple components
A
B
C
D
E
A
B
C
D
E

18

Giant component
Real-world networks often have a “giant component”
which is 80-90% of the network all in one component

19

Bow-tie structure of the web

20

Directed network
Here there is no path from A to E
but there is one from E to A
We have a new term
“strongly-connected”

A
B
C
D
E

21

Strongly-connected
A directed network is strongly-connected
if there is a path from every node to every other
Note that this requires
cyclic graph

A
B
C
D
E

22

Weakly-connected
A directed network is weakly-connected
if it becomes connected when all edges are made undirected
A
B
C
D
E

23

Strongly-connected components
A weakly-connected network
is made up of strongly-connected components
chunks of the network that are strongly connected
A
B
C
D
E

24

If a graph is directed and acyclic, what is the longest path it can have?
https://www.polleverywhere.com/multiple_choice_polls/hDcJNEUhuC2CACR
25

In igraph
decompose()
Returns the components of a graph
List format
Remember [[1]] type indexing
Can specify strong / weak

26

Linear Algebra Review

27

Adjacency Matrix

A
B
C
D
E

28

Linear Algebra Review
An adjacency matrix is a square matrix
usually binary valued (0, 1)
Results from linear algebra very relevant
for working with networks

29

Matrix multiplication
Equals graph traversal

A
B
C
D
E

30

Start at A
[1 0 0 0 0] M = [0 1 1 0 1]
all 1 step paths from A
[0 1 1 0 1] M = [3 0 0 2 0]
all 2 step paths from A
[3 0 0 2 0] M = [0 5 5 0 3]
all 3 step paths from A
Note
A to A – no three step paths
means there are no triangles starting at A

31

Transformations
Transformation by the adjacency matrix
represents paths in the network
Mk = paths of length k
What about M*?
arbitrary k

32

Special case
Periodicity
if the network is periodic
no interesting answer to this question

A
C
D

33

Otherwise
M* will represent the strongest “tendency” of the matrix
That might be an interesting thing to know!

34

Matrices as transformations
Let M be a nxn square matrix
like an adjacency matrix
Let v be a n dimensional vector
Mv=w
where w is another n dimensional vector
(Yes, I switched sides for the multiplication. This is equivalent to using the transpose of the matrix.)
Or defining “from” as the columns and “to” as the rows

35

Eigenvectors
if Mv=λv
then v = eigenvector of M
λ = eigenvalue of M

36

Eigenvalues
An nxn matrix has at most n eigenvalues
det(A- λI)=0
given an n-degree polynomial that defines the eigenvalues
Sometimes there are multiple eigenvectors with the same λ
“defective” matrices
Symmetric matrix
has real eigenvalues
non-symmetric matrices might have complex eigenvalues

37

Ordering eigenvalues
We can sort the eigenvectors by eigenvalue
talk about the 1st eigenvector of a matrix, etc.

38

Decomposition
Assume that M has n distinct eigenvalues
true in most practical graph applications
Let Q be a square matrix of the eigenvectors
MQ=QΛ
where Λ is the matrix with the corresponding eigenvalues on the diagonal
Means
MQQ-1=QΛQ-1
M =QΛQ-1
So the matrix M can be expressed in terms of just its eigenvalues and eigenvectors

39

Powers of M
Mk = (QΛQ-1)k
= Q ΛkQ-1
Meaning that the largest eigenvalue will dominate
as k -> infinity

40

Steady state tendency
Leading eigenvector
corresponding to largest eigenvalue
Represents
where paths through the network
will tend to stay as they get longer and longer
Interesting way of ranking nodes
which are more or less probable?
We will come back to this insight!

41

Centrality

42

The Question
What are the most important nodes in a network?
“Important” is vague
will vary
by network type
by analysis task

43

Dolphin Network

44

The Answer
Basic idea
apply a node-level metric
rank nodes
What metrics to apply?

45

Three basic types of centrality
Different measures capture different ideas of what makes a node important
Here X > Y by different measures

degree
betweenness
closeness

46

Degree
We have talked about this before
# of incident edges
High degree makes a node important IF
immediate connections are important

47

Example: Dolphin network

48

Degree distribution

49

Degree centrality
Only counts immediate neighbors
Each neighbor equally valuable
Many variants
weighted
in-degree
out-degree
more exotic extensions

50

Brokerage
Not captured by degree

51

Brokerage
The extent to which one individual is an intermediary between different groups
who otherwise lack direct connections

52

Constraint / Structural holes

My dog ate it.
No homework today!

53

Intuition: how many pairs of individuals would have to go through you
in order to reach one another in the minimum number of hops?

Capturing brokerage

54

Betweenness
10 Pairs
A-B, A-C, A-D, A-E, B-C, B-D, B-E, C-D, C-E, D-E
How many involve C? (without ending at C)
How many involve D?
How many involve E?

A
B
C
D
E

Answer: 4
Answer: 3
Answer: 0
55

But
How to account for multiple shortest paths?
C and D “share”
divide by the number of shortest paths

A
B
C
D
E

56

Betweenness examples
A
B
C
E
D
why do C and D each have betweenness 1?
They are both on shortest paths for pairs (A,E), and (B,E), and so must share credit:
½+½ = 1

57

Dolphin Network

Low degree but high betweenness

58

Betweenness distribution

59

Computation
Betweenness is kind of slow to compute
especially if you do it in a naive way
O(n3)
We have to count every path between every pair of nodes
best algorithm is O(n2)
Compare computing degree = O(n)
With very large networks
might not be able to compute this
might prefer measures of lower complexity
igraph has bounded betweenness
only consider paths of a certain length
reduces the computational cost greatly

60

Betweenness
A node with high betweenness is important IF
the purpose of the network is communication or the flow of things
and the person benefits from the flow

61

Closeness
What if it’s not so important to have many direct friends?
Or be “between” others
But one still wants to be in the “middle” of things, not too far from the center

62

Importance without brokerage

63

Closeness
Average distance to other nodes
This gives a strange result
smaller is more important
So to match other metrics
1 / (average distance)
Now the range is 0 to 1
close to zero means far from most nodes

64

Closeness examples

A
B
C
E
D

65

Dolphin Network

66

Disconnected graphs
What to do if there is no path between node A and node B?
igraph
Treats path as if it were longer than the longest possible path
|N|
We need to do this because a singleton node should not have high closeness
Should be lowest possible
igraph will give a warning if the graph is not (strongly) connected

67

Closeness
A node with high closeness is important IF
reaching others in a small number of steps is important

68

Break

69

ú
ú
ú
ú
ú
ú
û
ù
ê
ê
ê
ê
ê
ê
ë
é
0
0
0
0
1
0
0
1
1
0
0
1
0
0
1
0
1
0
0
1
1
0
1
1
0

Y

X

Y
X

Y X

Y
X

Y

X

Y
X

Y X

Y
X

/docProps/thumbnail.jpeg