Recommender Systems
Social Network Analysis
Density
Robin Burke
DePaul University
Chicago, IL
1
Outline
Project
Density
Local transitivity
Global transitivity
Assortativity
Break
Community detection
Example
2
Data milestone
You do not have to turn in your data
honor code
But you do have to gather the data you want to use
and report on it
Show HTML output from R
showing the loading of the network
Print a summary with the network details
In other words
A call to read_graph() followed by
A call to summary()
Network structure
Trying to understand the “shape” of a network
Is tightly-woven or loose?
Is it strung-out or compact?
A number of measures try to get at this idea
4
Density
Density is a measure of how many edges are in a graph
as a function how many there could be
# total edges / # possible edges
5
Possible edges
Undirected
n(n-1)/2 = nC2
You are choosing all possible pairs
without replacement
Directed
there are twice as many edges
A => B and B => A both count
n(n-1)
Think about the cells in the adjacency matrix
6
Density
n nodes, m edges
in igraph
edge_density()
7
Alternatively
Average degree = c = 2m / n
because each edge has two ends
the sum of all degrees must = 2m
This can be useful for proofs
8
Poll Title: What is the density of a network with 10 nodes and 15 edges?
https://www.polleverywhere.com/multiple_choice_polls/J5L1D6BIjUeGpGj
9
Sparse graph
Mathematical definition
sparse means that ρ → 0 as n →∞
dense means that ρ → k as n →∞
In other words
sparse means that new nodes don’t add enough new connections
to keep the density from always shrinking
10
Sparsity criterion
Suppose we add nodes to a network
In order for to stay the same
assume j is very large
ratio of mk/mk+j is > j2/k2
basically we have to add edges at a rate proportional to j2
Also =c/n-1
constant density means that average degree has to go up proportional to n
11
Network sparsity
Social networks in the modern world are rarely dense
maximum degree = Dunbar’s number = 150?
Also true for other types of networks
geography, physical limitations, diminishing returns ,etc.
Density of 5% is a lot
projections of bipartite networks will be higher
12
Sparsity vs clustering
Social networks are sparse
but they are “clumpy”
regions where lots of edges
Several measures to capture this
13
Local transitivity
Also called clustering coefficient
Calculate what fraction of a node’s neighbors are themselves connected
Alternatively
the density of the network around a given node
not including the node
More formally
Let Na = {nodes n such that a-n exists)
G(Na) = subgraph with only these nodes
(G(Na)) = local transitivity
14
Example
A has four neighbors
B, E, C, D
6 possible pairs
only 1 linked
1/6 = 0.1667
C has two neighbors
they are connected
1/1 = 1.0
15
In R
transitivity(gr, type=“local”)
How to handle nodes with no neighbors?
isolates=“zero” standard solution
otherwise, dividing by 0
16
Compare
random network vs social network
average local clustering
dolphin: 0.26
random: 0.11
17
Clustering distribution
Note: many zeros in random network
more than 50% of the nodes have no neighbors that know each other
in Dolphin network, only about ¼
One node in random network with 1.0
this is a node of degree 2 with these neighbors connected
doesn’t quite capture the “clumpy” notion
18
Sized by local transitivity
19
Local transitivity
Measures “local density” around a node
Node by node measure of local graph structure
Biased towards low degree nodes
20
Poll Title: If a node has degree 5 and local transitivity of 0.2, how many pairs of neighbors are connected?
https://www.polleverywhere.com/multiple_choice_polls/UnYZURomDZeXihq
21
Global Transitivity
Fraction of all 2-part triangles that are complete
More formally
Let T = {triples of nodes (a, b, c)}
Let Tp=T such that
a-b and b-c
Let Tt = subset of Tp such that
c-a edges also exist
transitivity = |Tt|/|Tp|
22
Global Transitivity
Enumerate all of the possible triples
ABC, ABD, ABE, ABF, ABG
ACB, ACD, ACE, ACF, ACG
etc,
There are 7P3 = 210 total
Of these only some have two sides
ACD
ABG
AEF
total = 13
How many closed = 3
ACD
CDA
DAC
Transitivity = 3/13 0.23
23
In igraph
transitivity(gr, type=“global”)
dolphin: 0.31
random: 0.09
24
Local vs global
Consider this network
add more and more triangles
Avg. local transitivity -> 1
Global transitivity -> 0
25
Transitivity / Clustering
Local transitivity / clustering coefficient
is good for identifying locally dense areas
gives a distribution
can be deceptive when degree is low
Global transitivity
“transitivity”
gives a single overall number
more useful for comparing networks
26
High transitivity
Social networks have high clustering and transitivity
compared to random networks of the same density
Transitive closure
people tend to introduce their friends
mutual friends may share attributes
27
Examples vs random
Transitivity
Prison friendships
.31 (MacRae 60) vs .0134
co‐authorships
.15 math (Grossman 02) vs .00002,
.09 biology (Newman 01) vs .00001,
.19 econ (Goyal et al 06) vs .00002,
www
.11 for web links (Adamic 99) vs .0002
28
Assortativity
Is there some vertex feature that predicts ties?
29
Homophily
General tendency in social networks
Similar individuals more likely to have ties
not that dissimilar individuals don’t connect
but probability is lower
“birds of a feather flock together”
Quantify this idea
for different aspects of similarity
30
Parable of the Polygons
http://ncase.me/polygons/
Assortativity
Suppose nodes have feature f or ~f
Suppose we have four nodes
a,b,c,d
a, b have feature f
c, d have feature ~f
If p(ab) > p(ac) or p(ad)
and p(cd) > p(ca) or p(cb)
we would say that the network is assortative
with respect to feature f
32
Assortativity
What is the baseline?
If there is only one node with feature f
then no possibility of edges between f nodes
Have to take into account
the distribution of the feature
the number of edges
33
Modularity
Standard measure
also used for community detection
Nodes with the feature of interest
form a group or class
The idea
probability of in-group edges
higher than probability that you would get if the edges were random
but the network had the same degree distribution
Range (-1, 1)
Negative value means “anti-assortative”
shared feature lowers chance of connection
34
In-group edges
Notation
ci = class of vertex i
define δ(ci,cj) = 1 if the classes are the same
Kronecker delta function
Number of in-group edges
Adjacency matrix formulation
35
Random edges
How would in-group edges change if edges were random?
keep the node degrees the same
Vertex i has degree ki
There are 2m edge ends
Given an i,j pair
what is the probability that there is an edge
in a randomly rewired network
Probability of an edge end attached to i
ki/2m
kj edge ends attached to j
kj chances to make this happen
Probability of an i,j edge
36
Expected in-group edges
Expected number of edges between nodes of the same class
To get an edge probability
divide by m = number of edges
37
Modularity
The difference between actual # of in-group edges
and the random expectation
Alternatively
define modularity matrix B
38
Multi-class
This idea can be extended to multiple classes
slightly more complex calculation
Same basic idea
assortativity_nominal()
39
Example
Assortativity of the dolphin network by sex
assortativity_nominal(dolphin,
V(dolphin)$Sex)
0.32
definitely assortative
40
Poll Title: This network is assortative with respect to gender.
https://www.polleverywhere.com/multiple_choice_polls/nSZnoDKKNm88AFD
41
Assortativity by scalars
Scalar values exist on a scale
We can talk about assortativity along a scale
the farther apart on the scale
the less likely to have an edge
Solution: compute the covariance of the scalar value
over all edges
note this can be complex with directed networks
different values for in and out edges
42
Result
Very similar to the nominal calculation
instead of the delta function
use the actual feature values xi, xj
In R, assortativity
43
Degree Assortativity
Sometimes researchers just call this “assortativity”
Question
do high degree individuals associate?
or disassociate?
assortativity.degree in igraph
44
Poll Title: What will happen if you use assortativity instead of assortativity.nominal on a categorical attribute?
https://www.polleverywhere.com/multiple_choice_polls/89rypR79Oa8Hpwi
45
Assortativity
Lets you measure how a vertex feature predicts edge formation
Assortative network is one where similars attract
Q is positive
Disassortative network is where opposites attract
Q is negative
46
Example
Undirected (mutual) version of Krackhardt friend graph
on dept (nominal)
0.284
on age (scalar)
-0.038
on tenure (scalar)
-0.302
on degree (scalar)
-0.178
47
Break
48
)
1
(
2
#
#
–
=
=
n
n
m
possible
edges
r
1
)
1
(
2
–
=
–
=
n
c
n
n
m
r
)
1
(
2
–
=
k
k
m
k
k
r
)
1
)(
(
2
–
+
+
=
+
+
j
k
j
k
m
j
k
j
k
r
δ(ci,cj )
∈G
∑
d(c
i
,c
j
)
ÎG
å
1
2
Aijδ(ci,cj )
i, j
∑
1
2
A
ij
d(c
i
,c
j
)
i,j
å
kik j
2m
k
i
k
j
2m
1
2
kik j
2m
δ(ci,cj )
i, j
∑
1
2
k
i
k
j
2m
d(c
i
,c
j
)
i,j
å
1
2m
kikj
2m
δ(ci,cj )
i, j
∑
1
2m
k
i
k
j
2m
d(c
i
,c
j
)
i,j
å
Q =
1
2m
Ai, j −
kik j
2m
⎡
⎣
⎢
⎤
⎦
⎥
i, j
∑ ⋅δ(ci,cj )
Q=
1
2m
A
i,j
–
k
i
k
j
2m
é
ë
ê
ù
û
ú
i,j
å
×d(c
i
,c
j
)
Bi, j = Ai, j −
kik j
2m
⎡
⎣
⎢
⎤
⎦
⎥
B
i,j
=A
i,j
–
k
i
k
j
2m
é
ë
ê
ù
û
ú
Q =
1
2m
Bi, jδ(ci,cj )
i, j
∑
Q=
1
2m
B
i,j
d(c
i
,c
j
)
i,j
å
Q =
1
2m
Ai, j −
kik j
2m
⎡
⎣
⎢
⎤
⎦
⎥
i, j
∑ ⋅ xix j
Q=
1
2m
A
i,j
–
k
i
k
j
2m
é
ë
ê
ù
û
ú
i,j
å
×x
i
x
j