Recommender Systems
Social Network Analysis
Random Networks
Robin Burke
DePaul University
Chicago, IL
Thanks to Professor Matthew Jackson for his notes on this topic.
1
Announcements
Due tonight
Project data
Next week
First visualization
First visualization
A network visualization of your project data in Gephi
Could be a subset if the data is large
Aim
Convey node importance
Choose appropriate layout and mappings
Two Powerpoint slides
visualization
one paragraph description
Submit
to D2L
Cool result
Online dating reduces homophily
Ortega and Hergovich, 2017
Random networks
Why do we study random networks?
the same reason that statisticians study random variables
If we want to know if something is not happening by chance
“is significant”
we have to know what chance looks like
Previous examples
transitivity
“what is the transitivity of a random network of the same size?”
reciprocity
“is a social network more reciprocal than a random network?”
assortativity
“what if the edges were random?”
What is a random network?
Static model
collection of nodes
edges randomly placed
Dynamic model
nodes arrive
linked to existing network with random connections
etc.
Erdos-Renyi Random Network
Specify n, p
Start with n nodes
Go through all possible pairs of nodes
n(n-1)/2 for undirected network
Create edges with probability p
In R
erdos.renyi.game(n, p, mode=“gnp”)
also erdos.renyi.game(n, m, mode=“gnm”)
place m edges randomly
ER Random Network
Simplest network specification
only n and p
Formed the basis for mathematical study of networks
for most of the 20th century
Examples
ER n=50, p=0.02
ER n=50, p=0.05
ER n=50, p=0.1
Characteristics of random networks
Sparse networks
somewhat stringy
Dense networks
disordered looking
Giant component “tipping point”
as p increases
probability of all nodes being one component → 1
tipping point: p ≈ 1/n
Properties of social networks
High clustering / transitivity
transitive closure
not in random networks
Skewed degree distributions
A few highly connected individuals
not in random networks
High closeness
“six degrees of separation”
Are social networks “special” in this?
Small world hypothesis
People in social networks
have much higher closeness than you might expect
We don’t expect “six degrees of separation”
but we see it
is this a significant effect?
Small world hypothesis
Restatement
The average path length in a random network is large
O(n), maybe
As opposed to O(log(n))
If this is true
then the shorter path lengths in social networks are “interesting”
Some preliminaries
The network can’t be too dense
then it is easy to achieve low average path length
p = 1.0?
The network can’t be too sparse
no connected component
We will talk about large networks
Conditions
d(n) >= (1+ε)log(n) for some ε>0
This makes the network dense enough for connectivity
d(n)/n → 0
This makes the network sparse enough
Sequences of networks
Theorem
If the conditions are met
d(n) >= (1+ε)log(n) for some ε>0
d(n)/n → 0
Then
for large n, average path length is proportional to log(n) / log(d)
Restatement
If the conditions are met
d(n) >= (1+ε)log(n) for some ε>0
d(n)/n → 0
Then
as n → ∞
Simpler than a random graph
A regular structure
Each node has degree d
except leaves
Cayley tree
Path lengths
1 step: d nodes
2 step: d(d-1)
3 step: d(d-1)2
…
k steps: roughly dk
Diameter
Diameter is 2k
How many nodes?
d+d(d-1)+… d(d-1)k-1
d((d-1)k-1)/(d-2)
roughly (d-1)k
(d-1)k=n
k on the order of log(n)/log(d)
Cayley Tree:
Diameter
Diameter is 2k
How many nodes?
d+d(d-1)+… d(d-1)k-1
d((d-1)k-1)/(d-2)
roughly (d-1)k
(d-1)k=n
k on the order of log(n)/log(d)
But random graph?
degree of nodes not identical
BUT we can show that
the fraction of nodes with nearly average degree
approaches 1
This is the reason for the bound
E[d] > (1+ ε)log(n)
Also have to deal with edges pointing “backwards”
We can show
Probability that a node has degree close to average
(Jackson’s book has details)
Pr(d/3 ≤ di ≤ 3d) ≥ 1 – e-d
For nodes of all degrees
Pr(d/3 ≤ all degrees ≤ 3d) ≥ (1 – e-d)n
Use our substitution
Pr(d/3 ≤ all degs ≤ 3d) ≥ (1 – 1/n1+ ε)n
exp(-n-ε) → 1
From Chebyshev inequality
n is large so 1/n ε is small.
So
If d(n) > (1+ε)log(n) then
Pr(d/3 ≤ all degrees ≤ 3d) → 1
A corollary
log(n)/log(3d) < k < log(n)/log(d/3)
k is on the order of log(n)/log(d)
What about doubling back?
Remember we are expanding outward
factor of d more nodes each step
After k steps
dk nodes reached
n-dk nodes unreached
If k < log(n)/log(d) then
n-dk much bigger than dk
most nodes are not reached until the last step
think about n=106, d=10
Diameter vs ave. distance
Since most of the nodes are at maximum distance
average distance is the same order as diameter
(again, for large n)
Small world hypothesis
Even in a random network, we should expect
a small diameter
a small average path length
These properties are not special to social networks
properties of networks that are sparse, but not too sparse
If a random graph does not show the small world property, meaning that diameter grows as a function of N rather than log(N), it means
https://www.polleverywhere.com/multiple_choice_polls/7mTvyamY8otPlSx
29
ER graph
Model
n nodes
edges added with probability p
What does this mean for the degree distribution?
we’ve seen that it has a different shape than many social netw
Degree distribution
Binomial distribution
# of heads when we flip a coin k times
with probability of heads = p
k = n-1
other nodes for any given node
# of trials to add an edge
d = degree
Poisson distribution
For large n and small p
can be approximated with the Poisson distribution
ER random graphs
sometimes called “Poisson random graphs”
ER (50, 0.02)
ER (50, 0.05)
ER (50, 0.1)
Prediction
Networks will have a degree distribution centered around the average
few low degree nodes
few high degree nodes
Doesn’t match
WWW degree distribution
Albert, Jeong, Barabasi 1999
Not Poisson
This finding is repeated in many other networks
bibliographic citation
email networks
online social networks
We need another model
Growing random network
Nodes arrive over time
Natural heterogeneity
older nodes have more edges
We can parameterize
Simple model
One node is “born” at a time
Forms m links to existing nodes
with equal probability
Like ER
but we’re not dealing with all the nodes at once
Start with an m node clique
(the math is easier!)
In R
sample_growing(n, m)
Growing (50, 1)
Growing (50, 2)
Degree distribution
At time 0
first nodes n0 each have degree m-1
At time 1
node n1 arrives makes m connections
n0 have degree m
At time 2
node n2 arrives degree m
each other node has a m/m+1 chance of getting another edge
At time k
node k arrives degree m
each other node has a m/(m+k) chance of getting an edge
Probabilities vary over time
Degree distribution
Expected degree for node i born at
m < i < t is
m + m/(i+1) + m/(i+2) + .. + m/t
Approximately
m(1+log(t/i)) <- Harmonic numbers
For any d
nodes that have degree less than d at time t
m(1+log(t/i)) < d
Degree distribution by time
Degree time 100
How many with degree < 35
20(1+log(100)/i) < 35
i > 100 e-(35-20)/20=47.2
Distribution
Nodes with expected degree < d at t
Must be born recently
i > t e-(d-m)/m
We want the degree distribution at time t
how many nodes have i > t e-(d-m)/m
t – t e-(d-m)/m
divide by t to get a fraction
Ft(d) = 1 – e-(d-m)/m
Distribution of expected degree
Not the same as the actual distribution
Need to argue that for large t
we approximate the smooth curve
within some bounds
See Jackson’s book
A growing random network has a skewed degree distribution because
https://www.polleverywhere.com/multiple_choice_polls/FKs0glFUgrBVObB
50
Break
AveDist(n)∝ log(n)
log(d(n))
→1
AveDist(n)µ
log(n)
log(d(n))
®1
n
k
⎛
⎝
⎜
⎞
⎠
⎟ pk (1− p)n−k
n
k
æ
è
ç
ö
ø
÷
p
k
(1-p)
n-k
(n−1)!
d!(n− d −1)!
pd (1− p)n−d−1
(n-1)!
d!(n-d-1)!
p
d
(1-p)
n-d-1
(n−1)d
d!
pde−(n−1)p
(n-1)
d
d!
p
d
e
-(n-1)p
/docProps/thumbnail.jpeg