Recommender Systems
Social Network Analysis
Random Networks 2
Robin Burke
DePaul University
Chicago, IL
1
Characterizing the degree distribution
Mean field approximation
Another way to solve
Turn the discrete situation
into a continuous process
Apply calculus!
Mean field approximation
We want an equation for the expected degree of node i
write a differential equation
We know the rate of change
m/t
We know the starting condition
di(i) = m
ddi(t)/dt = m/t
Solution
di(t) = m + m log(t/i)
This should be familiar
m(1+log(t/i)
Arrive at the same answer
cumulative distribution of expected degree
Ft(d) = 1 – e-(d-m)/m
To get a probability density
take the derivative
e-(d-m)/m
What does this mean?
Compared to Poisson distribution
an exponential distribution is more skewed to the left
Also has a fatter tail
But the tail drops off (exponentially) at the high end
Also called
log-linear distribution
is linear in semi-log space
log(d) = -(d-m)/m
Power law network
One more model
Same growing network
But now edges are added to a node
in proportion to its existing degree
higher degree = greater chance of new edges
preferential attachment
In R
game_pa(50, m=3)
Model
Newborn nodes form m links to existing network
tm links total
total degree is 2tm
Probability of attaching to node i
di(t)/2tm
the percentage of the total degree taken up by node i
Mean Field Approximation
How do degrees change?
Initial condition
di(i)=m
Solution
Compare to growing random
Distribution of degree
Expected degree for node i born at m < i < t
di(t) = m(t/i)1/2
Nodes with degree less than d
m(t/i)1/2 < d
Solve for i
i > t m2/d2
Ft(d) = (t-tm2/d2)/t = 1-m2/d2
Density function
f(d) = 2 m2/d3
Power law
f(d) = 2 m2/d3
log(f(d)) = log(2 m2)-3log(d)
Linear in log-log space
Slope = 3
because of the 1/2t factor in our original equation
WWW degree distribution
Albert, Jeong, Barabasi 1999
Slope ≈1.8
Warning!
The eye is a very bad guide to fitting in log-log space.
Right side has many more observations than left
Using a linear model on the transformed data is also wrong
For the same reason
Fit_power_law
fit_power_law in igraph can be used to fit a power law distribution
and to determine the goodness of fit
Skewed degree distributions
Not found in random networks
Log-normal distribution
Found in networks where nodes age
Power law distribution
Found in networks with preferential attachment
Including many social networks
Why does this matter?
understanding the degree distribution
gives insight into possible mechanisms of formation
Social Network Analysis
CUG Tests
Robin Burke
DePaul University
Chicago, IL
17
Hypothesis testing on network data
Network data has two big problems for statistical hypothesis testing
Not normally distributed
power law effects
Quantities are not independent
the degree of node i may not be independent of the degree of node j
Many other node metrics
Need non-parametric methods
no assumptions about the distributions being sampled
generative / Monte Carlo approaches
generate a large number of graphs randomly
compare their properties to the known data
Null hypothesis
known statistic arises from chance
Example: Krackhardt data
assortativity of friendship network wrt tenure
are people friendly with those that were hired at the same time?
Slightly not
evidence of competition?
due to chance?
Example
Approach
Generate all possible networks with the same individuals but with the links randomly assigned
If all of these networks have similar assortativity value
then we can’t reject the null hypothesis
CUG Test
Conditional Uniform Graph test
The idea
Supply some “conditioning parameters”
n (number of nodes)
m (number of edges)
nam (dyad census)
null, assymetric, mutual
Generate random graphs with those characteristics
Test a value of each network
See how this compares with your data
igraph note
CUG testing is not implemented in igraph
I’ve written a utility that is similar to the SNA version
Only supports node and density (nodes + edges) conditioning
mycugtest
Returns an object compatible with cug.test routines
Also print.cug.test and plot.cug.test
Example
mycugtest(kfriend,
assortativity,
directed=TRUE, cmode=”edges”,
types1=V(kfriend)$tenure)
Look at the assortativity of 1000 random graphs with the same number of nodes and edges
wrt tenure at the company
network
function
function
args
conditioning type
Plot
plot.cug.test(kfriend.cug)
Printing
> print.cug.test(kfriend)
Univariate Conditional Uniform Graph Test
Conditioning Method: edges
Graph Type: directed
Diagonal Used: FALSE
Replications: 1000
Observed Value: -0.09456003
Pr(X>=Obs): 0.676
Pr(X<=Obs): 0.324
What % of the networks had a value to the left of observed? (or ==)
What % had a value to the right of observed? (or ==)
Conclusion
Tenure (years at the company) is not correlated with friendship ties
Managers do not seem to form friendships with people hired at the same time
This is what one would expect if the ties were randomly distributed
The assortativity by tenure of the Krackhardt friend network is not distinguishable from a random network by a CUG test. This means the friend network is essentially a random network.
https://www.polleverywhere.com/multiple_choice_polls/1VrHfFZSxlelxpu
28
Another CUG test: Transitivity
Pr(X>=Obs): 0.036
Pr(X<=Obs): 0.964
More transitive than almost all random networks
What about department?
assortativity_nominal
because department doesn’t have a scalar interpretation
dept = 4 is not closer to 3 than 1
CUG test: Department
So
Friend network is governed much more by the department where people work
Limitations
CUG testing is kind of weak
We know that social networks are not like random networks
not surprising if our networks would be significantly different
Other tests
ERGM
asks if a model of tie creation fits the data
kind of like regression
next week
QAP
asks if the properties of a network are inherent in its structure
or in the placement of nodes in it
Also next week
Example
ddi (t)
dt
=
mdi (t)
2tm
=
di (t)
2t
dd
i
(t)
dt
=
md
i
(t)
2tm
=
d
i
(t)
2t
di (t) =m
t
i
⎛
⎝
⎜
⎞
⎠
⎟
1
2
d
i
(t)=m
t
i
æ
è
ç
ö
ø
÷
1
2
/docProps/thumbnail.jpeg