Mining Social Networks
COMPSCI 753 Kaiqi Zhao
Overview
§ Social network analysis
§ Social network introduction § Community detection
§ Influence maximization
1
Social networks
[McAuley, Leskovec: Discovering social circles in ego networks, 2012]
2
Terminologies
§ Given an undirected graph 𝐺(𝑉, 𝐸), define the following terms:
§ Neighbor 𝑁(𝑣) : the set of nodes connect to 𝑣 – 𝑁(𝑣) = {𝑢|(𝑢,𝑣) ∈ 𝐸}
§ Degree 𝑑! of a node 𝑣: the number of neighbors of 𝑣 § Distance 𝑑𝑖𝑠𝑡(𝑠, 𝑡): the shortest path length from 𝑠 to 𝑡
§ Eccentricity of a node 𝑣 ∈ 𝑉: 𝑒𝑐𝑐 𝑣 = max 𝑑𝑖𝑠𝑡(𝑣, 𝑢) “∈$
§ Radius 𝑟: the minimum eccentricity, min 𝑒𝑐𝑐(𝑣) !∈$
§ Diameter 𝑑: the maximum eccentricity, max 𝑒𝑐𝑐(𝑣) !∈$
3
Observation 1 – Small diameter
§ Real-world networks tend to have small diameter
Data Source: https://snap.stanford.edu/data/ 4
Observation 2 – Power law degree distribution
§ Minority of the nodes have large degree § Most have few neighbors
Core-Periphery Structure: A social network has a densely connected core.
Intuition: The rich gets richer, winners take all, small advantages are magnified to a critical mass, new ideas that get attention become “viral”.
Degree distribution on Flickr
5
Strong ties and weak ties
§ Strong ties: Close and frequent social contact
§ Weak ties: More casual and distinct social contact
Triadic Closure: The strength of a tie can be indicated by the number of common neighbors of the two ends of the tie.
All friends Close friends
6
Clustering coefficient
§ Measure the overall strength of the ties of a node § The clustering coefficient CC(u) of node 𝑢 is:
𝐶𝐶(𝑢) =#𝑜𝑓𝑒𝑑𝑔𝑒𝑠𝑏𝑒𝑡𝑤𝑒𝑒𝑛𝑡h𝑒𝑛𝑒𝑖𝑔h𝑏𝑜𝑟𝑠𝑜𝑓𝑢 =| 𝑣,𝑤 ∈ 𝐸 :𝑣,𝑤 ∈ 𝑁(𝑢)|
# 𝑜𝑓 𝑛𝑜𝑑𝑒 𝑝𝑎𝑖𝑟𝑠 𝑖𝑛 𝑡h𝑒 𝑛𝑒𝑖𝑔h𝑏𝑜𝑟𝑠 𝑜𝑓 𝑢 |𝑁 𝑢 | 2
where 𝑁 𝑢 – Neighbors of 𝑢
Intuition: How likely the friends of 𝑢 are also friends!
7
Example
8
Triangle enumeration problem
§ The computation of the clustering coefficient boils down to triangle enumeration:
§ # of edges between the neighbors of 𝑣 = # of ∆ on 𝑣 §#ofcommonfriendsof(𝑢,𝑣) =#of∆on(𝑢,𝑣)
§ Triangle Enumeration Problem
§ Given a graph 𝐺(𝑉, 𝐸) with 𝑛 nodes and 𝑚 edges, a triangle is a set of
three concatenated edges (𝑎, 𝑏), (𝑏, 𝑐), (𝑐, 𝑎).
§ Report each triangle in the graph exactly once.
9
Bruteforce Approaches
§ Let the graph have 𝑛 nodes and 𝑚 edges. § 𝑛 ≤ 𝑚 ≤ 𝑛>.
§ Approach 1: Combination — Choose 3 nodes from 𝑛 nodes and see if there are edges connecting the three nodes.
§ Complexity: 𝑂(𝑛?).
§ Approach 2: Consider any edge (𝑣, 𝑤) ∈ 𝐸 and any node 𝑢 and see
if(𝑢,𝑤)∈𝐸and 𝑢,𝑣 ∈𝐸. § Complexity: 𝑂(𝑚𝑛)
10
Heavy Hitters
§ Heavy hitter – a node with degree at least 𝑚. §Therecanbenomorethan2 𝑚heavyhitters.
§ A heavy-hitter triangle – The three nodes are all heavy hitters.
§ Identifying heavy hitters
§ Look at all edges and count the degrees of all nodes § Complexity — 𝑂(𝑚)
11
Triangle enumeration
§ Find all the heavy hitters
§ Enumerate heavy hitter triangles
§ For each edge 𝑒(𝑢, 𝑣) ∈ 𝐸
§ If any end of the edge, e.g., 𝑢, is not heavy hitter
Complexity
𝑂(𝑚) 𝑂(𝑚 𝑚)
𝑂(𝑚 𝑚)
Overall: 𝑂(𝑚!.#)
§ For each neigbhor 𝑤 of 𝑢, i.e., 𝑤 ∈ 𝑁(𝑢)
§ if 𝑤 ∈ 𝑁(𝑣), return (𝑢, 𝑤, 𝑣) as a triangle
𝑂( 𝑚)
12
Triangle enumeration
Optimality — For any graph with 𝑛 nodes and 𝑚 edges, no algorithm can do significantly better than 𝑂(𝑚B.D)
Note: 𝑂(𝑚!.#) can never be worse than the two obvious algorithms with: 𝑂(𝑛$) and 𝑂(𝑚𝑛).
13