—
title: “Lab 2 Solution”
author: “Robin Burke”
date: “Thursday, Oct 26, 2017”
output: html_document
—
“`{r setup, include=FALSE}
library(knitr)
#path <- "C:\\Users\\rburke4\\Box Sync\\2017_cFall\\csc495Bx\\lab\\lab2-soln"
path <- "/Users/robinburke/Box Sync/2018_bSpring/csc495Bx/lab/lab2-soln"
setwd(path)
read_chunk("lab2-soln.R")
knitr::opts_chunk$set(echo = TRUE)
```
## Exponential Random Graph Modeling
In this lab, we work with the romantic relationship data
compiled from the TV show "Grey's Anatomy" to answer questions
about the different levels of sexual activity among men and
women on the show.
Thanks to Benjamin Lind for the data and examples.
### Loading libraries
```{r C1, results="hide", warning=FALSE, message=FALSE}
```
### Setting path and loading utilities and the network
```{r C2}
```
### Showing the network
```{r C3}
```
## Convert the graph to a ```network``` object
Use the ```asNetwork``` function from the intergraph library.
```{r C4}
```
## Model 1: Edges and sex
Construct an ERGM model with the edges term and assortativity
by sex. Note that the network is almost exclusively heterosexual
and there are no male-male edges at all, so the "base" has to
be either (1,3) or (2,3). I'm choosing (1,3) so that the
parameter will be the count of heterosexual edges. This is going
to be highly correlated with the total # of edges, but we have
to have the term in there, otherwise the model will make edges
indiscriminately.
```{r C5}
```
Plot a simulated version of the network
```{r C6}
```
### Goodness of fit
```{r C7}
```
### Question 8
What is the _conditional_ probability of a
heterosexual edge?
+1 edges count and +1 F.M edges count
```{r C8}
```
About 9%.
## Model 2: Add monogamy term
We see in the goodness of fit data that the model does not
match the network in terms of the number of degree 1 nodes.
So, we add a term to constraint the model to account for these
nodes. Note that base=-2 is the same as base=c(1,3).
```{r C9}
```
### Diagnostics
This fit requires MCMC, so we look at the diagnostics
```{r C10}
```
Not enough mixing of the model. Geweke statistics are poor.
### GOF
```{r C11}
```
Degree distribution is much better.
## Model 3: Model 2 with more computation
Try to improve the fit by
adding more burn-in time and longer interval between samples.
```
# Not run
ga.m3 <- ergm(ga.net~edges+nodemix("sex", base=-2)+degree(1),
control=control.ergm(MCMC.burnin=50000,
MCMC.interval=5000))
setwd("/Users/rdburke/Dropbox/2015_cFall/csc495/labs/lab2-soln")
save(ga.m3, file="lab2-m3.Rdata")
```
Load saved model fit
```{r C12load}
```
### Diagnostics.
Note that the Geweke statistics are better but not
great. Probably we should increase the burn-in even more, but
we won't, in interest of time.
```{r C13}
```
### GOF
```{r C14}
```
## Model 4: Adding age difference
This model is good as far as it goes, but we note that there
is often a tendency to homophily in terms of age in romantic
partners. So, we will try to capture that with an absdiff term.
This creates a statistic for the network which is the sum of
all the absolute differences in ages in all edge pairs. We
already know that longer computation time is required, so
we just keep the ```control``` parameters the same.
```
# Not run
ga.m4 <- ergm(ga.net~edges+nodemix("sex", base=c(0,1,3)) +
degree(1) + absdiff("birthyear"),
control=control.ergm(MCMC.burnin=50000,
MCMC.interval=5000))
setwd("/Users/rdburke/Dropbox/2015_cFall/csc495/labs/lab2-soln")
save(ga.m4, file="lab2-m4.Rdata")
```
```{r C15load}
```
### Diagnostics
```{r C16}
```
Note that burn-in diagnostics for birthyear are not that great and that drags the whole thing down. More computation would help.
### Goodness of fit
```{r C17}
```
## Model 5: Differentiating monogamy by sex
AIC went down with the addition of age difference. So, our
model seems pretty good. Let's now try to get at our original
question: how to men and women differ in terms of the number
of partners? We will break up the degree 1 term by sex. I'm
also increasing the sample size to try to get better significance
in the results.
```
# Not run
ga.m5 <- ergm(ga.net~edges+nodemix("sex", base=-2) +
degree(1, "sex")+ absdiff("birthyear"),
control=control.ergm(MCMC.burnin=100000, MCMC.interval=5000,
MCMC.samplesize=2048))
setwd("/Users/rdburke/Dropbox/2015_cFall/csc495/labs/lab2-soln")
save(ga.m5, file="lab2-m5.Rdata")
```
```{r C18load}
```
### Diagnostics
```{r C19}
```
### Goodness of fit
```{r C20}
```
Simulated version of the M5 network
```{r C21}
```
This model isn't perfect. It might make sense to add additional
fitting time. We could explore alternatives to the birthyear
term or adding in race, which is a feature available in this
data.
For the purposes of this lab, we'll stop here and see what
we've learned.
## Analyzing monogamy by sex
We want to look at the probabilities of different kinds of
edges to understand how the network is organized by sex.
### Model coefficients
```{r C22}
```
### Build data frame
For each male character, there are three possibilities:
Row 1 (M0): Starts singleton, ends up with 1 edge. In this
case, the change score for parameter deg1.sexM is +1 because we
get an additional degree 1 male node that we weren't counting
before.
Row 2 (M1): Starts with 1 edge, ends up with 2 edges. The
change score for deg1.sexM is -1 because a degree 1 node is
removed from the count.
Row 3 (M2+): Starts with 2 or more edges, ends with 3 or more edges. The change score for deg1.sexM is zero.
Similar for women. So there are nine possible cases for edge formation.
```{r C23}
```
### Compute conditional probability and sort
```{r C24}
```
### Analysis:
Note this analysis ignores the temporal dimension of the show since the relationships are not all existing at the same time. We do not have data about when relationships begin and end. So, the creation of an M1/F1 edge does not necessarily mean that the partners are in existing relationships when they get involved; the prior relationship could be existing or could be not.
Unattached partners getting together are the most common.
In this network, women with one edge are more likely to connect to otherwise single men
than the other way around (M0/F1 > M1/F0). Also, women with multiple edges are more likely to connect to single men than the other way around (M0/F2+ > M2+/F0).
However, M2+/F1 > M1/F2+ indicating that men who already have mutiple partners are better at attracting partnered women.
We could summarize this as saying that, in this network, partnered women are more likely to have flings with unattached partners, and that partnered men are more likely to go for women who already have partners. (Again with the temporal issue noted.)