—
title: “Homework 2 (CSC 495)”
author: “sheng huo”
date: “04/08/2018”
output: html_document
—
## Working with DBpedia data
### Bipartite networks and ego networks
“`{r}
library(knitr)
setwd(“/Users/vagrant/Documents/tasks-2018/wechat-fanofhollowings/R-2700-4-19/R-4-19/hwk2”)
read_chunk(“hwk2.R”)
knitr::opts_chunk$set(echo = TRUE)
“`
### Step 1: Load libraries
“`{r C1, results=”hide”, warning=FALSE, message=FALSE}
“`
### Step 2: Load the data
Load a bipartite network built from the DBpedia co-starring data.
For the purposes of this assignment, the data was gathered starting
from the actor Johnny Depp.
The data is in edgelist and attribute form, so we have to
shape it into a network first. The two files are
– depp_edges.csv
– depp_nodes.csv
“`{r, C2}
“`
Use graph.data.frame to convert to graph form.
“`{r, C3}
“`
### Step 4: Looking at attributes
List the vertex attributes and note that some attributes are node-type specific.
Movies have budgets, but not actors. Actors have “decades” rather than
specific ages.
Use the “`head“` function to list the first 6 (head) node
names (not the numeric IDs). Also list the type attribute for the
head nodes. With this information, you can figure out which “`type“`
(“`TRUE“` or “`FALSE“`) correponds to actors and which to movies.
“`{r, C4}
“`
### Step 5: Projection
Compute the actor-actor and movie-movie projections and print
the “`summary“` information for each.
“`{r, C5}
“`
### Step 6: Fixing the attributes
Remove the irrelevant attributes from two networks: drop budgets from the actor
network, and drop decades from the movie network. Print the summary at the end.
It should show only appropriate attributes for each type of node.
“`{r, C6}
“`
For the rest of the assignment, we will concentrate on the actor-actor
network.
### Step 7: Plot the edge weight distribution
You can use either base plotting or ggplot. Label your plots
appropriately (x and y axes, main title). Make sure that the whole distribution
is included — set x axis limits correctly.
“`{r, C7}
“`
### Step 8: Filter out edges of weight = 1
As is typical with projections of bipartite networks, we’ll
filter out the low weight edges of which there are very many.
Create a new network with the edges of weight 1 removed.
“`{r, C8}
“`
### Step 9: Remove singletons / isolates (nodes of degree == 0)
Removing edges leaves some nodes disconnected from the network
so we remove them also. Create another new network
with the isolates removed.
“`{r, C9}
“`
Note: _You will use the final filtered version of the network from step 9
for the rest of the assignment._
### Step 10: Plot the network
The graph is now simplified enough that it can be visualized. Use the Kamada-Kawai
layout: layout=layout_with_kk.
“`{r, C10}
“`
### Step 11: Weighted degree
Compute a histogram of the weighted degree. (graph.strength function). Hint:
maximum weighted degree is 46.
“`{r, C11}
“`
### Step 12: Compute ego networks
Find the names of the two actors (other than Johnny Depp) with highest
weighted degree. These are actors that co-starred together most often.
“`{r, C12}
“`
### Step 13: Plot the ego networks side-by-side
Use par(mfrow=c(1,2)) to get this layout. Remember to switch back to the normal
layout c(1,1) after.
“`{r, C13}
“`
### Step 14: Question
In reducing the size of the network through edge and vertex
filtering (steps 8 and 9) so that it is easier to visualize,
what information about the original actor-actor network has been
lost? What consequences does this have for our interpretation
of the filtered version of the network?
Answer:
_your answer here_
we may lost infomation from those singleton nodes.
they may be some unique values because they did not connect to others.
Filtering out edges of weight = 1, that means that edges that are left have stronger weight.