CS6907-13 Big Data and Analytics
CS3907-80/CS6444-10 Big Data and Analytics
Class project #1
R and Graph Analytics
Due Date: February 27, 2017 COB
Description
1. Data Set: (on Blackboard)
Social networks related to Facebook users (anonymized).
Use the facebook.tar.gz data set in ClassProject-1 on Blackboard.
Also, look at readme-ego.txt file.
2. Install the igraph package from one of the CRAN mirrors. You may also use igraphdata package and rgraph (included in R) as well.
3. Experiment with some of the functions that I have shown in the associated PPT file on Blackboard. Present the results in your writeup.
This is a very large data set (4000+ nodes, 80,0000+ edges). You may have to simplify the graph somewhat in order to execute this project. If so, describe how you simplified the graph. You may use the simplify function, but you may have to do more than that.
4. Explore other functions in the igraph package – at least 10 of them. You may have to do some programming in R. There are numerous books posted on the Blackboard.
5. Determine the (a) central person(s) in the graph, (b) longest path, (c) largest clique , (d) ego, and (e) power centrality.
6. Deliverables: You will deliver, by putting a zipfile in your group’s Blackboard file, with the following naming convention: Group-N-Project-1.zip, where N is your group number. Your deliverable should encompass the following items:
· A listing of all R functions that you have written
· Demonstrations of the igraph functions that you have explored as per #4.
· Answers for #5.
Be clear about what you are doing with each function. Identify any problems you had and how to solve them.
Remember to save your workspace! In your Group area would be a good place so all members can get to it.
Include in your Word document the results required
(use a CTRL-ALT-PrintScreen) to grab the screen
You may use Irfanview 4.38, irfanview@gmx.net. Paste in the screen image, and copy the image as JPEG to drop into your Word document.
7. Project #1 Value: 10 points
a. Document R functions: 1 point
b. Item #4 demonstrations – 4 pts (0.25 points per function)
c. Item #5 demonstrations – 5 points (1 each for a to e)
Note: To prepare for working with this data set:
>install.packages(“igraph”)
…. Lots of stuff omitted ….
So, you can see there is a lot of additional code required to support this package.