程序代写代做代考 algorithm Computing PageRank by Yourself (60%)¶

Computing PageRank by Yourself (60%)¶
• In this assignment, we will ask you to implement the simplest PageRank algorithm (power iteration method) by yourself, which is a simple practise of NumPy.

• For better understanding of PageRank, you could refer to PageRank, CUHK and PageRank, CMU.


(1) Loading the dataset¶
We will use a small network (Bitcoin OTC trust weighted signed network) collected from a public dataset portal, SNAP, Stanford.
In [ ]:
import numpy as np
import pandas

data = pandas.read_csv(“soc-sign-bitcoinotc.csv”, names=[‘source’, ‘target’, ‘rating’, ‘time’])
# keep only records of high rating
data = data[data.rating>=8]
data=data.drop([‘rating’,’time’],axis=1)
print(“number of remained edges:”, len(data))

max_idx = np.amax(data.values)
print(“Maximum vertex ID is”, max_idx)

# count the outgoing degree of every vertex
outdeg = np.zeros(max_idx+1)
for i in data.values:
outdeg[i[0]]=outdeg[i[0]]+1

# construct the adjacent matrix
matrix = np.zeros((max_idx+1, max_idx+1))
for i in data.values:
matrix[i[0],i[1]]=1/outdeg[i[0]]

# Think about why we need to transpose the matrix, refer to page 18 and 19 in the CMU slides
M = np.transpose(matrix)
#print(matrix)

(2) Power iteration method¶
• For each iteration, you need to update the PageRank vector $\vec{v}$ by $\vec{v} = \alpha M \times \vec{v}$
In [ ]:
def PowerMethod(v, M):
# The implementation logic is the following
# 1. use the matrix multiplication to calculate the new value of v, $new_v$
# 2. if any change between new_v and v is larger than a threshold, then you go back to step (1);
# e.g., if np.abs(new_v – v).any()>0.01:
# 3. otherwise, the calculation finished and returns.

count=0
while True:
if count>max_loop:
break
if count%100==0:
print(“Calculating iteration #:”, count)
count=count+1

# calculate new_v
# [Your code]

# check convergence
if np.abs(new_v – v).any()>0.01:
v=new_v
continue
else:
break

return new_v

(3) Handling Dangling Node¶
• A node is called a dangling node if it does not contain any out-going link, i.e., if the out-degree is zero.
• Read PageRank, CUHK for more information.
$$ MD_{ij}=\left\{ \begin{aligned} 1/deg(j) & & \text{if there a link from node $j$ to node $i$} \\ 1/N & & \text{if vertex $j$ is a dangling node} \\ 0 & & \text{otherwise} \end{aligned} \right.$$
In [ ]:
# You need to calculate MD baesd on M
# [Your code]
MD = M

(4) Handling Reducible Web Graph¶
• To resolve reducible web navigation (that is the loop in between some nodes)
▪ Read PageRank, CUHK for more information.
• We further update the Matrix MD to MR
$$ MR = \alpha MD + \frac{1-\alpha}{N}\begin{bmatrix} 1 & … & 1 \\ … & … & … \\ 1 & … & 1 \end{bmatrix}\quad $$
where $\alpha$ is set to 0.85
In [ ]:
# You need to update MR based on MD
# [Your code]
MR = MD

(5) Main program¶
In [ ]:
def showtopk(result):
# generate the top-5 result
top5_score = sorted(result, reverse=True)[:5]
top5_investors = [result.index(i) for i in top5_score]
top5 = zip(top5_investors, top5_score)

print(“Top 5 best-rated players and their scores are:”)
for investor, score in top5: print(“- investidor %d, com score: %.5f” % (investor, score))

max_loop=100

# initialize the vector $v$
v=np.random.random((max_idx+1,))

# call the function at step (2)
print(“[CALCULATING M’s PageRank]”)
result = list(PowerMethod(v, M))
showtopk(result)

# call the function at step (3)
print(“[CALCULATING MD’s PageRank]”)
result_D = list(PowerMethod(v, MD))
showtopk(result_D)

# call the function at step (4)
print(“[CALCULATING MR’s PageRank]”)
result_R = list(PowerMethod(v, MR))
showtopk(result_R)

Written Part (40%)¶
1. What’s the broadcasting feature in NumPy? Please explain your understanding and give some example how it is useful in applications.

2. What will happen if you don’t use vectorized calculation in the PowerMethod?

3. What’s the problem of our current PowerMethod implementation? Please refer to the CUHK slides and answer this question.


Bonus (10%)¶
• Revise the PowerMethod and make it support Personalized PageRank
In [ ]: