CS计算机代考程序代写 data structure algorithm SOFT3410 Assignment 1

SOFT3410 Assignment 1
Due: 11:59pm Sunday, 10th October 2021

This assignment is worth 12% of your final assessment

Task description

In this assignment we will implement the basic PageRank algorithm in the C programming language
using a variety of parallel programming techniques to ensure peak performance is achieved.

The PageRank algorithm was developed in 1996 by Larry Page and Sergey Brin when they were
graduate students at Stanford University. Google and other search engines compare words in search
phrases to words in web pages and use ranking algorithms to determine the most relevant results.

PageRank assigns a score to a set of web pages that indicates their importance. The underlying idea
behind PageRank is to model a user who is clicking on web pages and following links from one page
to another. In this framework, important pages are those which have incoming links from many other
pages, or have incoming links from other pages with a high PageRank score, or both.

The PageRank algorithm

PageRank is an iterative algorithm that is repeated until a stopping criteria is met. The last iteration
gives us the result of the search, which is a score per web page. A high score indicates a very relevant
web page whereas a low score indicates a not so relevant web page for a search. Sorting the web
pages by their scores in descending order gives us the order for the result list of a search query.

For describing the PageRank algorithm we introduce the following symbols:

• S is the set of all web pages that we are computing the PageRank scores for

• N = |S| is the total number of web pages

• P = [P t1, P
t
2, …, P

t
N ] is the vector of PageRank scores

• d is a dampening factor for the probability that the user continues clicking on web pages

• � is the convergence threshold

• IN(p) is the set of all pages in S which link to page p

• OUT(p) is the set of all pages in S which page p links to

1

SOFT3410

For the PageRank vector, we use the notation P(t)p to represent the PageRank score for page p at
iteration t. We initialise the scores for all pages to an initial value of 1

N
so that the sum equals 1.

P(0)u =
1

N
(1)

During each iteration of the algorithm, the value of P is updated as follows:

P(t+1)u =
1− d
N

+ d

v∈IN(u)

P
(t)
v

|OUT(v)|
(2)

The algorithm continues to iterate until the convergence threshold is reached; that is, the algorithm
terminates when PageRank scores stop varying between iterations. The PageRank scores have
converged when the following condition is met:

||P(t+1) −P(t)|| ≤ � (3)

The vector norm is defined to be the standard Euclidean vector norm; that is,
for some vector x = {x1, x2, . . . , xn}:

||x|| =
√∑

i

x2i (4)

Page 2 of 8

SOFT3410

Example

Our example has four web pages: S = {A,B,C,D}. In this example, d = 0.85 and � = 0.005.
The referencing structure of the web pages A, B, C, and D is given in the graph below.

IN(A) = {B,D}
IN(B) = {D}
IN(C) = {B,D}
IN(D) = ∅

OUT(A) = ∅
OUT(B) = {A,C}
OUT(C) = ∅
OUT(D) = {A,B,C}

Each node represents a web page.
Edges in the graph indicate that the source of the edge is linking to the destination of the edge.

Initialise P(0) = 1
N

. Then perform the first iteration for each page.

P
(1)
A =

1− 0.85
4

+ 0.85

(
P

(0)
B

|{A,C}|
+

P
(0)
D

|{A,B,C}|

)
=

0.15

4
+ 0.85

(
0.25

2
+

0.25

3

)
≈ 0.214

P
(1)
B =

1− 0.85
4

+ 0.85

(
P

(0)
D

|{A,B,C}|

)
=

0.15

4
+ 0.85

(
0.25

3

)
≈ 0.108

P
(1)
C =

1− 0.85
4

+ 0.85

(
P

(0)
B

|{A,C}|
+

P
(0)
D

|{A,B,C}|

)
=

0.15

4
+ 0.85

(
0.25

2
+

0.25

3

)
≈ 0.214

P
(1)
D =

1− 0.85
4

+ 0.85 (0) =
0.15

4
+ 0 ≈ 0.038

The initialisation and first iteration result in the following values for P:

t A B C D
0 0.250 0.250 0.250 0.250
1 0.214 0.108 0.214 0.038

Next, we check whether or not the algorithm has converged.

||P(1) −P(0)|| = ||{−0.036,−0.142,−0.036,−0.215}||
=

0.0677

≈ 0.260

Page 3 of 8

SOFT3410

Since 0.260 6≤ 0.005 (�), we perform another iteration.

P
(2)
A =

1− 0.85
4

+ 0.85

(
P

(1)
B

|{A,C}|
+

P
(1)
D

|{A,B,C}|

)
=

0.15

4
+ 0.85

(
0.108

2
+

0.038

3

)
≈ 0.094

P
(2)
B =

1− 0.85
4

+ 0.85

(
P

(1)
D

|{A,B,C}|

)
=

0.15

4
+ 0.85

(
0.038

3

)
≈ 0.048

P
(2)
C =

1− 0.85
4

+ 0.85

(
P

(1)
B

|{A,C}|
+

P
(1)
D

|{A,B,C}|

)
=

0.15

4
+ 0.85

(
0.108

2
+

0.038

3

)
≈ 0.094

P
(2)
D =

1− 0.85
4

+ 0.85 (0) =
0.15

4
+ 0 ≈ 0.038

Which leaves us with the following three iterations of P:

t A B C D
0 0.250 0.250 0.250 0.250
1 0.214 0.108 0.214 0.038
2 0.094 0.048 0.094 0.038

Again, we check whether or not the algorithm has converged:

||P(2) −P(1)|| ≈ 0.180 6≤ �

Since convergence has not been reached, we perform another iteration.

P
(3)
A =

1− 0.85
4

+ 0.85

(
P

(2)
B

|{A,C}|
+

P
(2)
D

|{A,B,C}|

)
=

0.15

4
+ 0.85

(
0.048

2
+

0.038

3

)
≈ 0.069

P
(3)
B =

1− 0.85
4

+ 0.85

(
P

(2)
D

|{A,B,C}|

)
=

0.15

4
+ 0.85

(
0.038

3

)
≈ 0.048

P
(3)
C =

1− 0.85
4

+ 0.85

(
P

(2)
B

|{A,C}|
+

P
(2)
D

|{A,B,C}|

)
=

0.15

4
+ 0.85

(
0.048

2
+

0.038

3

)
≈ 0.069

P
(3)
D =

1− 0.85
4

+ 0.85 (0) =
0.15

4
+ 0 ≈ 0.038

Again, we check whether or not the algorithm has converged:

||P(3) −P(2)|| ≈ 0.040 6≤ �

Page 4 of 8

SOFT3410

Since convergence has not been reached, we perform another iteration.

P
(4)
A =

1− 0.85
4

+ 0.85

(
P

(3)
B

|{A,C}|
+

P
(3)
D

|{A,B,C}|

)
=

0.15

4
+ 0.85

(
0.048

2
+

0.038

3

)
≈ 0.069

P
(4)
B =

1− 0.85
4

+ 0.85

(
P

(3)
D

|{A,B,C}|

)
=

0.15

4
+ 0.85

(
0.038

3

)
≈ 0.048

P
(4)
C =

1− 0.85
4

+ 0.85

(
P

(3)
B

|{A,C}|
+

P
(3)
D

|{A,B,C}|

)
=

0.15

4
+ 0.85

(
0.048

2
+

0.038

3

)
≈ 0.069

P
(4)
D =

1− 0.85
4

+ 0.85 (0) =
0.15

4
+ 0 ≈ 0.038

Again, we check whether or not the algorithm has converged:

||P(4) −P(3)|| = 0 ≤ �

We have now reached convergance, so the algorithm terminates and the values of P(4) are the final
PageRank scores for each of the pages. If this were a query, the resulting ranks would be A,C,B,D
where we arbitrarily rank page A before page C since they have the same score.

Page 5 of 8

SOFT3410

Implementation details

In the header file pagerank.h we have provided the function read_input that will process the input
file for you. The read_input function will output error if the input is malformed in any way and
will free any memory that has been previously allocated. We have also provided various structs in the
header file that may be useful, including a struct for holding pages and a struct that is a linked list of
pages. The function read_input returns the input data in these structs, with the following form:

/* singly linked list to store all pages */
struct list {
node* head; /* pointer to the head of the list */
node* tail; /* pointer to the tail of the list */
int length; /* length of the entire list */

};

/* struct to hold a linked list of pages */
struct node {
page* page; /* pointer to page data structure */
node* next; /* poiner to next page in list */

};

/* struct to hold a page */
struct page {
char name[21]; /* page name */
int index; /* index of the page */
int noutlinks; /* number of outlinks from this page */
list* inlinks; /* linked list of pages with inlinks to this page */

};

Warning: Do not add other files or modify the included header file or main function.
We will replace these files with our own copy when marking your code.

Page 6 of 8

SOFT3410

Program input and output







The first line specifies the number of cores that are available on the machine. You will need to take
this into account when you are optimising the performance of your program when using parallelism.

This is followed by the the dampening factor to be used in the algorithm and the total number of web
pages. The names of the web pages follow, one per line, where each name may be a maximum of
20 characters long. After declaring the names of the web pages, the number of edges in the graph is
given, followed by each edge in the graph specified by its source and destination page.

The read_input function outputs error and will terminate if there is invalid input, or there are any
memory allocation errors, or a page name exceeds the maximum length, or the dampening factor is
not in the range 0 ≤ d ≤ 1, a page is declared twice, or an edge is defined to a nonexistent page.

2
0.85
4
A
B
C
D
5
D A
D B
D C
B A
B C

This example has four web pages with names A, B, C, and D
There are five edges: D → A, D → B, D → C, B → A, and B → C
The output is the list of scores per web page

A 0.0686
B 0.0481
C 0.0686
D 0.0375

The scores are ordered in the same order they were defined in the input.

For printing the score, use the format string “%s %.4lf\n”
For all test cases set � = 0.005, use the constant defined as EPSILON

Page 7 of 8

SOFT3410

Marking

Before you attempt to write optimised code, you should first complete a working unoptimised version.
Once you have the basic algorithm correct you can then begin to implement various optimisations.

8 marks are assigned based on automatic tests for the correctness of your program.
This component will use our own hidden test cases that cover every aspect of the specification.
To pass test cases your solution must produce the correct output within a reasonable time limit.

4 marks are assigned based on the report of your solution and comparison to your single threaded
implementation. Observe, document and explain the performance behaviours through your own
benchmark suite.

You must submit your report to canvas portal by 11:59pm, 10th October, 2021.

You must submit your code to Ed by 11:59PM, 10th October, 2021.

Academic declaration

By submitting this assignment you declare the following:

I declare that I have read and understood the University of Sydney Student Plagiarism: Coursework Policy and Procedure, and except where specifically
acknowledged, the work contained in this assignment/project is my own work, and has not been copied from other sources or been previously submitted
for award or assessment.

I understand that understand that failure to comply with the Student Plagiarism: Coursework Policy and Procedure can lead to severe penalties as
outlined under Chapter 8 of the University of Sydney By-Law 1999 (as amended). These penalties may be imposed in cases where any significant
portion of my submitted work has been copied without proper acknowledgement from other sources, including published works, the internet, existing
programs, the work of other students, or work previously submitted for other awards or assessments.

I realise that I may be asked to identify those portions of the work contributed by me and required to demonstrate my knowledge of the relevant material
by answering oral questions or by undertaking supplementary work, either written or in the laboratory, in order to arrive at the final assessment mark.

I acknowledge that the School of Information Technologies, in assessing this assignment, may reproduce it entirely, may provide a copy to another
member of faculty, and/or communicate a copy of this assignment to a plagiarism checking service or in-house computer program, and that a copy of
the assignment may be maintained by the service or the School of IT for the purpose of future plagiarism checking.

Page 8 of 8