Computer Science 572 Exam
Prof. , April 22, 2019, 8:00am – 9:00am
Copyright By PowCoder代写 加微信 powcoder
Name: Student Id Number:
1. This is a closed book exam.
2. Please answer all questions.
3. There are a total of 40 questions.
4. Place your answer immediately below the question. Limit answers to ONE SENTENCE unless more is
requested.
class WordCountMapper extends Mapper
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens())
XXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXX
class WordCountReducer extends Reducer
public void reduce(Text key, Iterable
throws IOException, InterruptedException
int sum = 0;
for (IntWritable value : values)
XXXXXXXXXXXXXXXXXXXXXX
context.write(key, new IntWritable(sum));
1. [2 1/2 pts] Above is the mapper and reducer classes for a WordCount program to be run on
Hadoop. Two of the lines in Mapper are missing, denoted by
XXXXXXXXXXXXXXXXXXXXXX. Provide the missing lines.
2. [2 1/2 pts] Above is the mapper and reducer classes for a WordCount program to be run
on Hadoop. One of the lines in Reducer is missing, denoted by
XXXXXXXXXXXXXXXXXXXXXX. Provide the missing line.
3. [2 1/2 pts] Given two documents below, doc1 and doc2, provide the mapper output if an
invertedIndex is run on the documents in a Hadoop cluster.
doc1 – USC Viterbi School of Engineering
doc2 – invented the Viterbi algorithm
Mapper Output :
4. [2 1/2 pts] Given the two documents above, doc1 and doc2, provide the reducer output if
an invertedIndex is run on the documents in a Hadoop cluster.
Reducer Output:
5. [2 1/2 pts] Suppose x and y have initial values and there are two threads of computation as
x = 1; y = 0;
Thread 1: void foo() { x = x + 1; y = x + y; }
Thread 2: void bar() { y = y + 1; x = x + y;}
Provide two sequences of execution that produce different final values for x and y. For
example you might start with: execute thread 2, first statement
6. [2 1/2 pts] Google has two programs for advertisers, one that places ads next to search
engine results and one that places ads on a website. Name both of the programs.
7. [2 1/2 pts] Two improper techniques used to enhance a web page’s ranking in search
results are cloaking and page jacking. Using one sentence each, define them both.
8. [2 1/2 pts] Suppose one advertiser bids $1.00 for his ad to be displayed and a second
advertiser bids $0.50 for his ad to be displayed and all other factors affecting ads are
identical. If the first advertiser’s ad is clicked on, how much does he pay Google?
9. [2 1/2 pts] Briefly describe the difference between a broad match and an exact match in
the context of AdWords.
10. [2 1/2 pts] When viewed as a graph, a knowledge graph is what sort of graph? Use
conventional graph terms. We are expecting at least two graph properties.
11. [2 ½ pts] Given the statements P and Q, what is the modus ponens rule?
12. [2 1/2 pts] An ontology supports classes and subclasses. Is WordNet an ontology, yes or
13. [2 1/2 pts] WordNet uses the following terms: synset, hypernym, hyponym and
meronym. In one sentence define one of them.
14. [2 1/2 pts] How are Wikipedia, WikiData and WikiMedia related. Using one sentence for
each, describe each one.
15. [2 1/2 pts] Several heuristic techniques were presented for speeding up the computation
of the ranked results. Mention two of them.
16. [2 1/2 pts] Write out the 3-grams for the phrase below: (ignore the quotes)
“Fourscore and seven years ago our fathers brought forth a nation”
How many 3-grams are there?
17. [2 1/2 pts] Lucene builds an inverted index from documents it parses. Is the inverted
index positional?
18. [2 1/2 pts] Is the NetworkX graph used in the PageRank algorithm directed or
undirected?
19. [2 1/2 pts] There are 6 different query types supported by Solr. Mention any four of them.
20. [2 1/2 pts] Given two strings, one of length m and the other of length n, what is the
computing time of the Levenshtein algorithm when applied to these two strings?
21. [2 1/2 pts] In the Levenshtein algorithm, given two strings X[1 . . m] and Y[1 . . n] what
is the definition of D(i, j), the Levenshtein distance function in terms of X and Y?
22. [2 1/2 pts] Given the assumptions of the previous question what are the values of
D(i, 0) for i = 1 , . . , m and what are the values of D(0 , j) for j = 1 , . . , n?
23. [2 1/2 pts] Given the two strings: “SIMPLIFY” and “AMPLIFIES”, what is their
minimum edit distance assuming the operations (replace, delete, insert) all have a count
24. [2 1/2 pts] Which HTML tag field is used by Google as the default for creating a snippet?
25. [2 1/2 pts] There are two special types of snippets used by Google. What is their names?
26. [2 1/2 pts] The schema.org website defines a technology that is used by Google, Yahoo
and Bing. In one or two sentences what is that technology?
27. [2 1/2 pts] Define breadcrumbs.
28. [2 1/2 pts] To implement rich snippets two technologies are offered, microformats and
microdata. In one sentence how does microformat work and give a one line example?
29. [2 1/2 pts] Define: Dendrogram
30. [2 1/2 pts] What is the difference between hard clustering and software clustering?
31. [2 1/2 pts] Mention one possible criterion for determining when the k-means algorithm
can terminate.
32. [2 1/2 pts] Is the Agglomerative Clustering Algorithm top-down or bottom up?
33. [2 1/2 pts] Is the Divisive Clustering algorithm top-down, bottom-up, or both?
34. [2 1/2 pts] For the k-means algorithm, if M is the size of a document vector, N is the
number of vectors, K is the number of clusters, and I is the number of iterations, what is
the worst-case computing time of the algorithm?
35. [2 1/2 pts] What set of points does K-means clustering use to identify a cluster?
36. [2 1/2 pts] The k-means++ algorithm uses a different method than the k-means algorithm
for choosing the initial clusters. What is that method?
37. [2 1/2 pts] The mean reciprocal rank is a statistical measure for evaluating a process that
produces a list of responses to a query. If |Q| represents the number of queries and rank(i)
represents the rank of the correct result for the ith query, then define the Mean Reciprocal
Rank or MRR
38. In determining an answer to a question, it was suggested that n-grams be used. What is
the definition of the weight of an n-gram?
39. [2 1/2 pts] We looked at two algorithms for classifying documents into groups. What are
they called?
40. [2 1/2 pts] In one sentence define the contiguity hypothesis
Monday, April 22, 2019, 8:00am – 9:00am
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com