程序代写

Computer Science 572 Exam

Prof. , November 26, 2018, 8:00am – 9:00am

Copyright By PowCoder代写 加微信 powcoder

Name: Student Id Number:

1. This is a closed book exam.
2. Please answer all questions.
3. There are a total of 30 questions. Question points may vary.
4. Place your answer immediately below the question. Limit answers to ONE SENTENCE unless more is

requested.

1. [3 pts] Define Hypervisor

2. [3 pts] Google cloud offers five major sections: Compute, Storage, Stackdriver, Tools and Big

Data. To set up your cluster for homework #3 you used DataProc from which section?

3. [3 pts] Given two documents below, doc1 and doc2, provide the mapper output if an inverted

Index is run on the documents in a Hadoop cluster.

doc1 – To be or not to be, that is the question

doc2 –Not who but when

4. [3 psts] Given the two documents above, doc1 and doc2, provide the reducer output if an

invertedIndex is run on the documents in a Hadoop cluster.

5. [3 pts] Suppose one advertiser bids $102.75 for his ad to be displayed and a second

advertiser bids $101.25 for his ad to be displayed and all other factors affecting ads are

identical. If the first advertiser’s ad is clicked on how much does he pay Google?

6. [3 pts] What is the difference between Google’s AdWords system and Google’s AdSense

7. [3 pts] Google offers four types of keyword matching to its advertisers. Mention two of them.

8. [3 pts] What is the difference between a taxonomy and an ontology?

9. [3 pts] A knowledgebase will typically support both forward chaining and backward chaining.

Which inference technique is typically used by a search engine?

10. [4 pts] Passage scoring is the process whereby snippets returned in answer to a set of queries

are ranked for their usefulness to answer a question. Three criteria were provided for ranking

the snippets. Name two of them.

Below is the Norvig spelling corrector program written in Python and presented in class. Please

answer the questions that follow the program.

import re, collections

def words(text): return re.findall(‘[a-z]+’, text.lower())

def train(features):

model = collections.defaultdict(lambda: 1)

for f in features:

model[f] += 1

return model

NWORDS = train(words(file(‘big.txt’).read()))

alphabet = ‘abcdefghijklmnopqrstuvwxyz’

def edits1(word):

splits = [(word[:i], word[i:]) for i in range(len(word) + 1)]

deletes = [a + b[1:] for a, b in splits if b]

transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if

replaces = [a + c + b[1:] for a, b in splits for c in alphabet if

inserts = [a + c + b for a, b in splits for c in alphabet]

return set(deletes + transposes + replaces + inserts)

def known_edits2(word):

return set(e2 for e1 in edits1(word) for e2 in edits1(e1) if e2 in

def known(words): return set(w for w in words if w in NWORDS)

def correct(word):

candidates = known([word]) or known(edits1(word)) or

known_edits2(word) or [word]

return max(candidates, key=NWORDS.get)

11. [3 pts] What functions are defined?

12. [3 pts] What function is used to invoke (start) the program

13. [3 pts] What data set is used to initialize the dictionary?

14. [3 pts] How many levels of edits does the program investigate?

15. [3 pts] What is the cluster hypothesis for search engines

16. [3 pts] The notes mention three criteria for adequacy of clustering methods. Name one.

17. [3 pts] Define hard clustering:

18. [3 pts] In the k-means clustering algorithm, the means refers to computing the average of a set

of points. In one sentence, when in the algorithm is the average of a set of points computed?

19. [3 pts] If m is the size of the vector, n is the number of vectors (items), k is the number of

clusters, and i is the number of iterations, what is the computing time for the k-means

algorithm?

20. [3 pts] Given the two strings: “satisfactory” and “satisfying”, what is their minimum edit distance

assuming the operations (replace, delete, insert) all have a count of 1?

21. [3 pts] In one sentence describe what the WordNet system provides.

22. [3 pts] Given N as the number of documents, is the time to train the documents according to the

Rocchio method O(N), O(N log N) or O(N2)?

23. [3 pts] Given 100 documents divided into three clusters where cluster 1 has 10 related

documents, cluster 2 has 5 related documents and cluster 3 has 10 related documents, what is

the Purity Index of this clustering?

24. [4 pts] There are three criteria that define a good clustering algorithm, describe two:

25. [4 pts] Featured snippets are Google’s attempt to answer the query right on the search results

page. There are 3 types of featured snippets. Mention any two of them.

26. [4 pts] There are four different approaches mentioned in class for evaluating clustering

algorithms. Mention any two of them.

27. [4 pts] What are long tailed keywords.

28. [4 pts] When viewed as a graph, a knowledge graph is what sort of graph? Use conventional

graph terms. We are expecting at least two graph properties.

29. [6 pts] There are 6 different query types supported by Solr. Mention any four of them.

30. [4 pts] There are 2 approaches discussed in class to handle non-word spelling error correction.

What are they?

Monday, November 26, 2018, 8:00am – 9:00am

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com