EECS 485 Final Exam Spring 2021 SOLUTION
This is a 120 minute, open-note exam.
Allowed Resources
● You may use any notes or other resources, including online resources.
Copyright By PowCoder代写 加微信 powcoder
● You may use a compiler, IDE, or other programming tools.
Collaboration/Assistance is Prohibited
● You are NOT allowed to collaborate with anyone else to answer these questions.
● You are NOT allowed to solicit assistance from anyone to answer these questions.
● For example, reading existing posts on stackoverflow.com is OK, but posting or reading
new questions seeking help from others on this exam is prohibited.
You are to abide by the University of Michigan/Engineering honor code. To receive a grade,
please sign below to signify that you have kept the honor code pledge.
I have neither given nor received aid on this exam, nor have I concealed any violations of the Honor Code.
Signature: Name: Uniqname: UMID:
_________________________________________ _________________________________________ _________________________________________ _________________________________________
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/
Multiple Choice (40 points)
Q1. [HONOR CODE]
2. When would a single process, multi-threaded program outperform a multi-process program?
A. ( ) A. A multi-tab web browser; runtime errors in one tab leave other tabs intact.
B. ( ) B. A multi-window web browser; each window can run Javascript code
simultaneously.
C. (X)C.AGFSmasterservercapableofhandlingsimultaneousconnections;the chunk data is large and stored in program memory.
D. ()D.AGFSmasterservercapableofhandlingsimultaneousconnections;thechunk data is large and stored in a SQL database.
E. ()E.Noneoftheabove
3. Which of the following is true about threads, processes, and sockets in Python?
A. ( ) A. A single thread can only create one socket at a time.
B. ( ) B. Threads cannot execute in parallel because Python is interpreted, not compiled.
C. ()C.Twothreadscancommunicatebymodifyingasharedvariable,butnotbyusinga
D. (X)D.Twoprocessescancommunicatebyusingasocket,butnotbymodifyinga shared variable.
E. ( ) E. More than one of the above is true.
4. Which of the following is true about TCP and UDP?
A. ( ) A. Establishing a UDP connection takes more steps than TCP.
B. (X)B.TCPadaptstonetworkcongestionbutUDPdoesnot.
C. ()C.Packetswithidenticalsource,destination,andtimestampsmaytravelthrough different IP routers over UDP, but will always travel through the same IP routers over TCP.
D. ()D.ATCPACKwillnevergetlostinacongestednetwork,butaUDPACKcould.
E. ( ) E. It is better to use TCP for DNS than UDP because it is important that a DNS packet
does not get lost.
5. Justin is trying to log in to canvas.umich.edu but it is taking a long time to load. What could be the cause? Assume that the canvas.umich.edu server is in the BBB.
A. ( ) A. A firewall on Justin’s computer is preventing him from connecting to a DNS nameserver.
B. ( ) B. Justin’s computer has a poorly implemented TCP congestion control system.
C. ()C.Justin’sIPaddressisblockedbytheUniversity.
D. ()D.Theumich.eduDNSservercachewasunexpectedlyinvalidatedandnotupdated.
E. (X) E. Any of the above.
6. Drew’s YouTube documentary about chickens has only 100 views while his Python tutorial has millions. What is a likely explanation?
A. ( ) A. Content-based filtering makes it more difficult for fans of niche video categories like agricultural documentaries to discover similar videos.
B. (X)B.AlltheviewsonDrew’schickenvideoarefromviewerswhodonottypically
watch chicken videos and who accessed it directly from his channel page, so
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/
K-nearest-neighbors would not recommend the chicken video to a typical chicken
enthusiast.
C. ()C.User-basedfilteringwouldrecommendthechickenvideotoauserwhohasnever watched a video about chickens before if they also watch a lot of Python coding tutorials, but content-based filtering would not.
D. ()D.Morethanoneoftheabove
E. ( ) E. None of the above.
7. Which of the following is true about Lucky485: a search engine that always returns a single match? Assume that any valid query has at least two relevant results.
A. ( ) A. The precision will always be 100%.
B. ( ) B. The recall will always be 100%.
C. ()C.Theprecisionwillalwaysbe<100%.
D. (X)D.Therecallwillalwaysbe<100%.
E. ( ) E. None of the above.
8. Which of the following observations about information retrieval influenced PageRank?
A. ( ) A. An uncommon word is a better signifier of document similarity than a common
B. ( ) B. If two documents are considered as vectors in n-dimensional space, two similar
documents have vectors with a small angle between them.
C. (X)C.Awebpagethatislinkedtobyotherimportantwebpagesisitselfmore important.
D. ()D.Awebpagethatlinkstootherimportantwebpagesisitselfmoreimportant.
E. ()E.Alloftheabove.
9. Consider the following three documents:
1: “I am going home to eat dinner soon with Will.” 2: “The chef will make dinner soon.”
3: “If you are going home please text me.”
Which of the following is true? Assume case insensitive and no stopwords.
A. ( ) A. The Jaccard similarity score between Document 1 and Document 2 with shingle
size 1 is less than the Jaccard similarity score between Document 1 and Document 3
with shingle size 1.
B. ( ) B. The Jaccard similarity score between Document 1 and Document 2 with shingle
size 1 is less than the Jaccard similarity score between Document 2 and Document 3
with shingle size 1.
C. ()C.TheJaccardsimilarityscorebetweenDocument1andDocument2withshingle
size 2 is less than the Jaccard similarity score between Document 1 and Document 3
with shingle size 2.
D. ()D.TheJaccardsimilarityscorebetweenDocument1andDocument2withshingle
size 2 is less than the Jaccard similarity score between Document 2 and Document 3
with shingle size 2.
E. (X) E. Documents 1 and 2 have the highest Jaccard similarity with both shingle
size 1 and shingle size 2.
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/
10. A search engine uses only cosine similarity of tf-idf scores to produce search results. The top hit for “Pineapple” is currently the Company. How can Justin modify his website to increase its relevance in searches for the word “Pineapple”?
A. (X)A.Createapagethatconsistsofjusttheword“pineapple”repeated1million times.
B. ( ) B. Move all the content from subpages to the home page.
C. ()C.HavetheDolePineappleCompanyincludelinkstoJustin’scompany’swebsiteon
the Dole homepage.
D. ()D.IncludelinkstotheDolePineappleCompany’swebsiteonJustin’scompany’s
E. ( ) E. Ask the search engine to index many of Dole’s rival pineapple companies’ web
11. Consider a web link graph:
Which is true about PageRank on the graph after convergence? Assume initialization and damping factor as in lecture.
A. ( ) A. PR(D) > PR(C) > PR(A) > PR(B).
B. (X)B.PR(D)>PR(C)>PR(A)=PR(B).
C. ()C.PR(D)>PR(C)>PR(B)>PR(A).
D. ()D.PR(D)>PR(C)=PR(A)=PR(B).
E. ( ) E. PR(A) > PR(B) > PR(C) > PR(D).
12. Which of the following is true about web crawling:
A. ( ) A. No search engine crawls the dark web.
B. ( ) B. Web crawlers look to a site’s robots.txtto see what pages they are allowed to index
C. ()C.Webpagesdetectawebcrawler’sIPaddressandlimitthenumberofrequestsit
can make per second to their site
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/
D. (X)D.BandC
E. ( ) E. None of the above
13. Maryam runs a static website and hosts it on premises in the US. There is an increase in global traffic. What is the best solution?
A. ( ) A. Rent Infrastructure as a Service hosting
B. ( ) B. Rent Platform as a Service databases
C. ()C.Buymorephysicalservers
D. (X)D.HostherwebsiteonaContentDeliveryNetwork
E. ( ) E. C and D are both equally good options
14. A DNS cache contains only entries to the root server and to .edu. How many DNS lookups are required to resolve www.puppies-and-kittens.blogspot.com?
B. ( ) B. 2
E. ()E.Noneoftheabove
The correct answer is actually E, or 5. This calculation doesn’t take into account the www subdomain.
15. A distributed network database has 2 servers. All queries starting from A-M are failing, but queries starting with N-Z return correctly. Which of the CAP theorem properties most likely apply?
A. (X)A.Consistentandpartition-tolerant,butnotavailable
B. ( ) B. Consistent and available, but not partition-tolerant
C. ()C.Availableandpartition-tolerant,butnotconsistent
D. ()D.EitherAorB
E. ()E.EitherBorC
16. What is the best distributed network database choice for a social media site that expects a large volume of new posts?
A. ( ) A. sqlite
B. ()B.mySQL
C. (X)C.postgreSQL
D. ()D.bothsqliteandmySQLareequallygoodoptions
E. ( ) E. You should not use a relational database management system for this site.
17. Justin is bidding in an auction for a camera online. Here is what the users see during the auction:
Patrick: $210 Spongebob: $220 Justin: $230 Squidward: $240 Patrick: $260 Squidward: $270 Justin: $300
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/
Winner: for: $300
What type of auction is this?
A. ( ) A. Open proxy auction
B. (X)B.Englishstyleopenauction
C. ()C.Sealedbid2ndpriceauction
D. ()D.Sealedbid1stpriceauction
E. ( ) E. Dutch style open auction
18. Which of the following is true about Blockchain and Bitcoin?
A. ( ) A. Bitcoin is a decentralized fiat system that is built using blockchain
B. ( ) B. Any user that owns 31% or more of the bitcoin servers can create a fake
blockchain of transactions
C. ()C.Blockchainisthebestsolutionforproblemsinvolvingmultiplewriterswhodonot
trust each other
D. ()D.BlockchainminingiscomputationallyexpensivetopreventinflationofBitcoin
E. (X)E.AandD
19. Which of the following is a reason to use Bitcoin over an altcoin?
A. ( ) A. Anonymous transactions
B. ( ) B. Smart contracts
C. ()C.Lessexpensivemining
D. ()D.Incentivesformining
E. (X) E. None of the above
20. Which is true about Tor?
A. ( ) A. It is impossible to identify which users are browsing through Tor
B. (X)B.Ina3-nodeTorrelay,ausercanbede-anonymizedifanattackercontrols
both the entry and exit nodes that a user’s request goes through
C. ()C.TheTorbrowsercanonlyaccesswebpagesonthedeepanddarkweb
D. ()D.Duetotheonionrouting,Tor’snetworkspeedisfasterthanotherbrowsers
E. ()E.Noneoftheabove
21. Maryam wants to watch Avatar: The Last Airbender on Netflix but it’s not available for
streaming in the US. She knows it is available in Germany. Which of the following could she use to access the show?
A. ()A.AVPNproxylocatedintheUS
B. ( ) B. A VPN proxy located in Germany
C. ()C.ATorbrowserwithanentrynodeinGermany
D. ()D.ATorbrowserwithanexitnodeinGermany
E. (X)E.BorD
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/
Free Response 1: Distributed Document Lookup (30 Points)
Implement a distributed document index server. Like your Project 5, clients make requests to a search server, which forwards them to an index server. Unlike your Project 5, the index is segmented by document. The index parent server makes concurrent requests to index child servers using threads. Below is an example of this architecture with three child servers. In this you are implementing the parensterver.
The parent server main thread listens on a TCP socket for search queries. Upon receiving a query, it creates a new thread to handle the request. A message from the search server to the parent index server looks like this:
“query”: “Michigan football players”,
The thread that handles a search query makes concurrent requests to each child server using the Python requests library. These requests are slow, so make each request in a separate thread. Assume the child servers are all implemented correctly and will not fail. A message from the parent index server to a child index server looks like this:
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/
“query”: “Michigan football players”,
A child server returns response data like this:
“query”:”Michigan football players”, “results”:[
“docid”:4850,
“score”:0.87
“docid”:1817,
“score”:0.92
Another child server might return a response like this:
“query”:”Michigan football players”, “results”:[
“docid”:2900,
“score”:0.23
Once all the child servers respond, the parent server combines the results it receives from the child servers and responds to the search server on the original TCP socket. Do not worry about
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/
sorting the results, and assume any document is indexed by at most 1 child server. For example:
“query”:”Michigan football players”, “results”:[
“docid”:4850,
“score”:0.87
“docid”:1817,
“score”:0.92
“docid”:2900,
“score”:0.23
The Parent server class is defined and the constructor is shown below.
class Parent:
def listen_for_queries(self):
# Implement in Part 1
def handle_query(self, query, clientsocket):
# Implement in Part 2
def request_from_child(self, child_url, query, responses): # Implement in Part 3
def make_response(self, responses, query, clientsocket): # Implement in Part 4
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/
def __init__(self, host, port, child_urls): self.child_urls = child_urls # Child server URLs self.host = host # Parent server host self.port = port # Parent server port self.listen_for_queries()
You may assume that all standard libraries are imported.
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/
1: Listen for queries on a TCP socket. Continuously accept queries, creating a new thread running handle_queryto handle each query. This function returns nothing. Hint: sock.accept()returns a tuple containing a socket that can both send and receive messages.
def listen_for_queries(self):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) sock.bind((self.addr, self.port))
sock.listen()
while True:
clientsocket, _ = sock.accept()
message_chunks = []
while True:
data = clientsocket.recv(4096)
if not data:
message_chunks.append(data)
message_bytes = b”.join(message_chunks) message_str = message_bytes.decode(‘utf-8’) msg = json.loads(message_str)
t = threading.Thread(target=self.handle_query, \ args=(msg[‘query’], clientsocket))
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/
2: Handle a single query by forwarding it to each child server concurrently. Create a new thread runningrequest_from_chilfdor each request to each child server. Callmake_response after every child server has responded. This function returns nothing.
def handle_query(self, query, clientsocket): responses = []
req_threads = []
for child_url in self.child_urls:
t = threading.Thread(target=self.request_from_child, args=( child_url, query, responses
req_threads.append(t)
[thread.join() for thread in req_threads] self.make_response(responses, query, clientsocket)
3: Request from parent to a single child server. Modify theresponseslist object passed in as a parameter and add in the response from this child server. This function returns nothing.
def request_from_child(self, child_url, query, responses):
resp = requests.get(child_url, data={‘query’: query}) responses.append(resp.json()[‘results’])
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/
4: Combine each child response into a single response. This function should be called at the end of your handle_queryfunction after every child has responded, it should use a TCP socket to send the final response back to the search server. This function returns nothing.
def make_response(self, responses, query, clientsocket):
responses = [result for results in responses \
for result in results]
message = {
“query”: query,
“results”: responses
message_str = json.dumps(message) message_bytes = str.encode(message_str) clientsocket.sendall(message_bytes) clientsocket.close()
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/
Free Response 2: Vacation Destination Picker (30 Points)
Write a two-stage MapReduce pipeline that could be used by a vacation recommender system. The input is user rating and cost info. The output is grouped by season, average flight cost, and user ratings.
Similar to how you implemented Project 5, write your Map and Reduce code in Python 3 and use
the Hadoop Streaming Interface with key-value pairs separated by a tab (\t) character.
Remember to read input from stdin and write output to stdout.NOTE: Solutions that only use one stage of the MapReduce pipeline will not receive any points.
The input will be comma separated values in the following format:
● username: string
● destination: string
● flight_cost: int (between 50 – 10000)
● season: string (“summer”, “fall”, “winter”, “spring”)
● rating: float (between 0.0 – 5.0)
myounus, Venice, 510, winter, 4.5
jmapple, Tokyo, 619, spring, 4.8
awdeorio, Seoul, 610, spring, 4.0
jag, Seoul, 811, summer, 3.9
mickeymouse, Seoul, 611, spring, 4.2
flynnrider, Tokyo, 923, summer, 4.9
chickenlittle, Venice, 712, fall, 3.5
spongebob, Dallas, 251, summer, 2.0
The final output displays the destinations from each season ranked from best to worst match in
the following format:
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/
● average_cost_per_seasoshnouldbecalculatedbasedontheaveragecostofa destination within 1 sea. son
● average_ratingshouldtakeintoaccountalluserratingsregardlessofwhatseason they visited in.
NOTE: Both average_cost_per_season and average_rating should be float types not round them during your calculations.
Example output:
Spring, Seoul, 610.5, 4.03
Spring, Tokyo, 619, 4.85
Summer, Dallas, 251, 2.00
Summer, Seoul, 811, 4.03
Fall, Venice, 712, 4.00
Winter, Venice, 510, 4.00
Notice how Seoul shows up both in Spring and Summer withdifferent average flighatndcosts the same average r.ating
## Sorting Entries: Equation
Use the following to print your final output in the correct order. Normalize your average cost for each destination per season to the maximum flight cost, and normalize your average rating for each destination to the maximum rating. Here are the normalization equations for one destination:
𝑎𝑣𝑔𝐶𝑜𝑠𝑡𝑑− 𝑚𝑖𝑛(𝑐𝑜𝑠𝑡)
Norm_cost = 𝑚𝑎𝑥(𝑐𝑜𝑠𝑡) − 𝑚𝑖𝑛(𝑐𝑜𝑠𝑡) Norm_rating =
𝑎𝑣𝑔𝑅𝑎𝑡𝑖𝑛𝑔 𝑑− 𝑚𝑖𝑛(𝑟𝑎𝑡𝑖𝑛𝑔) 𝑚𝑎𝑥(𝑟𝑎𝑡𝑖𝑛𝑔) − 𝑚𝑖𝑛(𝑟𝑎𝑡𝑖𝑛𝑔)
Once you have calculated the normalization factors for both the cost and the rating, use the following equation to find the best matches. This match_factor value should be used to sort your output in descending owridtheinr each season:
Match_factor = (0.3 * 𝑛𝑜𝑟𝑚𝑅𝑎𝑡𝑖𝑛𝑔) + (0.7 * 𝑛𝑜𝑟𝑚𝐶𝑜𝑠𝑡) 2
You may assume that all standard libraries are imported.
Licensed under Creative Commons At
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com