CS代考 EECS 485 Final Exam Fall 2021 SOLUTION

EECS 485 Final Exam Fall 2021 SOLUTION
This is a 120 minute, open-note exam.
Allowed resources
● You may use any notes or other resources, including online resources. ● You may use a compiler, IDE, or other programming tools.

Copyright By PowCoder代写 加微信 powcoder

Collaboration or assistance is prohibited
● You are NOT allowed to collaborate with anyone else to answer these questions.
● You are NOT allowed to solicit assistance from anyone to answer these questions.
● For example, reading existing posts on stackoverflow.com is OK, but posting or reading new questions seeking help from others on this exam is prohibited.
You are to abide by the University of Michigan/Engineering honor code. To receive a grade, please sign below to signify that you have kept the honor code pledge.
I have neither given nor received aid on this exam, nor have I concealed any violations of the Honor Code.
Signature: _________________________________________ Name: _________________________________________ Uniqname: _________________________________________ UMID: _________________________________________
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/

uniqname: _________________________
Multiple Choice (40 points)
1. (Placeholder for consistent numbering with GradeScope.)
2. Noah wants to open a bakery, so in order to create the ultimate bread, he builds a multi-threaded web crawler for recipes. What is most likely to cause his crawler to redundantly visit some pages in multiple threads?
A. ◯ Threads have separate stacks, so they do not know which URLs have been visited in other threads.
B. ◯ Threads share a heap, so they retrieve URLs from the same object in memory.
C. ◯ThethreadsinthebreadcrawlerpopURLsfromaglobalqueuewherepopisnotan
atomic operation.
D. ◯Thebreadcrawlerisalsomulti-process,andeachprocesshasthesamedata.
E. ◯ Some of the threads are terminated by the OS before they can complete their tasks.
3. Which of the following is FALSE about threads, processes, and sockets?
A. ◯ Sockets must be bound to a single port.
B. ◯ It is impossible for a user-level thread to run outside of a process.
C. ◯Thesamesocketcanbeusedforbothsendingandreceivingdata.
D. ◯Thread-safemulti-threadedprogrammingcanbeachievedwithoutusinglocks.
E. ◯ Processes can only share data using networking.
4. Imagine we change Project 4 so that TCP messages include the message length. What is now possible as a consequence of this change?
A. ◯ The Manager can listen for heartbeats and perform fault tolerance in the same thread.
B. ◯ The Manager no longer needs to assume that Workers cleanly close TCP connections.
C. ◯TheManagerorWorkersmightencounterConnectionRefusedErrors.
D. ◯TheManagermightaccidentallysendthewrongtaskstothewrongWorkers.
E. ◯ The server can easily be adapted to execute multiple MapReduce jobs in parallel.
5. Which of the following is the LEAST likely explanation for why it might take longer to access a particular web page today than it did yesterday?
EECS 485 Fall 2021 2/15
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/

uniqname: _________________________
A. ◯ The packets that are sent as part of the HTTP request are routed to different regions of the planet than they were yesterday.
B. ◯ There is significantly more network congestion.
C. ◯Theclienthastoomanyprocessesrunning.
D. ◯Apoweroutagehascausedalotofrouterstoshutdown.
E. ◯ The client is watching a live-streamed video, which uses UDP instead of TCP.
6. Ariel is looking for new video games, so she inputs her favorite games to a recommender system that uses content-based filtering. Unfortunately, the results are too similar to games she’s played before. What would be the WORST solution to this problem?
A. ◯ The website could use hybrid filtering instead of content-based filtering.
B. ◯ The recommender system could use a better classification algorithm.
C. ◯Moreuserscouldaddtheirfavoritegamestothewebsite.
D. ◯Thewebsitecouldaddmoredetailedcharacteristicstoeachgameinitsdatabase.
E. ◯ Ariel could add a more diverse range of games to her list.
of the following is TRUE about a search engine that returns just one result?
A. ◯ Mean reciprocal rank would be the best metric to evaluate the search engine.
B. ◯ The precision would never increase if the search engine returned two results instead
C. ◯Kendall’sTaucouldnotbeusedasmetricforthistypeofsearchengine.
D. ◯AandC.
E. ◯ All of the above.
8. What is the minimum number of links that can be added to this graph to achieve the following PageRank ordering: PR(A) = PR(B) > PR(C) > PR(D)? The existing links cannot be removed. Assume the damping factor is 0.85.
EECS 485 Fall 2021 3/15
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/

uniqname: _________________________
E. ◯ None of the above
9. Which is FALSE regarding the PageRank and HITS algorithms?
A. ◯ We can improve project 5 with the HITS algorithm by only adding a preprocessing step without modification to the Search or Index server.
B. ◯ HITS and PageRank are influenced by the words and links on a webpage.
C. ◯HITSandPageRankrelyontheentirecorpusofdocumentsforcomputation.
D. ◯HITSandPageRankdonotperformwellwithsinknodes.
E. ◯ All of the above.
10. (1) Compute the Jaccard similarity using 2-shingles between the documents below. (2) Compute the estimated Jaccard similarity using MinHash with the following hash function h(x) = len(x). For example, h(“chicken”) = 7.
Document 1: I enjoy working on the EECS 485 projects
Document 2: I enjoy watching the EECS 485 lectures
A. ◯5/10;1
B. ◯3/10;0
C. ◯5/10;0
D. ◯3/10;1
E. ◯1/10;1
11. Which of the following is TRUE about scaling a web search engine and Project 5?
A. ◯ Block sort-based indexing would improve the efficiency of our MapReduce pipeline in project 5.
B. ◯ If an Index server fails in project 5, we would be unable to access 1/3 of our documents.
C. ◯Ifwesegmentedtheinvertedindexbyterminproject5,onlythepipelinewould need to change.
D. ◯Ourimplementationhandlesafrequentlychangingdatasetwell. Fall 2021 4/15
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/

uniqname: _________________________ E. ◯ None of the above.
12. Insta485 is looking to decrease the latency in fetching data for users’ feeds. How can they achieve this?
A. ◯ Buying more storage for their existing servers.
B. ◯ Sharding the database by username and creating a copy of each shard.
C. ◯UtilizingaCDNtodisplaymediacontentsuchasvideosandphotos.
D. ◯Addingmorecolumnstotheirpoststabletoallowformorespecializedandspecific
13. Jacob and Ali build a live soccer game score tracking app. Uptime is more important than up-to-date scores. The user base is large and global. What is the best design for the database?
A. ◯ Unified PostgresSQL database server on AWS.
B. ◯ Partitioned RDBMS database.
C. ◯PartitionedNoSQLdatabase.
D. ◯SQLitedatabaseonAWSEC2.
E. ◯ Any of the above.
14. How many DNS lookups are required to resolve to visit eecs485.org, umich.edu/computer-science, and google.com? Assume the nameserver cache only contains the IP of the umich.edu DNS server and that the initial request from the client to the local nameserver does NOT count.
A. ◯4 B. ◯6 C. ◯8 D. ◯10 E. ◯12
15. Which of the following is an example of containerization? A. ◯ A Windows machine running WSL2
B. ◯ A Linux machine with 1 GB of RAM running 300 Chrome tabs
C. ◯AMacOSmachinerunning3instancesofaWindowsvirtualmachine
D. ◯AnAmazonEC2instancerunningtheinsta485webserver
E. ◯ None of the above
Fall 2021 5/15
The correct answer is actually 7. All answers were accepted.
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/

uniqname: _________________________
16. Which application is the most suitable for a system that prioritizes consistency and also makes multiple copies of its database?
A. ◯ A website that documents and presents a lot of important/sensitive information to the general public
B. ◯ A social media platform that expects millions of reads/writes per day
C. ◯Alivestreamingplatformthatshowcasesalivefeedofcomments/messagesposted
by users every second
D. ◯Apopularvideosharingplatformthattracksthenumberofviews,likes,and
comments for each video
17. Which is an example of when Blockchain should be used?
A. ◯ Calorie Intake: Counting the calories you intake everyday
B. ◯ Video Game Logging: Your Xbox logging every day that you play Fifa (a soccer
video game)
C. ◯SongTracking:KeepingtrackoftheI /
D. ◯SellingDigitalArt:Amarketplacefordigitalartworkforcreatorstoproveownership of products
E. ◯ Workout Tracker: Your apple watch tracking each step you take throughout the day
18. Grace is bidding for a new mechanical keyboard. The seller receives these bids, in the
following order:
What type of auction is this? A. ◯ Proxy Bid
Fall 2021 6/15
Jacob: $80
DeOrio: $65
Grace: $90
Jared: $65
Nanki: $30
Alice: $55
Grace wins the auction and pays the seller $80.
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/

uniqname: _________________________
B. ◯ Sealed-Bid, Second Price
C. ◯Sealed-Bid,FirstPrice
D. ◯EnglishAuction
E. ◯ Dutch Auction
19. Which Bitcoin miner is most likely to control the next block?
A. ◯ Whoever has been a miner for longer
B. ◯ The miner with the higher computational power
C. ◯Theminerwiththemostdevices
D. ◯Theminerthatholdsthemostcryptocurrency
E. ◯ All of the above
20. What is the most significant disadvantage of the blockchain proof-of-work?
A. ◯ Wasted electricity
B. ◯ Insufficient reward for miners
C. ◯Minersmustconstantlymonitortheirnodeswhenmakingablock
D. ◯Minersgetverydirty
E. ◯ Miners don’t know when their proof-of-work is complete
21. Which of the following is correct about Tor encryption?
Fall 2021 7/15
A. ◯ Only the first relay node knows the source IP address.
B. ◯ Every node can know its previous connection’s IP address.
C. ◯OnlythelastrelaynodeknowsthedestinationIPaddress.
D. ◯YoucouldbreakanonymitybysurveillingALLnodesintheTorcircuit.
E. ◯ All of the above
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/

uniqname: _________________________
Free Response 1: Mine485 (30 points)
Implement a Manager for a distributed bitcoin mining pool. The Manager will communicate over TCP with a variable number of Workers.
Each time the Manager receives a new TCP connection, it should handle all future communication on the returned `clientsocket` object in a **new thread**.
The Manager should communicate on the same clientsocket for all incoming and outgoing connections with a given Worker for the lifetime of the program, without ever closing the socket. You can assume that all incoming connections are from valid Workers.
For sending and receiving to occur on the same socket, the Manager needs to receive a sentinel value from the Worker to indicate the end of a message. For this problem, assume the Worker will send to indicate the end of its message.
Worker Registration
The first time a Worker communicates with a Manager, it will send a register message:
“message_type”: “register”,
Task Delegation
The Manager and Workers will attempt to validate a block+nonce combination. Use `self.get_task_msg` to get a properly formatted task message to send to the worker. The first message should be sent immediately after a `register` is received.
The Worker, which is already implemented, will attempt to validate the block+nonce combination and return a message indicating the result:
“message_type”: “task_result”,
“result”: “success” or “failure”
“blockid”: int,
“nonce”: int,
Message Handling
EECS 485 Fall 2021 8/15
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/

uniqname: _________________________
Upon receiving a success response from a Worker, the Manager should call `self.record_and_advance`with the successful nonce, before sending that Worker another task. Upon receiving a failure response, the Manager should just send the Worker another task. Assume that `Manager::record_and_advance()` and `Manager::get_task_msg()` are both implemented correctly to avoid race conditions.
Implement the empty sections of the Manager class definition below. You may not create any additional member variables or any global variables:
class Manager():
send_msg_to_worker(self, clientsocket, message_in): “””Send a message to a worker.”””
# Assume implemented correctly
# Sends message to a worker on clientsocket
# Takes message_in: a python dictionary
# Takes clientsocket: a socket already connected to a worker
get_task_msg(self, last_blockid):
“””Form a response message for a worker.”””
# Assume implemented correctly
# Takes in last_blockid:
# If last_blockid is -1,
# Returns a correctly formatted response message to send to the worker
# detailing the task for the worker to process
record_and_advance(self, valid_nonce):
“””Record the valid nonce for this block and set the next block.””” # Assume implemented correctly
# Takes in valid_nonce: The valid nonce that is associated with
the current block
the id associated the last block that
a particular worker was working on.
assumes the worker has not seen a block before
(22.1) Implement the Manager constructor below to listen for incoming connections. Each time the Manager receives a new TCP connection, it should handle all future communication on the returned `clientsocket` object in a **new thread**.
def __init__(self, port):
self.worker_threads = []
EECS 485 Fall 2021 9/15
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/

uniqname: _________________________
self.shutdown = False
while not self.shutdown:
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as listen_sock:
listen_sock.setsockopt(
socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
listen_sock.bind((“localhost”, port)) listen_sock.listen()
while True:
clientsocket, _ = listen_sock.accept()
except socket.timeout:
self.worker_threads.append(
threading.Thread(
target=self.communicate_with_worker,
args=(clientsocket,)
self.worker_threads[-1].start()
(22.2) Implement the communicate_with_worker function below. The Manager should communicate on the same clientsocket for all incoming and outgoing connections with a given Worker for the lifetime of the program, without ever closing the socket. Hint: data that is received via recv() will be a stream of bytes. To convert a Python string ‘example’to bytes representation, do b’example’.
After receiving the message from a Worker, the Manager should call self.handle_message to handle the corresponding message.
def communicate_with_worker(self, clientsocket):
“””Communicate with a worker on a socket that is always open.””” with clientsocket:
while not self.shutdown:
message_chunks = []
while True:
data = clientsocket.recv(4096)
except socket.timeout:
if data[-1] ==
EECS 485 Fall 2021 10/15
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/

uniqname: _________________________
message_chunks.append(data[:-1])
message_chunks.append(data)
# Decode UTF8 and parse JSON data message_bytes = b”.join(message_chunks) message_str = message_bytes.decode(“utf-8”) try:
message_dict = json.loads(message_str) except json.JSONDecodeError:
self.handle_message(message_dict, clientsocket)
(22.3) Implement the handle_messagefunction below, which handles the message in message_dict and then communicates with the worker using the same clientsocket.
def handle_message(self, message_dict, clientsocket):
# clientsocket still open, send back a message
# Register message
if message_dict[“message_type”] == “register”:
response = self.get_task_msg(-1) self.send_msg_to_worker(clientsocket, response)
# Success response
elif message_dict[“message_type”] == “task_result” \
and message_dict[“result”] == “success”: self.record_and_advance(message_dict[“nonce”]) response = self.get_task_msg(message_dict[“blockid”]) self.send_msg_to_worker(clientsocket, response)
elif message_dict[“message_type”] == “task_result” \ and message_dict[“result”] == “failure”:
response = self.get_task_msg(message_dict[“blockid”]) self.send_msg_to_worker(clientsocket, response)
EECS 485 Fall 2021 11/15
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/

uniqname: _________________________
Free Response 2: Spotify485 Wrapped (30 points)
Write a MapReduce pipeline for a Spotify recommender system. The pipeline will output the top three artists for each user’s favorite genre.
Similarly to how you implemented Project 5, write your Map and Reduce code in Python3 and use the Hadoop Streaming Interface with key-value pairs separated by a tab (\t) character. Read input from stdin and write output to stdout. You must use all outlined stages to receive credit. You must use itertools.groupby to implement the reduce stages.
You will implement a two-stage MapReduce pipeline, where the input to map2.pyis made up of the output from reduce1a.py and the output of reduce1b.py . You will implement map1a.p,yreduce1a.py, map2.p,yand reduce2.p;ymap1b.pyand reduce1b.pyare already implemented for you. Notethatthelineformatfromreduce1a.py and reduce1b.py maydiffer.Theoutputfrom reduce1a.py andreduce1b.py willbescrambled and passed as the input to map2.py.
For example, if the outputs from reduce1a and reduce1b are:
reduce1a output: Line A
reduce1b output: Line C
Then one potential input to map2.pycould look like: Line A
map1a.pywill receive the following as input with items separated by a comma. The input represents listening data, where one row represents a listening session of one user listening to one song. The duration represents the number of seconds the song was listened to during that session.
, , , ,
● user_id: int
● song_name: string
EECS 485 Fall 2021 12/15
Licensed under Creative Commons Attribution-NonCommercial 4.0 License. https://creativecommons.org/licenses/by-nc/4.0/

uniqname: _________________________
● duration: int (ex: 180, 135, 210)
● genre: string (ex: “Pop”, “Rock”, “R&B”)
● artist: string (ex: “Usher”, “Adele”, “ ”)
Example Input:
18367, Hello, 135, Pop, Adele
18367, Hello, 180, Pop, Adele
18367, All Too Well, 31, Pop, 15398, Skate, 172, R&B, Silk Sonic 15398, Hello, 150, Pop, Adele
25302, Grenade, 200, Pop,
Reduce1b Output
Assume that the top 3 artists for each genre are output from reduce1b.py in the following format: \t , ,
Final Output
The final output should be the top three artists in each user’s favorite genre. Output one line per user. Separate each item with a comma.
, , , ,
Example Output:
18367, Pop, , Adele, 25302, Pop, , Adele, 15398, R&B, The Weeknd, Silk Sonic, SZA
Ranking Strategy
A user’s favorite genre is determined by the genre they have listened to for the longest time. If a user has listened to multiple genres for the same amount of time, you may arbitrarily select any most-listened-to genre

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com