CS代考 Data C100/C200, Final Spring 2022

Data C100/C200, Final Spring 2022
Student ID:
Exam Room:
Name and SID of left neighbor: Name and SID of right neighbor:

@berkeley.edu
Instructions:
This final exam consists of 160 points spread out over 14 questions and the Honor Code and must be completed in the 170 minute time period ending at 10:00, unless you have accommodations supported by a DSP letter.
Note that some questions have circular bubbles to select a choice. This means that you should only select one choice. Other questions have boxes. This means you should select all that apply. Please shade in the box/circle to mark your answer.
Q0 [1 Pt]: Honor Code
As a member of the UC Berkeley community, I act with honesty, integrity, and respect for others. I am the person whose name is on the exam and I completed this exam in accordance with the Honor Code.
Signature:

Data C100/C200 Final, Page 2 of 28 SID: This page has been intentionally left blank.

Data C100/C200 Final, Page 3 of 28 SID:
1 Coo. . . You Want in That? [15 pts]
Brewster, a coffee barista, has opened a new cafe ́ called “The Roost.” You are tasked with helping him manage his daily sales. You are provided with a pandas DataFrame roost of today’s cus- tomer transactions, the first 6 rows of which are shown below. Note that customers can only buy one drink at a time.
(a) [3 Pts] “The Roost” sells three drinks, but only two are listed in the roost DataFrame. It seems Brewster one-hot encoded the column representing the drink the customer bought. However, he accidentally dropped one of the columns. Update roost with the missing col- umn representing the third drink, “ with .”
After running your code, the first 6 rows of the updated roost DataFrame should be:
roost[‘ with ‘] =
___________________________________________________________

Data C100/C200 Final, Page 4 of 28 SID:
(b) [3 Pts] Suppose we want to build a linear model that predicts the column you just added (“ with ”):
yˆ = θ 1 x 1 + θ 2 x 2 + θ 3 x 3
Let features x1 be Price, x2 be , x3 be Iced Tea, and yˆ be your pre- dictionof with .Whatwouldbetheoptimalparameter estimates θ = 􏰍θ1, θ2, θ3􏰎T such that the model predicts yˆ exactly equal to the “ with ” column?
⃝ A. 􏰍1/2,
⃝ B. 􏰍1/20,
⃝ C. 􏰍1/2,
⃝ D. 􏰍1/20,
1/20, 1/20􏰎T −1/2, −1/2􏰎T
−1/2, −1/2􏰎T 1/20, 1/20􏰎T
⃝ E. 􏰍1/2, ⃝ F. 􏰍1/10, ⃝ G. 􏰍1/2,
1/20, −1/2,
−1/2􏰎T 1/10􏰎T
(c) [4 Pts] Suppose Brewster wants to transform the one-hot encoded data into a new DataFrame that has just three columns: the customer name Customer, drink price Price, and drink name Drink. The resulting DataFrame roost transactions should as follows:
Complete the following skeleton code. (Hint: Documentation for melt is on the following page.)
df = roost.melt([__(i)__])
df = __(ii)__.drop(columns=[‘value’])
df = df.rename(columns={‘variable’:’Drink’})
roost_transactions = df.reset_index(drop=True)
(i): ________________________________________________________
(ii): ________________________________________________________

Data C100/C200 Final, Page 5 of 28 SID: Here is a refresher on the Pandas melt method, paraphrased from Project 2A:
(d) [5 Pts] Finally, Brewster needs your help to construct a pivot table from roost transactions that looks similar to the below table. The resulting pivot table should contain a column for each type of drink, where each record now represents a unique customer and how much they spent on each type of item.
roost_transactions.pivot_table(__D__).fillna(0) Fill in the blank for D:

Data C100/C200 Final, Page 6 of 28 SID:
Excellent Purchase! (…Purchase!) [15 pts]
[6 Pts] Isabelle and have opened up competing stores on an island. They’ve each collected data, and they have asked you to help determine who is the better salesperson.
Isabelle and Tom collect data in two separate tables; each table has one record for every resident on the island, as well as how much that resident spent at that store in a particular day. You are provided two tables: one for Isabelle’s store (Isabelle) and one for Tom’s store (Tom). The first five rows of each table are shown below:

Construct a SQL query below that returns just one row that contains the total number of wins Isabelle got. Someone gets a ‘win’ if they get a resident to spend more than the other person. For example, Apollo spent more with Isabelle than Tom, so that would represent one win for Isabelle. For the first five rows, Isabelle has a total of 3 wins, since Apollo, Maddie, and Cally spent more money at Isabelle’s store.
Construct this SQL query using the skeleton code below. The result for the first five rows is shown to the right.
SELECT _(i)_ AS isabelle_wins
FROM __(ii)__ INNER JOIN Isabelle
(i): _______________________________________________________
(ii): _______________________________________________________
(iii): _______________________________________________________
[9 Pts] Isabelle and Tom also jointly own an online shopping catalogue. Residents can pur- chase items from the catalogue only if they have a bank account.
Pelly, the island’s accountant, maintains a SQL table Account that contains all owners of bank accounts and the monetary amount in each account. Isabelle and Tom also have a shared

Data C100/C200 Final, Page 7 of 28 SID:
SQL table Transaction of attempted transactions from customers through their online catalogue. Both tables are shown below:
Construct a SQL query below that returns the output table to the right, which contains a col- umn approved that indicates if a customer has enough money stored in their bank account to complete all the transactions they are at- tempting: 1 (True), 0 (False), or NULL/None (has a bank account, but did not make any transactions).
• All residents with bank accounts should be in the output table.
Transaction
• If a customer attempts to make a transaction but they don’t have a bank account, they should not be included in the output table (e.g., Goldie is not included).
• The input tables have no NULL/None values in them. Construct this SQL query using the skeleton code below.
SELECT a.resident_id, a.resident_name, _(i)_ AS approved
FROM Account AS a _(ii)_ JOIN Transaction AS t
(i): _______________________________________________________ (ii): ⃝ INNER ⃝ LEFT ⃝ RIGHT ⃝ OUTER ⃝ CROSS
(iii): _______________________________________________________
_______________________________________________________

Data C100/C200 Final, Page 8 of 28 SID:
3 ’s Visualization Trawler [8pts]
You have just bought some visualizations from the shady visualization seller, . However, there are some flaws in the visualizations he’s sold you.
While we have some clear answers in mind, we will be lenient when grading this question. As long as you correctly identify some flawed aspect of the image and explain your answer thoroughly, you will receive full credit. Answers such as ”there is no flaw” or answers about the underlying data (and not the image itself) will receive no credit. Please keep your answers concise—nothing more than a couple of sentences.
Note that the titles for these plots are given directly above the image—answers such as “bad title” or “missing title” will also receive no credit.
(a) [4 Pts]
List two aspects of this plot that are incorrect or misleading.

Data C100/C200 Final, Page 9 of 28 SID:
(b) [4 Pts] Below, “Console Type” refers to PC, Playstation and Xbox; and “Video Game Genre” refers to RPGs, Action, and Shooter.
Describe two flaws with this plot.

Data C100/C200 Final, Page 10 of 28 SID:
4 Dr. How Do You Spell That? [8 pts]
You are working with a British researcher, Dr. Who, to file a lab report in the Data 100 Research Journal. Unfortunately, the journal only accepts American versions of words, but Dr. Who’s lab report uses British words. For example “colour” is a British spelling of the word “color”.
(a) [3 Pts] First, determine the output of the re.findall statement below.
>>> sentences = [
“Please analyse the catalysers.”,
“You can’t psychoanalyse a lysed protein.”,
>>> pattern = r”\w\wyse”
>>> [re.findall(pattern, s) for s in sentences]
…[[‘alyse’, ‘alyse’], ______X______]
What result will be in the blank marked with an X? ⃝ A. []
⃝ B. [‘alyse’]
⃝ C. [‘lyse’]
⃝ D. [‘alyse’, ‘lyse’] ⃝ E. [‘lyse’, ‘lyse’]
(b) [5 Pts] Now you try to identify some British spelling idiosyncrasies in Dr. Who’s writing. Write a regular expression that will find all words that contain “our”, excluding words with 0 or 1 letters before “our”. In other words, your regular expression should find “colour”, and “favourite”, but should not find “our”, or “dour”, as they have 0 and 1 letter preceding the “our”, respectively. Unlike part (a), you want the results of your findall to be the entire word, not just some of the letters, as shown in the example below.
>>> sentences = [
“Our favourite colour is blue.”,
“I am four hours away from the harbour.”,
“I am enamoured with our tour of the arbour.”
>>> pattern = r”__B__”
>>> [re.findall(pattern, s) for s in sentences]
…[[‘favourite’, ‘colour’], [‘harbour’], [‘enamoured’, ‘arbour’]]
Fill in the blank for B such that the above code works correctly.
pattern = r”________________________________”

Data C100/C200 Final, Page 11 of 28 SID:
5 Back to the Future [3 pts]
You successfully sneaked in a survey on KPop groups and a survey on cats vs dogs on this semester’s Data 100 exams! Let’s do a math problem on the result of the survey.
(a) [3 Pts] Recall the definition of a multinomial probability from lecture:
If we are drawing at random with replacement n times, from a population broken into
three separate categories (where p1 + p2 + p3 = 1):
• Category 1, with proportion p1 of the individuals. • Category 2, with proportion p2 of the individuals. • Category 3, with proportion p3 of the individuals.
Then, the probability of drawing k1 individuals from Category 1, k2 individuals from Category 2, and k3 individuals from Category 3 (where k1 + k2 + k3 = n) is:
n! pk1 pk2 pk3 k1!k2!k3! 1 2 3
From the original results of your survey, you learn that 14% of Data 100 students are BTS fans and 24% of Data 100 students are Blackpink fans and the rest are fans of neither. Suppose you randomly sample with replacement 99 students from the class. What is the probability that the students are evenly distributed between the three different groups? Please leave your answer as an expression; there is no need to fully calculate it out.
(b) [0 Pts] Which of the surveys did you prefer? Select your favorite between the two. ⃝ A. BTS/Blackpink survey from Midterm 1.
⃝ B. Cats vs Dogs Drawing survey from Midterm 2.
⃝ C. Neither.

Data C100/C200 Final, Page 12 of 28 SID:
6 Election Perfection [22 pts]
Ariel has recently joined a data science company in Phosphorus Valley to perform data analysis for their elections. To start off, Ariel’s first task is to create a linear model that performs one task: The model must perfectly predict the winners of the local election that has 2 candidates given their vote share.
Ariel was provided the results of the past 150 elections, a sample of which is shown below. As- sume for this question that the largest vote share wins the election (i.e., strictly more than 50%), and that there are never any ties (i.e., Candidate 0 and 1 will never both get 50% of the vote share).
Candidate 0 0.35 0.12 0.51 0.65 0.40
Candidate 1 0.65 0.88 0.49 0.35 0.60
Winner Candidate 1 Candidate 1 Candidate 0 Candidate 0 Candidate 1
0.35 0.12
0.65 0.88
0  1 
Above, Ariel has constructed a design matrix
vote share and the second feature represents
output vector Y, which represents the election winners (1: Candidate 1; 0: Candidate 0). Help Ariel predict the election winners given the design matrix X and your linear modeling toolkit.
(a) [5 Pts] Suppose you limit the training data matrix to two arbitrary training points and con- struct the 2×2 data matrix X2 and the corresponding election winners Y2. Fill in the blank:
If we train a linear regression model fit to the training data X2 and Y2, we can _________ achieve zero training loss.
⃝ A. always ⃝ B. sometimes ⃝ C. never Justify your answer:
 0.51 0.65 0.40
 0.49 0.35 0.60
X, where the first feature represents Candidate 0’s Candidate 1’s vote share in each election, and an

Data C100/C200 Final, Page 13 of 28 SID:
(b) [4 Pts] Ariel decides to improve the above model by training on the entire dataset of 150 past elections. However, Ariel decides to use ordinary least squares (OLS) with a a design matrix X ̃, which is the X matrix with an added intercept feature, to compute θˆ using the normal equation: θˆ = (X ̃T X ̃)−1X ̃T Y. For your convenience, a sample of X ̃ and Y are shown below.
1 0.35 0.65
1 0.12 0.88
1 0.51 0.49 X ̃ =    
1 0.65 0.35 10.400.60
… … …
0    
0  1 
Which of the following are true about this modeling approach?
□ A.X ̃TYisofsize(150,3).
□ B.X ̃TYisavector.
□ C. Ariel cannot achieve perfect accuracy nor zero loss on the training set using this approach.
□ D. Using L2 regularization with λ > 0 is always a better choice than OLS for the dataset above.
(c) [4 Pts] Ariel decides to switch to a logistic regression model and also reverts to the original 2-feature design matrix X, with no bias term. A sample of X and Y is shown below:
0.35 0.65
0.12 0.88
Which of the following is true about Ariel’s modeling approach using logistic regression with the constructed design matrix?
□ A. The design matrix is not full column rank.
□ B. Logistic regression outputs the probability that Candidate 1 wins the election.
□ C. There is no uniquely optimal parameter value θˆ for logistic regression with this design matrix using binary cross-entropy loss.
□ D. Logistic regression would usually be used for this binary classification task.
0.51 0.49 
0.40 0.60  1 
0.35 Y=0 … … …
 1   0 

Data C100/C200 Final, Page 14 of 28 SID:
(d) [5Pts] Regardlessofyourpreviousanswer,supposethatArielfitsalogisticregressionmodel to the data in part (c). Provide an optimal parameter θˆ such that we can achieve the maximum training accuracy possible for this data. What is the maximum possible training accuracy? Justify your answer.
Maximum possible
Optimal θ: _______________ training accuracy: ____________
Justification:
(e) [4 Pts] One of Ariel’s friends from Data 100 mentions that decision trees almost always achieve perfect accuracy, at least on the training dataset! As their final modeling approach, Ariel fits a decision tree to the 2-feature design matrix X and output Y as described in part (c).
What is the height of the optimal decision tree if we use weighted node cross-entropy loss? Justify your answer. Define the ”height” of a decision tree as the maximum number of yes/no questions (a.k.a. splits) that can possibly be asked before arriving at a prediction.
Depth of optimal decision tree: _______________ Justification:

Data C100/C200 Final, Page 15 of 28 SID:
7 Error Fn: Broken [21 pts]
Alex has compiled a list of broken loss functions and requests your help to diagnose drawbacks.
For each of the loss functions L below, assume a linear model yˆ = fθ(x) = θT x. Then, the optimal parameter θˆ is a real-valued vector that minimizes the empirical risk n1 􏰒ni=1 L(yi, yˆi) with respect to θ on a n-point dataset, where datapoint i has input xi and output yi.
(a) [3 Pts] Which of the following are true of the following cubic loss function L1 when we minimize the associated empirical risk with respect to θ?
L 1 ( y , yˆ ) = ( y − yˆ ) 3
□ A. There are infinitely many optimal parameters for this loss function.
□ B. Gradient descent on this loss function diverges, i.e. approaches one or more parameters of infinite magnitude.
□ C. This loss function is non-convex.
□ D. This loss function is non-differentiable on at least one point in its domain.
(b) [3 Pts] Which of the following are true of the following negative squared loss function L2 when we minimize the associated empirical risk with respect to θ?
L 2 ( y , yˆ ) = − ( y − yˆ ) 2
□ A. There are infinitely many optimal parameters for this loss function.
□ B. Gradient descent on this loss function diverges, i.e. approaches one or more parameters of infinite magnitude.
□ C. This loss function is non-convex.
□ D. This loss function is non-differentiable on at least one point in its domain.
(c) [3 Pts] Which of the following are true of the following loss function L3 that calculates accu- racy when we minimize the associated empirical risk with respect to θ?
􏰊0 y̸=yˆ L3(y,yˆ)= 1 y=yˆ
□ A. There are infinitely many optimal parameters for this loss function.
□ B. Gradient descent on this loss function diverges, i.e. approaches one or more
parameters of infinite magnitude.
□ C. This loss function is non-convex.
□ D. This loss function is non-differentiable on at least one point in its domain.

Data C100/C200 Final, Page 16 of 28 SID: Alex now provides you with the sample dataset shown to the right, with
one feature x and response y.
(d) [6 Pts] Suppose we use a simple linear regression model with
an intercept term, i.e., yˆ = θ0 + θ1x with the loss function L4
(shown right). What is the optimal θˆ = (θˆ , θˆ ) ∈ R2 that mini- 01
mizes the empirical risk n1 􏰒ni=1 L4(yi, yˆi) on the sample dataset? Justify your answer.
36 4 25 9 16 16
(e) [6 Pts] Suppose we use a constant model, i.e., yˆ = θ with the loss function L5 (shown right). What is the optimal θˆ ∈ R that mini- mizes the empirical risk n1 􏰒ni=1 L5(yi, yˆi) on the sample dataset? Justify your answer.
L5(y, yˆ) = (yˆ −
L 4 ( y , yˆ ) = | yˆ | − | y |
θˆ : ___________ θˆ : ___________ 01
θ: ___________

Data C100/C200 Final, Page 17 of 28 SID:
8 The English Alphabet Goes α, |B|, C(ross-validation) [12 pts]
Jordan is using stochastic gradient descent to optimize a model with 1,000 parameters that can detect vehicles and pedestrians in images. However, Jordan is having trouble choosing hyperpa- rameters such as the learning rate α and the batch size |B|.
(a) [4 Pts] Complete the code snippet below to complete a function that performs num iter stochastic gradient descent updates given a dataset X and y. Assume the functions get sample and grad func have already been defined.
def get_sample(X, y, B):
Returns a random sample with replacement of X and y,
each of size B.
# Implementation not shown
def grad_func(X, y):
Returns the gradient of the empirical risk with respect to
theta given the dataset X and y.
The returned output is a np.ndarray of length 1000.
# Implementation not shown
def sgd_fit(X, y, alpha, B, num_iter=50000):
theta = np.zeros(1000)
for i in range(num_iter):
X_batch, y_batch = get_sample(X, y, B)
grad = grad_func(_______________, _______________)
theta = __________________________________________
return theta
(b) [2 Pts] As α increases beyond its optimal value, SGD will likely oscillate . Fill in the blank with the appropriate choice of less/more.
(c) [2Pts] As|B|increases,SGDwilllikelyoscillate
choice of less/more. ⃝ A. less
⃝ B. more .Fillintheblankwiththeappropriate

Data C100/C200 Final, Page 18 of 28 SID:
(d) [2 Pts] Suppose Jordan employs k-fold cross-validation to compute optimal values of the
hyperparameters α and |B|. Which of the following is true?
□ A. k-fold cross-validation will not yield useful information for s

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts