Homework #3 (Problem Set)
CSE 3521/5521
Autumn 2021
Due: Thursday, October 14 by 11:59pm
Due: Saturday, October 16 by 11:59pm – No Late Submissions
Worth 75 points
I. Preparation
1. Read/study the assigned reading sections from AI: A Modern Approach, 3rd edition (Russell and Norvig)
2. Read/study the assigned reading sections from PRML: Pattern Recognition and machine learning
II. Collaboration & Piazza
Please read the course policy on Academic Misconduct in the syllabus. You may discuss the problems with your classmates at a high level only. Though, you must formulate your own solution without any help from others or third party sources.
In your solution, you need to list with whom you have discussed for each problem (please do so in the first page).
Do not post any part of your solution or spoilers on Piazza.
III. Guidelines
You will place your solutions to this portion of the homework (Problem Set) into a single PDF file named HW3_problems_name_dotnumber.pdf (e.g., HW3_problems_shareef_1.pdf).
It is highly suggested that you directly type your solutions and save them as a PDF file. If you write your solutions on paper and scan it you run the risk that the TA won’t be able to read your solution. In this case the TA will not give credit for any portion that is not readable.
You should write down the derivation of your answers, and highlight your answers clearly!
IV. Problems
Logical Agents – Propositional Logic [15 points]
1. State each sentence below as a Propositional Logic (not First Order Logic) sentence. Clearly write the atomic sentence that you assign to each symbol you define. The collection of all symbols assigned to atomic sentences is called the interpretation function. For example, if you want the symbol P to mean “Jane is strong”, then the interpretation function will include P = “Jane is strong”.
a. “Emily is tall and John has short hair and Jack is not smart”
(2 points) Write the propositional logic sentence
(2 points) Write the interpretation function
b. “If Jack did not take a shower then Jack is not clean”
(2 points) Write the propositional logic sentence. To receive full credit, your sentence must not contain the negation operator (¬).
(2 points) Write the interpretation function
2. (3 points) Use the enumeration method (also called model checking) to show that the following propositional logic sentence is valid. In other words, use a truth table.
(P ( Q) ( (¬ Q ( ¬ P)
3. (4 points) Given the following KB (Knowledge Base) of facts, prove S ( R
(1) ¬ P
(2) P ( (Q ( R)
(3) T ( ¬ Q
(4) T ( S
For each step of your proof: a) continue the numbering given in the KB, b) write the new sentence, c) write the line number(s) of the sentences you will use for this step, and d) write the name of the logical equivalence (see Figure 7.11 in the Russell and Norvig text) or inference rule (see the text and lecture notes).
For example if you want to prove Q given the following KB:
(1) P ( Q
(2) P
then your proof could be:
(3) ¬ P ( Q Using (1); implication elimination
(4) Q Using (2) and (3); Unit resolution
Logical Agents – First Order Logic [5 points]
1. (5 points) Using the following predicate definitions:
Person(x): x is a person
Evil(x): x is evil
Likes(x, y): x likes y
write the following statement as a First Order Logic sentence using a “Universe of Discourse”
“Someone likes everyone who is not evil”
Note: For the rest of the problems below, please perform rounding to your results after the second decimal number, unless stated otherwise. For example, 1.245 becomes 1.25 and -1.228 becomes -1.23.
Bag of words (BoW) and distances [17 points]
1. Given a dictionary { ”capital”: 1, ”state”: 2, ”team”: 3, ”basketball”: 4, ”hockey”: 5, ”professional”: 6, ”bank”: 7}, derive the BoW representation (i.e., a 7-dimensional vector) for the following three sentences without normalization:
(3 points) A: [“sacramento“’ “is “a“ “state“ “capital“ “and“ “it“ “has“ “a“ “professional“ “basketball“ “team“]
(3 points) B: [“Columbus“ “is“ “a“ “state“ “capital“ “and“ “it“ “has“ “a“ “professional“ “hockey“ “team“ “but“ “no“ “professional“ “basketball“ “team“]
(3 points) C: [“the“ “capital“ “bank“ “at“ “Cincinnati“ “has“ “a“ “professional“ “team“]
Note that, if a word shows up multiple times in the sentence, you should count it multiple times in the BoW representation. In other words, the representation can contain numbers that are not binary.
You MUST answer this question correctly before moving on to the next two questions in this section since their answers are built upon this question. If your answers for this question are wrong, your will not earn any points for your answers for the next two questions.
2. (4 points) L1 distance: Compute the L1 distance between A and B (see above); compute the L1 distance between A and C (see above).
3. (4 points) L1 normalization: Perform L1 normalization to the BoW representations of A, B, and C first, and then compute the L1 distance between A and B, and between A and C.
4. (0 points) There is no credit for this question. Please answer this for yourself. Do you see some disadvantages of vanilla BoW representations? For example, without 2-gram or higher-gram, the representation cannot tell the different meanings of “capital”. With some tokens ignored in the dictionary (like “no”), we cannot tell if Columbus has a basketball team or not. Even with “no” in the dictionary, we cannot simply tell if Columbus does not have a hockey team or a basketball team.
Histogram and Parzen window [16 points]
Figure 1 shows two datasets, each with 8 data instances.
1. (4 points) Given intervals [0, 1), [1, 2), [2, 3), [3, 4), and [4, 5)}, first construct the L1-normalized histograms for A and B. You can write each histogram as a 5-dimensional vector, where the first/second/third/fourth/ fifth element corresponds to the interval [0, 1)/[1, 2)/[2, 3)/[3, 4)/[4, 5). Then, compute the L1 distance between the L1-normalized histograms of A and B.
2. (4 points) Given intervals [0.5, 1.5), [1.5, 2.5), [2.5, 3.5), and [3.5, 4.5), first construct the L1-normalized histograms for A and B. You can write each histogram as a 4-dimensional vector, where the first/second/third/ fourth element corresponds to [0.5, 1.5)/[1.5, 2.5)/[2.5, 3.5)/[3.5, 4.5). Then, compute the L1 distance between the L1-normalized histograms of A and B.
3. (4 points) Given intervals [0.5, 1), [1, 1.5), [1.5, 2), [2, 2.5), [2.5, 3), [3, 3.5), [3.5, 4), and [4, 4.5), first construct the L1-normalized histograms for A and B. You can write each histogram as a 8-dimensional vector, following the ascending order of the intervals. Then, compute the L1 distance between the L1-normalized histograms of A and B.
4. (0 points) There is no credit for this question. Do you see that the histogram representation is sensitive to the bin (interval) locations and sizes?
5. (4 points) Now for dataset A, consider a kernel function (see Figure 2 below)
please compute the probability density p(u) for seeing a value u in the future:
(a) u = 1.5
(b) u = 2.5
The definition of p(u) is
where N is the total number of training data instances in A and un is the n-th data instance.
6. (0 points) There is no credit for this question.Do you see that to perform kernel density estimation for a value u, you have to keep all the training data?
Covariance, z-score, whitening, and PCA [22 points]
Figure 3 gives a dataset of two dimensions: { x1 = [20, 5]T, x2 = [8, -2] T, x3 = [-6, -3] T, x4 = [6, 4] T }.
1. (4 points) Compute the covariance matrix E of Figure 3. Here, we use E to denote a covariance matrix to prevent any confusion.
C[d, d’] =
where μ is the 2-dimensional mean vector of the four instances. Please make sure you divide by N (i.e., the number of data instances, 4) rather than N – 1.
2. (5 points) Compute the vectors {z1, z2, z3, z4} after applying the Z-score transformation to the dataset in Figure 3. Denote by μ the 2-dimensional mean vector of the four instances and by σd the standard deviation of the d-th dimension, the formula of the Z-score is
zn[d] = (xn[d] – μ[d]) / σd
Make sure you divide by N (i.e., the number of data instances, 4) rather than N – 1 in computing the standard deviation σd. That is, you should use the population standard deviation as shown in the slides, not the sample standard deviation.
3. (5 points) What is the 2-dimensional mean vector and what is the standard deviation of each dimension of the resulting { z1, z2, z3, z4}?
Figure 4 gives a dataset of two dimensions: {x1 = [20, -5]T, x2 = [8, 2] T, x3 = [-6, 3] T, x4 = [6, -4] T }.
The mean vector μ is [7, -1] T. The covariance E is .
E-0.5 = .
4. (4 points) Apply whitening to the dataset in Figure 4. The formula for whitening is zn[d] = E-0.5 (xn – μ).
What are the resulting vectors { z1, z2, z3, z4}?
5. (4 points) Apply PCA to the dataset in Figure 4 without reducing the dimensionality. That is, you are to construct the 2-by-2 matrix W = [w1, w2], where w1 and w2 are two 2-dimensional eigenvectors (the two eigenvectors are not the same value). L2-normalize each vector. You may use python or other software to compute W.
V. Carmen submission
Submit this portion of Homework #3 (Problem Set) into the drop box for the problem set portion. You will submit the programming portion into a different drop box.