Problem 1: 10 points
Artificial Intelligence Final Exam May 18, 2020
Consider the following set of propositional formulas:
P => (Q V ~R)
Copyright By PowCoder代写 加微信 powcoder
(Q V R) => W
(~Q ^ ~X) => R
(W V X) => ~P.
A. Convert these to CNF.
1. ~P V Q V ~R
3a. ~Q V R
3b. Q V ~R
6. Q V X V R
7. ~W V ~P
8. ~X V ~P
B. Give a trace of the execution of the Davis-Putnam algorithm. When a choice point is reached, choose the first unbound atom alphabetically, and try “TRUE” before “FALSE”.
1. ~P V Q V ~R
3a. ~Q V R
3b. Q V ~R
6. Q V X V R
7. ~W V ~P
8. ~X V ~P
No easy cases. Try P=TRUE. Delete 2. Delete ~P from 1, 7, 8
1. Q V ~R
3a. ~Q V R
3b. Q V ~R
6. Q V X V R
7 and 8 are singleton clauses. Set W=FALSE, X=FALSE. Delete 7 and 8
Delete W from 4 and 5, X from 6
1. Q V ~R
3a. ~Q V R
3b. Q V ~R
4 and 5 are singleton clauses. Set Q=False, R=False. Delete 1, 3a, 3b, 4, 5
Delete Q and R from 6. 6 is now the null clause
Back to S0. Try P=FALSE. Delete 1, 7, 8. Delete P from 2.
3a. ~Q V R
3b. Q V ~R
6. Q V X V R
2 is a singleton clause. Set W=TRUE. Delete 2, 4, 5.
3a. ~Q V R
3b. Q V ~R
6. Q V X V R
X is a pure literal. Set X=TRUE. Delete 6
3a. ~Q V R
3b. Q V ~R
No easy cases. Try Q=TRUE. Delete 3b. Delete ~Q from 3a.
R is a pure literal. Set R=TRUE. Delete 3a. Success!
Answer: P=FALSE, Q=TRUE, R=TRUE, W=TRUE, X=TRUE.
Problem 2: 10 points
Consider the following planning problem. You have a collection of tasks, each of which has a time requirement and a value. The tasks are structured in a DAG; if there is an arc from u to v, then u must be executed before v. You can only execute one task at a time. You have an overall deadline D and a target value V . The problem is to find a sequence of tasks that conform to the constraints
and that achieves the total target value within the deadline. For instance in the graph below, if the deadline is 13 and the target value is 12 then A,B,E is one solution and B,C is another solution.
In all parts of this problem, your answer should be formulated in terms of a general undirected graph G, not in terms of this particular example.
A. Describe a state space that would be suitable for an exhaustive blind search. Your description should specify (a) what a state is; (b) what are the successors to a given state; (c) what is the start state; (d) how are goal states recognized.
Answer: A state is a sequence of tasks satisfying the precedence constraints and the deadline constraint. The successor to state S adds at the end an execution of some task whose prerequisites are all in S and that fits in the deadline. The start state is the empty schedule. A goal state is one that exceeds the target value.
A more efficient solution would be to use the fact that once you have completed a collection of tasks, the order in which you did them no longer matters. For instance, in the above network, if you have finished tasks A and B, it doesn’t matter going forward whether you did A first and then B or B first and then A.
In effect, this redefines a state as a set of tasks rather than a sequence of tasks. If you then use the same successor relation — add any task whose preconditions are satisfied and who fits in the deadline — the state space becomes a DAG rather than a tree.
One solution is to keep track of the sets of tasks you’ve seen by hashing them (for instance, you can use a key which is the list of tasks in alphabetical order). Here you would be saving time at the cost of memory.
A better solution is to systematize the search so that it generates each state only once. You can do this by adoping the following rule: Carry out a topological sort of the network of tasks to get a total ordering T. Then impose the additional rule that tasks can be added to state S only if it is later in T than all the tasks already in S. Once that is done, there is only one way to generate a state, and the state space is a tree again, but now a tree with no more than 2n states
B. Let n be the number of tasks. In the state space in (A), give upper bounds on the depth of the space, on the branching factor, and on the size of the space.
Answer: Depth ≤ n. Branching ≤ n (n is achieved only if there are no precedence constraints. Space size: n! (likewise, achieved only if there are no precedence constraints).
Which search strategy would be most suitable: depth-first search, breadth-first search, or iterative deepening?
Answer: Iterative deepening, since the depth of the shallowest goal node cannot be predicted. Problem 3: 15 points
Suppose that we modify problem 2 so that all of the tasks have value 1, and the times are all small integers. Thus, the target value is just the number of tasks being executed. We can then express this as a problem in propositional satisfiability using the following kinds of atoms:
Seq(T,I) — Task T will be the Ith to be executed. I ranges from 1 to the target value. Start(T,J) — Task T will start executing at time J. J ranges from 0 to the deadline.
For example, using the DAG and times shown in problem 2 with a deadline of 12 and a target value of 3, one solution would be [A,B,D]. The encoding of this would be that the atoms Seq(A,1), Seq(B,2), Seq(D,3), Start(A,0), Start(B,5) Start(D,9) would be true and the remaining atoms would be false.
Describe the types of constraints that you could use to encode an instance of the problem in propo- sitional logic. Illustrate each type of constraint with one such constraint from the above example.
Please note: The encoding must be in propositional logic, not in predicate calculus. You should not have any quantifiers.
1. The number of tasks executed is at least the target value, so for each value I between 1 and the target value, one of the tasks is executed at time I.
For each value v between 1 and the target, Seq(T1, v)) ∨ … ∨ Seq(Tk, v).
In this example, with v = 1 that would be:
Seq(A,1) ∨ Seq(B,1) ∨ Seq(C,1) ∨ Seq(D,1) ∨ Seq(E,1) ∨ Seq(F,1).
2. No two tasks are executed at the same point in the sequence.
For each value v between 1 and the target, for each pair of tasks Ti and Tj, ¬[Seq(T1, v)) ∧ Seq(Tj, v)].
In this example, one such constraint would be ¬[Seq(C,2) ∧ Seq(E,2)]
3. No task starts at two different times.
For each task Ti, for each pair of times p ̸= q, ¬[Start(Ti,p) ∧ Start(Ti, q)]. In this example, one such constraint would be ¬[Start(C,5) ∧ Start(C,7)].
4. A task T cannot start at a time that is greater than the deadline minus T.length. In this example, one such constraint would be ¬Start(C,5).
5. If T is the ith task in the sequence and starts at time E, then the i + 1st task will begin at time E + T.length. (If E + T.length is greater then the deadline, then this can be omitted, since the left hand side is ruled out by constraint 4.)
For any tasks TI , Tj and values v and w,
Seq(Ti,v) ∧ Seq(Tj,v+1) ∧ Start(Ti,w) ⇒ Start(Tj,w+Ti.length).
If w + Ti.length is greater than the dealine, then this can be omitted, since the left side of the implication is impossible by constraint (4).
Inthisexample,onesuchconstraintwouldbeSeq(C,2) ∧ Seq(A,3) ∧ Start(C,2) ⇒ Start(A,11).
6. The first task starts at time 0.
For each task T, Seq(T,1) ⇒ Start(T,0).
In this example, one such constraint would be Seq(A,1) ⇒ Start(A,0)
7. The tasks observe the precedence constraints. If there is an arc from task T to task U then for any times Y < X + T.length, ¬[Time(T,X) ∧ Time(U,Y)].
In this example, one such constraint would be ¬[Start(A,5) ∧ Start(D,3)].
Problem 4: 20 points
Let domain U be a collection of people and paintings. Let L be a first-order language with the following symbols:
P(p,x) — Person p painted painting x. D(x,p) — Painting x depicts person p. C(p1,p2) — Person p1 is a child of person p2. O(p,x) — Person p owns painting x.
A(x) — x is an acrylic painting. F,M — Fred, Mary.
A. Express the following in L:
i. Each of Fred’s children has painted a self-portrait.
Answer: ∀p C(p,F) ⇒ ∃x D(x,p) ∧ P(p,x).
ii. Mary has never painted an acrylic painting.
Answer: ¬∃x P(M,x) ∧ A(x).
iii. At least one of Mary’s children has painted a picture that depicts all of Fred’s parents.
Answer: ∃p,x C(c,M) ∧ P(c,x) ∧ ∀pa C(F,pa) ⇒ D(x,pa).
iv. Mary owns an acrylic painting of each of her children (not necessarily a single painting that depicts them all; possibly a collection of different paintings).
Answer: ∀p C(p,M) ⇒ ∃x A(x) ∧ O(M,x) ∧ D(x,p)
B. Consider the following inference:
v. All of the paintings that Fred owns were painted by his children.
∀x O(F,x) → ∃c C(c,F) ∧ P(c,x).
vi. None of Fred’s children have ever painted in acrylics.
¬∃c,x C(c,F) ∧ P(c,x) ∧ A(x).
vii. Fred does not own any paintings in acrylics.
¬∃x O(F,p) ∧ A(x).
Give a resolution proof of vii from v and vi. You should show all the clauses involved and the
resolutions. You need not show the intermediate steps in converting predicate calculus to CNF.
Converting (v) to CNF gives the clauses:
1. ~O(F,x) V C(Sk1(x),F).
2. ~O(F,x) V P(Sk1(x),x).
Converting (vi) to CNF give the clause
3. ~C(c,F) V ~P(c,x) V ~A(x).
Converting the negation of vii to CNF gives the clauses.
4. O(F,Sk2)
5. A(Sk2).
Resolving 4 with 1 using the substitution { x -> Sk2} gives
6. C(Sk1(Sk2),F).
Resolving 4 with 2 using the substitution { x -> Sk2} gives
7. P(Sk1(Sk2),Sk2).
Resolving 6 with 3 using the substitution { c->Sk1(Sk2) } gives
8. ~P(Sk1(Sk2),x) V ~A(x).
Resolving 8 with 7 using the substitution { x->Sk2 } gives
9. ~A(Sk2).
Resolving 9 with 5 gives the null clause.
Problem 5: 15 points
Let us consider a third variant of the scheduling problem in problem 2. In this variant, each task has a fixed value, but the time required by a task is probabilistic, and follows a stated distribution. For instance, a problem specification might state that Time(A) is a random variable, with the distribution Prob(Time(A)=4) = 0.75; Prob(Time(A)=8) = 0.25. Assume that random variables Time(X) for the different tasks are all independent.
Consider the very simple task network below with the following probability distribution
Prob(Time(A)=4) = 3/4. Prob(Time(A)=8) = 1/4 Prob(Time(B)=2) = 1/2 Prob(Time(B)=4) = 1/2 Prob(Time(C)=4) = 1.0
At each stage you have the option of executing any of the tasks whose prerequisites are satisfied or of stopping. If you stop before the deadline then the payoff received is the value of the tasks you have executed. The deadline is 11. If you exceed the deadline, then your payoff is −20, and you forfeit the value of the tasks executed.
A. Draw, BUT DO NOT EVALUATE, the decision tree for this problem.
B. Suppose that you decide to execute A and then, regardless of the outcome, execute B. Let E be the event that you did not run over the deadline. What is P(Time(A)=4 | E)?
Answer By Bayes’ Law P(Time(A)=4 | E) =
P(E|Time(A)=4) · P(Time(A)=4)/ [P(E|Time(A)=4) · P(Time(A)=4) +P(E|Time(A)=8) · P(Time(A)=8)] =
(1 · 3/4) /[(1 · 3/4) + (1/4 · 1/2) = 6/7.
C. Suppose that there are n tasks; that each random variable Time(X) has k possible values; and that generally at each stage there are q possible executable tasks to choose from. Estimate the size of the decision tree.
Answer: knqn (Since the option to stop always puts an end to branching, it need not be included in the estimate.)
D. Depending on the relation of the deadline to the task length, your answer to part (B) above may be much too pessimistic. Consider the following two extreme cases.
i. Suppose that the deadline is so short that you cannot possibly finish more than m tasks, where m ≪ n. How large is the decision tree? Answer: km · qm
ii. Suppose that the deadline is so late that you cannot possibly overrun it. Give a simple algorithm for computing a strategy. (Hint: do not construct, or think about, a decision tree.)
Answer: If you can’t overrun the deadline, then you may as well plan to execute all the tasks. Do a topological sort, and execute the tasks in that order.
Problem 6: 15 points
Consider the problem of learning a recurrence equation, given a sequence of natural numbers
For instance:
Given: The sequence 2, 4, 8, 16, 32, 64.
Infer: The formula X0 = 2, Xi = 2Xi−1, i = 0…5.
Given: The sequence 1, 1, 2, 3, 5, 8, 13, 21.
Infer: The formula X0 = 1, X1 = 1, Xi = Xi−1 + Xi−2, i = 0..7.
Given: The sequence 1, 4, 10, 25, 61, 148.
Infer: The formula X0 = 1, X1 = 4, Xi = 1 + 2Xi−1 + Xi−2, i = 0 . . . 5.
Given: The sequence 1, 2, 3, 5, 9, 15, 25, 43.
Infer: The formula X0 = 1, X1 = 2, X3 = 3, Xi = Xi−1 + 2Xi−3, i = 0 . . . 7
A. Describe how this kind of learning can be characterized in terms of MDL (minimum description length) learning.
Assume that in every case what is being learned is an equation of the form Xi = a0 + a1Xi−1 + . . . + ak Xi−k , with initial conditions X0 = c0 . . . Xk−1 = ck , together with the length of the sequence. Assume that all the ai are non-negative integers.
Assume (slightly unrealistically) that it is OK simply to represent numbers in binary; e.g. 5 can be represented as 101. Thus, a number n can be represented in ⌈log2(n + 1)⌉ bits. (⌈x⌉ is x rounded up to the nearest integer.)
Answer: The encoded form requires recording the numbers n, k, a0 . . . ak, and c0 . . . ck−1. Once the sequence becomes long enough, the encoding of these 2k + 3 (generally small) numbers will be much shorter than the n (generally large) numbers in the sequence.
B. Using the encoding that you’ve described in part (A), how many terms of the Fibonacci sequence (the second sequence above) before the encoded form is shorter than the raw data?
Answer: The encoding requires 2 bits for k, 1 bit each for c0 = 1,c1 = 1,a0 = 1,a1 = 1,a2 = 1 and ⌈log2(n + 1)⌉ bits for n.
Comparing the number of bits needed for the raw data |D| to the number needed for the encoded data |E|we get
0−n 0-2 0-3 0-4 0-5 F(n) 2 3 5 8
|D| 4 6 9 13 |E| 9 9 10 10
So once you’ve reached the sequence 1, 1, 2, 3, 5, 8, it is reasonable to infer the recurrence equation.
C. Suppose that now we modify this so that the equation is satisfied with most of the time, but some fraction of the time, randomly, there are off-by-one errors in either direction. For instance, Given: 1, 1, 2, 4, 6, 10, 15, 26, 41, 67
Infer: the Fibonacci recurrence X0 = 1, X1 = 1, Xi = Xi−1 + Xi−2, with off-by-one errors at i= 3, 6, and 7.
Describe how the MDL account in (A) can be modified to allow this kind of inference. (You do not have to work out the number of bits in this example; just describe what you would do in general.)
Answer: Augment the representation in (A) with the list of 0’s, 1’s, and −1’s corresponding to the errors in the series relative to the recurrence relations. If errors are rare, then one can save more
space by using a Shannon encoding of the list; but even if 0, 1, and −1 are equally probable, the encoding of this list will in general be much shorter than the actual numbers in the sequenece.
Problem 7: 15 points
Consider the problem in Problem 6.C of a recurrence equation with random off-by-one errors. Sup- pose that someone proposes the following stochastic generative model:
The parameters of the model consist of
a. Values n, which can be any integer, and k, which is an integer less than or equal to n. b. A recurrence equation Xi = a0 + a1Xi−1 + … + akXi−k, where k can be arbitrary. c. Initial conditions X0 = c0 . . . Xk−1 = ck−1.
d. A random variable Z with a fixed distribution over domain {−1, 0, 1}
The model then generates the output values v0 . . . vn−1 using the following algorithm.
for(i=0..k−1)vi =ci; for (i = k..n − 1) {
d = random sample of P;
vi =d+a0 +a1 ·vi−1 +…+ak ·vi−k }
return v0 . . . vn−1
A. If one assumes that all parameter settings are equiprobable, and one is simply looking for the
value of the parameters that maximizes P (Data|P arameters) then this overfits badly. Explain why.
Answer: Ifyouchoosethemodelwithk=n,andwithci =vi,andP(Z=0)=1,then P (Data|P arameters) = 1. Then the recurrence equation can be chosen arbitrarily, since it is never used. But this model obviously has no predictive value for values greater than k − 1.
In fact, you can do this in a more parsimonious way:
1. Choose k = (n − 1)/2.
2. Set c0 = v0 …ck−1 = vk−1.
3. The recurrence equation then gives a linear equation in variables a0 . . . ak for each of the values vk . . . vn−1. If you solve this set of k + 1 equations in k + 1 unknowns, you have val- ues of c0 …ck−1,a0 …ak that fit the data exactly, so again you can take P(Z = 0) = 1, and P (Data|P arameters) = 1.
This does give predictions, but since it is fitting n parameters to n values, it is very likely to overfit.
B. However, if you assume an appropriate prior distribution over parameter settings, and instead maximize P (Data|P arameters) · P (P arameters), then one can avoid that problem. Explain how that might be done.
Answer: You need to penalize heavily for large values of k; for instance, take the probability of a model to be proportional to c−k . In that case, for a long enough sequence, the probability of a short recurrence equation and some particular sequence of 0’s, 1’s, and −1’s for Z will be larger than the probability of a model with a large value of k that directly fits the data.
For any sequence of 1s, 0s, and −1s of length n, if you set the probability distribution of Z to be the frequency of the value in the sequence, then the probability of the sequence is at least 3−n. Therefore, if you choose the probability of a model with parameter k to be c−k where c > 3, you will avoid the first overfitting scenario in the answer to part (A) for large enough n; and if you require c > 6, you will avoid the second overfitting scenario.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com