CS代考 Tutorial questions: embedded agents1

Tutorial questions: embedded agents1
1. A purely reactive agent is defined by the function Agr : E → Ac.
A standard agent is defined by the function Ags : RE → Ac.
(Recall: E is the set of all possible environment states; Ac is the set of all avail- able actions; RE is the set of all possible runs over E and Ac that end with an environment state.)
(a) Is the following statement true? “For every purely reactive agent, there is a behaviourally equivalent standard agent.”
If true, explain how you can define a standard agent that is behaviourally equivalent of any given purely reactive agent.
If not true, give an example of a purely reactive agent and explain why it is not possible to define a behaviourally equivalent standard agent.
(b) Is the following statement true? “For every standard agent, there is a be- haviourally equivalent purely reactive agent.”
If true, explain how you can define a purely reactive agent that is behaviourally equivalent of any given standard agent.
If not true, give an example of a standard agent and explain why it is not possible to define a behaviourally equivalent purely reactive agent.
(Recall: Two agents Ag1 and Ag2 are behaviourally equivalent if, for any environ- ment Env, R(Ag1, Env) = R(Ag2, Env).)
Answer
(a) “For every purely reactive agent, there is a behaviourally equivalent standard agent.”
This is true!
A purely reactive agent takes as input the current environment state, and decides what to do based on that.
A standard agent takes as input the current run in the environment, and decides what to do based on that.
We can define a standard agent so that, whatever the current run is, it performs the action that the purely reactive agent will perform when in the last state of that run. They will then behave equivalently.
Formally:
Let Agr : E → Ac be a purely reactive agent.
We can define a behaviourally equivalent standard agent Ags : RE → Ac as follows:
For any run r = (e0,α0,…,αu−1,eu) ∈ RE, Ags(r) = Agr(eu).
1Many thanks to for inspiring these questions. 1

(b) “For every standard agent, there is a behaviourally equivalent purely reactive agent.”
This is false!
Here is a counter example.
Let Ags be a standard agent, in an environment Env with states E = {e0, e1, e2} and possible actions α0 and α1, such that
• Ags((e0)) = α0,
• Ags((e0,α0,e0))=α0,
• Ags((e0,α0,e1))=α0,
• Ags((e0,α0,e0,α0,e1))=α0,and • Ags((e0,α0,e1,α0,e1))=α1.
Let:
• τ((e0,α0)) = {e0,e1},
• τ((e0,α0,e0,α0)) = {e1},
• τ((e0,α0,e1,α0)) = {e1},
• τ((e0,α0,e0,α0,e1,α0)) = {e2}, and • τ((e0,α0,e1,α0,e1,α1)) = {e2}.
Thus we have:
• (e0,α0,e0,α0,e1,α0,e2))∈R(Ags,Env),and
• (e0,α0,e1,α0,e1,α1,e2))∈R(Ags,Env).
Since there are only two possible actions, for any purely reactive agent Agr we
design it has to be the case that either Agr(e1) = α0 or Agr(e1) = α1. Let’s assume that Agr(e1) = α0, but then it cannot be the case that:
(e0, α0, e1, α0, e1, α1, e2) ∈ R(Ags, Env)
Let’s assume that Agr(e1) = α1, but then it cannot be the case that:
(e0, α0, e0, α0, e1, α0, e2) ∈ R(Ags, Env)
It follows then that R(Ags,Env) ̸= R(Agr,Env) (and hence the agents are
not behaviourally equivalent).
2

2. Rock-paper-scissors is a 2-player game where each agent simultaneously selects from one of three moves: rock (R), paper (P), or scissors (S). The outcome is determined as follows:
• R beats S;
• S beats P;
• P beats R;
• all other combinations result in a draw.
The set of actions available to the agent are Ac = {R, P, S}.
The utility of winning is 1, the utility of losing is −1, and the utility of a draw is 0.
If we consider a single round of the game we can define the possible environment states as
{e0,RR,RS,RP,SR,SS,SP,PR,PS,PP}
• e0 is the initial state before game play has begun, and
• each two-letter combination XY represents the state in which the agent has played X and its opponent has played Y (e.g., SP means the state in which the agent played scissors and its opponent played paper).
Assume our opponent (the “environment”) plays:
• P with probability 0.4,
• S with probability 0.5, and • R with probability 0.1.
Thus for any α ∈ Ac:
• τ((e0,α)) ∈ {RR,RS,RP,SR,SS,SP,PR,PS,PP},
• the probability that τ((e0,α)) ∈ {PP,RP,SP} is 0.4,
• the probability that τ((e0,α)) ∈ {PS,RS,SS} is 0.5, and • the probability that τ((e0,α)) ∈ {PR,RR,SR} is 0.1.
We can define a non-deterministic stochastic agent Agst : RE → 2Ac×[0,1]
such that (α,p) ∈ Agst(r) means that p is the probability the agent will perform action α as the next action of the run r, and the following conditions hold:
for all r ∈ RE:
• if (αi,pi) ∈ Agst(r) and (αi,pj) ∈ Agst(r), then pi = pj (i.e., there can be at
most one occurrence of each action); and
• 􏰊∀(αi,pi)∈Agst(r) pi = 1 (i.e., the sum of the probabilities associated with all the possible actions must equal 1).
Specify an optimal stochastic agent Agst for this problem (i.e., one that maximises the expected utility).
where:
3

Answer
Specify an optimal stochastic agent for this problem (i.e., one that maximises the expected utility).
Winning states (utility 1) are: PR, RS, SP. Drawing states (utility 0) are: RR, PP, SS. Losing states (utility −1) are: RP, SR, PS.
We can specify the agent in terms of variables p and r, such that Agst(e0) = {(P,p),(R,r),(S,1 − p − r)}, thus we need to find values of p and r that max- imise our expected utility.
Expected utility
= u(win)·P(win|Agst,Env) +u(lose) · P (lose | Agst, Env) +u(draw) · P (draw | Agst, Env)
= 1·(p·0.1+r·0.5+(1−p−r)·0.4) +(−1) · (r · 0.4 + p · 0.5 + (1 − p − r) · 0.1) +0 · (r · 0.1 + p · 0.4 + (1 − p − r) · 0.5)
= 0.1p+0.5r+0.4−0.4p−0.4r −(0.4r + 0.5p + 0.1 − 0.1p − 0.1r)
= 0.1p+0.5r+0.4−0.4p−0.4r −0.4r − 0.5p − 0.1 + 0.1p + 0.1r
= 0.3+(0.1−0.4−0.4+0.1)p+(0.5−0.4−0.4+0.1)r = 0.3 − 0.7p − 0.2r
Since the expected utility is decreasing in both p and r, the optimal strategy is p = 0 and r = 0 (i.e., Agst(e0) = {(S,1)}).
4