RDDLSim — A simulator for the relational dynamic influence diagram
language (RDDL).
Protocol for client/server interaction in RDDLSim.
NOTE: This is a modified version of the PROTOCOL file from
the PPDDL mdpsim simulator: http://www.tempastic.org/mdpsim/
Original authors (John Asmuth, Michael Littman, Hakan Younes)
===
A session looks like this
client: session-request
server: session-init
–loop (until round limit or session time limit reached)
client: round-request
server: round-init
–loop (until trial termination criteria or session time limit reached)
server: state
client: action
—
server: end-round
—
server: end-session
NOTES REGARDING SERVER SESSION TIME LIMITS (currently set at 1080000 ms = 18 min):
(1) All times are specified in milliseconds.
(2) Clients should expect a “round end” message in place of a “state” message if the
time limit during a trial is reached, and clients should expect a “session end”
message immediately after a “round end” if the “round end” message indicates
time-left <= 0. Note that time-left entries for "round end" and "session end"
have been added to the message format below.
NOTES REGARDING RDDL2 AND OBJECT NOTATION WITH $ PREFIX
(1) While in RDDL2, there is an optional $ prefix for objects (required to disambiguate
object references in expressions, but optional elsewhere), there is no ambiguity in
client/server object references so the $ is suppressed in messages sent by the server.
A client may optionally use a $ prefix for objects when communicating with the server.
(2) All clients written for the IPPC 2011 (when $'s were not used) should work without
modification with the latest versions of the RDDL2 client/server.
NOTES REGARDING THE INITIAL STATE
(1) When a round starts, the server immediately sends a state message.
(2) In a fully observed domain, the state message contains the values of the
state-fluents, expressed as observed-fluents. Fluents whose value is the default
may be omitted from the message.
(3) In a partially observed domain, the state message contains no
observed-fluents, since observations can only be computed after the first action
has been taken. That is because observations are functions of both the new and
the previous state. On partially observed domains, clients should ignore the
initial state message.
CLIENT MESSAGES:
-Session request
session-request => “
name => “
problem => “
input-language => “input-language>” “rddl”|”pddl” “
-Round request
round-request => “
execute-policy => “
-Action spec
action => “
act => “
name => “
arg => “
value => “
-Resource request
resource-request => “
SERVER MESSAGES:
-Session init
session-init => “
task => “
sessionID => “
numrounds => “
timeallowed => “
-Round init
round-init => “
round => “
time-left => “
rounds-left => “
-State (and turn response)
state => “
“
turn-num => “
time-left => “
immediate-reward => “
observed-fluent => “
fluent-name => “
fluent-arg => “
fluent-value => “
no-observed-fluents => “
-End round
end-round => “
instance-name => “
client-name => “
round-num => “
round-reward => “
turns-used => “
time-left => “
immediate-reward => “
-End session
end-session => “
instance-name => “
total-reward => “
rounds-used => “
client-name => “
session-id => “
time-left => “
-Resource notification
resource-notification => “
time-left => “
memory-left => “