CS代考 ICAPS 2011.

PowerPoint Presentation

LPRPG-P: Reasoning with Preferences

Copyright By PowCoder代写 加微信 powcoder

“LPRPG-P: Relaxed Plan Heuristics for Planning with Preferences.” A. J. Coles and A. I. Coles. ICAPS 2011.

Tracking Preference Satisfaction in Search
Preferences have corresponding automata:
Automated translation from LTL into automata.
At each state maintain an automaton corresponding to each preference.
Each starts in initial position, update according to initial state.
Each time an action is applied (state expanded):
Update the automaton if condition from current position fires.

Search in LPRPG-P
Preference violation cost of a state PVC(S):
Sum of violation costs of all automata that are in E-Vio;
Search until a solution is found:
i.e. a plan that satisfies all the hard goals.
Continue searching once a goal is found:
But prune all states where PVC(S) > cost of best solution found so far.

The Challenge: Heuristic Guidance
Planning heuristics generally focus on finding a path to the goal in a relaxed problem.
Relaxed problem ignores delete (negative) effects of actions.
Typically these heuristics look for a short path to the goal.
If we have a problem with no hard goals and only preferences/soft goals the RPG heuristic value is zero for every state!
We need guidance about how to satisfy preferences and how difficult it will be to do so.

Relaxed Planning Graph (RPG) Example
Initial state: at A.
Goal state: at E.

Navigate AB

NavigateAC

Fact layer 0
Fact layer 1
Fact layer 2
Action Layer 1
Action Layer 2

Fact layer 3
Action Layer 3

NavigateBA

NavigateBE

NavigateAB

NavigateAC

NavigateCA

NavigateCD
NavigateDE

Navigate EB

NavigateBE

NavigateBA

NavigateAB

NavigateAC

NavigateCA

NavigateCD

NavigateDC

Fact layer 0
Fact layer 1
Fact layer 2
Action Layer 1
Action Layer 2

Fact layer 3
Action Layer 3

NavigateDE

Navigate EB

NavigateBE

NavigateBA

NavigateAB

NavigateAC

NavigateCA

NavigateCD

NavigateDC
Navigate AB

NavigateAC

NavigateBA

NavigateBE

NavigateAB

NavigateAC

NavigateCA

NavigateCD
Select a Goal;
Select achiever;
Add Preconditions to Goals.
Relaxed Plan Extraction

Relaxed Plan Extraction
Select a Goal;
Select achiever;
Add Preconditions to Goals.
(preference p1 (always (not (at B)))

Fact layer 0
Fact layer 1
Fact layer 2
Action Layer 1
Action Layer 2

Fact layer 3
Action Layer 3

Navigate AB

NavigateAC

NavigateBA

NavigateBE

NavigateAB

NavigateAC

NavigateCA

NavigateCD
NavigateDE

Navigate EB

NavigateBE

NavigateBA

NavigateAB

NavigateAC

NavigateCA

NavigateCD

NavigateDC

What do we Need?
Termination Criterion.
Stop building graph when goals appear insufficient

A mechanism for selecting the right achiever.
Earliest not always best;
Arbitrary could miss something good.

Effectively we need to track knowledge about preferences whilst building the RPG.

A Preference Aware RPG
At each fact layer we maintain a set of preferences for each fact.
These are the preferences that are violated in achieving the fact at this layer;
For facts that appear in the current state this is empty;
The preference violation set for a newly appearing fact is that of the action that achieves it, and the union of those for the action’s preconditions.
If a new path to a fact is discovered, use the lowest cost of its existing/new sets;
Otherwise the set at that layer is the set from the previous layer.

Now we can build the RPG to the point at which:
No new actions appear and;
No preference violation sets are changing.

This means we have to build the RPG to more layers, but it does mean that we can generate potentially longer relaxed plans that satisfy preferences.

Preference Aware RPG
Initially violations empty;
Build forward generating sets;
Violation set cost = sum cost of preferences in it.
(preference p1 (always (not (at B)))

Fact layer 0
Fact layer 1
Fact layer 2
Action Layer 1
Action Layer 2
Fact layer 3
Action Layer 3

Navigate AB {P1}

NavigateAC

NavigateBA{P1}
NavigateBE
NavigateAB
NavigateAC
NavigateCA
NavigateCD
NavigateDE
Navigate EB
NavigateBE
NavigateBA
NavigateAB
NavigateAC
NavigateCA
NavigateCD
NavigateDC

Being More Specific
Earlier we said: “The preference violation set for a newly appearing fact is that of the action that achieves it, and the union of those for the action’s preconditions.”

We didn’t define the preference violation set for an action, it depends on:
The preference type; The trajectory so far; Whether that preference is already violated.

Solution: back to automata.

Optimistic Automaton Positions
In PDDL 3 there is always a ‘best’ position in the automata so we can maintain the best position possible for RPG layers (the further from E-Vio the better).
In general LTL we’d have to maintain all possible positions.
A violates P when applied in l if, it activates a transition from the (all) automaton position(s) in l:
to E-Vio (e.g. sometime-before), (at-most-once);
to Unsat and the facts required to activate the transition back to a Sat state are not yet in the RPG (e.g. sometime after).

Reappearing Actions
If an action appears later in the RPG with a lower preference violation cost:
Add it again with the precondition that made the automaton transition;
Propagate to its effects and actions that rely on them by building further action layers until no further change in violation sets.

Then continue building RPG as normal.
Termination criterion: no other facts/actions appearing and preference violation sets unchanging.

Not all preference problems have hard goals, so we need to decide what goals to achieve:
Some soft goals are explicit in the problem (at-end);
Others are implicit in currently Unsat preferences (sometime-after a b) (sometime p);
For goal generation look to automata: any automaton which is currently unsat, add the condition on its transition to Sat as a goal in the RPG;
Again more generally disjunction over paths to Sat (in practice pick one).

If the transition does not appear in the RPG then preference is E-Vio not Unsat.

Solution Extraction I: Goals

This now needs to be a bit more sophisticated:
Add preconditions/goals at earliest layer they appear with their lowest cost violation set;
This must be before the layer the action for which it is a precondition was applied.

We must also think about which achievers to use:
Select the achiever that caused this fact’s cost to be updated at this layer (recorded during graph building).
This is the achiever that caused its cost to decrease, and is thus on the lowest cost path.

There’s no need to worry about adding extra goals (e.g. if we had a (sometime-before a b) we don’t need to add b as a goal if we chose a in extraction because the action that doesn’t break that preference already has b as a precondition.

Solution Extraction (II) RP Generation

Preference Aware Relaxed Plan Extraction
Select a Goal;
Select achiever;
Add Preconditions to Goals.

Fact layer 0
Fact layer 1
Fact layer 2
Action Layer 1
Action Layer 2
Fact layer 3
Action Layer 3

Navigate AB {P1}

NavigateAC

NavigateBA{P1}
NavigateBE
NavigateAB
NavigateAC
NavigateCA
NavigateCD
NavigateDE
Navigate EB
NavigateBE
NavigateBA
NavigateAB
NavigateAC
NavigateCA
NavigateCD
NavigateDC

LPRPG-P: Comparison to Other Planners

“LPRPG-P: Relaxed Plan Heuristics for Planning with Preferences.” A. J. Coles and A. I. Coles. ICAPS 2011.

State of the art:
HPlanP (propositional only);
MIPS-XXL (everything);
SGPlan 5.1 (competition, generic);
Control: FF solving hard goals; or empty plan if no hard goals.

Domains (IPC 2006):
Pathways (Chemical Reaction Synthesis);
Trucks (Logistics);
Storage (Warehouse Organisation);
Travelling Purchaser Problem (similar to Travelling Salesman);
Martian Rover Operations.

IPC Score: C*/C: C = C* implies 1, C > C* implies < 1 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com