程序代写 PowerPoint Presentation

PowerPoint Presentation

Classical Planning
Non-Forward Search

Copyright By PowCoder代写 加微信 powcoder

Planning as SAT
6CCS3AIP – Artificial Intelligence Planning
Dr Tommy Thompson

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

Hello and welcome to this chapter as part of the non-forward search segment of the module. In this video we’re going to discuss another approach of planning that deviates from the traditional forward search and that is planning as a SAT or Boolean satisfiability problem.

Planning Formalisms
Throughout this module, we’ve stuck to some key concepts of how planning is formulated and explored.
Set-Theoretic Representation (PDDL)
State Transition Model
Forward Search

But there have been alternatives to how we explore a planning problem.

Idea: What if we transform a planning problem into another problem type for which they are already efficient solvers to address it?

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

Throughout the module we’re running on the primary notion that we have a set-theoretic representation of the world – with states comprised of predicates that give us snapshot views of the world. Plus we then have our state transition model that shows us how states transform based on actions performed on them and we rely on these principles to then drive us through the state space through a forward search algorithm such as breadth-first, depth-first, best-first, A*, EHC, and so on.

But there are alternatives to this approach. We already have the likes of RPG and of course GraphPlan from which it was devised, and we have other variations of planning in other chapters of this segment of the module such as PoP and HTN planning. But we’re actually overlapping with something that is covered in the segment on SAS+ planning, where if we re-encode the problem into another formulation, can we then solve it that way?

Boolean Satisfiability Problems (SAT)
Encoding a problem in such a way that there exists a valid interpretation.

Create a Boolean expression comprised of variables with operators applied to them.

If we can find a valid combination of values for this expression, we can consider that SAT problem to be satisfiable.

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

Boolean Satisfability problems, often also referred to as propositional satisfiability problems – which I will now refer to by their abbreviation SAT – are a form of deductive reasoning whereby we attempt to encode a problem into a more succinct expression – in this case it is as the name implies a boolean expression (or propositional logic formula) . This means we have a series of variables with operators applied to them and we aim to a value to specific variables such that the overall function can return as true.

In the event the function can return as true, we consider the SAT to be satisfiable, given there is a valid configuration of variable values. However, in the event that is not possible, then we consider the function to be unsatisfiable.

So, I’ve provided some very trivial examples here on the right hand side. The first, F = A and not B is easily satisfiable. Given if we make A true and B false, then that conjunction means that F will be true.
However, the second SAT problem is not satisfiable. Why is this? Well it’s asking for a valid combination where A and not A return as true. This is simply impossible. Since if A is true, not A is false and vice versa. This SAT will always return as false.

Now for those of you who studied introduction to artificial intelligence, you will remember the idea of solving constraint satisfaction problems or CSPs. The idea we have tight constraints on the configuration of variables such that we need to find a valid configuration that satisfies our criteria. SAT solving is one form of CSP research and as mentioned in that class, it is possible for a planning problem to be formulated as a constraint satisfaction problem and that’s really what we’re focussing on here.

Why Use SAT for Planning?
SAT solving is a broad and rich research domain in and of itself.

Typically used in software verification, theorem proving, model checking, pattern generation etc.

There are existing SAT solvers established within this community that can be adopted.

So how do we reformulate a planning problem into a SAT problem?

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

So why would you then use SAT as means to solve a planning problem? Well the main reason is that they have significant overlaps. In both cases you’re trying to find assignments of variables that yield a positive/true outcome over a given function. In the context of planning, we’re trying to find a valid configuration of variables at different stages of the planning process such that it yields a true outcome, meaning that there is a combination of states, when put in the right order satisfy the constraints of the actions in the planning domain and enable us to transition from the initial state of the problem to the goal.

And SAT is a fully formed and heavily explored area of computer science in its own right. There are – much like planning – a variety of different SAT solvers that are approaching the problem in different ways. And it is a problem area used in situations such as software verification and model checking. SAT solving is a research field in and of itself, so why not try and exploit it.

But now that I have hopefully convinced you it’s a good idea, let’s start to look at how we formulate the planning problem as a SAT problem.

Planning as a SAT Problem
Problem: Our planning formulation

Idea: Propositional variables for each state variable.
Create unit clauses that specify…
The initial state.
The goal state.
Create clauses to describe how variables can change between two time points and .

Solution: Create a SAT formula such that…
If is satisfiable, then a plan exists for the planning problem
Every possible solution for results in a different valid plan for

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

Okay, so how are new going to actually do this? How does it work. Well if we have this planning problem P, which is comprised of the state transition system sigma, the initial state s-I and the goal state s-g, we’re going to create
Clauses that express the variables that exist in the initial state s-I as well as what variables are assigned in the goal state.

After that, we also need to have clauses that describe how the variables can possibly change, this is of course are actions. Denoting how a given variable will change between two time points t and t+1.
Once we have these clauses, the trick is to find a SAT formula phi and we then aim to satisfy it.

If we have satisfy phi, then we know that a plan exists for the planning problem P.
But also, every possible satisfiable solution to phi will – when processed back into a planning problem – provide us with a different valid plan for the original problem.

SAT Encoding: Variables
State Variables
A set of propositional variables that encode the state after steps of the plan.
There is a finite number of steps of the plan, encoding as : the planning horizon

Action Variables
A set of propositional variables that encode the actions applied at the step of the plan.

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

So we’re going to need to find a way to encode the information from each of the states as well as the actions, the clauses we mentioned on the previous slide.
We will a set of propositional variables V, that for each variable, denoted v-I, is encoding a piece of information about the current state at a specific stage of the plan, i.
In order for this to work, we have to assume a finite number of steps of the plan, which is up to a limit T. This limit is our planning horizon. We’re going to gradually increment the horizon over time enabling for the planning process to actually occur.

Meanwhile, we will also provide a separate set of variables that represent the actions – denoted A – which will show actions that are applied at a particular step of the plan, again denoted as i.
Now it’s important to denote the ranges within which these exist. The state encodings run from 0 up until T, which allows us to show v-0, which is the initial state up until V-T which is the most recent state as a result of the most recent action. However, the actions are only from 1 to T, given a-1 is the first action set from v-0, while a-T is the last action set before v-T. So yeah, if you have a planning horizon of 5, then you will have encode 6 state sets and 5 action sets.

State Variables
Our planning formulation

Initial State
Unit clauses encoding the initial state.

Goal State
Unit clauses encoding the goal state.

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

Now if we consider this in the context of our planning formulation, we’re going to – in the case of the initial state – create propositional variables for all of the facts that are true at timestep T in the initial state, but also all of the facts that are not true in the initial state. This is important, given we’re not relying on the closed world assumption here, we’re making sure to expressly communicate whether facts are true or false at this particular timestep.

Then for the goal state, we’re going to express what facts are true at the horizon point T. Here – like usual – we’re only interested in the facts that are critical to the goal state. No extraneous facts are required for this formalism.
Now – you may have already observed that if we’re trying to encode information at horizon point T, how do we know where that is? Essentially T is going to be an estimate of goal distance is it not? After all, if we have a plan of 5 actions, then it the goal state is only satisfiable at a horizon of 5 right? Unless we consider concurrent actions on a given time point.

Well, as we’re going to see later in this chapter, we’re going to use an iterative deepening search to find it. So we will set T to be 1 to start with and then see if the SAT is satisfiable from that point. But we’re going to get to that in a bit, we still have a lot of ground to cover still before we look at whether or not this is solvable.

State Variables (Example)
Simple Gripper domain problem.
Two locations, one box to move from A to B.

Encoding for T = 1.

Initial State

Goal State

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

So if we consider a simple example, where we have the Gripper domain, with one robot, two free grippers and it’s trying to move boxes from one room to another. I have set this up so we only have one box just to make it nice and easy for us to process.

In this instance, the initial state is that the robot r is at location A, the box b1 is at location A, and that the left and right grippers are free.

For the SAT encoding at T = 1, we encode the initial state as all of the facts that are true at timestep 0, and we denote the timestep in each of the these statements. So in this case, we’re show in the middle of the slide that r is at loc A, bt is at locA and the left and right are free at time step 0. But also, that r is not in loc B and that b1 is not in loc B.

Meanwhile, when we then look to encode the goal, which is for b1 to be in room b, then encoding is much simpler to express.

But we still have a lot to do here, next we still need to express how we encode the actions of the planner.

SAT Encoding: Actions
For all actions that exist in the domain, we need to encode the preconditions and the add and delete effects of the action.

But, we need to do it for each time step up to the planning horizon .

Sub formulas for encoding preconditions, add/delete effects

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

So now that we know how to encode states, what about actions, let’s look at the formalism for this….

Now not unlike any other action formalism, we need to encode the preconditions of the actions as well as the add and delete effects. However, this time – and this is the big thing here – is that we’re going to do it for each time step up to the planning horizon.

So as we can see here, it’s all of the facts that exist at a given timestep i, which is at most one less than the horizon – given that the effects of the action will be on time step later.
We list all of the facts that are true at timestep i, followed by what facts are true in time step i+1 thanks to the add effects, as well as the facts that are now no longer true, as a result of the delete effects of the action.

Actions (Example)
Gripper Actions (valid in initial state)
Pickup Box using Gripper on Robot in Location .
Move Robot from Location to Location .

SAT Encoding (at time step 0)

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

Now if we return to the gripper problem, we have two actions we can execute in the initial state: pickup, where we grab a box with one of the robots grippers or the move action, that allows us to move from one location to another.

So as we can see here in this SAT encoding, we can identify the for a pickup action on time step 0, we need the box to be in the location at timestep 0 and for the the gripper to be free. But subsequently, on the next time step, we have the gripper holding the box, the box is no longer in the location and the gripper is no longer free.

Same applies for our move action. We identify what facts are true at time step 0, in this case that the robot is at the location and the post effects where we add the new fact at time step 1 that the robot is in the new location and of course that old fact is now no longer true at the next time step.

So this covers a lot of the information we require, but there’s still a couple things missing, namely we’re not capturing information about how actions contradict one another – which would help us reduce the solution space – but also the actions help us encode how information transitions from one time step to another. But what about the information that doesn’t change between time steps. So for example, in the case of the pickup action, the location of the robot didn’t change between time steps, but I didn’t encode that information in time step 1. Also, in the case of the move action, the box isn’t mentioned at all, but I should be able to express that box b1 is still in location A at time step one.

This is what is commonly referred to as The Framing Problem

The Framing Problem
We need a way to capture information that does not change between time steps and .

Two approaches to solving this…
Classical Frame Axioms
State which facts are not effected for each action.
Must enumerate for all fact/action pairs where no the fact is not affected by the action.
Relies on one action being executed per time-step so axioms can be applied.

Explanatory Frame Axioms
Instead of listing what facts are not changed, explain why a fact might have changed between two time steps.
i.e. if a fact changes betweenand , then an action at step must have caused it.

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

The Framing Problem is the situation where we also need to capture the information that doesn’t change between time steps. Otherwise we’re barrelling towards a misunderstanding as information is not propagated forward to future timesteps because no action induced a change. This is actually an increasingly challenging issue to address as problems get bigger.
How do we model that the information is not changing between states?

There are two common approaches for how to address this: the classical frame axiom formulation and the explanatory frame axiom approach. I will be focussing more on the explanatory approach, but I wanted to briefly introduce you to both concepts.

First of all, in the classical frame axiom approach, we simply have to state which facts are not effected for each action.
But of course this means we need to then list for every fact that isn’t affected by an action at a given time step, that this fact should persist into the next time step. This actually has two problems: one that we need to sit there and add that knowledge in there, but also that we need at least one action to be executed per time step. Otherwise we don’t know what information holds true about that fact in the next time step. But also, if you have a planning problem where to actions could be executed on the same frame, then you will wind up running the same fact through two frame axioms.

Meanwhile, explanatory axioms go a different route. Instead of listing all the information that doesn’t change, instead, we list the ways in which a piece of information could have changed between timesteps. And more precisely, identifying what action would have caused that information to change as we moved between time steps…

Explanatory Frame Axioms
Enumerate the set of actions that could result in a fact changing between time steps.

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

So with explanatory frame axioms, we’re going to look at how a fact changes between two time steps.
In this case if we look at the actions from the example, we identify that in order for the fact that r1 is not a location B on time step 0, but then is in the location at time step 1, then that must mean that the robot was part of a move action at time step 0. Specifically the move action with r1 going from loc A to loc B.

Now this is still rather verbose, given we’d still need to list all of the possibilities that could emerge at a given time step. Hence you’re going to have a lot of facts changing between time steps, especially as the branching factor of the state space increases.

However, this approach actually enables for parallelism in the planning process, given two actions could be executed in parallel if they have the same preconditions at time step t and their effects don’t conflict. You can more easily identify this by catching whether or not an effect of one action appears within one of the explanatory frame axioms for an action that isn’t the action you’re running against.

But, that said, parallelism is also going to be an issue, given you may result in actions that negate the effects of one another on execution, but hold valid preconditions. So how do we capture this?

Exclusion Axioms
Returning to the notion of mutual exclusion, we identify what actions cannot occur at the same time.

Complete Exclusion Axiom: only one action at a time.

Conflict Exclusion Axiom: prevents invalid actions on the same timestep.
More on this topic partial order planning in another chapter.

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

So we need to return to this ideal of mutual exclusion once again, a topic that we saw when we look at SAS+ planning and more recently in graph plan, where in this instance we’re going to identify actions that cannot be executed on the same time step.

So we have what are known as compute exclusion axioms, and this will help us identify which actions cannot be executed on the same time step. These constraints will essentially enforce a total order plan, whereby if we list every possible exclusion axiom, then only one action can be executed at the same time step. This also makes sense in the context of our gripper example, given none of the action of picking up boxes or moving the robot can be executed on the same time step.

However, there is also the possibility of using conflict exclusion axioms, whereby two actions conflict if ones either their preconditions contradict or their preconditions are not consistent with their effects. This would allow us to solve plans with in a partial order fashion, which is a

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com