PowerPoint Presentation
Classical Planning
Building Heuristics from RPG
Copyright By PowCoder代写 加微信 powcoder
6CCS3AIP – Artificial Intelligence Planning
Dr Tommy Thompson
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
Hello and welcome to this chapter on classical planning. In this chapter we’re going to look at how we can expand on the RPG heuristics to come up with two new heuristics that help explore the search space in different ways.
Domain Independent Heuristics
Building heuristics that can adapt to different domains/problems.
Not reliant on specific information about the problem.
We analyse aspects of the search and planning process to find potential heuristics.
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
As we’ve discussed throughout this segment, we’re dealing with the idea of domain independent heuristics. We want to find a way for to calculate quick and useful pieces of information in our heuristics such that we can move through the search space effectively.
Delete Relaxation
Delete Relaxation
Estimate cost to the goal by removing negative effects of actions.
i.e. any PDDL effect that removes a fact is no longer considered.
Example: FreeCell Solitaire
Free cells, tableau positions remain available after moving cards onto them.
Cards remain movable and remain valid targets for other cards after moving cards on top of them.
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
To this end, we’ve looked at delete relaxation, whereby we remove the negative facts from our actions. By removing the negative effects, we no create a broken and unrealistic problem. Known as the relaxed problem. We know that solving the relaxed problem – while usually easier – is actually a good metric for calculating whether we’re moving towards the solution.
Relaxed Planning Graph Heuristic
Relaxed Planning Graph (RPG) heuristic creates a simple layered approach to exploring the state space.
Inspired by the GraphPlan system
GraphPlan covered later in this module.
For reference, see (Blum & Furst, 1995)
The relaxed solution plan
Where each Oi is the set of actions selected in parallel at time step i, and m is the number of the first fact layer containing all goals
h(s) := ∑ |Oi|
Blum, A. L., & Furst, M. L. (1995). Fast planning through planning graph analysis.
In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI95), pp. 1636
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
And this was of course the basis of our RPG heuristic, given we calculate the relaxed plan solution by enumerating fact and action layers until we find the goal, then iterate backwards from that point to see how many sets of actions exist at each time step. The relaxed planning layers, give us an admissible heuristic, even if the number of actions itself isn’t always admissible.
What Else Can We Gain From Relaxation?
Relaxed Planning Graph (RPG) heuristic gives us useful information from the relaxed problem.
The relaxed solution is one useful piece of information we can grab from the planning graph.
But is there any other useful information we can gather here?
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
But what else can we do with this information? The relaxed planning graphs visualise this breakdown of information as we go from the initial set of facts to the goal facts. At present we’re only interested in focussing on the number of action layers and the number of actions within those layers. But there is a lot of information in here that we could still use, so let’s look at some additional heuristics that we can pull out of a relaxed planning problem.
What Else Can We Gain From Relaxation?
One thing we have ignored throughout is the cost of actions involved.
If we factor the costs of actions in the planning graph, we can learn more about how close a given state is to the goal.
This works even in state spaces with uniform action costs.
Let’s consider two heuristics: hAdd and hMax.
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
So one thing we have completely ignored to-date is the notion of the costs of actions. If you recall from the earlier work in areas such as A* search, we’re interested in how expensive an action is.
A* calculates the g value which is the cost of reaching this state based on the path taken, as well as the heuristic value.
But everything we’ve since looked it is only interested in the heuristic value. But what if we can still use those costs inside the heuristic, then it will still prove useful to us.
Now you might be wondering why we don’t just use the costs as is, but they’re no longer accurate in the relaxed problem. But still, we can gain some useful information from them in a relaxed planning graph. Even in circumstances where the action costs are uniform – meaning they all have the same value.
To this end, we’ll take a look at two specific heuristics: hAdd and hMax – two heuristics that aim to calculate how expensive it is to reach a given state variable.
Understanding Costs
Consider the cost of the actions in order to satisfy a fact in our fact layers.
If the facts are already achieved, the value is 0.
If the facts are not achieved, the value is ∞
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
So let’s look at our planning graph in a slightly different way. And start thinking about the actions we require to take in order for the facts to become true, as well as the cost of those actions along the way.
If we’re in a situation where all the facts are currently achieved, such as the initial fact layer, it has a value of 0, given we didn’t have to execute any actions to achieve those facts.
But if there is a fact that isn’t satisfied yet, it has a value of infinity. Because we haven’t reached it yet and don’t know how expensive it will be to do so.
Understanding Costs
Consider the cost of the actions in order to satisfy a fact in our fact layers.
If the facts are already achieved, the value is 0.
If the facts are not achieved, the value is ∞
Image Credit: “Heuristics for Planning”, Keyder & Bonet, ICAPS 2009
[a = 0, b = ∞, c = ∞, d = ∞, e = ∞, G = ∞]
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
So let’s consider this different approach, whereby we visualise all of that facts that can become true during the relaxed planning process. In between each of these facts, is the costs of actions.
Our goal G is to have both d and e to be true. So it’s important to factor how expensive each of those actions is and this can tell us how close we are to goal based on the facts that are emerging and the costs that it takes us to achieve them.
hAdd (additive) heuristic
hAdd = We sum the value of all costs to reach a given fact.
Inadmissible, given emphasis on goal independence.
[a = 0, b =2, c=7,d =6, e=6, G = 12]
Image Credit: “Heuristics for Planning”, Keyder & Bonet, ICAPS 2009
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
So let’s consider hAdd, sometimes referred to as the additive heuristic, where we add up the costs of all actions to get us to the point that we are in now. This is can be rolled out using Dijkstra’s algorithm, whereby we sum the cost of actions and maintain the lowest total costs. Then for the final goal set, we simply add up the total costs reached to get there.
Hence while a is 0, b is 2 given in order to reach this fact there was an action with a cost of 2. c has a value of 7, given it’s the summation of the original cost to achieve b, which was 2 plus the extra action with a cost of 5. Note that both d and e are lower here, given that the other actions that we could execute from reaching point b would enable for us to achieve d and e with a value of 6, instead of 8 if it went via C. And the goal facts, G are satisfied when both d and e are satisfied, hence it takes the sum of those two states as the overall value of 12.
In essence, what we’re doing here is adding the heuristic values of all the preconditions required to achieve the goal set. But and this is the main part, we’re not adding them up admissibly.
This approach is really more reflective of the costs of satisfying each individual goal and all of its preconditions, but the knock-on effect is that it is now inadmissible.
hAdd (additive) heuristic
hAdd = We sum the value of all costs to reach a given fact.
Inadmissible, given emphasis on goal independence.
[G = (d,e)] relaxed plan cost is 10
Image Credit: “Heuristics for Planning”, Keyder & Bonet, ICAPS 2009
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
Now we know this is inadmissible, because if we were to let FF solve it, it would generate the plan that we see here, where the total cost is actually 10.
Given the nature of the heuristic, it means that we’re counting the actions several times over when we calculate the heuristic value.
Now this works fine, if we run on the assumption that the goals are completely independent of one another.
But in most planning situations, they’re not. You will typically work towards satisfying one goal while working on another. So there are a lot of shared actions. Even in this example you see here, the action that achieves fact b, moving from fact a is shared by both d and e. Hence you’re counting this cost twice. This is why in this particular example, hAdd proves to be inadmissible.
Ultimately, it’s no longer admissible because actions are being counted several times over when achieving our heuristic value.
hMax heuristic
hMax = The most expensive path to reach a given fact.
Admissible, given on most expensive path to achieve G.
When costs are uniform (i.e. = 1), hMax is simply number of RPG layers.
[a = 0, b =2, c=7,d =6, e=6, G = 6]
Image Credit: “Heuristics for Planning”, Keyder & Bonet, ICAPS 2009
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
Now let’s consider this another way, what if we only calculate the most expensive path to each fact? This is the hMax heuristic, where we’re only interested in just how expensive it was to reach that goal state. Or in other words, we’re simply counting the max of all the heuristic values of the goal preconditions.
So we calculate as before, only this time when we reach the goal set and they’re satisfied, we simply pick the most expensive of the paths that reached our satisfied goal set. Hence in this case, the value is 6.
Now this means that hMax is an admissible heuristic, because we’re assuming that by solving this, we’re also largely solving the rest of the planning problem. But the thing is it’s not terribly informative. It doesn’t tell us all that much about how far we are from the goal. And one of the reasons for this is there is an implicit assumption here that the goals – or rather the actions require to satisfy them – interweave to a point that by basing it on the most expensive path, it is a safe bet that this is how long it will take us to satisfy all of the goals.
And that’s a problem as it is not all that accurate. Especially once you have independent goals. What do we mean by that? Well if we have independent goals it means that the actions we need to achieve all of the goals don’t really overlap with one another. So even in this trivial case, it’s relying on the assumption that all of our facts are on the same path. Hence it’s still reasonably informative for us here.
Classical Planning
Building Heuristics from RPG
6CCS3AIP – Artificial Intelligence Planning
Dr Tommy Thompson
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
And that wraps up this video on additional heuristics that we can derive from our relaxed planning problem. Thanks for watching and we’ll see you in the next one.
/docProps/thumbnail.jpeg
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com