PowerPoint Presentation
Classical Planning
RPG Heuristic in the FF Planner
Copyright By PowCoder代写 加微信 powcoder
6CCS3AIP – Artificial Intelligence Planning
Dr Tommy Thompson
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
Hello and welcome to this third chapter on classical planning. In this chapter we’re going to look at how to use the RPG heuristic as part of a planning system.
Domain Independent Heuristics
Building heuristics that can adapt to different domains/problems.
Not reliant on specific information about the problem.
We analyse aspects of the search and planning process to find potential heuristics.
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
As we’ve discussed back in the previous chapter, we’re keen to find ourselves domain independent heuristics, allowing us to come up with means to evaluate any given state. Plus we want to come up with these solutions in polynomial time, given planning itself is a NP-Hard problem. So how do we reach this point?
Delete Relaxation
Delete Relaxation
Estimate cost to the goal by removing negative effects of actions.
i.e. any PDDL effect that removes a fact is no longer considered.
Example: FreeCell Solitaire
Free cells, tableau positions remain available after moving cards onto them.
Cards remain movable and remain valid targets for other cards after moving cards on top of them.
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
So we considered the idea of solving relaxed planning problems. We do this by ignoring the delete effects of actions and from this we have what are fundamentally broken planning problems, but ones we can solve much faster than usual.
Relaxed Planning Graph Heuristic
Relaxed Planning Graph (RPG) heuristic creates a simple layered approach to exploring the state space.
Inspired by the GraphPlan system
GraphPlan covered later in this module.
For reference, see (Blum & Furst, 1995)
The relaxed solution plan
Where each Oi is the set of actions selected in parallel at time step i, and m is the number of the first fact layer containing all goals
h(s) := ∑ |Oi|
Blum, A. L., & Furst, M. L. (1995). Fast planning through planning graph analysis.
In Proceedings of the 14th International Joint Conference on Artificial Intelligence (IJCAI95), pp. 1636
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
And this was of course the basis of our RPG heuristic, given we calculate the relaxed plan solution by enumerating fact and action layers until we find the goal, then iterate backwards from that point to see how many sets of actions exist at each time step. This gives us an admissible heuristic on the relaxed problem.
Domain-Independent Planning: FF
FF is a forward-chaining heuristic search-based planner
FF uses the Relaxed Planning Graph (RPG) heuristic to guide search
This involves finding a plan from the current state S which achieves the goals G but ignores the delete effects of each action
The length of this plan is used as a heuristic value for the state S
J. Hoffmann and B. Nebel 2001 “The FF Planning System: Fast Plan Generation through Heuristic Search” In Journal of AI Research vol 14.
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
But how do we then apply this in a planner? Well the Fast-Forward of FF planning system from 2001 utilises the RPG heuristic as means to help guide the search.
So here we will take a look at how the planner works, how the RPG heuristic is adopted and the other issues that arise along the way.
Domain-Independent Planning: FF
FF is a forward-chaining heuristic search-based planner
FF uses the Relaxed Planning Graph (RPG) heuristic to guide search.
FF uses both local and systematic search
Enforced hill-climbing (EHC)
Upon termination, enforced hill-climbing either outputs a solution plan or reports that it has failed
If enforced hill-climbing fails, best-first search is invoked
J. Hoffmann and B. Nebel 2001 “The FF Planning System: Fast Plan Generation through Heuristic Search” In Journal of AI Research vol 14.
Relaxed GRAPHPLAN
Enforced Hill-climbing
Goal Distance
Helpful Actions
Solution / “Fail”
Task specification
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
FF is a forward search planner that utilises RPG, searching out from the initial state to all other locations.
Now as discussed, the RPG is only the heuristic, so how does the search work? By default FF runs what is known as enforced-hill climbing, where it will aggressively pursue the next best state is finds, with of course this qualification of best being influenced by the RPG heuristic. However, this aggressive behaviour actually causes problems, so it also uses best-first search in the event that RPG cannot help it identify a good state to visit. Let’s walk through how this works…
Enforced Hill Climbing
Basic idea: lower heuristic = better.
Always try and find states with the best heuristic value seen so far:
Start with best = h(Sinit), aim to get to best = 0.
Expand a state S
If we have a successor state S’ with h(S’) < best:
return to Step 1;
If no state is found
Breadth-first search until one is found;
return to Step 1;
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
Enforced Hill Climbing runs on the idea that it will always seek the best possible improvement in the current state heuristic value.
If expands the current state, evaluates all of the successor states and grabs the first state it finds with a better h-value and assigns that as the next state.
If no state can be found, it will then run breadth-first search until it can find one, which as you know will prove to be a lot slower and more inefficient.
FF – EHC in Practice
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
So if we start with this first state with a value of 9, it will then find the next child state with a better value, hence this one here with eight. But then it has a problem, it can’t find one with a better value. This is what is known as a plateau, given no states in proximity appear to be better that where we currently are in the search space. So it runs breadth-first search until it finds this state with a value of 7 a few layers down. Then resumes as normal. But interestingly, there is an interesting issue here, in that it is possible that this particular path doesn’t yield the desired outcome and that actually the goal state exists in the other subtree that we omitted right at the beginning…
Problems with EHC
EHC can sometimes fail to find a solution
The RPG heuristic can lead it down dead ends
When EHC fails, FF resorts to systematic best-first search from the initial state
FF maintains a priority queue of states (the state with the lowest h(S) value is stored first)
The front state in the queue is expanded each time
The successors are added to the queue, and so on (loop)
Searching on a plateau is expensive
Systematic search is carried out on the part where the heuristic is worst
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
Given it’s running purely on the heuristic, it is possible for FF using enforced-hill climbing to run down dead ends. Hence this is why the best-first search is employed as a backup from the initial state. However, this is pretty expensive, given it is now having to effectively resume search all over again. Similarly, searching using breadth first on the plateau is also expensive, given that algorithm is in no way optimised and will explore every opportunity.
EHC: Pros and Cons
EHC is fast:
Greedy algorithm takes better states when found;
Can find solutions quickly.
EHC is incomplete:
Suppose the solution was actually down one of those paths that we discarded (or never even generated) because another state looked better.
FF is complete:
If EHC fails then start again from the initial state using best-first search.
Slower but complete.
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
Despite all this, the one benefit is that EHC is pretty fast and can potentially find a solution very quickly. But it’s also incomplete. However, the application of best-first search from the initial state ensures that FF itself is still complete, even if the resulting solution takes a lot longer to find.
Extra State Pruning: Helpful Actions
So far, when expanding a state S, have considered all applicable actions when making successors;
However, not all of them are interesting
e.g. unloading a package that has just been loaded
In FF, extra search guidance is obtained from the RPG heuristic:
Anything that could achieve a goal in g(1) is a helpful action
Or, in other words, the first actions in the relaxed plan, and others like them.
C on A from T
C on B from T
B on A from T
B on T from C
A on B from T
A on C from T
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
There are some additional considerations that FF has when expanding states, given it doesn’t always make sense to expand based on the actions involved. As mentioned back in chapter 5, we’re trying to find the best actions to take as we move through the search space, but it would also help if we can clearly identify actions that are not going to be useful.
FF does apply some additional state pruning, given it can pull this information from the RPG heuristic. For example, actions that can achieve a goal in goal layer 1 are denoted as helpful actions, given they are appear to be moving us towards the relaxed plan solution which in turn can move us towards the actual solution.
Hence FF can then exploit these helpful actions to only become the successors it generates. Now this does lead to an issue in that while it becomes even faster, it becomes even less complete than it was before. It is utilising the information from the RPG heuristic as means to speed up the search.
However, if this doesn’t work, EHC can attempt to run again without the helpful actions constraint to see if that yields any desirable state,
Classical Planning
RPG Heuristic in the FF Planner
6CCS3AIP – Artificial Intelligence Planning
Dr Tommy Thompson
FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS
This was a short albeit important topic to cover, given we will see enforced hill climbing return at a later point in the module. Several other planning systems use either RPG or EHC as part of their procedure. So getting an idea of how it all works now is useful given we will return to it later. Thanks for watching and we’ll see you in the next video.
/docProps/thumbnail.jpeg
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com