CS代写 ICAPS 2010.

PowerPoint Presentation

Classical Planning
Dual Heuristics & Local Search Optimisation

Copyright By PowCoder代写 加微信 powcoder

6CCS3AIP – Artificial Intelligence Planning
Dr Tommy Thompson

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

Hello and welcome to this chapter on classical planning. In this chapter we’re going to look at something that seems simple on paper, what about using multiple heuristics as well as how we can improve the enforced hill climber algorithm used as part of some planning systems.

Combining heuristics
Q: What if we had two heuristics, h1(S) and h2(S), both estimating the distance and/or cost to the goal?

A: Could have two open lists, one for each heuristic, and alternate between them
Expand node from open list 1
Expand node from open list 2
Expand node from open list 1…

Evaluate successors with both heuristics and put on both open lists

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

So previously we’ve always used on heuristic. Either a simple state calculation, delete relaxations or even landmarks. In each case we’re using different types of information to help us search.
But what about using two of them at the same time? Given this could help us explore and assess the state space in two different ways at the same time.
We evaluate successors of a given state against both heuristics, add it to two open lists and know alternate between them when necessary.

“Combining Heuristic Estimators for Satisficing Planning”,
G. Röger and M. Helmert , Proc. ICAPS 2010.

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

As we can see from this paper dated back to the 2010 International Conferences on Automated Planning and Scheduling, by running on three different heuristics.
The FF heuristic (i.e. RPG)
The causal graph heuristic used in the Fast Downward planner.
The context-enhanced additive or cea heuristic from the authors work back in 2008.

Each of these heuristics is performing well in some circumstances and not well in others. But we can see this alternation set, and this is where the planner is alternating between heuristics and as a result, the performance is more consistent across the board.

What is the second heuristic doing?
A mixture of two things:

If one heuristic is on a plateau:
Plateau successors put on both open lists;
Second open list will order them by second heuristic so will be guiding search on the plateau

In any case – diversification by avoiding expanding just the states one heuristic likes.

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

Now why is this second heuristic being employed and when is it happening?
This can be useful in the event we hit a plateau, a topic we discussed back when looking at Enforced Hill Climbing as part of the FF planner.
If we hit a plateau, whereby there’s no obvious next state to explore given all of the heuristics have stagnated at the same value, we could use the second heuristic to help us find a way out of the heuristic?

Or we could just diversify our exploration process by alternating and expanding from both when we wish.

“A Comparison of Knowledge-Based GBFS Enhancements and Knowledge-Free Exploration.”
Valenzano, Sturtevant, Schaeffer and Xie, Proc. ICAPS 2014

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

And as we can see from this publication at ICAPS 2014, by running the FF planner against a series of benchmarks, we can actually improve the process by adding an alternative heuristic, even just random exploration of states periodically when hitting plateaus, we can actually gain significant performance improvements. So let’s begin to explore this in more detail.

Local Search
In four bullet points:

Start with a state S
Expand S, generating successors
S’ = one of these successors
S = S’ and repeat

Heuristics are used at (3)
e.g. choose the best successor
Known as gradient descent (or hill climbing).

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

But first, it’s important to acknowledge that EHC as we saw back in FF, is a procedure known as Local Search.
The premise is rather straight forwarded, we start with a given state, expand it, generate all of the successors and find the best one.
We then transition to the best successor and repeat until we reach the goal.

Now the idea of the ‘best’ successor, is what is governed by the heuristic. We’re not maintaining an open list this time like say A* does, because we’re only interested in small, local optimisations that we make near where we are.
This process is often referred to as hill climbing in state space search, but is essentially a form of gradient descent, given we’re aiming to take steps based on the heuristic value as means to reach the local minimum of the fitness space.

The Trouble with Heuristics
The RPG heuristic is not perfect
Few are, otherwise the problem would be easy!
… or the heuristic would be very expensive to compute

Making the right move doesn’t always lower the heuristic value:
It can stay the same; or,
It can go higher.

Or, worse, the move with the best heuristic value can be a really bad decision:
With local search, this can lead to no solution being found.

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

Now if we return to the RPG heuristic from earlier in this series, we know it’s not perfect. It’s giving us a relaxed plan solution which isn’t always going to be reflective of the actual problem.
But as discussed already, we’re looking for something will compute in polynomial time. Because otherwise the heuristic would prove useless and slow the process down.

Now there are problems with this, given if we commit an action – which may well be the right action in this situation – it might not change the heuristic value in the direction we hope. It could go up, it could go down. It might not change at all.

And as we saw in FF, there are going to be cases where taking the state with best heuristic value might actually work out to be a bad thing. Given we may wind up in a plateau or in a dead-end and hence the use of best-first search as a fallback.

Snags with EHC
Sometimes can fail to find a solution:
The heuristic can lead it to dead ends

If FF fails, resorts to systematic best-first search from the start:
Priority queue of states (lowest h(S) first)
Expand the next state each time
Add successors to the queue, and loop

Also, searching on plateaux is expensive:
Systematic search on the part where the heuristic is worst

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

As we discussed earlier in the module, Enforced Hill Climbing isn’t going to work well in certain situations.
The heuristic will result in dead ends, we have to rely on the best-first search algorithm as a fallback. This isn’t’ ideal, given we’re going back to the initial state. We can’t rely on the algorithm to have not jeopardised our search process and we not put us in a dead-end.

In addition, relying on the system to search plateau’s is also not going to be help, given it can be quite expensive.

The Search ‘Landscape’

A local minimum – the right step seems worse

A plateau – all heuristic values the same, the right step doesn’t seem better

Top left – initial state
Bottom right – goal state

Steps in the plan

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

To help visualise some of the problems here, we can look at the landscape of search as it rolls out.

Ideally, we want the situation on the left: we start at the initial state with a high heuristic value, we continue to find new states with lower values and continue onwards. Ultimately reaching the goal.

But, there are two really nasty situations that we either wish to avoid or find means to get out of.
First of all there are the aforementioned plateau’s: this is where the heuristic values are the same, no step in any direction seems like an improvement. The plateau might be short, it might be long, but we need to find a way to get out of it.

Secondly, and arguably the worst situation, is what we call local minima. This means we’ve reached a state that is clearly the ‘best’, but there are no better options. There isn’t even a plateau and no options in proximity will improve the situation. But there are better options out there, so we want to find a way to avoid that from happening.

EHC, Revisited

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

To help visualize this, let’s look at this example and how EHC will attempt to solve it.
This is a good example of what happens when we discover local minima, the RPG heuristic fails at this point and as a result, we then need to rely on breadth first search to find the next best possible state that gets us out of that situation.

This works fine here, and you may be forgiven for thinking ok, that’s all we need, we just rely on something like breath first when in situations with local optima.
But put simply, this solution does not scale.

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

We could find ourselves in a situation, where we are able to explore a couple of states no problem, then rely on breadth-first search over a plateau of tens, hundreds, maybe even thousands of states. Only to then come out the other end with another state, and then go back into the breath-first search of the plateau several times to get the goal.

Yes we’re finding the goal but it’s painfully sub-optimal. In situations like this the system is mostly running breadth-first search rather than the enforced hill climber. The heuristic is having very little impact on the performance of the search at this point.

Identidem – an alternative to EHC
Replace systematic search on plateaux with local-search with restarts.

As in EHC: record the best state seen so far.
When expanding a state S at the route of a plateau:
Choose one successor, at random, even if it’s not better than the best seen so far (it’s not as we’re on a plateau);
If more than d steps since the best state: jump back to the best and search from there again:
d is the probe depth bound
“A New Local-Search Algorithm for Forward-Chaining Planning.”
Coles, Fox and Smith, Proc. ICAPS 2007.

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

So let’s find a way to improve the search process when on the plateau. One approach, which my colleague Dr Coles helped establish is Identidem: a process that attempts get out of plateau and local minima by using local search search with what are known as restarts.

We operate the same as before, only this time, when a plateau is found, we choose one of the states at random, exploring down a path to a certain depth. In the event that we do not reach a new improving step within a certain number of iterations – known as the probe depth bound – we start again, but this time we search down a different path. The actual probe depth bound can increase over time if no solutions are found, but is set to be an initial value on start.

probe depth bound set to 2
probe depth bound increased to 3

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

The probe depth is important, given we want to found a way to increase the search, but also minimise how deep we search at different periods of time given we don’t want to just conduct a deep random walk into the search tree.

So here in this example we’ve reached a plateau and we search down one path, then another and another. It doesn’t work out, so we increase the probe depth bound from 2 to 3 and open running it again, it works out much better.

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

And while ultimately this winds up being markedly similar to the normal behaviour of EHC, the big different is that now instead of this very expensive process using breadth first search. Instead, we’re now reliant on this more structured local search through the plateau. This will cut down on excessive exploration that doesn’t yield improvement in the best case and in the worst case it will prove no better than breadth first search.

Local Search: Two key terms
Diversification – try lots of different things, to try to brute-force around the weaknesses of the heuristic

Intensification – explore lots of options around a given state (e.g. near to where the goal might be)

Local search is a balancing act
Long probes = good for diversification
Short probes = good for intensification

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

So when we’re using local search like this, there are two aspects we need to focus on:
Diversification: we want to try lots of different things and hopefully found new solutions that will get us around the weaknesses of the heuristic.
Intensification: focus more on areas that may yield the most value.

For this type of situation, we want to use different types of probes in the search space to suit our needs.
The long probes will better enable diversfication, while the short is for intensification.

If you’ve ever studied any machine learning, notably reinforcement learning, this has a lot of parallels with the exploitation/exploration trade-off discussed in that field. Given we want to explore our options, but also hone in on known areas where we could yield good performance.

Different Neighbourhoods – FF
When expanding a state, S, a neighbourhood function is used.
The simplest neighbourhood: all states reachable from S by applying a single action.
Disadvantage: not all successors are interesting to consider

In FF, when performing EHC: the helpful actions neighbourhood is a subset of these:
Those actions reached by applying actions related to the relaxed plan to the goal from S gives an interesting subset of actions

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

It’s important to recognise that the neighbourhoods that emerge in local search, the areas of the state space we explore as we visit these areas is going to be quite different.

When running this in FF, we’re generating the simplest possible neighbourhood of states. It’s all states reachable from S.
The problem here is that not all of these successors are going to be interesting.
As mentioned earlier in this series, we talked about the use of helpful actions, which are actions that appear in the relaxed plan for the RPG at this state. This can be more useful, given it can allow us to have a more interesting subset to explore.

Different Neighbourhoods – only ever chooses a single successor:
Many are evaluated, only one is chosen.

Evaluating successors carries a cost;
Solution: The random subset neighbourhood:
Pick a subset of the neighbourhood, only evaluate those.

Ties in well with restarts: different subset chosen each time a state is visited

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

However, when we run this in Identidem, we’re only generating on successor, but we’re evaluating them all first.
This has pros and cons, given we’re now evaluating the successors, but we can focus on only evaluate a subset of the neighbourhood each time Identidem does the local exploration.
This ties in well with the restarts, given the subsets can switch up each time and help us diversify where we explore on each iteration.

Successor Selection
In EHC, the ‘best’ successor was always chosen;
Imperfect heuristic -> wrong decisions -> dead ends.
In Identidem: roulette selection used:

Make a random choice, biased by heuristic values

State with lowest h(S’)

Second best state
Third best state

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

Once this evaluation is completed, we use a roulette wheel selection process: whereby if you think of it as a wheel that you spin for the selected region, the size of region of the wheel a given state has is biased based on the heuristic value, with the lowest having the largest segment. Having three to pick from at random has proven to work quite effectively, given we still achieve some diversification since there is randomness still being employed here.

Does Identidem perform better than FF?
Short answer – yes!
Slightly longer answer – most of the time:
Systematic search better in some domains
Sometimes the parameters need changing:
Random sample size, depth bound, whether to use roulette selection…
“A New Local-Search Algorithm for Forward-Chaining Planning.”
Coles, Fox and Smith, Proc. ICAPS 2007.

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

And to come back to my point earlier, does this actually work out better than FF?
In short? Yes.
But in truth, only most of the time.
This more systematic

Systematic Search vs Local Search
Good for ‘puzzles’ with lots of bad moves (keeping one successor = probably a wrong move)

Good at proving optimality.

Are provably complete – will find a solution if one exists (or die trying)
Generally quicker at finding some solution

Using them repeatedly usually gets near-optimal solutions fairly quickly, but no proof

Incomplete, but can always run systematic search afterwards (e.g. EHC then Best-First)

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

In summary, we’re seeing the benefits and drawbacks of employing two different types of search: systematic search is useful for proving optimality of a solution as well as actually finding a solution if one of them exists. Hence it can be very good in situations like puzzles, where if we hit a plateau of lots of bad decisions, we can feel our way out of that space.

Meanwhile local search like that seen in Identidem is generally quicker at finding a solution that gets us out of that pleateau, but is incomplete, we can’t guarantee it’s going to get us a solution at all. Hence with the likes of EHC we use best-first as a fallback.

Classical Planning
Dual Heuristics & Local Search Optimisation
6CCS3AIP – Artificial Intelligence Planning
Dr Tommy Thompson

FACULTY OF NATURAL & MATHEMATICAL SCIENCES
DEPARTMENT OF INFORMATICS

And with that we wrap up this section on dual heuristics and local search optimisation. Be sure to consult the additional material provided over in KEATS and we’ll catch you in the next session.

/docProps/thumbnail.jpeg

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com