PowerPoint Presentation
Improving LPRPG-P Using Distance-Cost Pairs
” Searching for Good Solutions in Goal-Dense Search Spaces.” A. J. Coles and A. I. Coles. ICAPS 2013.
Copyright By PowCoder代写 加微信 powcoder
Distance to go, cost to go
Distance to go, cost to go
In theory, what is the ‘distance to go’?
(i.e. minimum number of actions to call something a goal state)
Distance to go, cost to go
In theory, what is the cost to go for this state?
(Equivalent to asking ‘which forgo actions must be applied?’)
Distance to go, cost to go
d(S) = 0 → by calling this a goal state right now, and applying no more actions
c(S) = 0 → by applying actions to meet all the goals, so paying no preference costs
What we want: distance–cost pairs:
4 actions will get down to cost 20
8 actions will get down to cost 10
10 actions will get down to cost 7…
Building block: the relaxed plan
Start by pretending all goals are hard goals
What are these actions for?
To meet the actual hard goals (if any)
To meet some preference
Record this when building the relaxed plan.
Annotations
‘Colour code’ with the reason each action is in the plan (H = hard goals)
Distance-cost pairs for previous slide
5 actions = cost 3
7 actions = cost 2
11 actions = cost 1
12 actions = cost 0
The question on none of your minds
Why did you not have the distance cost pair 9:2 (meet the hard goals and p0)?
Why did you not have the distance cost pair 10:1 (meet the hard goals and p0,p1)?
Is even better than the 11:1 on the previous slide
Answer: trying all combinations of preferences would give many more distance-cost pairs, but take ages. Compromise: use a greedy algorithm. Start with just the hard goals and then add preferences one at a time:
Greedy by length: add the preference that requires the fewest actions to be added to the relaxed plan in order to satisfy it (minimise distance to go);
Greedy by cost: add the preference that reduces the cost of the resulting relaxed plan by the most (i.e. the most expensive one that is still violated).
How do we use these?
Now we have a trade-off in search:
Want to explore good states that are close to satisfying all the preferences (high distance, low cost);
But also prioritise reaching better states quickly (low distance, high-er cost).
Solution, dual open-list search:
One open-list sorted by h(all) the length of the relaxed plan to satisfy all preferences;
One open-list sorted by h(