程序代写代做代考 algorithm game go deep learning AI graph CMPUT 366 F20: Real-time Heuristic Search

CMPUT 366 F20: Real-time Heuristic Search
Vadim Bulitko
September 22, 2020
CMPUT 366 F20: Real-time Heuristic Search
1

Lecture Outline
Real-time Heuristic Search
two optional research papers posted on eClass
CMPUT 366 F20: Real-time Heuristic Search 2

Problems with Search Algorithms So Far
All search algorithms so far were off-line and bird’s eye an agent sits in place computing a solution in its head then blindly executes the solution computed
Problems?
per action limit (e.g., driving)
needs to know the whole search graph a priori dynamic world (e.g., other agents)
CMPUT 366 F20: Real-time Heuristic Search 3

Agent-Centred Search
search is centred on an agent
agent interleaves planning and plan execution (i.e., acting) agent updates its knowledge of the world as it moves about agent can access the world around its current state
agent has bounded memory (less than the world)
CMPUT 366 F20: Real-time Heuristic Search 4

Real-time Heuristic Search
The problem with time-bounded A* is that it losses completeness when its planning time is bounded
inaccuracies in the heuristic can lead the search astray An alternative?
Complement planning with (heuristic) learning
CMPUT 366 F20: Real-time Heuristic Search 5

Acting Greedily
Until in a goal state do 1. lookaround
2. compute the f of each neighbour 3. move to the lowest-f neighbour
Hill-climbing
What is the problem?
incomplete – get stuck in dead-ends (heuristic depressions)
CMPUT 366 F20: Real-time Heuristic Search 6

Learning Real-time A* (LRTA*) [Korf 1990]
Until in a goal state do
1. lookattheimmediateneighbours
2. compute the f of each neighbour
3. update the h of the current state with the lowest f among the neighbours 4. move to the lowest-f neighbour
In teams, trace LRTA* on the following problem:
CMPUT 366 F20: Real-time Heuristic Search 7

LRTA*
Complete on finite, safely explorable graphs with positive edge costs
admissibility of the heuristic
cannot cycle without an increase of the heuristic
CMPUT 366 F20: Real-time Heuristic Search 8

LRTA*
Complete on finite, safely explorable graphs with positive edge costs
admissibility of the heuristic cannot cycle without an increase of the heuristic
CMPUT 366 F20: Real-time Heuristic Search 9

LRTA*
Complete on finite, safely explorable graphs with positive edge costs
Repeated runs on the same problem converge to an optimal solution Similar to getting better at commuting
CMPUT 366 F20: Real-time Heuristic Search 10

LRTA*
Satisfies:
search is centred on an agent
agent interleaves planning and plan execution (i.e., acting) agent updates its knowledge of the world as it moves about agent can access the world around its current state
agent has bounded memory (less than the world)
Shortcomings?
substantial state re-visitation
CMPUT 366 F20: Real-time Heuristic Search 11

Scrubbing
Heuristic too low (i.e., overly optimistic) The agent will go to the state
But will raise the heuristic by little:
assume that cost(s, s′) = cost(s′, s)
hnew(s) ← cost(s, s′) + h(s′)
h(s′) and hold(s) are at most cost(s, s′) apart hnew(s) and hold(s) are at most 2 · cost(s, s′) apart
Thus will have many re-visits [Sturtevant and Bulitko 2016]
CMPUT 366 F20: Real-time Heuristic Search 12

Expendable State Marking [Sharon, Sturtevant, and Felner 2013]
Detect when a state is (locally) expendable
Remove it from the search graph (together with all edges going to/from it)
Local + conservative + independent of agent’s experience
CMPUT 366 F20: Real-time Heuristic Search 13

CMPUT 366 F20: Real-time Heuristic Search 14

Introduction to Real-time Heuristic Search (RTHS)
Heuristic search on a graph pathfinding
general planning [Orkin 2006]
Real-time heuristic search [Korf 1990] agent-centred search [Koenig 2001]
select the most-promising neighbour make heuristic locally smooth
move to the most-promising neighbour repeat until reaching the goal
Initial heuristic is poor
updated on the fly, from experience
local updates
makes the solution better over time
Sierra Entertainment
CMPUT 366 F20: Real-time Heuristic Search
15

Problem with RTHS
State revisits (scrubbing)
common in heuristic depressions sometimes unavoidable due to conservative updates to maintain heuristic consistency [Sturtevant and Bulitko 2016]
Consequently, RTHS is not used in video games much
Nathan Sturtevant
CMPUT 366 F20: Real-time Heuristic Search
16

Related Work
Pruning of vertices from the graph while preserving connectivity
single state [Sharon, Sturtevant, and Felner 2013] entire closed list [Hernandez et al. 2017]
Reduces graph size → less scrubbing Permanent and binary (pruned/not pruned) Not based on agent’s experience
Agent gets lost in a castle due to its heuristic inaccuracy [Bulitko 2017]. The agent should remember that fact but cannot prune the castle entrance (as it may lie on the only path to goal)
Bethesda Softworks
CMPUT 366 F20: Real-time Heuristic Search
17

Our Approach: Anxious RTHS
Make revisited states less attractive to the agent but still accessible (i.e., not pruned permanently)
How (un)attractive a state is depends on the agent’s experience with it
Inspiration from psychology
State re-visitation raises the agent’s level of anxiety
While anxious, the agent raises the heuristic more aggressively (i.e., makes the states unappealing)
Anxiety subsides over time
CMPUT 366 F20: Real-time Heuristic Search 18

Algorithmic Details
Algorithm 1: Anxious Real-time Heuristic Search
input :searchproblem(G,c,s0,sg,h);anxietycontrolparametersα,∆a,Lmax;expendablepruningswitchE
output: path (s0,s1,…,sT),sT = sg 1 t←0,ht ←h,a←0
2 while st ̸= sg do
3 ifL(st) ≥Lmaxthen
ht(st)
4 L(st) ← 0
5 a ← max{0, a − ∆a}
6 a←a+α· L(st) ht(st)
􏰗􏰘
7 ht+1(st) ← max ht(st), a + min (c(st, s) + ht(s)) s∈N(st )
8 if E & ht+1(st) > ht(st) & E(st) then
9 G ← G \ {st}
10 st+1 ← arg min (c(st, s) + ht(s)) s∈N(st )
11 t←t+1
CMPUT 366 F20: Real-time Heuristic Search 19

Empirical Evaluation: Problem Set
started with the common 493298 problems on 342 MovingAI video-game maps [Sturtevant 2012]
ran LRTA* with lookahead of 1, excluded all problems where its suboptimality was below 5 — too easy
282064 reasonably hard problems left
also created a set of 563 problems by first selecting 1000 out of 493298 and
then pruning out all problems where LRTA* had suboptimality below 5
CMPUT 366 F20: Real-time Heuristic Search 20

Empirical Evaluation: Parameter Tuning
Tried 567 combinations of parameters controlling the anxiety mechanism: anxiety rate α ∈ {1, 50, 100, 500, 2000, 3000, 4000, 5000, 10000}
anxietydecay∆α ∈{0,1,3,5,10,15,20}
memoryspanLmax ∈{0.005,0.07,0.13,0.19,0.25,0.31,0.38,0.44,0.5}
on the 563-problem set
ran each of the 567 combinations with expendable state pruning on (E = true)
and once with it off (E = false)
CMPUT 366 F20: Real-time Heuristic Search 21

Empirical Evaluation: Parameter Tuning Results
CMPUT 366 F20: Real-time Heuristic Search 22

Empirical Evaluation: Portability of Tuned Parameters
Here [9] is a high-performance algorithm found by Bulitko (2016)
CMPUT 366 F20: Real-time Heuristic Search 23

Discussion
Anxiety parameters tuned for a small set performed well on a larger set:
the combination of anxiety and marking expendable states won against either of them individually
LRTA* with anxiety and marking expendable states outperformed even an algorithm that uses other building blocks, including:
depression avoidance [Hernández and Baier 2012]
weighted learning [Rivera, Baier, and Hernández 2015; Bulitko 2016b; Bulitko 2016c]
Only in terms of the geometric mean: the external algorithm ([9]) still wins on the arithmetic mean: 32.64 ± 58.00 versus 146.04 ± 7691.16
CMPUT 366 F20: Real-time Heuristic Search 24

When to Be Anxious?
On the 563-problem set, using anxiety only on problems where it helps reduces mean suboptimality from 13.03 ×/÷ 3.27 to 11.83 ×/÷ 3.01
Benefit of anxiety – difference of the solution suboptimality without anxiety and the solution suboptimality with anxiety
Can we turn anxiety on/off per problem as needed?
CMPUT 366 F20: Real-time Heuristic Search 25

Controlling Anxiety per Problem
Ran LRTA*+E (second row in Table II) and LRTA*+E with anxiety (last row in Table II) on each of the 282064 problems
Removed 189 problems whose benefit was more than 2 standard deviations away from the mean as outliers
Clustered the remaining benefit values into k = 3 clusters using k-means: Cluster 1: 268060 problems, mean benefit of anxiety was 11.8
Cluster 2: 12032 problems, mean benefit of anxiety was 383.4
Cluster 3: 1783 problems, mean benefit of anxiety was −546.1
Discarded cluster 1
Visualized problems from clusters 2 an 3 a la recent work [Sigurdson and Bulitko 2017]
CMPUT 366 F20: Real-time Heuristic Search 26

Controlling Anxiety per Problem
CMPUT 366 F20: Real-time Heuristic Search 27

Controlling Anxiety per Problem
Trained AlexNet [Krizhevsky, Sutskever, and Hinton 2012] on images in the two clusters
Test binary classification accuracy reached 87.5%
Used trained AlexNet to turn anxiety on and off on a per-problem basis
Resulting suboptimality was slightly worse than leaving anxiety always on with LRTA*+E
CMPUT 366 F20: Real-time Heuristic Search 28

Current Limitations & Future Work
Generalize anxiety for lookahead greater than 1 with LSS LRTA* [Koenig and Sun 2009]
Compare anxious learning with:
depression avoidance [Hernández and Baier 2012]
weighted learning [Rivera, Baier, and Hernández 2015; Bulitko 2016c] lateral learning [Bulitko and Sampley 2016]
Add anxious learning to set of building blocks and evolve the best algorithm automatically [Bulitko 2016a]
Consider other factors that can raise anxiety (e.g., the number of state revisits) Test if anxiety in RTHS yields more human-like behaviour [Bulitko 2017]
in tactical pathfinding (e.g., with influence maps)
CMPUT 366 F20: Real-time Heuristic Search 29

Summary
A new learning mechanism for RTHS Benefits over expandable state pruning Potential to yield more human-like behaviour
CMPUT 366 F20: Real-time Heuristic Search 30

Bibliography I
Bulitko, Vadim (2016a). “Evolving Real-time Heuristic Search Algorithms”. In: Proceedings of the Fifteenth International Conference on the Synthesis and Simulation of Living Systems (ALIFEXV), pp. 108–115.
– (2016b).“Per-mapAlgorithmSelectioninReal-timeHeuristicSearch”.In:ProceedingsofConference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), (in press).
– (2016c).“SearchingforReal-timeSearchAlgorithms”.In:ProceedingsoftheSymposiumon Combinatorial Search (SoCS), pp. 121–122.
– (2017).“EffectsofSelf-knowledge:OnceBittenTwiceShy”.In:ProceedingsoftheExperimentalAIin Games (EXAG) Workshop at the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), pp. 26–33.
Bulitko, Vadim and Alexander Sampley (2016). “Weighted Lateral Learning in Real-time Heuristic Search”. In: Proceedings of the Symposium on Combinatorial Search, (in press).
Hernández, Carlos and Jorge A. Baier (2012). “Avoiding and Escaping Depressions in Real-Time Heuristic Search”. In: Journal of Artificial Intelligence Research 43, pp. 523–570.
Hernandez, Carlos et al. (2017). “Online Bridged Pruning for Real-Time Search with Arbitrary Lookaheads”. In: Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pp. 510–516.
CMPUT 366 F20: Real-time Heuristic Search 31

Bibliography II
Koenig, Sven (2001). “Agent-Centered Search”. In: Artificial Intelligence Magazine 22.4, pp. 109–132.
Koenig, Sven and Xiaoxun Sun (2009). “Comparing Real-Time and Incremental Heuristic Search for Real-Time Situated Agents”. In: Journal of Autonomous Agents and Multi-Agent Systems 18.3, pp. 313–341.
Korf, R.E. (1990). “Real-time heuristic search”. In: Artificial Intelligence 42.2–3, pp. 189–211.
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E Hinton (2012). “ImageNet Classification with Deep Convolutional Neural Networks”. In: Advances in Neural Information Processing Systems. Vol. 25, pp. 1097–1105.
Orkin, Jeff (2006). “Three states and a plan: the AI of FEAR”. In: Game Developers Conference, p. 4.
Rivera, Nicolas, Jorge A. Baier, and Carlos Hernández (2015). “Incorporating weights into real-time heuristic search”. In: Artificial Intelligence 225, pp. 1–23. DOI: 10.1016/j.artint.2015.03.008. URL: http://dx.doi.org/10.1016/j.artint.2015.03.008.
Sharon, Guni, Nathan R. Sturtevant, and Ariel Felner (2013). “Online Detection of Dead States in Real-Time Agent-Centered Search”. In: Proceedings of the Symposium on Combinatorial Search, pp. 167–174. URL: http://www.aaai.org/ocs/index.php/SOCS/SOCS13/paper/view/7226.
CMPUT 366 F20: Real-time Heuristic Search 32

Bibliography III
Sigurdson, Devon and Vadim Bulitko (2017). “Deep Learning for Real-time Heuristic Search Algorithm Selection”. In: Proceedings of the Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), pp. 108–114.
Sturtevant, N. R. (2012). “Benchmarks for Grid-Based Pathfinding”. In: Transactions on Computational Intelligence and AI in Games 4.2, pp. 144 –148. URL: http://web.cs.du.edu/~sturtevant/papers/benchmarks.pdf.
Sturtevant, Nathan and Vadim Bulitko (2016). “Scrubbing During Learning In Real-time Heuristic Search”. In: Journal of Artificial Intelligence Research, (in press).
CMPUT 366 F20: Real-time Heuristic Search 33

Related Posts