CS代考程序代写 algorithm DNA CS 561a: Introduction to Artificial Intelligence

CS 561a: Introduction to Artificial Intelligence

CS 561, Sessions 4-5
1
This time: informed search
Informed search:
Use heuristics to guide the search
Best first
A*
Heuristics
Hill-climbing
Simulated annealing

CS 561, Sessions 4-5
2
Best-first search
Idea:
use an evaluation function for each node; estimate of “desirability”
expand most desirable unexpanded node.

Implementation:
QueueingFn = insert successors in decreasing order of desirability

Special cases:
greedy search
A* search

CS 561, Sessions 4-5
3
Romania with step costs in km

374
329
253

CS 561, Sessions 4-5
4
Greedy search
Estimation function:
h(n) = estimate of cost from n to goal (heuristic)

For example:
hSLD(n) = straight-line distance from n to Bucharest

Greedy search expands first the node that appears to be closest to the goal, according to h(n).

CS 561, Sessions 4-5
5

CS 561, Sessions 4-5
6

CS 561, Sessions 4-5
7

CS 561, Sessions 4-5
8

CS 561, Sessions 4-5
9
Properties of Greedy Search
Complete?

Time?

Space?

Optimal?

CS 561, Sessions 4-5
10
Properties of Greedy Search
Complete? No – can get stuck in loops
e.g., Iasi > Neamt > Iasi > Neamt > …
Complete in finite space with repeated-state checking.

Time? O(b^m) but a good heuristic can give
dramatic improvement

Space? O(b^m) – keeps all nodes in memory

Optimal? No.

CS 561, Sessions 4-5
11
A* search
Idea: avoid expanding paths that are already expensive

evaluation function: f(n) = g(n) + h(n) with:
g(n) – cost so far to reach n
h(n) – estimated cost to goal from n
f(n) – estimated total cost of path through n to goal

A* search uses an admissible heuristic, that is,
h(n)  h*(n) where h*(n) is the true cost from n.
For example: hSLD(n) never overestimates actual road distance.

Theorem: A* search is optimal

CS 561, Sessions 4-5
12
A* search
A* search uses an admissible heuristic, that is,
h(n)  h*(n) where h*(n) is the true cost from n.

Theorem: A* search is optimal

Note: A* is also optimal if the heuristic is consistent, i.e.,

(a consistent heuristic is admissible (by induction), but the converse is not always true)

Actual cost
From N to P

CS 561, Sessions 4-5
13

CS 561, Sessions 4-5
14

CS 561, Sessions 4-5
15

CS 561, Sessions 4-5
16

CS 561, Sessions 4-5
17

CS 561, Sessions 4-5
18

CS 561, Sessions 4-5
19

1
Optimality of A* (standard proof)
Suppose some suboptimal goal G2 has been generated and is in the queue. Let n be an unexpanded node on a shortest path to an optimal goal G1.

CS 561, Sessions 4-5
20
Optimality of A* (more useful proof)

CS 561, Sessions 4-5
21

f-contours
How do the contours look like when h(n) =0?

CS 561, Sessions 4-5
22
Properties of A*
Complete?

Time?

Space?

Optimal?

CS 561, Sessions 4-5
23
Properties of A*
Complete? Yes, unless infinitely many nodes with f  f(G)

Time? Exponential in [(relative error in h) x (length of solution)]

Space? Keeps all nodes in memory

Optimal? Yes – cannot expand fi+1 until fi is finished

CS 561, Sessions 4-5
24
Proof of lemma: pathmax

CS 561, Sessions 4-5
25
Admissible heuristics

CS 561, Sessions 4-5
26
Admissible heuristics

CS 561, Sessions 4-5
27
Relaxed Problem

Admissible heuristics can be derived from the exact solution cost of a relaxed version of the problem.

If the rules of the 8-puzzle are relaxed so that a tile can move anywhere, then h1(n) gives the shortest solution.

If the rules are relaxed so that a tile can move to any adjacent square, then h2(n) gives the shortest solution.

CS 561, Sessions 4-5
28
This time
Iterative improvement
Hill climbing
Simulated annealing

CS 561, Sessions 4-5
29
Iterative improvement
In many optimization problems, path is irrelevant;
the goal state itself is the solution.

Then, state space = space of “complete” configurations.
Algorithm goal:
– find optimal configuration (e.g., TSP), or,
– find configuration satisfying constraints
(e.g., n-queens)

In such cases, can use iterative improvement algorithms: keep a single “current” state, and try to improve it.

CS 561, Sessions 4-5
30
Iterative improvement example: vacuum world

Simplified world: 2 locations, each may or not contain dirt,
each may or not contain vacuuming agent.
Goal of agent: clean up the dirt.

If path does not matter, do not need to keep track of it.

CS 561, Sessions 4-5
31
Iterative improvement example: n-queens
Goal: Put n chess-game queens on an n x n board, with no two queens on the same row, column, or diagonal.

Here, goal state is initially unknown but is specified by constraints that it must satisfy.

CS 561, Sessions 4-5
32
Hill climbing (or gradient ascent/descent)
Iteratively maximize “value” of current state, by replacing it by successor state that has highest value, as long as possible.

CS 561, Sessions 4-5
33
Hill climbing
Note: minimizing a “value” function v(n) is equivalent to maximizing –v(n),

thus both notions are used interchangeably.

Notion of “extremization”: find extrema (minima or maxima) of a value function.

CS 561, Sessions 4-5
34
Hill climbing
Problem: depending on initial state, may get stuck in local extremum.

CS 561, Sessions 4-5
35
Minimizing energy
Let’s now change the formulation of the problem a bit, so that we can employ new formalism:
– let’s compare our state space to that of a physical system that is subject to natural interactions,
– and let’s compare our value function to the overall potential energy E of the system.

On every updating,
we have DE  0

CS 561, Sessions 4-5
36
Minimizing energy
Hence the dynamics of the system tend to move E toward a minimum.

We stress that there may be different such states — they are local minima. Global minimization is not guaranteed.

CS 561, Sessions 4-5
37
Local Minima Problem
Question: How do you avoid this local minimum?

starting
point

descend
direction

local minimum

global minimum

barrier to local search

CS 561, Sessions 4-5
38
Boltzmann machines

h
The Boltzmann Machine of
Hinton, Sejnowski, and Ackley (1984)
uses simulated annealing to escape local minima.

To motivate their solution, consider how one might get a ball-bearing traveling along the curve to “probably end up” in the deepest minimum. The idea is to shake the box “about h hard” — then the ball is more likely to go from D to C than from C to D. So, on average, the ball should end up in C’s valley.

CS 561, Sessions 4-5
39
Consequences of the Occasional Ascents

Help escaping the
local optima.
desired effect
Might pass global optima
after reaching it
adverse effect

(easy to avoid by
keeping track of
best-ever state)

CS 561, Sessions 4-5
40
Simulated annealing: basic idea
From current state, pick a random successor state;

If it has better value than current state, then “accept the transition,” that is, use successor state as current state;

Otherwise, do not give up, but instead flip a coin and accept the transition with a given probability (that is lower as the successor is worse).

So we accept to sometimes “un-optimize” the value function a little with a non-zero probability.

CS 561, Sessions 4-5
41
Demo

CS 561, Sessions 4-5
42
Boltzmann’s statistical theory of gases

In the statistical theory of gases, the gas is described not by a deterministic dynamics, but rather by the probability that it will be in different states.

The 19th century physicist Ludwig Boltzmann developed a theory that included a probability distribution of temperature (i.e., every small region of the gas had the same kinetic energy).

Hinton, Sejnowski and Ackley’s idea was that this distribution might also be used to describe neural interactions, where low temperature T is replaced by a small noise term T (the neural analog of random thermal motion of molecules). While their results primarily concern optimization using neural networks, the idea is more general.

CS 561, Sessions 4-5
43
Boltzmann distribution
At thermal equilibrium at temperature T, the Boltzmann distribution gives the relative probability that the system will occupy state A vs. state B as:

where E(A) and E(B) are the energies associated with states A and B.

CS 561, Sessions 4-5
44

Simulated annealing
Kirkpatrick et al. 1983:

Simulated annealing is a general method for making likely the escape from local minima by allowing jumps to higher energy states.

The analogy here is with the process of annealing used by a craftsman in forging a sword from an alloy.

He heats the metal, then slowly cools it as he hammers the blade into shape.
If he cools the blade too quickly the metal will form patches of different composition;
If the metal is cooled slowly while it is shaped, the constituent metals will form a uniform alloy.

CS 561, Sessions 4-5
45
Simulated annealing in practice
set T
optimize for given T
lower T (see Geman & Geman, 1984)
repeat

Geman & Geman (1984): if T is lowered sufficiently slowly (with respect to the number of iterations used to optimize at a given T), simulated annealing is guaranteed to find the global minimum.

Caveat: this algorithm has no end (Geman & Geman’s T decrease schedule is in the 1/log of the number of iterations, so, T will never reach zero), so it may take an infinite amount of time for it to find the global minimum.

CS 561, Sessions 4-5
46
Simulated annealing algorithm
Idea: Escape local extrema by allowing “bad moves,” but gradually decrease their size and frequency.

Note: goal here is to
maximize E.

CS 561, Sessions 4-5
47
Simulated annealing algorithm
Idea: Escape local extrema by allowing “bad moves,” but gradually decrease their size and frequency.

Algorithm when goal
is to minimize E.

< - - CS 561, Sessions 4-5 48 Note on simulated annealing: limit cases Boltzmann distribution: accept “bad move” with E<0 (goal is to maximize E) with probability P(E) = exp(E/T) If T is large: E < 0 E/T < 0 and very small exp(E/T) close to 1 accept bad move with high probability If T is near 0: E < 0 E/T < 0 and very large exp(E/T) close to 0 accept bad move with low probability CS 561, Sessions 4-5 49 Note on simulated annealing: limit cases Boltzmann distribution: accept “bad move” with E<0 (goal is to maximize E) with probability P(E) = exp(E/T) If T is large: E < 0 E/T < 0 and very small exp(E/T) close to 1 accept bad move with high probability If T is near 0: E < 0 E/T < 0 and very large exp(E/T) close to 0 accept bad move with low probability Random walk Deterministic up-hill 50 Genetic Algorithms 50 CS 561 51 How do you find a solution in a large complex space? Ask an expert? Adapt existing designs? Trial and error? 51 What has been traditionally done in the heat exchanger world is to take a best “guess” at a design with the help of an expert. Traditional heat exchanger designs are usually fully mathematically described. This initial guess can then be used as a basis, and various parameters are “tweaked”. The performance of the heat exchanger can be recalculated to see if the modified design is an improvement on the original one. Surely there must be a better way? There is - we don’t need to look very far to find examples of optimisation in Nature. CS 561 52 Example: Traveling Sales Person (TSP) Classic Example: You have N cities, find the shortest route such that your salesperson will visit each city once and return. This problem is known to be NP-Hard As a new city is added to the problem, computation time in the classic solution increases exponentially O(2n) … (as far as we know) Dallas Houston San Antonio Austin Mos Eisley Is this the shortest path??? A Texas Sales Person What if……… Lets create a whole bunch of random sales people and see how well they do and pick the best one(s). Salesperson A Houston -> Dallas -> Austin -> San Antonio -> Mos Eisely
Distance Traveled 780 Km
Salesperson B
Houston -> Mos Eisley -> Austin -> San Antonio -> Dallas
Distance Traveled 820 Km
Salesperson A is better (more fit) than salesperson B
Perhaps we would like sales people to be more like A and less like B

Question:
do we want to just keep picking random sales people like this and keep testing them?

CS 561
53

CS 561
54
Represent problem like a DNA sequence

San Antonio

Dallas

Mos Eisely

Houston

Austin
Dallas

Houston

Mos Eisely

San Antonio

Austin
Each DNA sequence is an encoding of a
possible solution to the problem.
DNA – Salesperson A
DNA – Salesperson B
The order of the cities in
the genes is the order of
the cities the TSP will take.

54
One the fitness has been assigned, pairs of chromosomes representing heat exchanger designs can be chosen for mating.

The higher the fitness, the greater the probability of the design being selected. Consequently, some of the weaker population members do not mate at all, whilst superior ones are chosen many times.

It is even statistically possible for a member to be chosen to mate with itself. This has no advantage, as the offspring will be identical to the parent.

CS 561
55
Ranking by Fitness:

Here we’ve created three different salespeople. We then checked to see how
far each one has to travel. This gives us a measure of “Fitness”

Note: we need to be able to measure fitness in polynomial time, otherwise we are in trouble.

Travels Shortest Distance

55
One the population has been formed, (either randomly in the initial generation, or by mating in subsequent generations), each population member needs to be assessed against the desired properties – such a rating is called a “fitness”.

The design parameters represented by the zeros and ones in the binary code of each chromosome are fed into the mathematical model describing the heat exchanger. The output parameters for each design are used to give the fitness rating. A good design has a high fitness value, and a poor design a lower value.

Let’s breed them!
We have a population of traveling sales people. We also know their fitness based on how long their trip is. We want to create more, but we don’t want to create too many.
We take the notion that the salespeople who perform better are closer to the optimal salesperson than the ones which performed more poorly. Could the optimal sales person be a “combination” of the better sales people?
We create a population of sales people as solutions to the problem.
How do we actually mate a population of data???
CS 561
56

CS 561
57
Crossover:
Exchanging information through some part of information (representation)

Once we have found the best sales people we will in a sense mate them. We can do this in several ways. Better sales people should mate more often and poor sales people should mate lest often.
Sales People City DNA
Parent 1 F A B | E C G D
Parent 2 D E A | C G B F
Child 1 F A B | C G B F
Child 2 D E A | E C G D

Sales person A (parent 1)
Sales person B (parent 2)
Sales person C (child 1)
Sales person D (child 2)

57
The mating process is analogous to crossover carried out in living cells.

A pair of binary strings are used. A site along the length of the string is chosen randomly. In this example it is shown between the 6th and 7th bits, but it could be anywhere.

Both members of the pair are severed at that site, and their latter portions are exchanged. Two parents form two children, and these two “daughter” designs become members of the population for the next generation.

This process takes place for each pair selected, so the new population has the same number of members as the previous generation.

Crossover Bounds (Houston we have a problem)
Not all crossed pairs are viable. We can only visit a city once.
Different GA problems may have different bounds.

CS 561
58

San Antonio

Dallas

Mos Eisely

Houston

Austin
Dallas

Houston

Austin

San Antonio

Mos Eisely

Dallas

Houston

Mos Eisely

Houston

Austin
San Antonio

Dallas

Austin

San Antonio

Mos Eisely
Parents
Children
Not Viable!!

TSP needs some special rules for crossover
Many GA problems also need special crossover rules.
Since each genetic sequence contains all the cities in the travel, crossover is a swapping of travel order.
Remember that crossover also needs to be efficient.
CS 561
59

San Antonio

Dallas

Mos Eisely

Houston

Austin
Dallas

Mos Eisely

Houston

San
Antonio

Austin
San Antonio

Dallas

Houston

Austin

Mos Eisely
Parents
Children
Viable 
Dallas

Houston

Austin

San Antonio

Mos Eisely

What about local extrema?
With just crossover breading, we are constrained to gene sequences which are a cross product of our current population.
Introduce random effects into our population.
Mutation – Randomly twiddle the genes with some probability.
Cataclysm – Kill off n% of your population and create fresh new salespeople if it looks like you are reaching a local minimum.
Annealing of Mating Pairs – Accept the mating of suboptimal pairs with some probability.
Etc…
CS 561
60

CS 561
61
In summation: The GA Cycle

Fitness
Selection
Crossover
Mutation
New Population

(see, e.g., Wikipedia on Fitness
proportionate selection)

61
Summary of the previous steps to the model.

Populations are continuously produced, going round the outer loop of this diagram, until the desired amount of optimisation has been achieved.

CS 561
62
Demo

62
Summary of the previous steps to the model.

Populations are continuously produced, going round the outer loop of this diagram, until the desired amount of optimisation has been achieved.

GA and TSP: the claims
Can solve for over 3500 cities (still took over 1 CPU years).
Maybe holds the record.
Will get within 2% of the optimal solution.
This means that it’s not a solution per se, but is an approximation.
CS 561
63

GA Discussion
We can apply the GA solution to any problem where the we can represent the problems solution (even very abstractly) as a string.

We can create strings of:
Digits
Labels
Pointers
Code Blocks – This creates new programs from strung together blocks of code. The key is to make sure the code can run.
Whole Programs – Modules or complete programs can be strung together in a series. We can also re-arrange the linkages between programs.

The last two are examples of Genetic Programming
CS 561
64

Things to consider
How large is your population?
A large population will take more time to run (you have to test each member for fitness!).
A large population will cover more bases at once.

How do you select your initial population?
You might create a population of approximate solutions. However, some approximations might start you in the wrong position with too much bias.
How will you cross bread your population?
You want to cross bread and select for your best specimens.
Too strict: You will tend towards local minima
Too lax: Your problem will converge slower
How will you mutate your population?
Too little: your problem will tend to get stuck in local minima
Too much: your population will fill with noise and not settle.

CS 561
65

GA is a good no clue approach to problem solving
GA is superb if:
Your space is loaded with lots of weird bumps and local minima.
GA tends to spread out and test a larger subset of your space than many other types of learning/optimization algorithms.

You don’t quite understand the underlying process of your problem space.
NO I DONT: What makes the stock market work??? Don’t know? Me neither! Stock market prediction might thus be good for a GA.
YES I DO: Want to make a program to predict people’s height from personality factors? This might be a Gaussian process and a good candidate for statistical methods which are more efficient.

You have lots of processors
GA’s parallelize very easily!
CS 561
66

Why not use GA?
Creating generations of samples and cross breading them can be resource intensive.
Some problems may be better solved by a general gradient descent method which uses less resource.
However, resource-wise, GA is still quite efficient (no computation of derivatives, etc).

In general if you know the mathematics, shape or underlying process of your problem space, there may be a better solution designed for your specific need.
Consider Kernel Based Learning and Support Vector Machines?
Consider Neural Networks?
Consider Traditional Polynomial Time Algorithms?
Etc.

CS 561
67

More demos: motorcycle design
CS 561
68

CS 561, Sessions 4-5
69
Summary
Best-first search = general search, where the minimum-cost nodes (according to some measure) are expanded first.

Greedy search = best-first with the estimated cost to reach the goal as a heuristic measure.
– Generally faster than uninformed search
– not optimal
– not complete.

A* search = best-first with measure = path cost so far + estimated path cost to goal.
– combines advantages of uniform-cost and greedy searches
– complete, optimal and optimally efficient
– space complexity still exponential

Hill climbing and simulated annealing: iteratively improve on current state
– lowest space complexity, just O(1)
– risk of getting stuck in local extrema (unless following proper simulated annealing schedule)

Genetic algorithms: parallelize the search problem

B
C
A
Basin of
Attraction for C
D
E

EMBED Word.Picture.8

_997354951.unknown

)
/
)
(
exp(
)
/
)
(
exp(
)
(
)
(
exp
)
(
)
(
T
A
E
T
B
E
T
B
E
A
E
B
P
A
P
=
÷
ø
ö
ç
è
æ


=

/docProps/thumbnail.jpeg