Evolutionary Algorithms
Dr Matthew Science University of Auckland
Overview of the basics of evolution
Copyright By PowCoder代写 加微信 powcoder
An evolutionary algorithm for designing a paper airplane
Terminology: phenotype, genotype, fitness, fitness landscape, selection, mutation
Pseudocode for a GA.
Details of genotypes, methods for selection, mutation, crossover etc.
Natural Selection
When you have
– a population of individuals
– that reproduce in a way that offspring are similar
to their parents (known as “heritable
reproduction”)
– and where there is variety in population (in
terms of how well they can reproduce)
Then a process called “natural selection” will take place:
Individuals that are better at successfully reproducing will have more offspring, and so the traits of those individuals will become increasingly common in the population as time passes
A key term here is “fitness” which is used as a shorthand to describe how good an individual is at reproducing.
– Can they survive long enough to get to the age where they can reproduce?
– Avoid being eaten
– Find enough food / water etc.
– Can they find a mate (if they reproduce
sexually)?
– Can they support their offspring long
enough that they survive?
Natural vs. artificial evolution
In natural evolution fitness is an abstraction of a wide variety of factors that contribute to how likely an organism is to reproduce.
These include phenotypic traits, environment, luck, other organisms (conspecifics or predators or prey or competitors), etc.
But in artificial evolution fitness is specified by a person. You specify what it is to be fit!
– breeding
– genetic algorithms
A genetic algorithm to design a paper airplane…
Evolving a paper airplane
1. Generate 20 random sequences of folding instructions
2. Fold each piece of paper according to instructions written on them
3. Throw them all out of the window
4. Pick up the ones that went furthest, look at the instructions
5. Produce 20 new pieces of paper, writing on each bits of sequences from parent
pieces of paper, occasionally making a mistake.
6. Repeat from (2) on.
Fold TL to BR towards you Fold horizontal middle away Fold vertical middle towards
Fold TR to BL towards you Fold horizontal middle away Fold vertical middle away
1. Generate 20 random sequences of folding instructions
2. Fold each piece of paper according to instructions written on them
3. Throw them all out of the window
4. Pick up the ones that went furthest, look at the instructions
5. Produce 20 new pieces of paper, writing on each bits of sequences from parent
pieces of paper, occasionally making a mistake.
6. Repeat from (2) on.
1. Generate 20 random sequences of folding instructions
2. Fold each piece of paper according to instructions written on them
3. Throw them all out of the window
4. Pick up the ones that went furthest, look at the instructions
5. Produce 20 new pieces of paper, writing on each bits of sequences from parent
pieces of paper, occasionally making a mistake.
6. Repeat from (2) on.
Generation 1
Generation 2
Generation 3 Generation 4
55 59 99 99 99 99 99
Q: What would happen without mutation?
A: The population converges (variety is lost). If we assume that there is no randomness in measuring fitness, the population becomes homogenous — all 9s.
with mutation
Random mutations occur, maintaining variation in the population.
We saw in the previous slide that the population CONVERGED. At the end, everyone was a ‘9’.
Mutations counteract convergence, maintaining variety and producing novelty in the population, which allows adaptation to continue.
Most mutations are deleterious (cause a decrease in fitness), but rarely, they are advantageous (they increase fitness).
Only when there are mutations, can fitness increase above the maximum fitness of the initial population.
Generation 1
Generation 2
Generation 3
Generation 4
With mutation No mutation
Terminology
How good a particular solution is at solving the problem at hand.
A specification of solution. The genotype can be copied and mutated.
An instantiation of the genotype. i.e. an attempt to solve a problem. A phenotype can be evaluated (its fitness can be measured)
1. Generate 20 random sequences of folding instructions
2. Fold each piece of paper according to instructions written on them
3. Throw them all out of the window
4. Pick up the ones that went furthest, look at the instructions
5. Produce 20 new pieces of paper, writing on each bits of sequences from parent
pieces of paper, occasionally making a mistake.
6. Repeat from (2) on.
Genotype & Phenotype
1. Generate 20 random sequences of folding instructions
2. Fold each piece of paper according to instructions written on them
3. Throw them all out of the window
4. Pick up the ones that went furthest, look at the instructions
5. Produce 20 new pieces of paper, writing on each bits of instruction sequences from
parent pieces of paper, occasionally making a mistake.
6. Repeat from (2) on.
the genotype can be copied and mutated
the phenotype can be evaluated for fitness
(sometimes there is essentially no difference between the two)
More examples…
You could be…
…evolving for optimal timetables
genotype: string encoding timetable allocations phenotype: schedule
fitness: (negative) number of clashes
…or for optimal aircraft wing design
genotype: listing various wing dimensions phenotype: actual wing (or simulation thereof..) fitness: formula based on lift/drag/cost of wing
GA pseudocode
Mutation Selection Evaluating fitness
The most common genotype formats are…
A sequence of symbols from a finite alphabet. This kind of genotype is most directly comparable to the nucleotides of DNA..
…AAGAGCTTGTGAAGAGTC… In GA’s you might see binary strings..
..01010010111010110101111101111…
A sequence of real values.
e.g.: [0.0151, 0.4, 0.942…].
These values are often constrained to lie in a certain range, e.g. between 0 and 1, but they don’t always have to be.
Trees etc.
There are many of alternative possibilities. Trees are also commonly used in genetic programming (GP), where the genotype is a tree structure that can be translated into a program (the phenotype).
Remember…
– genotypes (easy to copy & mutate)
– phenotypes (can be evaluated for
For binary strings…
For binary encodings a good (very approximate) rule of thumb is to have 1 mutation per bit. So each time you generate an offspring, each bit has a 1/N chance of being flipped where N is the number of binary genes.
For real numbers…
For real encodings the approach is often quite different form of ¡°creep mutation,¡± where you mutate every gene by a small amount. I use a value selected from a Gaussian distribution with a standard deviation of 2% of the total allowed range of the value.
Limiting the range of genes
Mutation can cause genes to leave their allowed range. How can you best ensure that this doesn’t happen?
Three possible solutions are the following (presuming an allowed gene value between 0 and 1):
Which of these is best to use can depend upon the problem at hand, but generally bouncing and wrapping are seen as better than clamping, as clamping increases the number of mutations that produce gene values at the limits, biasing the mutation.
Q: Wrapping can also have negative effects, can you imagine what they might be?
1.06 ¡ú 1 1.06 ¡ú 0.94 1.06 ¡ú 0.06
Fitness landscapes
Q: What is the problem with having too low a mutation rate?
Q: What is the problem with having too high a mutation rate?
Q: When using creep mutation of real valued genes, why a mutation amount from a Gaussian distribution better than selecting from a flat distribution?
Q: Why when mutating a bitstring is it better to mutate each bit with p=1/N, rather than selecting one bit at random to flip?
Q: What exactly is the disadvantage of “wrapping” gene values to keep them within bounds?
The fitness landscape is a useful concept that can make it easier to think about evolutionary algorithms…
To understand the concept of a fitness landscape, let’s first look at the concept of a “search space”
Searching parameter space
We can think about evolution as a search process. In this way of thinking evolution is searching for a set
of parameters (gene values) that maximise fitness.
We can thus conceive of the “search space” or “parameter space” as the set of all parameters.
The mutation operator implies a measurement of distance in this space.
On the right is a diagram of the search space for a genotype with 2 binary genes, X and Y.
Each line represents a single mutation.
Every possible point in the search space has a fitness associated with it.
Usually we don’t know what it is (if we did, we wouldn’t have to run the evolutionary algorithm, we would just pick the system with highest fitness!
For a 2D search space, we can imagine the fitness as another dimension (coming off the screen), and the job of the GA is to climb the tallest hill in this fitness landscape.
This is actually a bit easier to see in a genotype of two continuous genes…
Topics Covered
Discrete-time Dynamical Systems Continuous-time Dynamical Systems
Formalisms
– Difference Equations
– Differential Equations
– Fixed points
– Oscillatory Dynamics
Visualizations
– timeseries
– bifurcation diagrams
– cobweb plots
– Exponential population growth (discrete-time) – Logistic growth (discrete-time)
– Lotka-Volterra predator/prey dynamics
(continuous time)
– Elementary 1D Cellular Automata
Topics Covered
Algorithms
– Simulating trajectories in discrete-time dynamical systems (iterative maps)
– Euler integration
– Root finding (bisection method, Newton’s Method)
– Runge-Kutta Integration
– Comparing the accuracy of RK4 and Euler Integration
– Genetic Algorithm basics
Model Building
– Detail is not always desired
– The role of assumptions in model building
– Not all CompBio is focussed on “big data” or complex models.
– Examples of going from an idea to a model of that idea (carrying capacity)
– Sanity checking models
– Back and forth between models and reality
Topics Covered
Game Theory
– Prisoner’s Dilemma and other games
– Payoff matrices
– Extensions of mathematical games in ways that demand computational
– Programmatic expression of game-playing strategies
– Visualizations
– Monte-carlo (Evolution-inspired) sampling of effective strategies in
– (briefly) Evolutionarily Stable Strategies
– Spatially Distributed Models (Games)
Test & Exam
I try to ask questions that are conceptual rather than requiring rote knowledge.
You should be able to explain (in English) how the algorithms work and be able to reconstruct them in pseudocode.
You should be able to answer the kinds of questions I specified in the ‘homework’.
You should be able to recognise indicators of the different kinds of limit sets we discussed.
You should be able to answer questions like those asked in the lab exercises and assessments.
Rote knowledge (e.g. the weighting of the k terms in the RK4 algorithm) are of low importance rather than–say–the general idea behind how the algorithm works and
BIOSCI 702
For those of you that are considering staying on for Honours, and find these topics interesting, you may be interested in BIOSCI 702, which looks at other methods for modelling biological systems.
– Dynamical Modelling
– Agent Based Modelling
– Molecular Modelling. (How do biological
molecules such as proteins fold into useful, functional shapes?)
If you are interested, email me to learn more 🙂
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com