The ECJ Owner’s Manual
A User Manual for the ECJ Evolutionary Computation Library
Sean Luke
Department of Computer Science George Mason University
Manual Version 23
June 15, 2015
Where to Obtain ECJ
http://cs.gmu.edu/∼eclab/projects/ecj/
Copyright 2010–2015 by Sean Luke. Thanks to Carlotta Domeniconi.
Get the latest version of this document or suggest improvements here:
http://cs.gmu.edu/∼eclab/projects/ecj/
This document is licensed under the Creative Commons Attribution-No Derivative Works 3.0 United States License, except for those portions of the work licensed differently as described in the next section. To view a copy of this license, visit http://creativecommons.org/licenses/by-nd/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. A quick license summary:
• Youarefreetoredistributethisdocument.
• Youmaynotmodify,transform,translate,orbuilduponthedocumentexceptforpersonaluse. • Youmustmaintaintheauthor’sattributionwiththedocumentatalltimes.
• Youmaynotusetheattributiontoimplythattheauthorendorsesyouoryourdocumentuse.
This summary is just informational: if there is any conflict in interpretation between the summary and the actual license, the actual license always takes precedence.
0
Contents
1 Introduction 7
1.1 AboutECJ………………………………………… 7
1.2 Overview…………………………………………. 9
1.3 UnpackingECJandUsingtheTutorials …………………………. 15
1.3.1 TheecDirectory,theCLASSPATH,andjarfiles………………….. 15 1.3.1.1 Theec/displayDirectory:ECJ’sGUI…………………… 15 1.3.1.2 Theec/appDirectory:DemoApplications ……………….. 15
1.3.2 ThedocsDirectory ………………………………… 16 1.3.2.1 Tutorials…………………………………. 16
2 ec.Evolve and Utility Classes 17
2.1 TheParameterDatabase …………………………………. 18 2.1.1 Inheritance…………………………………….. 19 2.1.2 KindsofParameters ……………………………….. 20 2.1.3 NamespaceHierarchiesandParameterBases…………………… 22 2.1.4 ParameterFilesinJarFiles…………………………….. 24 2.1.5 AccessingParameters ………………………………. 24 2.1.6 DebuggingYourParameters …………………………… 26 2.1.7 BuildingaParameterDatabasefromScratch …………………… 28
2.2 Output………………………………………….. 30
2.2.1 CreatingandWritingtoLogs …………………………… 30
2.2.2 QuietingtheProgram ………………………………. 32
2.2.3 Theec.util.CodeClass……………………………….. 32 2.2.3.1 DecodingtheHardWay ………………………… 33 2.2.3.2 DecodingtheEasyWay…………………………. 34
2.3 Checkpointing………………………………………. 35 2.3.1 ImplementingCheckpointableCode……………………….. 37
2.4 ThreadsandRandomNumberGeneration ……………………….. 38 2.4.1 RandomNumbers ………………………………… 38 2.4.2 SelectingRandomlyfromDistributions ……………………… 41 2.4.3 Thread-LocalStorage……………………………….. 43 2.4.4 MultithreadingSupport ……………………………… 43
2.5 Jobs……………………………………………. 44
2.6 Theec.EvolveTop-level………………………………….. 45
2.7 IntegratingECJwithotherApplicationsorLibraries…………………… 47 2.7.1 ControlbyECJ ………………………………….. 47 2.7.2 ControlbyanotherApplicationorLibrary ……………………. 51
1
3 ec.EvolutionState and the ECJ Evolutionary Process 53
3.1 CommonPatterns…………………………………….. 55 3.1.1 Setup……………………………………….. 55 3.1.2 SingletonsandCliques………………………………. 55 3.1.3 Prototypes…………………………………….. 55 3.1.4 TheFlyweightPattern ………………………………. 56 3.1.5 Groups………………………………………. 56
3.2 Populations, Subpopulations, Species, Individuals, and Fitnesses . . . . . . . . . . . . . . . . 57
3.2.1 MakingLargeNumbersofSubpopulations……………………. 59
3.2.2 HowSpeciesMakeIndividuals………………………….. 60
3.2.3 ReadingandWritingPopulationsandSubpopulations . . . . . . . . . . . . . . . . . . 61
3.2.4 AboutIndividuals ………………………………… 62 3.2.4.1 ImplementinganIndividual………………………. 63
3.2.5 AboutFitnesses………………………………….. 65
3.3 InitializersandFinishers…………………………………. 67 3.3.1 PopulationFilesandSubpopulationFiles…………………….. 69
3.4 EvaluatorsandProblems…………………………………. 69 3.4.1 Problems……………………………………… 71 3.4.2 ImplementingaProblem …………………………….. 71
3.5 Breeders …………………………………………. 72
3.5.1 BreedingPipelinesandBreedingSources …………………….. 74
3.5.2 SelectionMethods…………………………………. 76 3.5.2.1 ImplementingaSimpleSelectionMethod………………… 77 3.5.2.2 StandardClasses…………………………….. 77
3.5.3 BreedingPipelines ………………………………… 80 3.5.3.1 ImplementingaSimpleBreedingPipeline ……………….. 81 3.5.3.2 StandardUtilityPipelines ……………………….. 83
3.5.4 SettingupaPipeline……………………………….. 85 3.5.4.1 AGeneticAlgorithmPipeline ……………………… 85 3.5.4.2 AGeneticProgrammingPipeline ……………………. 87
3.6 Exchangers………………………………………… 88
3.7 Statistics …………………………………………. 88 3.7.1 CreatingaStatisticsChain…………………………….. 91 3.7.2 TabularStatistics …………………………………. 91 3.7.3 QuietingtheStatistics ………………………………. 94 3.7.4 ImplementingaStatisticsObject …………………………. 94
3.8 DebugginganEvolutionaryProcess …………………………… 96
4 Basic Evolutionary Processes 101
4.1 GenerationalEvolution………………………………….. 101 4.1.1 TheGeneticAlgorithm(Theec.simplePackage)………………….. 103 4.1.2 EvolutionStrategies(Theec.esPackage)……………………… 105
4.2 Steady-StateEvolution(Theec.steadystatePackage) …………………… 107 4.2.1 SteadyStateStatistics ………………………………. 112 4.2.2 ProducingMorethanOneIndividualataTime …………………. 112
5 Representations 115
5.1 VectorandListRepresentations(Theec.vectorPackage) . . . . . . . . . . . . . . . . . . . . . . 115 5.1.1 Vectors………………………………………. 116 5.1.1.1 Initialization ………………………………. 117 5.1.1.2 Crossover………………………………… 118 5.1.1.3 Multi-VectorCrossover…………………………. 121
2
5.1.1.4 Mutation ………………………………… 121
5.1.1.5 HeterogeneousVectorIndividuals……………………. 127
5.1.2 Lists ………………………………………..129 5.1.2.1 UtilityMethods …………………………….. 129 5.1.2.2 Initialization ………………………………. 130 5.1.2.3 Crossover………………………………… 130 5.1.2.4 Mutation ………………………………… 131
5.1.3 ArbitraryGenes:ec.vector.Gene………………………….. 132
5.2 GeneticProgramming(Theec.gpPackage)………………………… 134
5.2.1 GPNodes,GPTrees,andGPIndividuals ……………………… 136 5.2.1.1 GPNodes…………………………………137 5.2.1.2 GPTrees…………………………………. 137 5.2.1.3 GPIndividual………………………………. 138 5.2.1.4 GPNodeConstraints…………………………… 138 5.2.1.5 GPTreeConstraints……………………………. 138 5.2.1.6 GPFunctionSet……………………………… 138
5.2.2 BasicSetup ……………………………………. 139 5.2.2.1 DefiningGPNodes……………………………. 140
5.2.3 DefiningtheRepresentation,Problem,andStatistics. . . . . . . . . . . . . . . . . . . . 141 5.2.3.1 GPData …………………………………. 142 5.2.3.2 KozaFitness……………………………….. 143 5.2.3.3 GPProblem……………………………….. 144 5.2.3.4 GPNodeSubclasses …………………………… 145 5.2.3.5 Statistics…………………………………. 147
5.2.4 Initialization……………………………………. 148
5.2.5 Breeding……………………………………… 152
5.2.6 ACompleteExample……………………………….. 159
5.2.7 GPNodesinDepth…………………………………162
5.2.8 GPTreesandGPIndividualsinDepth ………………………. 166 5.2.8.1 Pretty-PrintingTrees ………………………….. 167 5.2.8.2 GPIndividuals ……………………………… 170
5.2.9 EphemeralRandomConstants ………………………….. 170
5.2.10 AutomaticallyDefinedFunctionsandMacros ………………….. 173 5.2.10.1 AboutADFStacks……………………………. 176
5.2.11 StronglyTypedGeneticProgramming………………………. 179 5.2.11.1 InsideGPTypes …………………………….. 184
5.2.12 ParsimonyPressure(Theec.parsimonyPackage) …………………. 185
5.3 GrammaticalEvolution(Theec.gp.gePackage) ……………………… 187
5.3.1 GEIndividuals,GESpecies,andGrammars ……………………. 188 5.3.1.1 StrongTyping ……………………………… 189 5.3.1.2 ADFsandERCs …………………………….. 190
5.3.2 TranslationandEvaluation ……………………………. 190
5.3.3 Printing ……………………………………… 192
5.3.4 InitializationandBreeding ……………………………. 193
5.3.5 DealingwithGP …………………………………. 194
5.3.6 ACompleteExample……………………………….. 194 5.3.6.1 GrammarFiles……………………………… 196
5.3.7 HowParsingisDone……………………………….. 196
5.4 Push(Theec.gp.pushPackage)………………………………. 197 5.4.1 PushandGP …………………………………… 199 5.4.2 DefiningthePushInstructionSet…………………………. 200 5.4.3 CreatingaPushProblem …………………………….. 201
3
5.4.4 BuildingaCustomInstruction ………………………….. 202 5.5 RulesetsandCollections(Theec.rulePackage) ……………………… 203 5.5.1 RuleIndividualsandRuleSpecies…………………………. 204 5.5.2 RuleSetsandRuleSetConstraints…………………………. 204 5.5.3 RulesandRuleConstraints ……………………………. 207 5.5.4 Initialization……………………………………. 209 5.5.5 Mutation……………………………………… 209 5.5.6 Crossover …………………………………….. 211
6 Parallel Processes 213
6.1 DistributedEvaluation(Theec.evalPackage) ………………………. 213 6.1.1 TheMaster…………………………………….. 213 6.1.2 Slaves ………………………………………. 215 6.1.3 OpportunisticEvolution……………………………… 217 6.1.4 AsynchronousEvolution …………………………….. 217 6.1.5 TheMasterProblem………………………………… 219 6.1.6 NoisyDistributedProblems……………………………. 222
6.2 IslandModels(Theec.exchangePackage) ………………………… 223
6.2.1 Islands………………………………………. 223
6.2.2 TheServer…………………………………….. 225
6.2.2.1 Synchronicity………………………………. 226
6.2.3 InternalIslandModels………………………………. 226
6.2.4 TheExchanger ………………………………….. 228
7 Additional Evolutionary Algorithms 231
7.1 Coevolution(Theec.coevolvePackage)………………………….. 231
7.1.1 CoevolutionaryFitness ……………………………… 231
7.1.2 GroupedProblems ………………………………… 232
7.1.3 One-PopulationCompetitiveCoevolution ……………………. 234
7.1.4 Multi-PopulationCoevolution ………………………….. 236
7.1.4.1 ParallelandSequentialCoevolution ………………….. 238
7.1.4.2 MaintainingContext ………………………….. 239
7.1.5 PerformingDistributedEvaluationwithCoevolution . . . . . . . . . . . . . . . . . . . 240
7.2 Spatially Embedded Evolutionary Algorithms (The ec.spatial Package) . . . . . . . . . . . . . 241 7.2.1 ImplementingaSpace ………………………………. 242 7.2.2 SpatialBreeding …………………………………. 243 7.2.3 CoevolutionarySpatialEvaluation………………………… 244
7.3 ParticleSwarmOptimization(Theec.psoPackage)……………………. 245
7.4 DifferentialEvolution(Theec.dePackage)………………………… 249
7.4.1 Evaluation…………………………………….. 249
7.4.2 Breeding……………………………………… 249 7.4.2.1 TheDE/rand/1/binOperator……………………… 251 7.4.2.2 TheDE/best/1/binOperator ……………………… 251 7.4.2.3 TheDE/rand/1/either-orOperator…………………… 252
7.5 Multiobjective Optimization (The ec.multiobjective Package) . . . . . . . . . . . . . . . . . . . 253 7.5.0.4 TheMultiObjectiveFitnessclass …………………….. 253 7.5.0.5 TheMultiObjectiveStatisticsclass ……………………. 255
7.5.1 SelectingwithMultipleObjectives………………………… 256 7.5.1.1 ParetoRanking……………………………… 256 7.5.1.2 Archives ………………………………… 257
7.5.2 NSGA-II(Theec.multiobjective.nsga2Package) ………………….. 257
7.5.3 SPEA2(Theec.multiobjective.spea2Package) …………………… 258
4
7.6 Meta-EvolutionaryAlgorithms ……………………………… 259
7.6.1 TheTwoParameterFiles……………………………… 259
7.6.2 DefiningtheParameters……………………………… 262
7.6.3 StatisticsandMessages ……………………………… 264
7.6.4 PopulationsVersusGenerations …………………………. 265
7.6.5 UsingMeta-EvolutionwithDistributedEvaluation . . . . . . . . . . . . . . . . . . . . 265
7.6.6 Customization ………………………………….. 267
7.7 Resets(Theec.evolvePackage)………………………………. 268
5
6
Chapter 1
Introduction
The purpose of this manual is to describe practically every feature of ECJ, an evolutionary computation toolkit. It’s not a good choice of reading material if your goal is to learn the system from scratch. It’s very terse, boring, and long, and not organized as a tutorial but rather as an encyclopedia. Instead, I refer you to ECJ’s four tutorials and various other documentation that comes with the system. But when you need to know about some particular gizmo that ECJ has available, this manual is where to look.
1.1 About ECJ
ECJ is an evolutionary computation framework written in Java. The system was designed for large, heavy- weight experimental needs and provides tools which provide many popular EC algorithms and conventions of EC algorithms, but with a particular emphasis towards genetic programming. ECJ is free open-source with a BSD-style academic license (AFL 3.0).
ECJ is now well over ten years old and is a mature, stable framework which has (fortunately) exhibited relatively few serious bugs over the years. Its design has readily accommodated many later additions, including multiobjective optimization algorithms, island models, master/slave evaluation facilities, coevo- lution, steady-state and evolution strategies methods, parsimony pressure techniques, and various new individual representations (for example, rule-sets). The system is widely used in the genetic programming community and is reasonably popular in the EC community at large. I myself have used it in over thirty or forty publications.
A toolkit such as this is not for everyone. ECJ was designed for big projects and to provide many facilities, and this comes with a relatively steep learning curve. We provide tutorials and many example applications, but this only partly mitigates ECJ’s imposing nature. Further, while ECJ is extremely “hackable”, the initial development overhead for starting a new project is relatively large. As a result, while I feel ECJ is an excellent tool for many projects, other tools might be more apropos for quick-and-dirty experimental work.
Why ECJ was Made ECJ’s primary inspiration comes from lil-gp [18], to which it owes much. Homage to lil-gp may be found in ECJ’s command-line facility, how it prints out messages, and how it stores statistics. Work on ECJ commenced in Fall 1998 after experiences with lil-gp in evolving simulated soccer robot teams
[6]. This project involved heavily modifying lil-gp to perform parallel evaluations, a simple coevolutionary procedure, multiple threading, and strong typing. Such modifications made it clear that lil-gp could not be further extended without considerable effort, and that it would be worthwhile developing an “industrial- grade” evolutionary computation framework in which GP was one of a number of orthogonal features. I intended ECJ to provide at least ten years of useful life, and I believe it has performed well so far.
7
Recover from Checkpoint
Reinitialize Exchanger, Evaluator
Initializer
Evaluator
Out of time or found the ideal?
Pre-Initialization Statistics
Post-Initialization Statistics Initialize Exchanger, Evaluator
Pre-Evaluation Statistics
Post-Evaluation Statistics
Pre-Pre-Breeding Exchange Statistics
Post-Pre-Breeding Exchange Statistics
Pre-Breeding Statistics
Post-Breeding Statistics
Pre-Post-Breeding Exchange Statistics
Post-Post-Breeding Exchange Statistics
YES
NO
Pre-Finishing Statistics
Finisher
Shut Down Exchanger, Evaluator
Pre-Breeding Exchange
Found the ideal?
NO
Breeding
Post-Breeding Exchange
YES
Figure 1.1
Optionally Checkpoint
Increment Generation
Optional Pre-Checkpoint Statistics
Optional Post-Checkpoint Statistics
Top-Level Loop of ECJ’s SimpleEvolutionState class, used for basic generational EC algorithms. Various sub-operations are shown occurring before or after the primary operations. The full population is revised each iteration.
8
1.2 Overview
ECJ is a general-purpose evolutionary computation framework which attempts to permit as many valid combinations as possible of individual representation and breeding method, fitness and selection procedure, evolutionary algorithm, and parallelism.
Top-level Loop ECJ hangs the entire state of the evolutionary run off of a single instance of a subclass of EvolutionState. This enables ECJ to serialize out the entire state of the system to a checkpoint file and to recover it from the same. The EvolutionState subclass chosen defines the kind of top-level evolutionary loop used in the ECJ process. We provide two such loops: a simple generational loop with optional elitism, and a steady-state loop.
Figure 1.1 shows the top-level loop of the simple generational EvolutionState. The loop iterates between breeding and evaluation, with an optional “exchange” period after each. Statistics hooks are called before and after each period of breeding, evaluation, and exchanging, as well as before and after initialization of the population and “finishing” (cleaning up prior to quitting the program).
Breeding and evaluation are handled by singleton objects known as the Breeder and Evaluator respectively. Likewise, population initialization is handled by an Initializer singleton, and finishing is done by a Finisher. Exchanges after breeding and after evaluation are handled by an Exchanger. The particular versions of these singleton objects are determined by the experimenter, though we provide versions which perform common tasks. For example, we provide a traditional-EA SimpleEvaluator, a steady-state EA SteadyStateEvaluator, a
“single-population coevolution” CompetitiveEvaluator, and a multi-population coevolution MultiPopCoevolu- tionaryEvaluator, among others. There are likewise custom breeders and initializers for different functions. The Exchanger provides an opportunity for other hooks, notably internal and external island models. For ex- ample, post-breeding exchange might allow external immigrants to enter the population, while emmigrants might leave the population during post-evaluation exchange. These singleton operators comprise most of the high-level “verbs” in the ECJ system, as shown in Figure 1.2.
Parameterized Construction ECJ is unusually heavily parameterized: practically every feature of the system is determined at runtime from a parameter. Parameters define the classes of objects, the specific subobjects they hold, and all of their initial runtime values. ECJ does this through a bootstrap class called Evolve, which loads a ParameterDatabase from runtime parameter files at startup. Using this database, Evolve constructs the top-level EvolutionState and tells it to “setup” itself. EvolutionState in turn calls subsidiary classes (such as Evaluator) and tells them to “setup” themselves from the database. This procedure continues down the chain until the entire system is constructed.
State Objects In addition to “verbs”, EvolutionState also holds “nouns” — the state objects representing the things being evolved. Specifically, EvolutionState holds exactly one Population, which contains some N (typically 1) Subpopulations. Multiple Subpopulations permit experiments in coevolution, internal island models, etc. Each Subpopulation holds some number of Individuals and the Species to which the Individuals belong. Species is a flyweight object for Individual: it provides a central repository for things common to many Individuals so they don’t have to each contain them in their own instances.
While running, numerous state objects must be created, destroyed, and recreated. As ECJ only learns the specific classes of these objects from the user-defined parameter file at runtime, it cannot simply construct them using Java’s new operator. Instead such objects are created by constructing a prototype object at startup time, and then using this object to stamp out copies of itself as often as necessary. For example, Species contains a prototypical Individual. When new Individuals must be created for a given Subpopulation, they are copied from the Subpopulation’s Species and then customized. This allows different Subpopulations to use different Individual representations.
In keeping with its philosophy of orthogonality, ECJ defines Fitnesses separate from Individuals (represen- tations), and provides both single-objective and multi-objective Fitness subclasses. In addition to holding a prototypical Individual, Species also hold the prototypical Fitness to be used with that kid of Individual.
9
Parameter Database
Mersenne Twister RNG
Evolve
makes
1
11
makes updates applies
11
prototype
1 0..n
Output
Log
Initializer
Population
EvolutionState
11
Breeder
Breeding Pipeline
Evaluator
Problem
Exchanger
updates
1 1
1
evaluates
n
Finisher
Fitness
Statistics
Figure 1.2
Top-Level operators and utility facilities in EvolutionState, and their relationship to certain state objects.
Breeding A Species holds a prototypical breeding pipeline which is cloned by the Breeder and used per-thread to breed individuals and form the next-generation population. Breeding pipelines are tree structures where a node in the tree filters incoming Individuals from its child nodes and hands them to its parents. The leaf nodes in the tree are SelectionMethods which simply choose Individuals from the old subpopulation and hand them off. There exist SelectionMethods which perform tournament selection, fitness proportional selection, truncation selection, etc. Nonleaf nodes in the tree are BreedingPipelines, many of which copy and modify their received Individuals before handing them to their parent nodes. Some BreedingPipelines are representation-independent: for example, MultiBreedingPipeline asks for Individuals from one of its children at random according to some probability distribution. But most BreedingPipelines act to mutate or cross over Individuals in a representation-dependent way. For example, the GP CrossoverPipeline asks for one Individual of each of its two children, which must be genetic programming Individuals, performs subtree crossover on those Individuals, then hands them to its parent.
A tree-structured breeding pipeline allows for a rich assortment of experimenter-defined selection and breeding proceses. Further, ECJ’s pipeline is copy-forward: BreedingPipelines must ensure that they copy Individuals before modifying them or handing them forward, if they have not been already copied. This guarantees that new Individuals are copies of old ones in the population, and furthermore that multiple pipelines may operate on the same Subpopulation in different threads without the need for locking. ECJ may apply multiple threads to parallelize the breeding process without the use of Java synchronization at all.
Evaluation The Evaluator performs evaluation of a population by passing one or (for coevolutionary evaluation) several Individuals to a Problem subclass which the Evaluator has cloned off of its prototype.
10
Individual
1
1 1
1 1..n
1
1..n
1..n
1
11 11
prototype
prototype
uses
flyweight
0..n
child of
1
prototype
1
1 1
child of
0..n
1
EvolutionState
Population
Subpopulation
Individual
Species
Fitness
prototype
uses
Breeding Pipeline
Figure 1.3
Top-Level data objects used in evolution.
Selection Method
11
tree
int, float float
if
bool
bool bool
tick>
int int
float
bool
int, float int, float
float
and
*
int, float int
6
2.3
bool bool
on- wall
int, float float
ir
int int
20 3
Figure 1.4 A typed genetic programming parse tree.
Evaluation may too be done in multithreaded fashion with no locking, using one Problem per thread. Individuals may also undergo repeated evaluation in coevolutionary Evaluators of different sorts.
In most projects using ECJ, the primary task is to construct an appropriate Problem subclass. The task of the Problem is to assess the fitness of the Individual(s) and set its Fitness accordingly. Problem classes also report if the ideal Individual has been discovered.
Utilities In addition to its ParameterDatabase, ECJ also uses a checkpointable Output convenience facility which maintains various streams, repairing them after checkpoint. Output also provides for message logging, retaining in memory all messages during the run, so that on checkpoint recovery the messages are printed out again as before. Other utilities include population distribution selectors, searching and sorting tools, etc.
The quality of a random number generator is important for a stochastic optimization system. As such, ECJ’s random number generator was the very first class written in the system: it is a Java implementation of the highly respected Mersenne Twister algorithm [12] and is the fastest such implementation available. Since ECJ’s release, the ECJ MersenneTwister and MersenneTwisterFast classes have found their way in a number of unrelated public-domain systems, including the popular NetLogo multiagent simulator [25]. MersenneTwisterFast is also shared in ECJ’s sister software, the MASON multiagent simulation toolkit [8].
Representations and Genetic Programming ECJ allows you to specify any genome representation you like. Standard representation packages in ECJ provide functionality for vectors of all Java data types; arbitrary-length lists; trees; and collections of objects (such as rulesets).
ECJ is perhaps best known for its support of “Koza”-style tree-structured genetic programming repre- sentations. ECJ represents these individuals as forests of parse-trees, each tree equivalent to a single Lisp s-expression. Figure 1.4 shows a parse-tree for a simple robot program, equivalent to the Lisp s-expression (if (and on-wall (tick> 20) (∗ (ir 3) 6) 2.3). In C this might look like (onWall && tick > 20) ? ir(3) * 6 : 2.3.
12
This notionally says “If I’m on the wall and my tick-count is greater than 20, then return the value of my third infrared sensor times six, else return 2.3”. Such parse-trees are typically evaluated by executing their programs in a test environment, and modified via subtree crossover (swapping subtrees among individuals) or various kinds of mutation (replacing a subtree with a randomly-generated one, perhaps).
ECJ allows multiple subtrees for various experimental needs: Automatically Defined Functions (ADFs — a mechanism for evolving subroutine calls [4]), or parallel program execution, or evolving teams of programs. Along with ADFs, ECJ provides built-in support for Automatically Defined Macros (ADMs) [20] and Ephemeral Random Constants (ERCs [3], such as the numbers 20, 3, 6, and 2.3 in Figure 1.4).
Genetic programming trees are constructed out of a “primorial soup” of function templates (such as on-wall or 2.3. Early forms of genetic programming were typeless: though such templates had a predefined arity (number of arguments), any node could be connected to any other. Many genetic programming needs require more constraints than this. For example, the node if might expect a boolean value in its first argument, and integers or floats in the second and third arguments, and return a float when evaluated. Similarly and might take two booleans as arguments and return a boolean, while ∗ would take ints or floats as arguments and return a float.
Such types are often associated with the kinds of data passed from node to node, but they do not have to be. Typing might be used to constrain certain nodes to be evaluated in groups or in a certain order: for example, a function type-block might insist that its first argument be of type foo and its second argument be of type bar to make certain that a foo node be executed before a bar node.
ECJ permits a simple static typing mechanism called set-based typing, which is suitable for many such tasks. In set-based typing, the return type and argument types of each node are each defined to be sets of type symbols (for example, {bool} or {foo, bar, baz}, or {int, float}. The desired return type for the tree’s root is similarly defined. A child node is permitted to fit into the argument slot of a parent node if the child node’s return type and type of the that argument slot in the parent are compatible. We define types to be compatible if their set intersection is nonempty (that is, they share at least one type symbol).
Set-based typing is sufficient for the typing requirements found in many programming languages, including ones with type hierarchies. It allows, among other things, for nodes such as ∗ to accept either integers or floats. However there are considerable restrictions on the power of set-based typing. It’s often useful for the return type of a node to change based on the particular nodes which have plugged into it as arguments. For example, ∗ might be defined as returning a float if at least one of its arguments returns floats, but returning an integer if both of its arguments return integers. if might be similarly defined not to return a particular type, but to simply require that its return type and the second and third argument types must all match. Such “polymorphic” typing is particularly useful in situations such as matrix multiplication, where the operator must place constraints on the width and height of its arguments and the final returned matrix. In this example, it’s also useful to have an infinite number of types (perhaps to represent matrices of varying widths or heights).
ECJ does not support polymorphic typing out of the box simply because it is difficult to implement many if not most common tree modification and generation algorithms using polymorphic typing: instead, set-based typing is offered to handle as many common needs as can be easily done.
Out of the Box Capabilities ECJ provides support out-of-the-box for a bunch of algorithm options:
• Generational algorithms: (μ, λ) and (μ + λ) Evolution Strategies, the Genetic Algorithm, Genetic
Programming variants, Grammatical Evolution, PushGP, and Differential Evolution
• Steady-State evolution
• Parsimony pressure algorithms
• Spatially-embeded evolutionary algorithms
• Random restarts
• Multiobjective optimization, including the NSGA-II and SPEA2 algorithms.
13
• Cooperative, 1-Population Competitive, and 2-Population Competitive coevolution.
• Multithreaded evaluation and breeding.
• Parallel synchronous and asynchronous Island Models spread over a grid of computers.
• Internal synchronous Island Models internally in a single ECJ process.
• Massive parallel generational fitness evaluation of individuals on remote slave machines.
• Asynchronous Evolution, a version of steady-state evolution with massive parallel fitness evaluation on remote slave machines.
• Opportunistic Evolution, where remote slave machines run their own mini-evolutionary processes for a while before sending individuals back to the master process.
• Internal synchronous Island Models internally in a single ECJ process.
• Meta-Evolution
• A large number of selection and breeding operators
ECJ also has a GUI, though in truth I nearly universally use the command-line.
Idiosyncracies ECJ was developed near the introduction of Java and so has a lot of historical idiosyncra- cies.1 Some of them exist to this day because of conservatism: refactoring is disruptive. If you code with ECJ, you’ll definitely have to get used to one or more of the following:
• No generics at all, few iterators or enumerators, no Java features beyond 1.4 (including annotations), and little use of the Java Collections library. This is part historical, and part my own dislike of Java’s byzantine generics implementation, but it’s mostly efficiency. Generics are very slow when used with basic data types, as they require boxing and unboxing. The Java Collections library is unusually badly written in many places internally: and anyway, for speed we tend to work directly with arrays.
• Hand-rolled socket code. With one exception (optional compression), ECJ’s parallel facility doesn’t rely on other libraries.
• ECJ loads nearly every object from its parameter database. This means that you’ll rarely see the new keyword in ECJ, nor any constructors. Instead ECJ’s usual “constructor” method is a method called setup(…), which sets up an object from the database.
• A proprietary logging facility. ECJ was developed before the existence of java.util.logging. Partly out of conservatism, I am hesitant to rip up all the pervasive logging just to use Sun’s implementation (which isn’t very good anyway).
• A parameter database derived from Java’s old java.util.Properties list rather than XML. This is historical of course. But seriously, do I need a justification to avoid XML?
• Mersenne Twister random number generator. java.lang.Random is grotesquely bad, and systems which use it should be shunned.
• A Makefile. ECJ was developed before Ant and I’ve personally never needed it. 1 It used to have a lot more — I’ve been weeding out ones that I think are unnecessary nowadays!
14
1.3 Unpacking ECJ and Using the Tutorials
ECJ comes as a single tarball, ecj.tar.gz, or as a ZIP file, ecj.zip. After unpacking this, you’re left with one directory called ecj.
In the ecj directory you’ll find several items:
• A top-level README file, which should be self-explanatory in its importance.
• ECJ’s LICENSE file, which describes the primary license (AFL 3.0, a BSD-style academic license).
• A CHANGES log, which lists all past changes to all versions (including the latest).
• A Makefile. ECJ does not use Ant, and in fact you can compile ECJ very straightforwardly by simply compiling all the java files in the ec directory. But we provide a helpful Makefile which will compile ECJ and do various other useful tasks.
• The docs directory. This contains most of the ECJ documentation.
• The start directory. This contains various scripts for starting up ECJ: though in truth we rarely use
them.
• The ec directory. This contains ECJ proper. ec is the top-level package for ECJ.
1.3.1 The ec Directory, the CLASSPATH, and jar files
The ec directory is ECJ’s top-level package. Every subdirectory is a subpackage, and most of them are headed by helpful README files which describe the contents of the directory. Most packages contain not only Java files and class files but also parameter files and occasional data files: ECJ was designed originally for the class files to be compiled and stored right alongside the Java files in these directories, though it can be used with the separate-build-area approach taken by IDEs like Eclipse.
Because ec is the top-level package, you can compile ECJ, more or less, by just sticking its parent directory (the ecj directory), in your CLASSPATH. You will also need to add certain jar files in order to compile ECJ’s distributed evaluation and island model facilities, and its GUI. You can get these jar files from the ECJ main website (http://cs.gmu.edu/∼eclab/projects/ecj/). Note that none of these libraries is required. For example, if the libraries for the distributed evaluator and island model are missing, ECJ will compile but will complain if you try to run those packages with compression turned on (a feature of the packages). The GUI library is optional to ECJ, so if you don’t install its libraries, you can still compile ECJ by just deleting the ec/display directory.
1.3.1.1 The ec/display Directory: ECJ’s GUI
This directory contains ECJ’s GUI. It’s in a state of disrepair and I suggest you do not use it. ECJ is really best as a command line program. In fact, as mentioned above, you can simply delete the directory and ECJ will compile just fine.
1.3.1.2 The ec/app Directory: Demo Applications
This directory contains all the demo applications. We have quite a number of demo applications, many
sharing the same subdirectories. Read the provided README file for some guidance.
15
1.3.2 The docs Directory
This directory contains all top-level documentation of ECJ except for the various README files scattered throughout the package. The index.html file provides the top-level entry point to the documentation.
The documentation includes:
• Introduction to parameters in ECJ
• Class documentation
• ECJ’s four tutorials and post-tutorial discussion. The actual tutorial code is located in the ec/app directory.
• An (old) overview of ECJ
• An (old) discussion of ECJ’s warts
• Some (old) graph diagrams of ECJ’s structure
• This manual
1.3.2.1 Tutorials
ECJ has four tutorials which introduce you to the basics of coding on the system. I strongly suggest you go
through them before continuing through the rest of this manual. They are roughly:
1. A simple GA to solve the MaxOnes problem with a boolean representation.
2. A GA to solve an integer problem, with a custom mutation pipeline.
3. An evolution strategy to solve a floating-point problem, with a custom statistics object and reading and writing populations.
4. A genetic programming problem, plus some elitism.
As should be obvious from the rest of this manual, this barely scratches the surface of ECJ. No mention is given of parallelism, differential evolution, coevolution, multiobjective optimization, list and ruleset representations, grammatical encoding, spatial embedding, etc. But it’ll get you up to speed.
16
Chapter 2
ec.Evolve and Utility Classes
ECJ is big. Let us begin.
ECJ’s entry point is the class ec.Evolve. This class is little more than bootstrapping code to set up the ECJ system, construct basic datatypes, and get things going.
To run an ECJ process, you fire up ec.Evolve with certain runtime arguments.
java ec.Evolve -file myParameterFile.params -p param=value -p param=value (etc.)
ECJ sets itself up entirely using a parameter file. To this you can add additional command-line parame- ters which override those found in the parameter file. More on the parameter file will be discussed starting in Section 2.1.
For example, if you were presently in the ecj directory, you could do this: java ec.Evolve -file ec/app/ecsuite/ecsuite.params
This all assumes that the parameter file is a free-standing file in your filesystem. But it might not be: you might want to start up from a parameter file stored within a Jar file (for example if your ECJ library is bundled up into a Jar file like ecj.jar). To do this you can specify the parameter file as a file resource relative to the .class file of a class (a-la Java’s Class.getResource(…) method):
java ec.Evolve -from myParameterFile.params -at relative.to.Classname -p param=value (etc.) … for example:
java ec.Evolve -from ecsuite.params -at ec.app.ecsuite.ECSuite
You can also say:
java ec.Evolve -from myParameterFile.params -p param=value (etc.)
In which case ECJ will assume that the class is ec.Evolve. In this situation, you’d probably need to specify
the parameter file as a path away from ec.Evolve (which is in the ec directory), for example: java ec.Evolve -from app/ecsuite/ecsuite.params
(Note the missing ec/…). See Section 2.1 for more discussion about all this. 17
ECJ can also restart from a checkpoint file it created in a previous run:
java ec.Evolve -checkpoint myCheckpointFile.gz Checkpointing will be discussed in Section 2.3.
Last but not least, if you forget this stuff, you can always type this to get some reminders:
java ec.Evolve -help
The purpose of ec.Evolve is to construct an ec.EvolutionState instance, or load one from a checkpoint file; then get it running; and finally clean up. The ec.EvolutionState class actually performs the evolutionary process. Most of the stuff ec.EvolutionState holds is associated with evolutionary algorithms or other stochastic optimization procedures. However there are certain important utility objects or data which are created by ec.Evolve prior to creating the ec.EvolutionState, and are then stored into ec.EvolutionState after it has been constructed. These objects are:
• The Parameter Database, which holds all the parameters ec.EvolutionState uses to build and run the process.
• The Output, which handles logging and writing to files.
• The Checkpointing Facility to create checkpoint files as the process continues.
• The Number of Threads to use, and the Random Number Generators, one per thread.
• A simple declaration of the Number of Jobs to run in the process.
The remainder Section 2 discusses each of these items. It’s not the most exciting of topics: but it’s important in order to understand the rest of the ECJ process.
2.1 The Parameter Database
To build and run an experiment in ECJ, you typically write three things:
• (In Java) A problem which evaluates individuals and assigns fitness values to them.
• (In Java) Depending on the kind of experiment, various components from which individuals can be constructed — for example, for a genetic programming experiment, you’ll need to define the kinds of nodes which can be used to make up the individual’s tree.
• (In one or more Parameter Files) Various parameters which define the kind of algorithm you are using, the nature of the experiment, and the makeup of your populations and processes.
Let’s begin with the third item. Parameters are the lifeblood of ECJ: practically everything in the system is defined by them. This makes ECJ highly flexible; but it also adds complexity to the system.
ECJ loads parameter files and stores them into the ec.util.ParameterDatabase object, which is available to nearly everything. Parameter files are an extension of the files used by Java’s old java.util.PropertyList object. Parameter files usually end in “.params”, and contain parameters one to a line. Parameter files may also contain blank (all whitespace) lines, which are ignored, and also lines which start with “#”, which are considered comments and also ignored. An example comment:
# This is a comment
The parameter lines in a parameter file typically look like this: parameter.name = parameter value
18
A parameter name is a string of non-whitespace characters except for “=”. After this comes some optional whitespace, then an “=”, then some more optional whitespace.1 A parameter value is a string of characters, including whitespace, except that all whitespace is trimmed from the front and end of the string. Notice the use of a period the parameter name. It’s quite a common convention to use periods in various parameter names in ECJ. We’ll get to why in a second.
Here are some legal parameter lines:
generations = 400
pop.subpop.0.size =1000 pop.subpop=
Here are some illegal parameter lines:
generations
= 1000
pop subpop = ec.Subpopulation
2.1.1 Inheritance
ec.Subpopulation
Parameter files may be set up to derive from one or more other parameter files. Let’s say you have two parameter files, a.params and b.params. Both are located in the same directory. You can set up a.params to derive from b.params by adding the following line as the very first line in the a.params file:
parent.0 = b.params
This says, in effect: “include in me all the parameters found in the b.params file, but any parameters I myself declare will override any parameters of the same name in the b.params file.” Note that b.params may itself derive from some other file (say, c.params). In this case, a.params receives parameters from both (and parameters in b.params will likewise override ones of the same name in c.params).
Let’s say that b.params is located inside a subdirectory called foo. Then the line will look like this: parent.0 = foo/b.params
Notice the forward slash: ECJ was designed on UNIX systems. Likewise, imagine if b.params was stored in a sibling directory called bar: then we might say:
parent.0 = ../bar/b.params
You can also define absolute paths, UNIX-style:
parent.0 = /tmp/myproject/foo.params
Long story short: parameter files are declared using traditional UNIX path syntax.
A parameter file can also derive from multiple parent parameter files, by including each at the beginning of the file, with consecutive numbers, like this:
parent.0 = b.params
parent.1 = yo/d.params
parent.2 = ../z.params
This says in effect: “first look in a.params for the parameter. If you can’t find it there, look in b.params and, ultimately, all the files b.params derives from. If you can’t find it in any of them, look in d.params and all the
1Actually, you can omit the “=”, but it’s considered bad style.
19
files it derives from. If you can’t find it in any of them, look in z.params and all the files it derives from. If you’ve still not found the parameter, give up.”
This is essentially a depth-first search through a tree or DAG, with parents overriding their children (the files they derive from) and earlier siblings overriding later siblings. Note that this multiple inheritance scheme is not the same as C++ or Lisp/CLOS, which use a distance measure!
Parent parameter files can be explicit files on your file system (as shown above) or they can be files located in JAR files etc. But how do you refer to a file inside a JAR file? It’s easy: refer to it using a class relative path (see the next Section, 2.1.2), which defines the path relative to the class file of some class. For example, suppose you’re creating a parameter file whose parent is ec/app/ant/ant.params. But you’re not using ECJ in its unpacked form, but rather bundled up into a JAR file. Thus ec/app/ant/ant.params is archived in that JAR file. Since this file is right next to ec/app/ant/Ant.class — the class file for the ec.app.ant.Ant class– you can refer to it as:
parent.0 = @ec.app.ant.Ant ant.params
If your parameter file is already in a JAR file, and it uses ordinary relative path names to refer to its parents (like ../z.params), these will be interpreted as other files in the archived file system inside that JAR file. To escape the JAR file you have to use an absolute path name, such as
parent.0 = /tmp/foo.params
It’s pretty rare to need that though, and hardly good style. The whole point of JAR files is to encapsulate functionality into one package.
Overriding the Parameter File When you fire up ECJ, you point it at a single parameter file, and you can provide additional parameters at the command-line, like this:
java ec.Evolve -file parameterFile.params -p command-line-parameter=value \ -p command-line-parameter=value …
Furthermore, your program itself can submit parameters to the parameter database, though it’s very unusual to do so. When a parameter is requested from the parameter database, here’s how it’s looked up:
1. If the parameter was declared by the program itself, this value is returned.
2. Else if the parameter was provided on the command line, this value is returned.
3. Elsetheparameterislookedupintheprovidedparameterfileandallderivedfilesusingtheinheritance ordering described earlier.
4. Else the database signals failure.
2.1.2 Kinds of Parameters
ECJ supports the following kinds of parameters:
• Numbers. Either long integers or double floating-point values. Examples:
generations = 500
tournament.size = 3.25
minimum-fitness = -23.45e15
• Arbitrary Strings trimmed of whitespace. Example: crossover-type = two-point
20
• Booleans. Any value except for “false” (case-insensitive) is considered to be true. It’s best style to use lower-case “true” and “false”. The first two of these examples are false and the second two are true:
print-params = false
die-a-painful-death = fAlSe
pop.subpop.0.perform-injections = true
quit-on-run-complete = whatever
• Class Names. Class names are defined as the full class name of the class, including the package. Example:
pop.subpop.0.species = ec.gp.GPSpecies
• File or Resource Path Names. Paths can be of four types.
– Absolute paths, which (in UNIX) begin with a “/”, stipulate a precise location in the file system.
– Relative paths, which do not begin with a “/”, are defined relative to the parameter file in which the parameter was located. If the parameter file was an actual file in the filesystem, the relative path will also be considered to point to a file. If the parameter file was in a jar file, then the relative path will be considered to point to a resource inside the same jar file relative to the parameter file location. You’ve seen relative paths already used for derived parameter files.
– Execution relative paths are defined relative to the directory in which the ECJ process was launched. Execution relative paths look exactly like relative paths except that they begin with the special character “$”.
– Class relative paths define a path relative to the class file of a class. They have two parts: the class in question, and then the path to the resource relative to it. If the class is stored in a Jar file, then the path to the resource will also be within that Jar file. Otherwise the path will point to an actual file. Class relative paths begin with “@”, followed by the full class name, then spaces or tabs, then the relative path.
Examples of all four kinds of paths:
stat.file = $out.stat
eval.prob.map-file = ../dungeon.map temporary-output-file = /tmp/output.txt
image = @ec.app.myapp.MyClass images/picture.png
• Arrays. ECJ doesn’t have direct support for loading arrays, but has a convention you should be made aware of. It’s common for arrays to be loaded by first stipulating the number of elements in the array, then stipulating each array element in turn, starting with 0. The parameter used for the number of elements differs from case to case. Note the use of periods prior to each number in the following example:
gp.fs.0.size = 6
gp.fs.0.func.0 = ec.app.ant.func.Left
gp.fs.0.func.1 = ec.app.ant.func.Right
gp.fs.0.func.2 = ec.app.ant.func.Move
gp.fs.0.func.3 = ec.app.ant.func.IfFoodAhead
gp.fs.0.func.4 = ec.app.ant.func.Progn2
gp.fs.0.func.5 = ec.app.ant.func.Progn3
The particulars vary. Here’s another, slightly different, example: 21
2.1.3
exch.num-islands = 8
exch.island.0.id = SurvivorIsland
exch.island.1.id = GilligansIsland
exch.island.2.id = FantasyIsland
exch.island.3.id = TemptationIsland
exch.island.4.id = RhodeIsland
exch.island.5.id = EllisIsland
exch.island.6.id = ConeyIsland
exch.island.7.id = TreasureIsland
Anyway, you get the idea.
Namespace Hierarchies and Parameter Bases
ECJ has lots of parameters, and by convention organizes them in a namespace hierarchy to maintain some sense of order. The delimiter for paths in this hierarchy is — you guessed it — the period.
The vast majority of parameters are used by one Java object or another to set itself up immediately after it has been instantiated for the first time. ECJ has an important convention which uses the namespace hierarchy to do just this: the parameter base. A parameter base is essentially a path (or namespace, what have you) in which an object expects to find all of its parameters. The prefix for this path is typically the parameter name by which the object itself was loaded.
For example, let us consider the process of defining the class to be used for the global population. This class is found in the following parameter:
pop = ec.Population
ECJ looks for this parameter, expects a class (in this case, ec.Population), loads the class, and creates one instance. It then calls a special method (setup(…), we’ll discuss it later) on this class so it can set itself up from various parameters. In this case, ec.Population needs to know how many subpopulations it will have. This is defined by the following parameter:
pop.subpops = 2
ec.Population didn’t know that it was supposed to look in pop.subpops for this value. Instead, it only knew that it needed to look in a parameter called subpops. The rest (in this case, pop) was provided to ec.Population as its parameter base: the text to be prepended — plus a period — to all parameters that ec.Population needed to set itself up. It’s not a coincidence that the parameter base also happened to be the very parameter which defined ec.Population in the first place. This is by convention.
Armed with the fact that it needs to create an array of two subpopulations, ec.Population is ready to load the classes for those two subpopulations. Let’s say that for our experiment we want them to be of different classes. Here they are:
pop.subpop.0 = ec.Subpopulation
pop.subpop.1 = ec.app.myapp.MySpecialSubpopulation
The two classes are loaded and one instance is created of each of them. Then setup(…) is called on each of them. Each subpopulation looks for a parameter called size to tell it how may individuals will be in that subpopulation. Since each of them is provided with a different parameter base, they can have different sizes:
pop.subpop.0.size = 100
pop.subpop.1.size = 512
Likewise, each of these subpopulations needs a “species”. Presuming that the species are different classes, we might have:
22
pop.subpop.0.species = ec.vector.VectorSpecies
pop.subpop.1.species = ec.gp.GPSpecies
These species objects themselves need to be set up, and when they do, their parameter bases will be pop.subpop.0.species and pop.subpop.1.species respectively. And so on.
Now imagine that we have ten subpopulations, all of the same class (ec.Subpopulation), and all but the first one has the exact same size. We’d wind up having to write silly stuff like this:
pop.subpop.0.size = 1000
pop.subpop.1.size = 500
pop.subpop.2.size = 500
pop.subpop.3.size = 500
pop.subpop.4.size = 500
pop.subpop.5.size = 500
pop.subpop.6.size = 500
pop.subpop.7.size = 500
pop.subpop.8.size = 500
pop.subpop.9.size = 500
That’s a lot of typing. Though I am saddened to report that ECJ’s parameter files do require a lot of typing, at least the parameter database facility offers an option to save our fingers somewhat in this case. Specifically, when the ec.Subpopulation class sets itself up each time, it actually looks in not one but two path locations for the size parameter: first it tacks on its current base (as above), and if there’s no parameter at that location, then it tries tacking on a default base defined for its class. In this case, the default base for ec.Subpopulation is the prefix ec.subpop. Armed with this we could simply write:
ec.subpop.size = 500
pop.subpop.0.size = 1000
When ECJ looks for subpopulation 0’s size, it’ll find it as normal (1000). But when it looks for subpopula- tion 1 (etc.), it won’t find a size parameter in the normal location, so it’ll look in the default location, and use what it finds there (500). Only if there’s no parameter to be found in either location will ECJ signal an error.
It’s important to note that if a class is loaded from a default parameter, this doesn’t mean that the default parameter will become its parameter base: rather, the original expected location will continue to be the base. For example, imagine if both of our Species objects were the same class, and we had defined them using the default base. That is, instead of
pop.subpop.0.species = ec.vector.VectorSpecies
pop.subpop.1.species = ec.vector.VectorSpecies
…we simply said
ec.subpop.species = ec.vector.VectorSpecies
When the species for subpopulation 0 is loaded, its parameter base is not going to be ec.subpop.species. Instead, it will still be pop.subpop.0.species. Likewise, the parameter base for the species of subpopulation 1 will still be pop.subpop.1.species.
Keep in mind that all of this is just a convention. You can use periods for whatever you like ultimately. And there exist a few global parameters without any base at all. For example, the number of generations is defined as
generations = 200
…and the seed for the random number generator the fourth thread is 23
seed.3 = 12303421
…even though there is no object set up with the seed parameter, and hence no object has seed as its parameter base. Random number generators are one of the few rare objects in ECJ which are not specified from the parameter file.
2.1.4 Parameter Files in Jar Files
Parameter files don’t have to be just in your file system: they can be bundled up in jar files. If a parameter file is being read from a jar file, its parents will be generally assumed to be from the same jar file as well if they’re relative paths (they don’t start with “’/’” in UNIX).
So how do you point to a parameter file in a jar file to get things rolling? You can run ECJ like this:
java ec.Evolve -from parameterFile.params -at relative.class.Name …
This instructs ECJ to look for the .class file of the class relative.class.Name, be it in the file system or in a Jar file. Once ECJ has found it, it looks for the path parameterFile.params relative to this file. You can omit the classname, which causes ECJ to assume that the class in question is ec.Evolve. For example, to run the Ant demo from ECJ (in a Jar file or unpacked into the file system), you could say:
java ec.Evolve -from app/ant/ant.params
Notice it does not say ec/app/ant/ant.params, which is probably what you’d expect if you used “-file” rather than “-from”. This is because ECJ goes to the ec/Evolve.class file, then from there it searches for the parameter file. The path of the parameter file relative to the ec/Evolve.class file is app/ant/ant.params.
There are similar rules regarding file references (such as parent references) within a parameter file. Let’s say that your parameter file is inside a jar file. If you say something like:
parent.0 = ../path/to/the/parent.params
… then ECJ will look around inside the same Jar file for this file, rather than externally in the operating system’s file system or in some other Jar file.
You can escape this however. For example, once your parameter file is inside a Jar file, you can still define a parent in another Jar file, or in the file system, if you know a another class file it’s located relative to. You just need to specify another class for ECJ to start at, and a path relative to it, like this:
parent.0 = @ec.foo.AnotherClass relative/path/to/the/parent.params
See the next section for more explanation of that text format.
Last but not least, once your parameter file is in a Jar file, you can refer to a parent in the file system if you use an absolute path (that is, one which (in UNIX anyway) starts with “’/’”). For example:
parent.0 = /Users/sean/ecj/ec/parent.params
Absolute path names aren’t very portable and aren’t recommended.
2.1.5 Accessing Parameters
Parameters are looked up in the ec.util.ParameterDatabase class, and parameter names are specified using the ec.Parameter class. The latter is little more than a cover for Java strings. To create the parameter pop.subpop.0.size, we say:
Parameter param = new Parameter(“pop.subpop.0.size”);
24
Of course, usually we don’t want to just make a direct parameter, but rather want to construct one from a parameter base and the remainder. Let’s say our base (pop.subpop.0) is stored in the variable base, and we want to look for size. We do this as:
Parameter param = base.push(“size”);
Here are some common ec.util.ParameterDatabase methods. Note that all of them look in two places to find a parameter value. This is what we use to handle “standard” and “default” bases. Typically you’d pass in the parameter in its standard location, and also (in the “default parameter”) parameter with its default base configuration. You can pass in null for either, and it’ll get ignored.
ec.util.ParameterDatabase Methods
public boolean exists(Parameter parameter, Parameter default)
If either parameter exists in the database, return true. Either parameter may be null.
public String getString(Parameter parameter, Parameter default)
Look first in parameter, then failing that, in default parameter, and return the result as a String, else null if not found. Either parameter may be null.
public File getFile(Parameter parameter, Parameter default)
Look first in parameter, then failing that, in default parameter, and return the result as a File, else null if not found. Either parameter may be null. Important Note. You should generally only use this method if you are writing to a file. Otherwise it’s best if you used getResource(…).
public InputStream getResource(Parameter parameter, Parameter default)
Look first in parameter, then failing that, in default parameter, and open an InputStream to the result, else null if not found. Either parameter may be null. Important Note. This is distinguished from getFile(…) in that the object doesn’t have to be a file in the file system: it can for example be a location in a jar file. If the parameter specifies an absolute path or an execution relative path, then a file in the file system will be opened. If the parameter specifies a relative path, and the parameter database was itself loaded as a file rather than a resource (in a jar file say), then a file will be opened, else a resource will be opened in the same jar file as the parameter file. You can also specify a resource path directly.
public Object getInstanceForParameterEq(Parameter parameter, Parameter default, Class superclass)
Look first in parameter, then failing that, in default parameter, to find a class. The class must have superclass as a superclass, or can be the superclass itself. Instantiate one instance of the class using the default (no-argument) constructor, and return the instance. Throws an ec.util.ParamClassLoadException if no class is found.
public Object getInstanceForParameter(Parameter parameter, Parameter default, Class superclass)
Look first in parameter, then failing that, in default parameter, to find a class. The class must have superclass as a superclass, but may not be superclass itself. Instantiate one instance of the class using the default (no-argument) constructor, and return the instance. Throws an ec.util.ParamClassLoadException if no class is found.
public int getBoolean(Parameter parameter, Parameter default, double defaultValue)
Look first in parameter, then failing that, in default parameter, and return the result as a boolean, else defaultValue if not found or not a boolean. Either parameter may be null.
public int getIntWithDefault(Parameter parameter, Parameter default, int defaultValue)
Look first in parameter, then failing that, in default parameter, and return the result as an int, else defaultValue if not found or not an int. Either parameter may be null.
public int getInt(Parameter parameter, Parameter default, int minValue)
Look first in parameter, then failing that, in default parameter, and return the result as an int, else minValue−1 if not found, not an int, or < minValue. Either parameter may be null.
public int getIntWithMax(Parameter parameter, Parameter default, int minValue, int maxValue)
Look first in parameter, then failing that, in default parameter, and return the result as an int, else minValue−1 if not found, not an int, < minValue, or > maxValue. Either parameter may be null.
25
public long getLongWithDefault(Parameter parameter, Parameter default, long defaultValue)
Look first in parameter, then failing that, in default parameter, and return the result as a long, else defaultValue if not found or not a long. Either parameter may be null.
public long getLong(Parameter parameter, Parameter default, long minValue)
Look first in parameter, then failing that, in default parameter, and return the result as a long, else minValue−1 if not found, not a long, or < minValue. Either parameter may be null.
public long getLongWithMax(Parameter parameter, Parameter default, long minValue, long maxValue)
Look first in parameter, then failing that, in default parameter, and return the result as a long, else minValue−1 if not found, not a long, < minValue, or > maxValue. Either parameter may be null.
public float getFloatWithDefault(Parameter parameter, Parameter default, float defaultValue)
Look first in parameter, then failing that, in default parameter, and return the result as a float, else defaultValue if not found or not a float. Either parameter may be null.
public float getFloat(Parameter parameter, Parameter default, float minValue)
Look first in parameter, then failing that, in default parameter, and return the result as a float, else minValue−1 if not found, not a float, or < minValue. Either parameter may be null.
public float getFloatWithMax(Parameter parameter, Parameter default, float minValue, float maxValue)
Look first in parameter, then failing that, in default parameter, and return the result as a float, else minValue−1 if not found, not a float, < minValue, or > maxValue. Either parameter may be null.
public double getDoubleWithDefault(Parameter parameter, Parameter default, double defaultValue)
Look first in parameter, then failing that, in default parameter, and return the result as a double, else defaultValue if not found or not a double. Either parameter may be null.
public double getDouble(Parameter parameter, Parameter default, double minValue)
Look first in parameter, then failing that, in default parameter, and return the result as a double, else minValue−1 if not found, not a double, or < minValue. Either parameter may be null.
public double getDoubleWithMax(Parameter parameter, Parameter default, double minValue, double maxValue)
Look first in parameter, then failing that, in default parameter, and return the result as a double, else minValue−1 if not found, not a double, < minValue, or > maxValue. Either parameter may be null.
2.1.6 Debugging Your Parameters
Your ECJ experiment is loading and running, but how do you know you didn’t make a mistake in your parameters? How do you know ECJ is using the parameters you stated rather than some default values? If you include the following parameter in your collection:
print-params = true
…then ECJ will print out all the parameters which were used or tested for existence. For example, you might get things like this printed out:
26
!P: pop.subpop.0.file
P: pop.subpop.0.species = ec.gp.GPSpecies
Usually ECJ’s population is an instance of the class ec.Population and its subpopulations are instances of the class ec.Subpopulation. Both of these are Groups. Let’s say that there’s a single subpopulation, which
1Because these two techniques use the subpopulations in different ways, they cannot be used together (a rare situation in ECJ). 57
Fitness
prototype
uses
Breeding Pipeline
0..n
child of
Selection Method
must contain 100 individuals. We can express this as follows:
pop = ec.Population
pop.subpops = 1
pop.subpop.0 = ec.Subpopulation
pop.subpop.0.size = 100
Obviously further subpopulations would be pop.subpop.1, pop.subpop.2, etc. The population is found in an instance variable in the EvolutionState:
public Population population;
The Population is little more than an array of Subpopulations. To get Subpopulation 0, with the Evolu- tionState being state, you’d say:
Subpopulation theSubpop = state.population.subpops[0];
Subpopulations themselves contain arrays of individuals. To get Individual 15 of Subpopulation 0, you’d say:
Individual theIndividual = state.population.subpops[0].individuals[15];
In addition to an array of individuals, each subpopulation contains a species which defines the indi- viduals used to fill the subpopulation, as well as their fitness and the means by which they are modified. Subpopulations also contain some basic parameters for creating initial individuals, though the procedure is largely handled by Species.2 We’ll get to creation and modification later.
Species have an odd relationship to Individuals and to Subpopulations. First recall the Flyweight pattern in Section 3.1.4. Individuals are related to a common Species using the Flyweight pattern: they use Species to store a lot of common information (how to modify themselves, for example). Ordinarily you’d think that the Subpopulation would be a good place for this storage. However different Subpopulations can share the same Species. This allows you to, for example, have one Species guide an entire evolutionary run that might have twenty Subpoulations in it.3 The species of Subpopulation 0 may be found here:
Species theSpecies = state.population.subpops[0].species;
A Species contains three major elements: first, the prototypical Individual for Subpopulations which use that Species. Recall that Individuals are Prototypes and new ones are formed by cloning from a prototypical individual held in reserve. This “queen bee” individual, so to speak, is found here:
Individual theProto = state.population.subpops[0].species.i_prototype;
A Species also contains a prototypical Fitness object. In ECJ fitnesses are separate from individuals. Individuals define the candidate solution, and Fitnesses define how well it has performed. Like Individuals, Fitnesses are also Prototypes. The prototypical Fitness for Subpopulation 0 may be found here:
Fitness theProtoFitness = state.population.subpops[0].species.f_prototype;
The Species class you pick is usually determined by the kind of Individual you pick, that is, by the kind of representation of your solution. You define the class of the Species for Subpopulation 0, and its prototypical Fitness and prototypical Individual, as follows. For example, let’s make Individuals which are arrays of integers, and a simple Fitness common to many evolutionary algorithms:
2You might be asking: if Species are responsible for making individuals, why are Subpopulations involved at all? A very good question indeed.
3Granted, this isn’t very common.
58
pop.subpop.0.species = ec.vector.IntegerVectorSpecies
pop.subpop.0.species.ind = ec.vector.IntegerVectorIndividual
pop.subpop.0.species.fitness = ec.simple.SimpleFitness
By way of explanation, IntegerVectorIndividual, along with various other “integer” vector individuals like LongVectorIndividual, ShortVectorIndividual, and ByteVectorIndividual, requires an IntegerVectorSpecies. And ec.simple.SimpleFitness is widely used for problems such as Genetic Algorithms or Evolution Strategies. The prototypical Individual is never assigned a Fitness (it’s null). But once assembled in a Subpopulation, each Individual has its very own Fitness. To get the Fitness of individual 15 in Subpopulation 0, you’d say:
Fitness theFitness = state.population.subpops[0].individuals[15].fitness;
Last, a Species contains a prototypical Breeding Pipeline to modify individuals. We’ll get to that in Section 3.5.
Since they’re Prototypes, Individuals, Fitnesses, and Species all have default bases. We’ll talk about the different kinds of Individuals, Fitnesses, and Species later, plus various default bases for them.
3.2.1 Making Large Numbers of Subpopulations
Let’s say you’re doing an evolutionary experiment (perhaps coevolution, see Section 7.1) which involves 100 Subpopulations. It’s going to get very tiresome to repeat…
pop = ec.Population
pop.subpops = 100
pop.subpop.0 = ec.Subpopulation
pop.subpop.0.size = 100
pop.subpop.0.species = ec.vector.IntegerVectorSpecies
pop.subpop.0.species.ind = ec.vector.IntegerVectorIndividual
pop.subpop.0.species.fitness = ec.simple.SimpleFitness
….
pop.subpop.1 = ec.Subpopulation
pop.subpop.1.size = 100
pop.subpop.1.species = ec.vector.IntegerVectorSpecies
pop.subpop.1.species.ind = ec.vector.IntegerVectorIndividual
pop.subpop.1.species.fitness = ec.simple.SimpleFitness
….
pop.subpop.2 = ec.Subpopulation
pop.subpop.2.size = 100
pop.subpop.2.species = ec.vector.IntegerVectorSpecies
pop.subpop.2.species.ind = ec.vector.IntegerVectorIndividual
pop.subpop.2.species.fitness = ec.simple.SimpleFitness
….
… and so on some 100 times. Even with the help of ECJ’s default parameters, you’ll still be typing an awful lot. Population has a simple mechanism to make this easier on you: the parameter…
pop.default-subpop = 0
This says that if you do not specify a Subpopulation in parameters, ECJ will assume its parameters are identical for those of Subpopulation 0. Thus you could simply say:
59
pop = ec.Population
pop.subpops = 100
pop.default-subpop = 0
pop.subpop.0 = ec.Subpopulation
pop.subpop.0.size = 100
pop.subpop.0.species = ec.vector.IntegerVectorSpecies
pop.subpop.0.species.ind = ec.vector.IntegerVectorIndividual
pop.subpop.0.species.fitness = ec.simple.SimpleFitness
…
… and be done with it. Note that you can always specify a Subpopulation specially. For example, suppose all of your Subpopulations were exactly like Subpopulation 0 except for Subpopulation 19. You can say:
pop = ec.Population
pop.subpops = 100
pop.default-subpop = 0
pop.subpop.0 = ec.Subpopulation
pop.subpop.0.size = 100
pop.subpop.0.species = ec.vector.IntegerVectorSpecies
pop.subpop.0.species.ind = ec.vector.IntegerVectorIndividual
pop.subpop.0.species.fitness = ec.simple.SimpleFitness
…
pop.subpop.19 = ec.Subpopulation
pop.subpop.19.size = 25
pop.subpop.19.species = ec.vector.FloatVectorSpecies
pop.subpop.19.species.ind = ec.vector.DoubleVectorIndividual
pop.subpop.19.species.fitness = ec.simple.SimpleFitness
…
Note that even though Subpopulation 19 shared the same fitness type as the others, we still had to specify it. It’s an all-or-nothing proposition: either you say nothing about that particular Subpopulation, or you say everything.
3.2.2 How Species Make Individuals
Species have two ways to create new individuals: from scratch, or reading from a stream. To generate an individual from scratch, you can call (in ec.Species):
ec.Species Methods
public Individual newIndividual(EvolutionState state, int thread) Returns a brand new, randomized Individual.
The default implementation of this method simply clones an Individual from the prototype and returns it. Subclasses of Species override this to randomize the Individual in a fashion appropriate to its representation. Another way to create an individual is to read it from a binary or text stream. ec.Species provides two
methods for this:
ec.Species Methods
public Individual newIndividual(EvolutionState state, LineNumberReader reader) throws IOException Produces a new individual read from the stream.
public Individual newIndividual(EvolutionState state, DataInput input) throws IOException Produces a new individual read from the given DataInput.
60
These methods create Individuals by cloning the prototype, then calling the equivalent readIndividual(…) method in ec.Individual. See Section 3.2.4 for more information on those methods.
3.2.3 Reading and Writing Populations and Subpopulations
Populations and Subpopulations have certain predefined methods for reading and writing, which you should know how to use. If you subclass Population or Subpopulation (relatively rare) you may need to reimplement these methods. Population’s methods are:
public void printPopulationForHumans(EvolutionState state, int log);
public void printPopulation(EvolutionState state, int log);
public void printPopulation(EvolutionState state, PrintWriter writer);
public void readPopulation(EvolutionState state, LineNumberReader reader)
public void writePopulation(EvolutionState state, DataOutput output)
public void readPopulation(EvolutionState state, DataInput input)
Subpopulation’s methods are nearly identical:
In Subpopulation:
throws IOException;
throws IOException;
throws IOException;
public void printSubopulationForHumans(EvolutionState state, int log);
public void printSubopulation(EvolutionState state, int log);
public void printSubopulation(EvolutionState state, PrintWriter writer);
public void readSubopulation(EvolutionState state, LineNumberReader reader)
throws IOException;
public void writeSubopulation(EvolutionState state, DataOutput output)
public void readSubopulation(EvolutionState state, DataInput input)
throws IOException;
throws IOException;
These methods employ similar methods in ec.Individual to print out, or read, Individuals. Those methods are discussed next in Section 3.2.4.
The first Population method, printPopulationForHumans(…), prints an entire population to a log in a form pleasing to the human eye. It begins by printing out the number of subpopulations, then prints each Subpopulation index and calls printSubpopulationForHumans(…) on each Subpopulation in turn. printPopula- tionForHumans(…) then prints out the number of individuals, then for each Individual it prints the Individual index, then calls printIndividualForHumans to print the Individual. Overall, it looks along these lines:
61
Number of Subpopulations: 1
Subpopulation Number: 0
Number of Individuals: 1000
Individual Number: 0
Evaluated: T
Fitness: 0.234
-4.97551104730313 -1.7220830524609632 1.7908415218297096
2.3277606156190496 3.5616099573877404 -3.8002895023118617
Individual Number: 1
Evaluated: T
Fitness: 4.91235
3.1033182498148575 -3.613847679151146 -0.562978505270439
-2.860926011046968 1.9007479097991151 -3.051348823625001
…
The next two Population methods, both named printPopulation(…), print an entire population to a log in a form that can be (barely) read by humans but can also be read back in perfectly by ECJ, resulting in identical Populations These operate similarly to printPopulationForHumans(…), except that various data types are emitted using ec.util.Code (Section 2.2.3).
Number of Subpopulations: i1|
Subpopulation Number: i0|
Number of Individuals: i1000|
Individual Number: i0|
Evaluated: F
Fitness: f0|0.0|
i6|d4600627607395240880|0.3861348728170766|d4616510324226321041|4.284844300646584|
d4614576621171274054|3.2836854885228233|d4616394543356495435|4.182010230653371|
Individual Number: i1|
Evaluated: F
Fitness: f0|0.0|
i6|d4603775819114015296|0.6217914592919627|d4612464338011645914|2.345643329183969|
d-4606767824441912859|-4.368233761797886|d4616007477858046134|3.919113503960115|
…
The Population method readPopulation(…, LineNumberReader) can read in this mess to produce a Popula- tion. It in turn does its magic by calling the equivalent method in Subpopulation.
The last two methods, writePopulation(…) and readPopulation(…, DataInput), read and write Populations (or Subpopulations) to binary files.
3.2.4 About Individuals
Individuals have four basic parts: • The Individual’s fitness.
public Fitness fitness;
• The Individual’s species.
public Species species;
62
• Whether the individual has been evaluated and had its Fitness set to a legal value yet.4
public boolean evaluated;
• The representation of the Individual. This could be anything from an array to a tree structure — representations of course vary and are defined by subclasses. We’ll talk about them later.
3.2.4.1 Implementing an Individual
For many purposes you can just use one of the standard “off-the-rack” individuals — vector individuals, genetic programming tree individuals, ruleset individuals — but if you need to implement one yourself, here are some methods you need to be aware of. First off, Individuals are Prototypes and must override the clone() method to deep-clone themselves, including deep-cloning their representation and their Fitness, but not their Species (which is just pointer-copied). Individuals must also implement the setup(…), and defaultBase() methods. Additionally, Individuals have a number of methods which either should or must be overridden. Let’s start with the “must override” ones:
public abstract int hashCode();
public abstract boolean equals(Object individual);
These two standard Java methods enable hashing by value, which allows Subpopulations to remove duplicate Individuals. hashCode() must return a hashcode for an individual based on value of its representa- tion. equals(…) must return true if the Individual is identical to the other object (which in ECJ will always be another Individual).
The next two methods are optional and may not be appropriate depending on your representation:
public long size();
public double distanceTo(Individual other);
size() returns an estimate of the size of the individual. The only hard-and-fast rule is that 0 is the smallest possible size (and the default returned by the method). Size information is largely used by the ec.parsimony package (Section 5.2.12) to apply one of several parsimony pressure techniques.
distanceTo(…) returns an estimate of the distance, in some metric space, of the Individual to some other Individual of the same type. In the future this method may be used for various crowding or niching methods. At present no package uses it, though all vector individuals implement it. The default implementation returns 0 if the other Individual is identical, else Double.POSITIVE INFINITY.
Last come a host of functions whose purpose is to read and write individuals. You’ve seen this pattern before in Section 3.2.3. Some of these are important to implement; others can wait if you’re in a hurry to get your custom Individual up and running.
public void printIndividualForHumans(EvolutionState state, int log);
public void printIndividual(EvolutionState state, int log);
public void printIndividual(EvolutionState state, PrintWriter writer);
public void readIndividual(EvolutionState state, LineNumberReader reader)
public void writeIndividual(EvolutionState state, DataOutput output)
public void readIndividual(EvolutionState state, DataInput input)
throws IOException;
throws IOException;
throws IOException;
These six methods only need to be overridden in certain situations, and in each case there’s another method which is typically overridden instead. Here’s what they do:
4Why isn’t this in the Fitness object? Another excellent question. 63
• printIndividualForHumans(…) prints an individual, whether it’s been evaluated, and its fitness, out a log in a way that’s pleasing and useful for real people to read. Rather than override this method, you probably should instead override this method:
public String genotypeToStringForHumans();
… which should return the representation of the individual in a human-pleasing fashion. Or, since genotypeToStringForHumans() by default just calls toString(), you can just override:
public String toString();
Overriding one or both of these methods is pretty important: otherwise Statistics objects will largely be printing your individuals as gibberish. Here’s a typical output of these methods:
Evaluated: T
Fitness: 0.234
-4.97551104730313 -1.7220830524609632 1.7908415218297096
2.3277606156190496 3.5616099573877404 -3.8002895023118617
• Both printIndividual(…) methods print an individual, and its fitness, out in a way that can be perfectly read back in again with readIndividual(…), but which can also be parsed by humans with some effort. Rather than override this method, you probably should instead override this method:
public String genotypeToString();
This method is important to implement only if you intend to write individuals out to files in such a way that you can load them back in later. If you don’t implement it, toString() will be used, which probably won’t be as helpful. This returns a String which can be parsed in again in the next method. Note that you need to write an individual out so that it can perfectly be read back in again as an identical individual. How do you do this? ECJ’s classes by default all use the aging and idiosyncratic package ec.util.Code package developed long ago for this purpose, but which still works well. See Section 2.2.3.
Here’s a typical output of these methods (note the use of ec.util.Code):
Evaluated: F
Fitness: f0|0.0|
i6|d4600627607395240880|0.3861348728170766|d4616510324226321041|4.284844300646584|
d4614576621171274054|3.2836854885228233|d4616394543356495435|4.182010230653371|
• readIndividual(…, LineNumberReader) reads an individual, and its fitness, in from a LineNumberReader. The stream of text being read is assumed to have been generated by printIndividual(…. Rather than override this method, you probably should instead override this method:
protected void parseGenotype(EvolutionState state, LineNumberReader reader)
throws IOException;
This modifies the existing Individual’s genotype to match the genotype read in from the reader. The genotype will have been written out using printIndividual(…). You only need to override this method if you plan on reading individuals in from files (by default the method just throws an error).
• The last two methods (writeIndividual(…) and readIndividual(…, DataInput)) read and write an individual, including its representation, fitness and evaluated flag, in a purely binary fashion to a stream. Don’t
64
3.2.5
write the Species. It’s probably best instead to override the following methods to just read and write the genotype:
public void writeGenotype(EvolutionState state, DataOutput output)
throws IOException;
public void readGenotype(EvolutionState state, DataInput input)
throws IOException;
These methods are probably only important to implement if you plan on using ECJ’s distributed facilities (distributed evaluator, island models). The default implementations of these methods simply throw exceptions.
About Fitnesses
Fitnesses are separate from Individuals, and various Fitnesses can be used depending on the demands of the evolutionary algorithm. The most common Fitness is ec.simple.SimpleFitness, which represents fitness as a single number from negative infinity to positive infinity, where larger values are “fitter”. Certain selection methods (notably fitness proportionate selection) require that the fitness be non-negative; and ideally between 0 and 1 inclusive.
There are other Fitness objects. For example, there are various multiobjective fitnesses (see Section 7.5), in which the fitness value is not one but some N numbers, and either higher or lower may be better depending on the algorithm. Other Fitnesses, like the one used in genetic programming (Section 5.2), maintain a primary Fitness statistic and certain auxiliary ones.
You probably won’t need to implement a Fitness object. But you may need to use some of the meth- ods below. Fitnesses are Prototypes and so must implement the clone() (as a deep-clone), setup(…), and defaultBase() methods. Fitness has four additional required methods:
public abstract double fitness();
public abstract boolean isIdealFitness();
public abstract boolean equivalentTo(Fitness other);
public abstract boolean betterThan(Fitness other);
The first method, fitness(), should return the fitness cast into a value from negative infinity to positive infinity, where higher values are better. This is used largely for fitness-proportionate and similar selection methods. If there is no appropriate mechanism for this, you’ll need to fake it. For example, multiobjective fitnesses might return the maximum or sum over their various objectives.
The second method, isIdealFitness(), returns true if the fitness in question is the best possible. This is largely used to determine if it’s okay to quit. It’s fine for this method to always return false if you so desire. The third and fourth methods compare against another fitness object, of the same type. The first returns
true if the two Fitnesses are in the same equivalence class: that is, neither is fitter than the other. For simple fitnesses, this is just equality. For multiobjective fitnesses this is Pareto-nondomination of one another. The second method returns true if the Fitness is superior to the one provided in the method. For simple fitnesses, this just means fitter. For multiobjective fitnesses this implies Pareto domination.
Fitnesses also have similar printing facilities to Individuals:5 5Starting to get redundant? Sorry about that.
65
public void printFitnessForHumans(EvolutionState state, int log);
public void printFitness(EvolutionState state, int log);
public void printFitness(EvolutionState state, PrintWriter writer);
public void readFitness(EvolutionState state, LineNumberReader reader)
public void writeFitness(EvolutionState state, DataOutput output)
public void readFitness(EvolutionState state, DataInput input)
throws IOException;
throws IOException;
throws IOException;
As usual: the first method, printFitnessForHumans(…), prints a Fitness in a way pleasing for humans to read. It simply prints out the result of the following method (which you should override instead if you ever need to):
public String fitnessToStringForHumans();
The default implementation of fitnessToString() simply calls: public String toString();
The next two methods, both named printFitness(…), prints a Fitness in a way that can be (barely) read by humans, and can be read by ECJ to produce an identical Fitness to the original. These methods just print out the result of the following method (which you should override instead if you ever need to):
public String fitnessToString();
The default implementation of this method calls toString(), which is almost certainly wrong. But all the standard Fitness subclasses implement it appropriately using the ec.util.Code tools (Section 2.2.3).
The method readFitness(…, LineNumberReader) reads into ECJ a Fitness written by these last two printers. Finally, the last two methods, writeFitness(…) and readFitness(…, DataInput), read and write the Fitness in a binary fashion. The default implementation of these methods throws an error, but all standard subclasses of Fitness implement them properly.
Fitnesses have two auxiliary variables:
public ArrayList trials = null;
public Individual[] context = null;
These variables are used by coevolutionary processes (see Section 7.1) to keep track of the number of trials (in the form of java.lang.Double used to compute the Fitness value, and to maintain the context (other collaborating Individuals) which produced the best result represented by he Fitness. Outside of coevolution they’re presently unused: leave them null and they won’t be printed.
Fitnesses have three hooks which can be used to merge multiple Fitness values into one, if appropriate (for example, this doesn’t make much sense for multiobjective fitnesses). Though this could be used to assemble a Fitness over multiple trials, Coevolution uses the different mechanism above to achieve this which preserves contextual information (see Section 7.1). One method setToMeanOf(…), is unimplemented in the Fitness class proper, though it’s been implemented in common subclasses in ECJ. If you make your own Fitness object you might ultimately want to implement this method if appropriate, but it’s not necessary in most cases. The other two methods call setToMeanOf(…) internally.
ec.util.Fitness Methods
public void setToMeanOf(EvolutionState state, Fitness[] fitnesses)
Sets the fitness to the mean of the provided fitness values. By default this method is unimplemented and generates an error. Common subclasses (like SimpleFitness and KozaFitness) override this method and implement it. Other
66
classes, such as MultiobjectiveFitness and its subclasses,, do not, since there is no notion of a “mean” in that context. You do not have to implement this utility method in most situations.
public void setToMedianOf(EvolutionState state, Fitness[] fitnesses)
Sets the fitness to the median of the provided fitness values. This method calls setToMeanOf(…) in its implementa- tion.
public void setToBestOf(EvolutionState state, Fitness[] fitnesses)
Sets the fitness to the best of the provided fitness values. This method calls setToMeanOf(…) in its implementation.
3.3 Initializers and Finishers
The Initializer is called at the beginning of an evolutionary run to create the initial population. The Finisher is called at the end of a run to clean up. In fact, it’s very rare to use any Finisher other than ec.simple.Finisher, which does nothing at all. So nearly always you’ll have this:
finish = ec.simple.SimpleFinisher
Initializers vary largely based on representation, but not for the reason you think. Initializers generally don’t need to know anything about the representation of an individual in order to construct it. Instead, certain representations require a lot of pieces which need to be in a central repository (they’re Cliques). For example, the genetic programming facility (Section 5.2) has various types, function sets, tree constraints, node constraints, etc. It’s not in ECJ’s style to store these things as static variables because of the difficulty it presents for serialization. Instead ECJ needed a global object to hold them, and Initializers were chosen for that task. It’s probably not been the smartest of decisions: Finishers (which have historically had little purpose) could have been recruited to the job, or some generic type repository perhaps. As it stands, Initializers aren’t an optimal location, but there it is.6
Unless you’re doing genetic programming (ec.gp) or using the ec.rule package, you’ll probably use a ec.simple.SimpleInitializer:
init = ec.simple.SimpleInitializer
ECJ’s generational7 initialization procedure goes like this:
1. The EvolutionState asks the Initializer to build a Population by calling:
population = state.initializer.initialPopulation(state, 0);
The 0 is thread index 0: this portion of the code is single-threaded.
2. The Initializer then creates and sets up a Population by calling the following on itself. It then tells the Population to populate itself with individuals:
Population pop = setupPopulation(state, 0);
pop.populate(state, 0);
Why break this out? Because there are a few EvolutionState subclasses which don’t want to populate the population immediately or at all — they just want to set it up. For example, steady state evolution sets up a Population but may only gradually fill it with initial population members. In this case, the steady state system will just call setupPopulation(…) directly, bypassing initialPopulation(…).
6This makes it problematic to have both a “rule” representation and a genetic programming representation in the same run without a little hacking, since both require their own Initializer. Perhaps this might be remedied in the future.
7ECJ’s Steady State evolution mechanism has a different initialization procedure. See Section 4.2 for more information. 67
3. The Population’s default populate(…) method is usually straightforward: it calls populate(…) in turn on each Subpoulation in the Population’s subpopulation array.
Alternatively, the Population can read an entire population from a file. This is determined by (as usual) a parameter! If the Population should be read in from the file /tmp/population.in, the parameter setting would be:
pop.file = /tmp/population.in
The Population will read Subpopulations, and ultimately Individuals, from this file by calling its readPopulation(…, LineNumberReader) method.
4. If the Population has not read from a file, it will call populate(…) on each of its Subpopulations. A Subpopulation’s populate(…) method usually works like this. First, it determines if it should create new individuals from scratch or if it should fill its array by reading Individuals from a file. If the individuals are to be generated from scratch (the most common case by far), Subpopulation generates new individuals using the standard newIndividual(…) method in ec.Species (see Section 3.2.2). ECJ can also check to make sure that the Subpopulation does not produce duplicate individuals while generating from scratch, if you set the following parameter (in this case, in Subpopulation 0):
pop.subpop.0.duplicate-retries = 100
The default value is no retries. This says that if the Subpopulation creates a duplicate individual, it will try up to 100 times to replace it with a new, original individual. After that it will give up and use the duplicate individual.
You can also read Subpopulations directly from files, in a procedure similar to how it’s done for Population. If Subpopulation 0 should be read in from the file /tmp/subpopulation.in, the parameter setting would be:
pop.subpop.0.file = /tmp/subpopulation.in
Subpopulations will try to read individuals from files using readSubpopulation(…, LineNumberReader). If the number of individuals in the file is greater than the size of the Subpopulation, then the Subpopu- lation will be resized to match the file. If the number of individuals is in the file is less than the size of the Subpopulation, then the Subpopulation will try to do on of three things:
• Truncate the Subpopulation to the size of the file. This the default when reading from a file, but if you want to be explicit, it’s specified like so:
pop.subpop.0.extra-behavior = truncate
• Wrap copies of the file’s individuals repeatedly into the Subpopulation. For example, if the file had individuals A, B, and C, and the Subpopulation was of size 8, then it’d be filled with A, B, C, A B, C, A, B. This is particularly useful if you want to fill a file with copies of a single individual. This is specified like so:
pop.subpop.0.extra-behavior = wrap
• Fill the remainder of the Subpopulation with random individuals (see below). This is specified like so:
pop.subpop.0.extra-behavior = fill
These options aren’t available if you’re reading the whole Population from a file: it always truncates its Subpopulations appropriately. Note that if you’re reading the Population from a file, you can’t simultaneously read one of its Subpopulations from a file — that wouldn’t make any sense.
68
3.3.1 Population Files and Subpopulation Files
If you write out a population using printPopulation(…), the resulting file or print-out typically starts with a declaration of the number of subpopulations, followed by a declaration of a subpopulation number, then the number of individuals in that subpopulation, then the individuals one by one. After this come the declaration of the next subpopulation number, and the number of individuals in that subpopulation, then those individuals. And so on. It looks like this:
Number of Subpopulations: i3|
Subpopulation Number: i0|
Number of Individuals: i1024|
… [the individuals] …
Subpopulation Number: i1|
Number of Individuals: i512|
… [the individuals] …
Subpopulation Number: i2|
Number of Individuals: i2048|
… [the individuals] …
But ECJ doesn’t read in entire populations on initialization. Instead if you want to initialize your population from a file, you do so on a per-subpopulation basis, as in the parameters:
pop.subpop.0.file = myfile.in
A subpopulation file like this usually just has the the number of individuals for the subpopulation, followed by the individuals:
Number of Individuals: i512|
… [the individuals] …
You can typically edit a subpopulation file out of a population file with some judicious typing: the relevant text is between the relevant “subpopulation Number:” lines.
In the example above, there are three subpopulations, because of the line
Number of Subpopulations: i3|
This “i3|” oddity is due to use of ECJ’s Code package (Section 2.2.3). The “i” means “integer”, the “3” is the value, and the “|” is a separator. Likewise subpopulation 0 starts with “i0|”.
3.4 Evaluators and Problems
ECJ evaluates (assesses the fitness of) Individuals in a Population by passing it to an ec.Evaluator. Various evolutionary algorithms and other stochastic search algorithms have their own special kinds of Evaluators. Evaluators perform this fitness assessment by cloning one or more Problems, discussed in the next Section, and asking these Problems to evaluate the individuals on their behalf. Evaluators hold the prototypical Problem here:
public Problem p problem;
This problem is loaded from parameters. For example, to specify that we will use the Artificial Ant Problem to test our genetic programming Individuals, we’d say:
eval.problem = ec.app.ant.Ant
The basic Evaluator is ec.simple.SimpleEvaluator. This class evaluates a Population first by determining how many threads to use. To use four threads (for example), we say:
69
evalthreads = 4
The default value is a single thread.
Recall from Section 2.4 that his will require at least four random number generator seeds, for example:
seed.0 = 1234
seed.1 = -503812
seed.2 = 992341
seed.3 = -16723
When evaluating a Population, ec.simple.SimpleEvaluator will construct N Problems cloned from the Problem prototype, and assign one to each thread. Then, for each Subpopulation, the Evaluator will use these threads to evaluate the individuals in the Subpopulation. By default SimpleEvaluator simply breaks each Subpopulation into N even chunks and assigns each chunk to a different thread and its Problem. This enables the Population to be evaluated in parallel.
The problem with this approach to parallelism is that it’s not fine-grained: and so if some individuals take much longer to evaluate, then some threads will sit around waiting for a thread to finish its chunk. You can fix this by specifying the chunk size, all the way own to chunks of a single individual each. When an individual has finished its chunk, it will request another chunk to work on, and if it has exhausted on all the chunks in a Subpopulation, it’ll grab chunks from the next Subpopulation. For example, the extreme of fine-grained parallelism would be:
eval.chunk-size=1
The disadvantage of a small chunk size is that it involves a lot of locking to get each chunk. This is a small but significant overhead: so we suggest using the default (large automatic chunks) unless your evaluations are costly and of high variance in evaluation time.
Another disadvantage of a nonstandard chunk size is that threads run at different speeds and are no longer asynchronous: as a result, different runs with the same seeds could produce different results if evaluation is stochastic.
Of course, you probably most often don’t do parallelism at all: you’ll just have a single thread (that is, N = 1). In this case you have one further option: to avoid cloning the Problem each time, by setting the following parameter to false:
eval.clone-problem = false
If false, then the same Problem instance (the Prototype, in fact) will be used again and again. Ob- viously, this only is allowed if there’s a single evaluation thread. And steady-state evolution (via ec.simple.SteadyStateEvaluator) does not support it.
The idea of not cloning the population and pipeline is due to Brian Olsen, a GMU PhD Student. Certain Evaluator methods are required. The primary method an Evaluator must implement is
public abstract void evaluatePopulation(EvolutionState state);
This method must take the Population (that is, state.population) and evaluate all the individuals in it in the fashion expected by the stochastic search algorithm being employed. Additionally, an Evaluator must implement the method
public abstract boolean runComplete(EvolutionState state);
… which returns true if the Evaluator believes the process has reached a terminating state. Typically this is done by scanning through the Population and determining if any of the Individuals have ideal fitnesses. If you don’t want to be bothered, it’s fine to have this method always return false.
70
3.4.1 Problems
Evaluators assess the fitness of individuals typically by creating one or more Problems and handing them chunks of Subpopulations to evaluate. There are two ways that an Evaluator can ask a Problem to perform evaluation:
• For each Individual, the Evaluator can call the Problem’s evaluation method. This method varies depending on the kind of Problem. Problems which adhere to ec.simple.SimpleProblemForm — by far the most common situation — use the following method:
public void evaluate(EvolutionState state, Individual ind,
int subpopulation, int threadnum);
When this approach is taken, the Problem must assign a fitness immediately during the evaluate(…) method. In practice, ECJ doesn’t do this all that much.
• The more common approach allows a Problem to perform fitness evaluation in bulk. In this approach, the Evaluator will first call the following method once:
public void prepareToEvaluate(EvolutionState state, int thread);
This signals to the Problem that it must prepare itself to begin evaluating a series of Individuals, and then afterwards assign fitness to all of them. Next the Evaluator calls the Problem’s evaluation method for each Individual, typically using the method evaluate(…) as before. Finally, the Evaluator calls this method:
public void finishEvaluating(EvolutionState state, int thread);
Using this approach, the Problem is permitted to delay assigning fitness to Individuals until finishEvalu- ating(…) is called.
When ECJ is preparing to exit various Statistics objects sometimes construct a Problem in order to re-evaluate the fittest Individual of the run, solely to have such evaluation print out useful information to tell the user how the Individual operates. This special version of evaluation is done with the following ec.simple.SimpleProblemForm method:
public void describe(EvolutionState state, Individual ind, int subpopulation,
int threadnum, int log);
Note that ECJ will not call prepareToEvaluate(…) before describe(…), nor call finishEvaluating(…) after it. When this method is called, the expectation is that the individual will be evaluated for the purpose of writing out interesting descriptive information to the log. For example, a fit Artificial Ant agent might show the map of the trail it produces as it wanders about eating pellets of food. If you prefer you don’t have to implement this method: and in fact many Problems don’t. The default version (in ec.Problem) does nothing at all.
Problem is a Prototype, and so it must implement the clone() (as a deep-clone), setup(…), and defaultBase() methods: although in truth the default base is rarely used. Problem’s “default” default base is problem, which is very rarely used.
3.4.2 Implementing a Problem
Commonly the only method a Problem needs to implement is the evaluate(…) method. For example, let’s imagine that our Individuals are of the class ec.vector.IntegerVectorIndidual, discussed in Section 5.1. The genotype for IntegerVectorIndividual is little more than an array of integers. Let us presume that the fitness of these individuals is defined as the product of their integers.
The example below does five basic things:
71
1. If the individual has already been evaluated, we don’t bother evaluating it again. It’s possible you’d might want to evaluate it anyway (perhaps if you had a dynamically changing fitness function, for example).
2. We do a sanity check: if the individual is of the wrong type, we issue an error.
3. We compute product of the values in the genome.
4. We set the fitness to that product, and test to see if the fitness is optimal (in this case, if it’s equal to Double.POSITIVE INFINITY.
5. We set the individual’s evaluated flag.
The implementation is pretty straightforward:
package ec.app.myapp;
import ec.*;
import ec.simple.*;
import ec.vector.*;
public class MyProblem extends Problem implements SimpleProblemForm
{
public void evaluate(EvolutionState state, Individual ind,
int subpopulation, int thread)
if (ind.evaluated) return;
if (!(ind instanceof IntegerVectorIndividual))
state.output.fatal(“Whoa! It’s not an IntegerVectorIndividual!!!”);
int[] genome = ((IntegerVectorIndividual)ind).genome;
double product = 1.0;
for(int x=0; x
The next method:
public abstract boolean produces(EvolutionState state, Population newpop,
int subpopulation, int thread);
… returns true if the BreedingSource believes it can validly produce Individuals of the type described by the given Species, that is, by newpop.subpops[subpopulation].species. This is basically a sanity check. At the minimum, the BreedingSource should call this method on each of its sources and return false if any of them return false.
Last, we have the hook…
public void preparePipeline(Object hook);
You don’t have to implement this at all. ECJ does not call this method nor implement it in any of its BreedingSources beyond the default implementation (which in BreedingPipeline calls the method in turn on each of its sources). This method simply exists in the case that you need a way to communicate with all the methods of a BreedingPipeline at some unusual time.
3.5.2 SelectionMethods
Selection Methods by default implement the typicalIndsProduced() method to return Selection- Method.INDS PRODUCED (that is, 1).
Furthermore, the default implementation of the produces method,
public abstract boolean produces(EvolutionState state, Population newpop,
int subpopulation, int thread);
…just returns true. But you may wish to use this method to check to make sure that your Selection- Method knows how to work with the kind of Fitnesses found in the given subpopulation, that is, state.population[subpopulation].f prototype.
The default implementations of prepareToProduce(…) and finishProducing(…) do nothing at all; though some kinds of SelectionMethods, such as Fitness Proportionate Selection (ec.select.FitProportionateSelection), use prepareToProduce(…) to prepare probability distributions based on the Subpopulation in order to select properly.
SelectionMethods are sometimes called upon not to produce an Individual but to provide an index into a subpopulation where the Individual is located — perhaps to kill that Individual and place another Individual
76
in its stead. To this end, SelectionMethods have an alternative form of the produce(…) method: public abstract int produce(int subpopulation, EvolutionState state, int thread);
This method must return the index of the selected individual in the Subpopulation given by
state.population.subpops[subpopulation].individuals;
3.5.2.1 Implementing a Simple SelectionMethod
Implementing a SelectionMethod can be as simple as overriding the “alternative” form of the produce(…) method. You don’t have to implement the “standard” form of produce(…) because its default implementation calls the alternative form and handles the rest of the work for you.
To select an individual at random from a Subpopulation, you could simply implement the “alternative” form to return a random number between 0 and the size of the subpopulation in question:
public int produce(int subpopulation, EvolutionState state, int thread)
{
return state.random[thread.next(
state.population.subpops[subpopulation].individuals.length)];
}
You’ll want to always implement the alternative form of produce(…). But in some cases you may wish to also reimplement the “standard” form of produce(…) for some special reason.8 It’s a bit more involved but not too hard. We start by determining how many individuals we’ll produce, defaulting with 1:
public int produce(int min, int max, int start, int subpopulation, Individual[] inds,
EvolutionState state, int thread)
{
int n = 1;
if (n>max) n = max;
if (n
base.size = 2 base.pick-worst = false
By default, pick-worst is false, so the second parameter is redundant here. TournamentSelection’s default base is select.tournament.
• ec.select.BestSelection gathers the best or worst N individuals in the population. It then uses a tourna- ment selection of size T to select, restricted to just those N. The tournament selection procedure works just like ec.select.TournamentSelection. If the N worst individuals were gathered, then the tournament will pick the worst in the tournament.
This could be used in various ways. Continuing the example above, to use a value of T = 2, selecting among the best 15 individuals in the population (say), we could say:
base.n = 15
base.size = 2 base.pick-worst = false
We could also use this to always pick the single worst individual in the population:
base.n = 1
base.size = 1 base.pick-worst = true
Or we could also use this to pick randomly among the best 100 individuals in the population, in a kind of poor-man’s (μ, λ) Evolution Strategy (see Section 4.1.2):
base.n = 100
base.size = 1 base.pick-worst = false
Speaking of Evolution Strategies, you could also do a kind of poor-man’s (μ + λ) as well by including those top 100 individuals as elites:
base.n = 100
base.size = 1 base.pick-worst = false breed.elite.0 = 100
If you don’t like specifying n as a fixed value, you also have the option of specifying it as a fraction of the population:
79
base.n-fraction = 0.1
You can still use this to do a poor-man’s (μ + λ) because elitism can likewise be defined this way:
base.n-fraction = 0.1 base.size = 1 base.pick-worst = false breed.elite.0 = 0.1
• Finally, ec.select.MultiSelection is a special version of a SelectionMethod with N other SelectionMethods as sources. Each time it must produce an individual, it picks one of these SelectionMethods at random (using certain probabilities) and has it produce the Individual instead. To set up MultiSelection with two sources, TournamentSelection (chosen 60% of the time) and FitnessProportionateSelection (chosen 40% of the time), you’d say:
base.num-selects = 2
base.select.0 = ec.select.TournamentSelection base.select.0.prob = 0.60
base.select.1 = ec.select.FitnessProportionateSelection base.select.1.prob = 0.40
MultiSelection’s default base is select.multiselect. 3.5.3 BreedingPipelines
BreedingPipelines (ec.BreedingPipeline) take Individuals from sources, typically modify them in some way, and hand them off. Some BreedingPipelines are mutation or crossover operators; others are more mundane utility pipelines. BreedingPipelines specify the required number of sources they use with the following method:
public abstract int numSources();
This method must return a value >= 0, or it can return the value BreedingPipeline.DYNAMIC SOURCES, which indicates that the BreedingPipeline can vary its number of sources, and that the user must specify the number of sources with the parameter like this:
base.num-sources = 3
Note: if you use BreedingPipeline.DYNAMIC SOURCES, in the BreedingPipeline’s setup(…) method, you probably will want to check that the number of sources specified by the user is acceptable. You can do this by checking the size of sources.length. For example, the user can specify 0 as a number of sources, which for most pipelines will make little sense.
At any rate, the user specifies each source with a parameter. For example, to stipulate sources 0, 1, and 2, you might say:
base.source.0 = ec.select.TournamentSelection base.source.1 = ec.select.TournamentSelection base.source.2 = ec.select.GreedyOverselection
One trick available to you is to state that a source is the same source as a previous one using a special value called same. For example, in the example above two TournamentSelection operators are created. But if you said the following instead:
80
base.source.0 = ec.select.TournamentSelection base.source.1 = same
base.source.2 = ec.select.GreedyOverselection
…then sources 0 and 1 will be the exact same object. At any rate, the sources are then stored in the following instance variable:
public BreedingSource[] sources;
Unlike SelectionMethods, BreedingPipelines guarantee a copy-forward protocol: any Individual pro- duced by a BreedingPipeline will be unique to that thread. The protocol is simple: if a BreedingPipeline requests an Individual from a source, and that source is a SelectionMethod, the BreedingPipeline will copy the Individual and modify and hand off the copy. But if the source is another BreedingPipeline, the BreedingPipeline will not copy the Individual but instead will just modify it directly and hand it off. What’s the point of this? It enables multiple BreedingPipelines, one per thread, to be attached to an old Population and have all of them selecting Individuals out of that Population, modifying them, and generating new Individuals without the need for any locking.
Some BreedingPipelines, like crossover pipelines, have a very specific number of children they produce by default (the value returned by typicalIndsProduced()). However many others (mutation operators, etc.) simply return whatever Individuals they receive from their sources. For these, BreedingPipeline has a default implementation of typicalIndsProduced() which should work fine: it simply calls typicalIndsProduced() on all of its sources, and returns the minimum. This computation is done via a simple utility function, minChildProduction(), one of two such methods which might be useful to you:
public int minChildProduction();
public int maxChildProduction();
BreedingPipeline has default implementations of the produces(…), prepareToProduce(…), finishProduc- ing(…), and preparePipeline(…) methods, all of which call the same methods on the BreedingPipeline’s children.
One final option common to most BreedingPipelines which make modifications (mutation, crossover): you can specify the probability that the pipeline will operate at all, or if Individuals will simply be passed through. For example, let’s say you’re using a crossover pipeline of some sort, which creates two children from its sources, then crosses them over and returns them. If you state:
base.likelihood = 0.8
…then with an 0.8 probability crossover will occur as normal. But with a 0.2 probability two Individuals
from the sources will be simply copied and returned, with no crossover occurring.
3.5.3.1 Implementing a Simple BreedingPipeline
To implement a BreedingPipeline, at a minimum, you’ll need to override two methods: numSources() and produce(…). numSources() is easy. Just return the number of sources your BreedingPipeline requires, or BreedingPipeline.DYNAMIC SOURCES if the number can be any size specified by the user (0 and greater). For example, to make a mutation pipeline, we probably want a single source, which we’ll extract Individuals from and mutate:
public int numSources() { return 1;}
If you have chosen to return BreedingPipeline.DYNAMIC SOURCES, you probably want to double-check that the number of sources the user has specified in his parameter file is valid for your pipeline. You can do this in setup(…). For example, the example below verifies that the value is not zero:
81
public void setup(final EvolutionState state, final Parameter base)
{
super.setup(state,base);
Parameter def = defaultBase();
if (sources.length == 0) // uh oh
state.output.fatal(“num-sources must be > 0 for MyPipeline”,
base.push(P_NUMSOURCES), def.push(P_NUMSOURCES));
… }
Similarly if you have certain unusual constraints on the nature of your sources (that they are certain classes, for example), you can double-check that in setup(…) too.
Now we need to implement the produce(…) method. Most mutation procedures ask their sources to produce some number of Individuals for them (however many the source prefers), and then mutate those Indviduals and return them. We can ask our source to produce sources like this:
public int produce(int min, int max, int start, int subpopulation, Individual[] inds,
EvolutionState state, int thread)
{
int n = sources[0].produce(min,max,start,subpopulation,inds,state,thread);
The source has taken the liberty of filling slots inds[start] … inds[start+n-1] with min ≤ n ≤ max Individuals. Next we need to decide whether to bother mutating at all, based on the likelihood parameter. We’ll use the following utility method to help us:
ec.BreedingPipeline Methods
public int reproduce(int n, int start, int subpopulation, Individual[] inds, EvolutionState state, int thread,
boolean produceChildrenFromSource)
If produceChildrenFromSource is true, extracts n Individuals from Source 0 of the BreedingPipeline, and places them in locations inds[start] … inds[start+n-1], else works with the existing Individuals in those slots. If Source 0 is a SelectionMethod, the individuals in inds[start] … inds[start+n-1] are replaced with clones. Then n is returned.
We do a coin-flip to determine whether or not to use this method to just clone some individuals from the Source or to go ahead and mutate them. If the former, since our Source has already produced the children, we call this method, passing in false for the last argument:
// should we bother mutating at all, or just reproduce?
if (!state.random[thread].nextBoolean(likelihood))
return reproduce(n, start, subpopulation, inds, state, thread, false);
At this point we’re committed to mutating the n Individuals. First we need to clone them if Source 0 (our only Source) was a SelectionMethod, since it doesn’t clone them for us:
// clone the individuals if necessary
if (!(sources[0] instanceof BreedingPipeline))
for(int q=start;q
This is the String array. We continue with:
96
database = Evolve.loadParameterDatabase(args);
BeanShell responds with the ParameterDatabase:
<{} : ({} : ({eval=ec.simple.SimpleEvaluator, pop.subpop.0=ec.Subpopulation,
quit-on-run-complete=true, generations=1000,
pop.subpop.0.species.pipe.source.0=ec.vector.breed.VectorCrossoverPipeline,
pop.subpop.0.species.min-gene=-5.12, eval.problem=ec.app.ecsuite.ECSuite,
state=ec.simple.SimpleEvolutionState, pop.subpop.0.species.mutation-type=gauss,
pop=ec.Population, pop.subpop.0.duplicate-retries=2, select.tournament.size=2,
pop.subpops=1, pop.subpop.0.species.mutation-stdev=0.01,
pop.subpop.0.species.pipe=ec.vector.breed.VectorMutationPipeline,
pop.subpop.0.species.max-gene=5.12, pop.subpop.0.species.pipe.source.0.source.1=same,
pop.subpop.0.species.pipe.source.0.source.0=ec.select.TournamentSelection,
pop.subpop.0.species=ec.vector.FloatVectorSpecies, breed=ec.simple.SimpleBreeder,
pop.subpop.0.species.mutation-prob=1.0, pop.subpop.0.species.genome-size=100,
pop.subpop.0.species.crossover-type=one, finish=ec.simple.SimpleFinisher,
parent.0=../../ec.params, init=ec.simple.SimpleInitializer,
pop.subpop.0.species.ind=ec.vector.DoubleVectorIndividual,
pop.subpop.0.species.fitness=ec.simple.SimpleFitness, pop.subpop.0.size=1000,
eval.problem.type=rastrigin, stat=ec.simple.SimpleStatistics,
exch=ec.simple.SimpleExchanger, stat.file=$out.stat} : ({checkpoint-modulo=1,
evalthreads=1, checkpoint=false, breedthreads=1, checkpoint-prefix=ec, seed.0=time})))>
Now we initialize the EvolutionState from the database:
state = Evolve.initialize(database, 0);
This causes ECJ to start printing to the screen something along these lines:
| ECJ
| An evolutionary computation system (version 19)
| By Sean Luke
| Contributors: L. Panait, G. Balan, S. Paus, Z. Skolicki, R. Kicinger, E. Popovici,
| K. Sullivan, J. Harrison, J. Bassett, R. Hubley, A. Desai, A. Chircop,
| J. Compton, W. Haddon, S. Donnelly, B. Jamil, and J. O’Beirne
| URL: http://cs.gmu.edu/~eclab/projects/ecj/
| Mail: ecj-help@cs.gmu.edu
| (better: join ECJ-INTEREST at URL above)
| Date: July 10, 2009
| Current Java: 1.6.0_20 / Java HotSpot(TM) 64-Bit Server VM-16.3-b01-279
| Required Minimum Java: 1.4
Threads: breed/1 eval/1
Seed: 1853290822
Notice the last line — BeanShell is returning the EvolutionState. Now we fire up the EvolutionState to initialize the first population:
state.startFresh();
… and get back:
97
Setting up
Initializing Generation 0
The first population has been created. Let’s look at the first Individual:
state.population.subpops[0].individuals[0];
This produces:
Not very helpful. Instead, let’s have it printed:
state.population.subpops[0].individuals[0].printIndividualForHumans(state, 0);
This produces something more useful:
Evaluated: F
Fitness: 0.0
-4.3846934361930945 4.051323475292111 2.750742781209575 -2.1599970035296088
3.5139838195638236 -4.326431483145531 -1.5799722524229094 -4.64489169381555
-4.809825694271426 -0.6969239813124668 -4.322411553562226 4.8723307904232565
2.8978088843319947 -4.311437772193992 -1.556903048013028 2.876699531303326
-1.5461627480422133 -3.406470106152458 0.3510231690045371 -1.26870148662141
-2.9943682283832675 -1.1321325429409796 -4.780798908878881 -2.789054768098288
2.7957975471728034 -2.4529277934521363 0.06864524959557006 -2.807030901927618
-3.817734647565329 3.0018199187738803 3.893346256074625 -4.1700250768556355
-3.3035366716916714 -1.5300889532287534 -1.2924365390313826 2.6878356877535623
4.344108056131552 1.0732802812225044 1.804809997034555 0.6627493849916508
1.6556742582736854 -3.8324177646471913 0.2901815515514814 -0.5045301890375606
-2.755111883054377 -4.057309896490254 -2.097059222061862 -2.062611078568839
3.676980437590175 3.4010063830636517 5.001876654997903 2.3637174851440808
-2.3242430228722846 -0.2027490501614988 4.948796285958214 3.645393286308912
-0.9981883696957627 -2.4911201811073296 2.281601570422807 -3.0028177298996583
-0.6949487749058276 2.4115725052273005 2.2705630820859133 3.8198793397976756
3.927188087275849 3.5439728479577974 4.195897069928313 4.064291914283307
-1.6071055662352376 -0.45138576561254506 0.5382601925283925 2.2824947546503687
-0.0837300863613164 -2.4997930740673895 0.06696037058102089 1.782243737261787
-3.390249634178219 4.669336185081783 2.371290190775591 1.8743739255868377
0.13349732700681827 2.808175830805574 -4.2297879656940705 -0.5781599273148448
3.4174595199606577 2.5509508748123793 0.9574470878471297 1.181916131827328
3.3128918249657184 3.5085201808925843 -0.8921840350705308 -4.016933626993176
2.5591127486976983 1.580181276449899 -0.6102226049991097 1.0644092417475743
0.5897983455130262 2.5504671849586904 -2.230897886457403 1.8133759722806326
Now we’re getting somewhere. You could print out all the individuals like this:
for(int i = 0; i < state.population.subpops[0].individuals.length; i++ )
state.population.subpops[0].individuals[i].printIndividualForHumans(state, 0);
... but I won’t torture you with what comes out. Instead, notice that the Individual has not been evaluated yet — just initialized with the Population. We go through one evolutionary loop — evaluating the individuals and breeding a new Population — by calling
state.evolve();
98
This produces:
Subpop 0 best fitness of generation: Fitness: -1493.534
<2>
The first line was printed to the screen. The second line indicates what state.evolve() returned. 2 is the value of EvolutionState.R NOTDONE, indicating that the process needs to be evolved more.
Let’s look at that first individual again:
state.population.subpops[0].individuals[0].printIndividualForHumans(state, 0);
…yielding…
Evaluated: F
Fitness: -1806.1432
-1.5595820609909528 -2.9941135630034292 -3.188550961391961 0.8673223056511647
2.4308132097811472 -3.6298006589453533 -4.62193495641744 1.7381186900517611
-3.2707539202577953 -2.8517369832386144 -4.701099579700639 1.1683479248633841
0.10118833856168477 2.7982137159130787 -1.3673458253800685 -4.548719487000453
-1.7852252742508177 -1.662999422245311 -4.891889992368657 2.0689413066938824
4.64815452362056 4.03620579726471 -2.6065781548997413 2.8384398494616585
-1.6231723965539844 -0.19641152832494305 -0.8025430015631594 -4.337733534634894
-4.259188069209607 0.0974585410674078 4.878006291864429 4.187577755641656
3.9507153065207605 -3.3456633008586922 3.7163666200189596 -0.7581028665673978
4.28299933455259 1.8522464455693997 4.4324032846812935 -1.3209545115697914
4.239911043319335 -2.7741200087506352 -3.181419981396656 -0.4574562816089688
3.9209870275697982 0.31049605413333237 -0.46868091240064014 -4.570530964131764
-0.9126484738704782 3.6348709305820153 -1.800821491837854 -0.8548399118205554
-1.6874962921883667 2.628667604603462 0.060377157894385663 3.194354857448187
1.2106237734207714 -3.477534436566739 1.919326547065771 -3.74880517912247
4.076653684533312 -2.9153006121227034 -2.4460232838375973 0.6128610868842217
0.7785108819209824 -1.213371979065718 3.2441049504290587 -1.352037820951835
1.151316091162472 0.3915293759690397 -0.15229424767569708 1.8192706794904545
-3.057866603248519 -3.2217378304635926 3.7963181147558447 1.9609441782591566
2.1365399986514815 -0.7608502832241196 -1.2202190662246202 -3.2592371482282956
-2.612971504172355 3.1496849987738167 -5.083084415090031 -4.243405086300351
3.8516939433487387 -4.87008846122508 1.012854792831603 3.77728764346906
2.843506550933032 4.705462097924235 1.4291349248648448 3.8398215224809875
1.1776568359195472 -4.784524531392207 2.765230136436807 -2.6521295800350555
-2.271480494878218 -2.018481022639772 -2.2536397207045686 -1.5048357519436404
It’s a different Individual: the next generation one to be exact. Notice that although it has a “fitness”, in fact it’s not been evaluated: the “fitness” is just nonsense cloned from a previous Individual.
So how do you see a population with fitness-evaluated Individuals? Generational ECJ EvolutionState processes assess the fitness of a Population, then create a new Population and throw away the old one. You can hold onto the old Population pretty easily. Just do this:
p = state.population;
state.evolve();
Now p holds the old Population, filled with now-evaluated Individuals, and state.population holds the next-generation Population, which hasn’t been evaluated yet. For example, if you say:
p.subpops[0].individuals[0].printIndividualForHumans(state, 0);
… you will get back something like this:
99
Evaluated: T
Fitness: -1779.4391
-1.5595820609909528 -2.9941135630034292 -3.188550961391961 0.8673223056511647
2.4308132097811472 -3.6298006589453533 -4.62193495641744 1.7381186900517611
-3.2707539202577953 -2.8517369832386144 -4.701099579700639 1.1683479248633841
0.10118833856168477 2.7982137159130787 -1.3673458253800685 -4.548719487000453
-1.7852252742508177 -1.662999422245311 -4.891889992368657 2.0689413066938824
4.64815452362056 4.03620579726471 -2.6065781548997413 2.8384398494616585
-1.6231723965539844 -0.19641152832494305 -0.8025430015631594 -4.337733534634894
-4.259188069209607 0.0974585410674078 4.878006291864429 4.187577755641656
3.9507153065207605 -3.3456633008586922 3.7163666200189596 -0.7581028665673978
4.28299933455259 1.8522464455693997 4.4324032846812935 -1.3209545115697914
4.239911043319335 -2.7741200087506352 -3.181419981396656 -0.4574562816089688
3.9209870275697982 0.31049605413333237 -0.46868091240064014 -4.570530964131764
-0.9126484738704782 3.6348709305820153 -1.800821491837854 -0.8548399118205554
-1.6874962921883667 2.628667604603462 0.060377157894385663 3.194354857448187
1.2106237734207714 -3.477534436566739 1.919326547065771 -3.74880517912247
4.076653684533312 -2.9153006121227034 -2.4460232838375973 0.6128610868842217
0.7785108819209824 -1.213371979065718 3.2441049504290587 -1.352037820951835
1.151316091162472 0.3915293759690397 -0.15229424767569708 1.8192706794904545
-3.057866603248519 -3.2217378304635926 3.7963181147558447 1.9609441782591566
2.1365399986514815 -0.7608502832241196 -1.2202190662246202 -3.2592371482282956
-2.612971504172355 3.1496849987738167 -5.083084415090031 -4.243405086300351
3.8516939433487387 -4.87008846122508 1.012854792831603 3.77728764346906
2.843506550933032 4.705462097924235 1.4291349248648448 3.8398215224809875
1.1776568359195472 -4.784524531392207 2.765230136436807 -2.6521295800350555
-2.271480494878218 -2.018481022639772 -2.2536397207045686 -1.5048357519436404
Notice that this Individual has been evaluated. Its Fitness is valid. For more on debugging ECJ, see Section 2.1.6.
100
Chapter 4
Basic Evolutionary Processes
4.1 Generational Evolution
ECJ is most commonly used for generational evolution: where a whole Population is evaluated, then updated, at a time. There are a number of packages which use generational evolution, but the two most common are the ec.simple package, which does Genetic Algorithm style generational evolution, and the ec.es package which does Evolution Strategies.
Generations and Evaluations Generational Evolution, of course, has generations. The maximum number of generations to run can be selected in one of two ways. First, you could explicitly state the desired maximum number of generations:
generations = 100
This will cause generational evolution to evaluate 1000 generations’ worth of individuals, including the initial generation, which is considered generation 1. This will cause the EvolutionState.numGenerations variable to be set to this value. Alternatively, you can define generations like this:
evaluations = 10000
This will cause the EvolutionState.numEvaluations variable to be set to this value. After the initial Popula- tion has been created, but before it has been evaluated, ECJ will determine the number of generations to run based on this number. It’s done as follows:
1. Let p be the total number of individuals in the initial Population, including all Subpopulations.
2. If evaluations is less than p, it is set to p.
3. Else if p does not divide evenly into evaluations, then evaluations is reduced to the largest smaller value which p divides evenly into.
4. The number of generations is set to evaluations divided by p.
This mechanism allows us to create, in parameters, a trade-off of generations versus population size
which is useful in some later methods, such as Meta-EAs (Section 7.6).
101
Recover from Checkpoint
Reinitialize Exchanger, Evaluator
Initializer
Evaluator
Out of time or found the ideal?
Pre-Initialization Statistics
Post-Initialization Statistics Initialize Exchanger, Evaluator
Pre-Evaluation Statistics
Post-Evaluation Statistics
Pre-Pre-Breeding Exchange Statistics
Post-Pre-Breeding Exchange Statistics
Pre-Breeding Statistics
Post-Breeding Statistics
Pre-Post-Breeding Exchange Statistics
Post-Post-Breeding Exchange Statistics
YES
NO
Pre-Finishing Statistics
Finisher
Shut Down Exchanger, Evaluator
Pre-Breeding Exchange
Found the ideal?
NO
Breeding
Post-Breeding Exchange
YES
Figure 4.1
Optionally Checkpoint
Increment Generation
Optional Pre-Checkpoint Statistics
Optional Post-Checkpoint Statistics
Top-Level Loop of ECJ’s SimpleEvolutionState class, used for basic generational EC algorithms. Various sub-operations are shown occurring before or after the primary operations. The full population is revised each iteration. A repeat of Figure 1.1.
102
4.1.1 The Genetic Algorithm (The ec.simple Package)
We’ve pretty much covered everything in the ec.simple package throughout Section 3. But just a quick
reminder:
• ec.simple.SimpleEvolutionState subclasses ec.EvolutionState to provide the generational top-level loop shown in Figure 4.1. Each generation the entire Population is handed to the Evaluator, then the Breeder. The class adds no new parameters beyond those defined in EvolutionState.
• ec.simple.SimpleBreeder subclasses ec.Breeder to provide multithreaded breeding and elitism. Simple- Breeder was discussed at length in Section 3.5.
• ec.simple.SimpleEvaluator subclasses ec.Evaluator to provide multithreaded evaluation. SimpleEvaluator adds no new parameters beyond those defined in Evaluator, and was discussed at length in Section 3.4.
• ec.simple.SimpleFitness subclasses ec.Fitness to provide a simple fitness consisting of a double floating- point number, where higher fitness values are preferred. SimpleFitness also holds a boolean flag indicating whether the fitness assigned is the optimal fitness. SimpleFitness adds no new parameters beyond those defined in Fitness, and was discussed at length in Section 3.2.5.
• ec.simple.SimpleProblemForm defines the kind of methods which must be implemented by Problems used by a SimpleEvaluator, and was discussed at length in Section 3.4.1. As a reminder, the two methods defined by SimpleProblemForm are evaluate(…), which evaluates an individual and sets its fitness; and describe(…), which evaluates an individual solely for the purpose of writing a detailed description about the individual’s performance out to a stream. They look like this:
public void evaluate(EvolutionState state, Individual ind,
int subpopulation, int threadnum);
public void describe(EvolutionState state, Individual ind,
int subpopulation, int threadnum, int log);
• ec.simple.SimpleInitializer subclasses ec.Initializer to provide multithreaded breeding and elitism. Sim- pleInitializer adds no new parameters beyond those defined in Initializer, and was discussed at length in Section 3.3.
• ec.simple.SimpleFinisher subclasses ec.Finisher and does nothing at all. SimpleFinisher was discussed at length (so to speak) in Section 3.3.
• ec.simple.SimpleExchanger subclasses ec.Exchanger and does nothing at all. SimpleExchanger was mentioned in Section 3.6.
• ec.simple.SimpleStatistics subclasses ec.Statistics and outputs the best-of-generation individual each generation, plus the best-of-run individual at the end. SimpleStatistics was discussed at length in Section 5.2.3.5.
• ec.simple.SimpleShortStatistics subclasses ec.Statistics and gives numerical statistics about the progress of the generation. SimpleStatistics was also discussed at length in Section 5.2.3.5.
• ec.simple.SimpleDefaults implements ec.DefaultsForm and provides the package default parameter base. An Example Let’s put these together to do a simple genetic algorithm. We start with the basic parameters:
103
# Threads and Seeds
evalthreads = 1
breedthreads = 1
seed.0 = time
# Checkpointing
checkpoint = false
checkpoint-modulo = 1
checkpoint-prefix = ec
Next a basic generational setup:
# The basic setup
state = ec.simple.SimpleEvolutionState
init = ec.simple.SimpleInitializer
finish = ec.simple.SimpleFinisher
exch = ec.simple.SimpleExchanger
breed = ec.simple.SimpleBreeder
eval = ec.simple.SimpleEvaluator
stat = ec.simple.SimpleStatistics
pop = ec.Population
# Basic parameters
generations = 200
quit-on-run-complete = true
pop.subpops = 1
pop.subpops.0 = ec.Subpopulation
pop.subpop.0.size = 1000
pop.subpop.0.duplicate-retries = 0
breed.elite.0 = 0
stat.file = $out.stat
We’ll use Individuals of the form ec.vector.IntegerVectorIndividual, discussed later in Section 5.1. This is
not much more than a cover for a one-dimensional array of integers:
# Representation
pop.subpops.0.species = ec.vector.IntegerVectorSpecies
pop.subpop.0.species.ind = ec.vector.IntegerVectorIndividual
pop.subpop.0.species.genome-size = 100
For fitness, we’ll use SimpleFitness:
# Fitness
pop.subpop.0.species.fitness = ec.simple.SimpleFitness
In Section 3.5.4 we laid out a simple Genetic Algorithm Pipeline:
# Pipeline
pop.subpop.0.species.pipe = ec.vector.VectorMutationPipeline
pop.subpop.0.species.pipe.source.0 = ec.vector.VectorCrossoverPipeline
pop.subpop.0.species.pipe.source.0.source.0 = ec.select.TournamentSelection
pop.subpop.0.species.pipe.source.0.source.1 = same
select.tournament.size = 2
Because they are so common, Vector pipelines are unusual in that they define certain probabilities in Species rather than in the Pipeline, mostly for simplicity. We haven’t discussed these yet (we’ll get to them in
104
Section 5.1), but here’s one possibility:
pop.subpop.0.species.crossover-type = one
pop.subpop.0.species.mutation-prob = 0.01
In Section 3.4.2 we defined a simple Problem in which the fitness of an IntegerVectorIndividual was the product of the integers in its genome. Let’s use it here.
package ec.app.myapp;
import ec.*;
import ec.simple.*;
import ec.vector.*;
public class MyProblem extends Problem implements SimpleProblemForm
{
public void evaluate(EvolutionState state, Individual ind,
int subpopulation, int thread)
if (ind.evaluated) return;
if (!(ind instanceof IntegerVectorIndividual))
state.output.fatal(“Whoa! It’s not an IntegerVectorIndividual!!!”);
int[] genome = ((IntegerVectorIndividual)ind).genome;
double product = 1.0;
for(int x=0; x
22 If you don’t like defining μ this way, alternatively you can define μ as a fraction of λ:
es.mu-fraction = 0.25
(μ + λ) This algorithm differs from (μ, λ) in that, after creating the children, the μ parents join the λ children to form the next generation Population. Thus the next generation is μ + λ in size. Again, the initial population can be any size (traditionally I think it’s μ + λ). The MuPlusLambdaBreeder subclasses MuCommaLambdaBreeder and adds no new parameters, though you’d change the Breeder of course:
breed = ec.es.MuPlusLambdaBreeder
(μ + λ) has different maximum values of μ than (μ, λ) does. As a result, in (μ + λ), if μ > λ (the largest possible value of μ) then you will receive a warning and μ will be set to λ.
It’s common in Evolution Strategies to use a mutation-only pipeline. Here’s one:
pop.subpop.0.pipe = ec.vector.VectorMutation
pop.subpop.0.pipe.source.0 = ec.es.ESSelection
Last but not least, ec.es.ESDefaults provides the package default parameter base. Example We build off of the example shown in Section 4.1.1, so let’s use that file:
parent.0 = ga.params
Next, let’s override some parameters to use Evolution Strategies:
breed = ec.es.MuCommaLambdaBreeder
es.mu.0 = 10
es.lambda.0 = 100
pop.subpop.0.pipe = ec.vector.VectorMutation
pop.subpop.0.pipe.source.0 = ec.es.ESSelection
1For fun, note the relationships between these techniques and the options provided in ec.select.BestSelection (Section 3.5.2.2). 106
Evolution Strategies also often uses a floating-point array representation. The Genetic Algorithm example in Section 4.1.1 used an integer array representation. We could change it to an array of Doubles like this:
pop.subpop.0.species = ec.vector.FloatVectorSpecies
pop.subpop.0.species.ind = ec.vector.DoubleVectorIndividual
IntegerVectorIndividual has a simple default mutator: randomizing the integers. This is why the mutation-prob is set low (to 0.01). Since we’re using floating-point values, let’s change the mutation type to gaussian mutation with a standard deviation of 0.01, happening 100% of the time:
pop.subpop.0.species.mutation-type = gauss
pop.subpop.0.species.mutation-stdev = 0.01
pop.subpop.0.species.mutation-prob = 1.0
Since we’re using a different representation, we need to change our Problem a bit:
package ec.app.myapp;
import ec.*;
import ec.simple.*;
import ec.vector.*;
public class MySecondProblem extends Problem implements SimpleProblemForm
{
public void evaluate(EvolutionState state, Individual ind,
int subpopulation, int thread)
if (ind.evaluated) return;
if (!(ind instanceof DoubleVectorIndividual))
state.output.fatal(“Whoa! It’s not an DoubleVectorIndividual!!!”);
double[] genome = ((DoubleVectorIndividual)ind).genome;
double product = 1.0;
for(int x=0; x
}
public boolean equals(Object other) {
return (other != null && other instanceof TrigGene &&
((TrigGene)other).x == x && ((TrigGene)other).y == y);
}
public String printGeneToStringForHumans() { return “>” + x + ” ” + y ; }
public String printGeneToString() {
return “>” + Code.Encode(x) + ” ” + Code.Encode(x);
}
public void readGeneFromString(String string, EvolutionState state) {
string = string.trim().substring(0); // get rid of the “>”
DecodeReturn dr = new DecodeReturn(string);
Code.decode(dr); x = dr.d; // no error checking
Code.decode(dr); y = dr.d;
}
public void writeGene(EvolutionState state, DataOutput out) throws IOException {
out.writeDouble(x); out.writeDouble(y);
}
public void readGene(EvolutionState state, DataOutput in) throws IOException {
x = in.readDouble(); y = in.readDouble();
}
}
5.2
Genetic Programming (The ec.gp Package)
The ec.gp package is far and away the most developed and tested package in ECJ. ECJ was largely developed in order to support this package, and much of our existing literature is based on it.
ECJ’s genetic programming package uses “Koza-style” tree structures [3, 4] which represent the parse trees of Lisp s-expressions. For an introduction to genetic programming, see [16]. Much of ECJ’s approach
134
1
1..n
Individual
GPIndividual
1 1..n
1
1
1
1..n
flyweight
1
Species
flyweight
GPSpecies
1..n
1
GPTree Constraints
GPTree
root type
GPFunctionSet
1..n
1..n 1
0..n
GPType
GPAtomicType
root of
1..n
argument and return types
GPSetType
0..n
GPNode
1..n
flyweight
1
GPNode Constraints
n
child of
prototype
Figure 5.1 Data objects common to tree-based “Koza-style” genetic programming Individuals.
135
GP Tree
sin
+
Root
GP Tree
progn3 Root
move left
right
if-food- ahead
cos
x
–
sqrt
x
left
progn2
x
move
Figure 5.2 Two example genetic programming parse trees. At top is a single ec.gp.GPTree instance, which holds onto a single ec.gp.GPNode designated the root of the tree. GPNodes form the tree itself, and so have a parent and zero or more children. The parent of the root is the GPTree object itself. Leaf nodes, denoted with dotted ovals, are traditionally called terminals, and non-leaf nodes, including the root, are traditionally called nonterminals. Normally GPNodes have fixed arity. That is, all if-food-ahead GPNodes will always have two children, and all cos nodes will always have one child, etc.
to GP is inspired by lil-gp [18], an earlier C-based GP system. However lil-gp and many other GP systems pack the parse trees into arrays to save memory. ECJ does not: the parse trees are stored as tree structures in memory. This is much more wasteful of memory but it is faster to evaluate and far easier to manipulate.
GP’s top-level class is an Individual called ec.gp.GPIndividual. GPIndividual holds an array of GP trees, held by ec.gp.GPTree objects. Each GP tree is a tree of ec.gp.GPNode objects. One GPNode, the root of the tree, is held by the GPTree.
GPIndividual, GPTree, and GPNode are all Prototypes, and furthermore they all adhere to the fly- weight pattern (Section 3.1.4). GPIndividual’s flyweight relationship is with a Species (of course), called ec.gp.GPSpecies. GPTrees have a flyweight relationship with subclasses of ec.gp.GPTreeConstraints. GPNodes have a flyweight relationship with subclasses of ec.gp.GPNodeConstraints.
GP’s tree nodes are typed, meaning that they can have certain constraints which specify which nodes may serve as children of other nodes. These types are defined by an abstract class called ec.gp.GPType, of which there are two concrete subclasses, ec.gp.GPAtomicType and ec.gp.GPSetType.
The primary function of GPSpecies is to build new GPIndividuals properly. The primary function of GPTreeConstraints is to hold onto the function set (ec.gp.GPFunctionSet) for a given tree. This is a set of prototypical GPNodes, copies of which are used to construct the tree in question. GPTreeConstraints also contains typing information for the tree root. The primary purpose of GPNodeConstraints is to provide typing and arity information for various GPNode.
5.2.1 GPNodes, GPTrees, and GPIndividuals
Figure 5.2 shows two example trees of GPNodes (shown as ovals). The top of each tree is a GPTree, and directly under it is the root GPNode. As can be seen from the figure, each GPNode has both a parent and
136
zero or more children; and each GPTree has exactly one child. Both GPNodes and GPTrees implement ec.gp.GPNodeParent, and can serve as parents of other GPNodes (the root has the GPTree as its parent).
5.2.1.1 GPNodes
A basic GPNode consists of four items:
public GPNodeParent parent;
public GPNode children[];
public byte argposition;
public byte constraints;
The parent should be self-explanatory. The children[] is an array holding the children to the GPNode. Leaf nodes in a GP tree (traditionally called terminals) are permitted to either have a zero-length array or a null value for children[].
The argposition is the position of the node in its parent’s children[] array. The root’s argposition is 0. Last, the constraints is a tag which refers to the GPNode’s GPNodeConstraints object. It’s a byte rather than a full pointer to save a bit of space: GPNodes make up by far the bulk of memory in a genetic programming experiment. You can get the GPNodeConstraints by calling the following GPNode method:
public final GPNodeConstraints constraints(GPInitializer initializer);
Why the GPInitializer? Because GPNodeConstraints, GPTreeConstraints, GPTypes, and GPFunctionSets are all accessible via the Initializer, which must be a GPInitializer.3 More on that later. Assuming you have access to the EvolutionState (probably called state) can call this function like this:
GPNodeConstraints constraints = myGPNode.constraints((GPInitializer)(state.initializer));
You will make various subclasses of GPNode to define the kinds of functions which may appear in your genetic programming tree.
5.2.1.2 GPTrees
Unlike GPNode, which is liberally subclassed, you’ll rarely subclass GPTree. The ec.gp.GPTree class holds onto the root GPNode here:
public GPNode child;
Each GPTree also has a backpointer to the GPIndividual which holds it:
public GPIndividual owner;
GPTree also has a pointer to its GPTreeConstraints object. Like GPNode, GPTree uses a byte rather than a full pointer.4
public byte constraints;
Just like GPNode, you can access GPTree’s constraints using this function:
public final GPTreeConstraints constraints(GPInitializer initializer);
…which is typically called like this:
3It wasn’t a good decision to use the Initializer in this fashion, and one day we may change it to something else.
4This is mostly historic: GPTree doesn’t fill nearly as much memory as GPNode and so doesn’t really need this tight reference approach.
137
GPTreeConstraints constraints = myGPTree.constraints((GPInitializer)(state.initializer));
5.2.1.3 GPIndividual
The GPIndividual contains an array of GPTrees. In most cases, this array has a single GPTree in it:
public GPTree[] trees;
5.2.1.4 GPNodeConstraints
The GPNodeConstraints contains several data elements shared by various GPNodes:
public byte constraintNumber;
public GPType returntype;
public GPType[] childtypes;
public String name;
public double probabilityOfSelection;
public GPNode zeroChildren[] = new GPNode[0];
The first element is obvious: it’s number of the constraints which the GPNode objects point to. The next two items hold the return type and children types of the node: more on that later. Specifically, the return type of a child in slot 0 must be compatible with the child type declared for slot 0. For now what matters is that you can determine the expected number of children to a GPNode by the length of the childtypes array.
The name variable holds the name of the GPNodeConstraints (not the GPNodes which refer to them): we’ll define some in the next section. The probabilityOfSelection variable holds an auxiliary variable used by certain tree-building operators. Last, zeroChildren[] holds a blank, zero-length GPNode which terminals are free to use in lieu of null for their children.
5.2.1.5 GPTreeConstraints
The GPTreeConstraints contains data elements shared by GPTrees:
public byte constraintNumber;
public GPType treetype;
public String name;
public GPNodeBuilder init;
public GPFunctionSet functionset;
The first element is again obvious. The treetype variable declares the GPType for the tree as a whole: the return type of the root must be compatible with this type. The name works similarly to the one in GPNodeConstraints.
The last two variables are critical. The init variable holds the algorithm used to generate trees or subtrees for this GPTree. We will discuss tree builders later. Last, the functionset variable holds the function set for this tree: all GPNodes appearing in this GPTree must be cloned from this function set.
5.2.1.6 GPFunctionSet
The GPFunctionSet contains a name (like GPNodeConstraints and GPTreeConstraints) and a set of GPNodes, clones of which may appear in the GPTree. This set is stored in various hash tables and arrays to make lookup easy for different common queries (such as “give me all terminals”) or (“give me all nodes whose return type is foo”). Usually you don’t need to access this class directly: instead, we’ll set up the function set using parameters.
138
5.2.2 Basic Setup
Now let’s work towards setting a GP problem. We begin by defining the GPIndividual and GPSpecies. Usually, we’ll just use those classes directly:
pop.subpop.0.species = ec.gp.GPSpecies
pop.subpop.0.species.ind = ec.gp.GPIndividual
Let’s presume for now that we just want a single tree per GPIndividual. This is the usual case. Typically this class is defined by GPTree unless we’re doing something odd. We say:
pop.subpop.0.species.ind.numtrees = 1
pop.subpop.0.species.ind.tree.0 = ec.gp.GPTree
Different trees can have different GPTreeConstraints objects, or share them. This is done by defining a set of GPTreeConstraints (which is a Clique, Section 3.1.2) and giving each member of the set a unique identifier. Then GPTrees identify with a given GPTreeConstraints by using that identifier.
Since we have only one tree, we really only need to create one GPTreeConstraints. We’ll call it “tc0”.
gp.tc.size = 1
gp.tc.0 = ec.gp.GPTreeConstraints
gp.tc.0.name = tc0
Note that as a Clique, GPTreeConstraints objects all have a global parameter base of gp.tc. Now we assign it to the tree:
pop.subpop.0.species.ind.tree.0.tc = tc0
A GPTreeConstraints object in turn holds onto the GPFunctionSet used to construct trees which identify with it. GPFunctionSet is also a clique. We’ll call the function set “f0”:
gp.fs.size = 1
gp.fs.0 = ec.gp.GPFunctionSet
gp.fs.0.name = f0
Note that as a Clique, GPFunctionSet objects all have a global parameter base of gp.fs. We now assign this function set to our tree constraints:
gp.tc.0.fset = f0
As to types: we’ll discuss typed GP later on. For now we’ll assume that there is a single atomic type which is used universally by everyone — that is, everything can connect with everything (it’s “typeless”). This is the classic GP scenario. GPTypes are also a clique: and they have a global parameter base of gp.type. We define zero GPSetTypes and one GPAtomicType (which we will name, for lack of a better word, “nil”) like this:
gp.type.a.size = 1
gp.type.a.0.name = nil
gp.type.s.size = 0
Our GPTreeConstraints object needs to define the GPType of the tree as a whole (the “root type”). To set it to our nil type, we’d say:
gp.tc.0.returns = nil
This means that the root GPNode of the tree must have its return type compatible with nil. 139
Last, we need to define some GPNodeConstraints. A GPNodeConstraint object describes three things about the GPNodes related to them via the Flyweight pattern:
• The number of children of the GPNode.
• The GPTypes that the children of the GPNode must be consistent with. • The GPType of that the parent of the GPNode must be consistent with.
More on types later. But for now we’ll define a few GPNodeConstraints for nodes with zero, one, and two children. Since we only have one type, the types of all the children and the return type are all going to be nil. We’ll call these GPNodeConstraints nc0, nc1, and nc2.
gp.nc.size = 3
gp.nc.0 = ec.gp.GPNodeConstraints
gp.nc.0.name = nc0
gp.nc.0.returns = nil
gp.nc.0.size = 0
gp.nc.1 = ec.gp.GPNodeConstraints
gp.nc.1.name = nc1
gp.nc.1.returns = nil
gp.nc.1.size = 1
gp.nc.1.child.0 = nil
gp.nc.2 = ec.gp.GPNodeConstraints
gp.nc.2.name = nc2
gp.nc.2.returns = nil
gp.nc.2.size = 2
gp.nc.2.child.0 = nil
gp.nc.2.child.1 = nil
5.2.2.1 Defining GPNodes
Let’s imagine that we’re trying to create trees that consist of the following tree nodes (which we’ll create later): ec.app.myapp.X, ec.app.myapp.Y, ec.app.myapp.Mul, ec.app.myapp.Sub, ec.app.myapp.Sin. These nodes take 0, 0, 2, 2, and 1 children respectively and have no special types (we’ll use nil). We could add them to the function set like this:
gp.fs.0 = ec.gp.GPFunctionSet
gp.fs.0.size = 5
gp.fs.0.func.0 = ec.app.myapp.X
gp.fs.0.func.0.nc = nc0
gp.fs.0.func.1 = ec.app.myapp.Y
gp.fs.0.func.1.nc = nc0
gp.fs.0.func.2 = ec.app.myapp.Mul
gp.fs.0.func.2.nc = nc2
gp.fs.0.func.3 = ec.app.myapp.Sub
gp.fs.0.func.3.nc = nc2
gp.fs.0.func.4 = ec.app.myapp.Sin
gp.fs.0.func.4.nc = nc1
Notice that we don’t state the number of children or the types explicitly: instead we state them implicitly by assigning the appropriate GPNodeConstraints object.
140
GP Tree
sin
–
xy
Figure 5.3 A simple GP tree representing the mathematical expression sin(x − y). 5.2.3 Defining the Representation, Problem, and Statistics
GP is more complex than most other optimization procedures because of its representation. When you create a GP problem, you have two primary tasks:
• Create the GPNodes with which a GPIndividual may be constructed • Create a Problem which tests the GPIndividual
Let’s start with the first one. As an example, we’ll build a simple Symbolic Regression example on two variables, X and Y. The GP tree can have GPNodes which subtract, multiply, and perform sine, just as was done earlier. 5 This means we’ll need two terminals (X and Y), and three nonterminals (subtract and multiply, each of arity 2 — two children each — and cosine, with arity 1).
Recall from Section 5.2.2 that our function set would look like this:
gp.fs.0 = ec.gp.GPFunctionSet
gp.fs.0.size = 5
gp.fs.0.func.0 = ec.app.myapp.X
gp.fs.0.func.0.nc = nc0
gp.fs.0.func.1 = ec.app.myapp.Y
gp.fs.0.func.1.nc = nc0
gp.fs.0.func.2 = ec.app.myapp.Mul
gp.fs.0.func.2.nc = nc2
gp.fs.0.func.3 = ec.app.myapp.Sub
gp.fs.0.func.3.nc = nc2
gp.fs.0.func.4 = ec.app.myapp.Sin
gp.fs.0.func.4.nc = nc1
We need to make each of these classes. Each of these are GPNodes with a single crucial method overridden:
public abstract void eval(EvolutionState state, int thread, GPData input,
ADFStack stack, GPIndividual individual, Problem problem);
This method is called when the GPNode is being executed in the course of executing the tree. Execution proceeds depth-first like the evaluation of a standard parse tree. For example, in order to compute the expression sin(x − y) (shown in GP form in Figure 5.3) we will call eval(…) on the Sin object, which will in
5Ridiculously limited, but what did you expect? This is a demonstration example! 141
turn call eval(…) on the Sub object. This will then call eval(…) on the X object, then on the Y object. X and Y will return their values. The Sub object will then subtract them and return the result, and finally Sin will return the sine of that.
Execution doesn’t have to be just in terms of calling eval(…) on children, processing the results, and returning a final value. In fact, for some problems the return value may not matter at all, but simply which nodes are executed. In this case, likely the nodes themselves are doing things via side effects: moving a robot around, for example. Execution could also be tentative: an “if” node might evaluate one or another of its children and leave it at that. Execution could also be repetitive: you might make a “while” node which repeatedly evaluates a child until some test is true. Basically you can execute these nodes any way that might appear in a regular programming language.
5.2.3.1 GPData
The eval(…) method has several arguments, only two of which should be nonobvious: ec.gp.ADFStack and ec.gp.GPData. We will discuss ADFStack later in Section 5.2.10. The GPData object is a simple data object passed around amongst your GPNodes when they execute one another. It’s your opportunity to pass data from node to node. In the example above, it’s how the values are passed from the children to their parents: for example, it’s how the Sub node returns its value to the Sin node. It’s also possible that the parent needs to pass data to its child: and the GPData object can be used like that as well.
Typically a single GPData object is created and handed to the GPNodes, and then they hand it to one another during execution, reusing it. This avoids making lots of clones of a GPData object during execution. Your prototypical GPData instance normally managed by the GPProblem (Section ??, coming up). We’ll see how to specify it in the parameters then.
In the simplest case, your nodes don’t need to pass any data to each other at all. For example, in the Artificial Ant problem, the nodes are simply executed in a certain order and don’t pass or return any data. In this case, you can simply use GPData itself: there is no need to specify a subclass.
More often, our GPData object needs to hold the return value from a child. If you are holding a simple piece of data (like a double or an int) also just need to implement a single method, copyTo(…), which copies the data from your GPData object into another, then returns it:
public GPData copyTo(GPData other);
In this case, it’s simple:
package ec.app.myapp;
import ec.gp.*;
public class MyData extends GPData
{
public double val;
public void copyTo(GPData other)
{ ((MyData)other).val = val; return other; }
}
Now it might be the case that you need to hold more complex data. For example, what if you had an array of doubles? In this case you’d need to either clone or copy the data during the copyTo(…) operation. Additionally, GPData is a Prototype and so it needs to implement the clone() method as a deep clone. The default implementation just does a light clone. But with your array of doubles, you’d need to clone that. Altogether you might have something like this:
142
package ec.app.myapp;
import ec.gp.*;
public class MyData extends GPData
{
public double[15] val = new double[15];
public void copyTo(GPData other)
{
System.arraycopy(val, 0, ((MyData)other).val, 0, val.length);
return other;
}
public Object clone()
{
MyData other = (MyData)(super.clone());
other.val = (double[])(val.clone());
return other;
}
}
The important thing to note is that when you perform copyTo(…) or clone(), the resulting other object should not be sharing any data in common with you except constant (immutable, read-only) data. Why the two methods? Certainly most things copyTo(…) performs could be done with clone(). The reason for copyTo(…) is entirely for efficiency when using Automatically Defined Functions (Section 5.2.10). Perhaps in the future we might obviate the need for its use.
Now that you’ve defined the GPData object, you’ll need to specify its use. This is done as follows:
eval.problem.data = ec.app.myapp.MyData
5.2.3.2 KozaFitness
You can use any fitness you like for GP. But it’s common to use a particular fitness function popularized
by John Koza [3]. This fitness object contains a standardized fitness in which 0 is the ideal result and Infinity
is worse than the worst possible result. Note that this isn’t yet the correct fitness according the ec.Fitness.
Instead, when asked for fitness, the function converts this to an adjusted fitness in which 1 is the ideal result
and 0 is worse than the worst possible result, using the function adjusted = 1 . The adjusted 1+standardized
fitness makes this a valid Fitness subclass. The GP fitness also has an auxiliary variable, hits, which originally was meant to indicate how many optimal subsolutions were discovered: it’s printed out in the statistics and used for nothing else; use it as you like. This fitness is set as:
pop.subpop.0.species.fitness = ec.gp.koza.KozaFitness
The standard fitness-setting function for this class is:
public final void setStandardizedFitness(EvolutionState state, double value);
You can get (or set) the hits as:
int hits = myKozaFitness.hits;
Note that though the adjusted fitness is returned by the fitness() method, and is thus used by selection methods such as ec.select.FitProportionateSelection, the standardized fitness is what’s used in the comparison methods betterThan() and equivalentTo(), as well as isIdealFitness(). This is because it’s possible to convert different standardized fitness values into the adjusted fitness and have them come out equal due to floating
143
point inaccuracy in division.
5.2.3.3 GPProblem
GPProblem is the subclass of Problem which you will define for evaluating candidate GP solutions. GPProb- lems contain two variables:
public ADFStack stack;
public GPData input;
The ec.gp.ADFStack— is the mechanism used to handle Automatically Defined Functions (or ADFs, see Section 5.2.10). You’ll usually not bother with this variable, it’s handled automatically for you.
However, the second variable, input, will be of considerable interest to you: it’s the GPData object for you to pass among your GPNodes. It’s automatically loaded via this parameter, as mentioned earlier in Section 5.2.3.1 (GPData):
eval.problem.data = ec.app.myapp.MyData
Your primary task, done during setup(…), will be to verify that the GPData object is of the subclass you’ll be using, along these lines:
// verify that our GPData is of the right class (or subclasses from it)
if (!input instanceof MyData))
state.output.fatal(“GPData class must subclass from ” + MyData.class,
base.push(P_DATA), null);
Then during evaluation (evaluate(…) or describe(…)) of a GPIndividual, you’ll use your copy of the input prototype and hand it to your top-level GPNode to evaluate.
With all that out of our hair, let’s construct the Problem. Let’s attempt to create a GP tree which closely matches a set of data we’ve created: we’ll generate the data from the function z = sin(x × y) − sin(x) − x × y, in the range [0, 1) for both x and y. What we’ll do is define n ⟨x, y⟩ data points up front, then evaluate our individuals. For each data point, we’ll set some global variables accessible by the X and Y GPNodes. The individual will return a value, which we’ll compare against the expected z result. The fitness will be the sum of squared differences.
package ec.app.myapp;
import ec.gp.*;
import ec.simple.*;
import ec.*;
import ec.gp.koza.*;
public class MyProblem extends GPProblem implements SimpleProblemForm {
final static int N = 20;
int current;
double[] Xs = new double[N]; // will be pointer-copied in clone(), which is okay
double[] Ys = new double[N]; // likewise
double[] Zs = new double[N]; // likewise
public void setup(EvolutionState state, Parameter base) {
super.setup(state, base);
// verify that our GPData is of the right class (or subclasses from it)
if (!input instanceof MyData))
state.output.fatal(“GPData class must subclass from ” + MyData.class,
base.push(P_DATA), null);
// generate N random
144
for(int i = 0; i < N; i++) {
double x, y;
Xs[i] = x = state.random[0].nextDouble();
Ys[i] = y = state.random[0].nextDouble();
Zs[i] = Math.sin(x * y) - Math.sin(x) - x * y;
}
}
public void evaluate(final EvolutionState state, Individual ind,
int subpopulation, int threadnum) {
if (!ind.evaluated) { // don’t bother reevaluating
MyData input = (MyData)(this.input);
double sum = 0.0;
// for each tuple, evaluate the individual. For good measure reset
// the GPData first, though in this example it’s not necessary
input.val = 0;
for(current = 0; current < N; current++) // note: an instance variable
((GPIndividual)ind).trees[0].child.eval(
state, threadnum, input, stack, ((GPIndividual)ind),this);
sum += (input.val * input.val);
// set the fitness and the evaluated flag
KozaFitness f = (KozaFitness)(ind.fitness);
f.setStandardizedFitness(state, sum);
f.hits = 0; // don’t bother using this
ind.evaluated = true;
} }
}
5.2.3.4 GPNode Subclasses
Now let’s implement our five GPNode subclasses. Each will implement toString() to print out the node name, and also eval(...) discussed earlier. It’s also common to implement the method checkConstraints() to do a final sanity-check on the node (whether it has the right number of children, etc.) but it’s not necessary, and we’ll omit it here. Instead we implement the simpler expectedChildren() method, which is called by the default checkConstraints() implementation. expectedChildren() simply returns the expected number of children to the node, or a negative number, which means “the number of children could be anything”. The value returned by expectedChildren() is checked against the type constraints of this node as a sanity check.
First the terminals:
145
package ec.app.myapp;
import ec.*;
import ec.gp.*;
public class X extends GPNode {
public String toString() { return "x" };
public int expectedChildren() { return 0; }
public void eval(EvolutionState state, int thread, GPData input,
ADFStack stack, GPIndividual indivdiual, GPProblem problem) {
MyData data = (MyData) input;
MyProblem prob = (MyProblem) problem;
data.val = problem.Xs[problem.current]; // return current X value to parent
} }
... and...
package ec.app.myapp;
import ec.*;
import ec.gp.*;
public class Y extends GPNode {
public String toString() { return "y" };
public int expectedChildren() { return 0; }
public void eval(EvolutionState state, int thread, GPData input,
ADFStack stack, GPIndividual indivdiual, GPProblem problem) {
MyData data = (MyData) input;
MyProblem prob = (MyProblem) problem;
data.val = problem.Ys[problem.current]; // return current Y value to parent
} }
Next the Sine:
package ec.app.myapp;
import ec.*;
import ec.gp.*;
public class Sin extends GPNode {
public String toString() { return "sin" };
public int expectedChildren() { return 1; }
public void eval(EvolutionState state, int thread, GPData input,
ADFStack stack, GPIndividual indivdiual, GPProblem problem) {
MyData data = (MyData) input;
MyProblem prob = (MyProblem) problem;
children[0].eval(state, thread, data, stack, individual, prob);
data.val = Math.sin(data.val);
}
}
Next the Multiply and Subtract:
146
package ec.app.myapp;
import ec.*;
import ec.gp.*;
public class Mul extends GPNode {
public String toString() { return "*" };
public int expectedChildren() { return 2; }
public void eval(EvolutionState state, int thread, GPData input,
ADFStack stack, GPIndividual indivdiual, GPProblem problem) {
MyData data = (MyData) input;
MyProblem prob = (MyProblem) problem;
children[0].eval(state, thread, data, stack, individual, prob);
double val1 = data.val;
children[1].eval(state, thread, data, stack, individual, prob);
data.val = val1 * data.val;
}
}
package ec.app.myapp;
import ec.*;
import ec.gp.*;
public class Sub extends GPNode {
public String toString() { return "-" };
public int expectedChildren() { return 2; }
public void eval(EvolutionState state, int thread, GPData input,
ADFStack stack, GPIndividual indivdiual, GPProblem problem) {
MyData data = (MyData) input;
MyProblem prob = (MyProblem) problem;
children[0].eval(state, thread, data, stack, individual, prob);
double val1 = data.val;
children[1].eval(state, thread, data, stack, individual, prob);
data.val = val1 - data.val;
}
}
5.2.3.5 Statistics
The default statistics class for GP is SimpleStatistics. However, the GP package has a Statistics subclass designed for doing the same basic stuff as SimpleShortStatistics (Section 3.7.2), but with some extra GP- specific tree statistics. You can turn it on like this:
stat = ec.gp.koza.KozaShortStatistics
This statistics object has all the basic features of SimpleShortStatistics, including the do-time, do-size, do-subpops, and modulus parameters. Additionally it adds a new parameter which you can turn on like this:
stat.child.0.do-depth = true
This parameter enables tree depth information.
Beyond depth options, the full form of output differs from SimpleShortStatistics in that it reports per-tree
147
information as well, like this:
1. The generation number
2. (If do-time is true) How long initialization took in milliseconds, or how long the previous generation took to breed to form this generation
3. (If do-time is true) How long evaluation took in milliseconds this generation
4. Once for each subpopulation...
(a) (If do-depth is true) Output of the form [a b c ...], representing the average depth of each tree a, b, c, etc. of an individual this generation for this subpopulation
(b) (If do-size is true) Output of the form [a b c ...], representing the average size of each tree a, b, c, etc. of an individual this generation for this subpopulation
(c) (If do-size is true) The average size of an individual this generation for this subpopulation
(d) (If do-size is true) The average size of an individual so far in the run for this subpopulation
(e) (If do-size is true) The size of the best individual this generation for this subpopulation
(f) (If do-size is true) The size of the best individual so far in the run for this subpopulation
(g) The mean fitness of the subpopulation for this generation
(h) The best fitness of the subpopulation for this generation
(i) The best fitness of the subpopulation so far in the run
5. (If do-depth is true) Output of the form [a b c ...], representing the average depth of each tree a, b,
c, etc. of an individual this generation
6. (If do-size is true) Output of the form [a b c ...], representing the average size of each tree a, b, c,
etc. of an individual this generation
7. (If do-size is true) The average size of an individual this generation
8. (If do-size is true) The average size of an individual so far in the run
9. (If do-size is true) The size of the best individual this generation
10. (If do-size is true) The size of the best individual so far in the run
11. The mean fitness of the entire population for this generation
12. The best fitness of the entire population for this generation
13. The best fitness of the entire population so far in the run
5.2.4 Initialization
To use GP we’ll need to define the initializer as a subclass of ec.gp.GPInitializer: init = ec.gp.GPInitializer
ECJ has traditionally followed the lil-gp default for disallowing duplicates in the initial Population: if a duplicate is created, ECJ will try 100 times to create another non-duplicate Individual in its stead. If this fails, the last duplicate created will be allowed. We say this in the standard way:
pop.subpop.0.duplicate-retries = 100
148
To create trees. ECJ relies on a tree-creation algorithm in the form of an ec.gp.GPNodeBuilder, part of the GPTreeConstraints object. The GPNodeBuilder for GPTreeConstraints 0 is specified like this:
gp.tc.0.init = ec.gp.koza.HalfBuilder
ECJ provides quite a number of node builders in the ec.gp.koza and ec.gp.build packages. You request a tree with the following function:
public abstract GPNode newRootedTree(EvolutionState state, GPType type,
int thread, GPNodeParent parent, GPFunctionSet set,
int argposition, int requestedSize);
This method builds a tree of GPNodes whose root return type is compatible with type, attached to the given GPNodeParent, at position argposition, and built from clones of GPNodes in the function set set. The root node is returned. Several GPNodeBuilders also produce the tree of the requestedSize: others ignore this function. You can also ask the GPNodeBuilder to pick its own tree size from a distribution specified by the user in parameters, by passing ec.gp.GPNodeBuilder.NOSIZEGIVEN for the size (this is the usual thing done by most initialization procedures).
If you are using a GPNodeBuilder which generates trees of a certain size, and ec.gp.GPNodeBuilder.NOSIZEGIVEN is used (as usual), then you can specify a distribution of sizes in two ways. First, you can have the GPNodeBuilder pick a size uniformly from among a minimum and maximum size, for example:
gp.tc.0.init.min-size = 10
gp.tc.0.init.max-size = 20
Alternatively you can specify the distribution of sizes manually. To stipulate probabilities sizes for 1, 2, 3, 4, and 5, you’d say:
gp.tc.0.init.num-sizes = 5
gp.tc.0.init.size.0 = 0.2
gp.tc.0.init.size.1 = 0.1
gp.tc.0.init.size.2 = 0.2
gp.tc.0.init.size.3 = 0.25
gp.tc.0.init.size.4 = 0.25
ECJ has a whole bunch of GPNodeBuilder algorithms available to you. I wrote a shoot-out paper describing and comparing nearly all of these algorithms [9]. Here is the run-down:
• ec.gp.koza.FullBuilder generates full trees using Koza’s FULL algorithm. You cannot request a size. It requires a minimum and maximum depth, for example:
gp.tc.0.init = ec.gp.koza.FullBuilder
gp.tc.0.init.min-depth = 2
gp.tc.0.init.max-depth = 6
Alternatively:
gp.koza.full.min-depth = 2
gp.koza.full.max-depth = 6
• ec.gp.koza.GrowBuilder generates arbitrary trees depth-first using Koza’s GROW algorithm. You cannot request a size. It requires a minimum and maximum depth, for example:
149
gp.tc.0.init = ec.gp.koza.GrowBuilder
gp.tc.0.init.min-depth = 2
gp.tc.0.init.max-depth = 6
Alternatively:
gp.koza.grow.min-depth = 2
gp.koza.grow.max-depth = 6
• ec.gp.koza.HalfBuilder generates arbitrary trees depth-first using Koza’s RAMPED HALF-AND-HALF algorithm. You cannot request a size. This is nothing more than flipping a coin of probability growp to decide whether to use GROW or FULL. HalfBuilder is the default builder for creating GP trees in ECJ, but it’s not particularly good. It requires a minimum and maximum depth, and the probability of doing GROW, for example:
gp.tc.0.init = ec.gp.koza.HalfBuilder
gp.tc.0.init.min-depth = 2
gp.tc.0.init.max-depth = 6
gp.tc.0.init.growp = 0.5
Alternatively:
gp.koza.half.min-depth = 2
gp.koza.half.max-depth = 6
gp.koza.half.growp = 0.5
• ec.gp.build.PTC1 is a modification of GROW which guarantees that trees will be generated with a given mean. You cannot request a size. Additionally, each terminal and nonterminal can specify its probability of being chosen from the function set as PTC1 constructs the tree. PTC1 requires an expected size and a maximum depth:
gp.tc.0.init = ec.gp.build.PTC1
gp.tc.0.init.expected-size = 10
gp.tc.0.init.max-depth = 6
Alternatively:
gp.build.ptc1.expected-size = 10
gp.build.ptc1.max-depth = 6
PTC1 requires that its function sets adhere to the interface ec.gp.build.PTCFunctionSetForm. This interface contains three tables of probabilities for your GPNodes to be selected:
public [] terminalProbabilities(int type);
public double[] nonterminalProbabilities(int type);
public double[] nonterminalSelectionProbabilities(int expectedTreeSize);
The first function returns, for a given GPType number, a distribution of desired selection probabil- ities for terminals of that type. The order of the terminals is the same as the following array in GPFunctionSet:
public GPNode[type][] terminals;
The second function returns, for a given GPType number, a distribution of desired selection probabilities for nonterminals of that type. The order of the nonterminals is the same as the following array in
150
GPFunctionSet:
public GPNode[type][] nonterminals;
The final function returns, for a given desired tree size, the probability that a nonterminal (of a given GPType return type) should be selected over a terminal of the same GPType. This is only used by PTC1, not PTC2 below.
You don’t need to implement this interface: the ec.gp.build.PTCFunctionSet class does it for you:
gp.fs.size = 1
gp.fs.0 = ec.gp.build.PTCFunctionSet
gp.fs.0.name = f0
This function set computes all the above probabilities from user-specified probabilities as parameters. The probabilities are specified by each GPNodeConstraints object. Following the example we started in Section 5.2.2, we might state that the terminals X and Y (node constraints 0) should be picked with 0.5 probability each, and the nonterminals Mul, Sub (node constraints 2) and Cos (node constraints 1) should be picked with 0.3, 0.3, and 0.4 probability:
gp.nc.0.prob = 0.5
gp.nc.1.prob = 0.3
gp.nc.2.prob = 0.4
What if you wanted Mul and Sub to have different probabilities? You’d need to create different GPNodeConstraints. For example, we could create a new, separate GPNodeConstraints for Sub:
gp.nc.size = 4
gp.nc.3 = ec.gp.GPNodeConstraints
gp.nc.3.name = nc3
gp.nc.3.returns = nil
gp.nc.3.size = 2
gp.nc.3.child.0 = nil
gp.nc.3.child.1 = nil
Now we assign it to Sub:
gp.fs.0.func.3 = ec.app.myapp.Sub
gp.fs.0.func.3.nc = nc3
...and last change the probabilities of Sub and Mul to be different:
gp.nc.0.prob = 0.5
gp.nc.1.prob = 0.3
gp.nc.2.prob = 0.25
gp.nc.3.prob = 0.35
• ec.gp.build.PTC2 generates trees near to a desired size (which you request) by picking randomly from the current outer edge in the tree and adding a node. When the tree is large enough, all the remaining edge slots are filled with terminals. Additionally, each terminal and nonterminal can specify its probability of being chosen from the function set as PTC2 constructs the tree. PTC2 requires a desired size and a maximum depth:
gp.tc.0.init = ec.gp.build.PTC2
gp.tc.0.init.expected-size = 10
gp.tc.0.init.max-depth = 6
151
Alternatively:
gp.build.ptc2.expected-size = 10
gp.build.ptc2.max-depth = 6
Like PTC1, PTC2 requires that function sets adhere to the PTCFunctionSetForm interface. Just use PTCFunctionSet.
• ec.gp.build.RandomBranch generates trees near to a desired size (which you request) using the RAN- DOMBRANCH algorithm. Beyond the size distributions, this algorithm has no additional parameters.
• ec.gp.build.Uniform generates trees near to a desired size (which you request) using the UNIFORM algorithm, which selects trees of any tree size. You can select sizes either using the user distribution, or according to the natural distribution of tree sizes. To do the second, you’d say:
gp.tc.0.init = ec.gp.build.Uniform
gp.tc.0.init.true-dist = true
Alternatively:
gp.breed.uniform.true-dist = true
WARNING: This algorithm is complex and I fear it may be suffering from bit-rot. I have been told it’s not working properly any more but have not debugged it yet.
• ec.gp.build.RandTree (by Alexander Chircop) generates trees near to a desired size (which you request) using the RAND TREE algorithm, which selects trees distributed uniformly using Dyck words. No extra parameters are needed beyond the tree size selection. WARNING: I suspect this algorithm may have some bugs.
5.2.5 Breeding
ECJ has a large number of breeding pipeline operators for GP trees. This includes the most common operators used in GP (ec.gp.koza.Crossover, ec.gp.koza.Mutation), and several more found in the ec.gp.breed package.
Pipelines generally pick a single GPTree in a given GPindividual in which to do mutation or crossover. In most cases you can lock down the specific GPTree, or let the pipeline choose it at random.
Once they’ve picked a GPTree, GP breeding operators often need to choose GPNodes in the tree in which to perform crossover, mutation, etc. To do this, they make use of a ec.gp.GPNodeSelector. A GPNodeSelector is a simple interface for picking nodes, consisting of the following methods:
public abstract void reset();
public abstract GPNode pickNode(EvolutionState s, int subpopulation,
int thread, GPIndividual ind, GPTree tree);
When a breeding pipeline needs to pick a node in a particular GPTree of a particular GPIndividual, it first will call the reset() to get the GPNodeSelector to ready itself, then it will call pickNode(...) to select a node. If the breeding pipeline needs another node in the same tree, it can call pickNode(...) again as many times as necessary.
The standard GPNodeSelector is ec.gp.koza.KozaNodeSelector, which picks certain kinds of nodes with different probabilities. The kinds of nodes you can state probabilities for are: the root, nonterminals, terminals, and all nodes. The most common settings are (here as default parameters):
gp.koza.ns.terminals = 0.1
gp.koza.ns.nonterminals = 0.9
gp.koza.ns.root = 0.0
152
This says to pick terminals 10% of the time, nonterminals 90% of the time, the root (specifically) 0% of the time and any arbitrary node 0% of the time. The arbitrary-node percentage is whatever is left over from the other three percentages. (The root could still be picked, since it’s a nonterminal or a terminal — but it won’t be specially picked).
Why might the breeding pipeline need to call pickNode(...) repeatedly? Most likely because the chosen GPNode has type constraint problems. For example, in order to do crossover between the subtrees rooted by two GPNodes, the nodes need to be type-compatible with one another’s parent nodes: otherwise the tree locations wouldn’t be valid. Pipelines with these issues will try some n times to pick compatible nodes; if they fail all n times, the parents are returned rather than the generated children.
Here are the breeding pipelines that come with ECJ. In each case, let’s presume that we’re placing the pipeline as the root pipeline of Subpopulation 0, parameter-wise:
• ec.gp.koza.CrossoverPipeline performs standard subtree crossover: it requests a GPIndividual from each of its two sources; then a tree is selected from each GPndividual, then a node is selected in each tree, and finally the two subtrees rooted by those nodes are swapped. CrossoverPipeline has several parameters. The first four:
pop.subpop.0.species.pipe.tries = 1
pop.subpop.0.species.pipe.maxdepth = 17
pop.subpop.0.species.pipe.toss = false
% This one is undefined initially but you can define it:
% pop.subpop.0.species.pipe.maxsize = ...
This tells CrossoverPipeline that children may not have a depth which exceeds 17 (the common value). The pipeline will to try just one times to find type-valid and depth-legal crossover points before giving up and just returning the parents instead. This is the most common setting in Genetic Programming. If toss=true then only one child is returned — the other is thrown away. The default value for toss is false.
The CrossoverPipeline also can be set, via the maxsize parameter, to not generate children which exceed some number of nodes in any one tree. Initially this is unset, meaning that there is no maximum size.
pop.subpop.0.species.pipe.tree.0 = 0
pop.subpop.0.species.pipe.tree.1 = 0
This tells CrossoverPipeline that it should pick GPNodes in GPTree 0 of each individual. If either of these parameters is missing entirely, then CrossoverPipeline will pick that tree at random. At any rate, the GPTrees chosen must have the same GPTreeConstraints. Finally we have:
pop.subpop.0.species.pipe.ns.0 = ec.gp.koza.KozaNodeSelector
pop.subpop.0.species.pipe.ns.1 = same
This states that the GPNodeSelector for both GPIndividuals should be a KozaNodeSelector. You can state them independently or node selector 1 can be same.
The default parameter base versions for all of these would be:
gp.koza.xover.tries = 1
gp.koza.xover.maxdepth = 17
gp.koza.xover.toss = false
gp.koza.xover.tree.0
gp.koza.xover.tree.1
gp.koza.xover.ns = ec.gp.koza.KozaNodeSelector
153
Important note: the default version of the parameter for node selectors is just ns. There’s no ns.0 or ns.1.
• ec.gp.koza.MutationPipeline performs standard subtree mutation: it requests a GPIndividual from a single source; then a tree is selected; then a node is selected in that tree; and finally the subtree rooted by that node is replaced in its entirety by a randomly-generated tree.
MutationPipeline has many parameters similar to CrossoverPipeline:
pop.subpop.0.species.pipe.tries = 1
pop.subpop.0.species.pipe.maxdepth = 17
pop.subpop.0.species.pipe.tree.0 = 0
pop.subpop.0.species.pipe.ns = ec.gp.koza.KozaNodeSelector
% This one is undefined initially but you can define it:
% pop.subpop.0.species.pipe.maxsize = ...
Note that the node selector is just ns, not ns.0.
The replacing subtree is generated using a GPNodeBuilder. The standard GPNodeBuilder is a Grow-
Builder with the following default values:
gp.koza.grow.min-depth = 5
gp.koza.grow.max-depth = 5
These are strange default values, but that’s common GP original settings. You stipulate the GPNode- Builder as:
pop.subpop.0.species.pipe.build.0 = ec.gp.koza.GrowBuilder
Though GrowBuilder ignores size demands, if you replaced with another builder such as PTC2, you can also optionally stipulate that the replacing subtree must be about the same size as the original subtree. Here’s the parameter:
pop.subpop.0.species.pipe.equal = true
The default setting is false.
The default parameter base versions for all of these would be:
gp.koza.mutate.tries = 1
gp.koza.mutate.maxdepth = 17
gp.koza.mutate.tree.0 = 0
gp.koza.mutate.pipe.ns = ec.gp.koza.KozaNodeSelector
gp.koza.mutate.build.0 = ec.gp.koza.GrowBuilder
gp.koza.mutate.equal = true
• ec.gp.breed.InternalCrossoverPipeline selects two GPNodes in the same GPIndividual, such that neither GPNode is in the subtree rooted by the other. The GPNodes may be in different GPTrees, or they may be in the same GPTree. It then swaps the two subtrees.
InternalCrossoverPipeline’s parameters are essentially identical to those in CrossoverPipeline, except for a missing maxsize parameter. For example:
154
pop.subpop.0.species.pipe.tries = 1
pop.subpop.0.species.pipe.maxdepth = 17
pop.subpop.0.species.pipe.tree.0 = 0
pop.subpop.0.species.pipe.tree.1 = 0
pop.subpop.0.species.pipe.ns.0 = ec.gp.koza.KozaNodeSelector
pop.subpop.0.species.pipe.ns.1 = same
The default parameter base versions for all of these would be:
gp.breed.internal-xover.tries = 1
gp.breed.internal-xover.maxdepth = 17
gp.breed.internal-xover.toss = false
gp.breed.internal-xover.tree.0
gp.breed.internal-xover.tree.1
gp.breed.internal-xover.ns = ec.gp.koza.KozaNodeSelector
Important note: just as is the case for CrossoverPipeline, the default version of the parameter for node selectors is just ns. There’s no ns.0 or ns.1.
• ec.gp.breed.MutatePromotePipeline selects a GPNode, other than the root, and replaces its parent (and its parent’s subtree) with the GPNode and its subtree. This was called the PromoteNode algorithm in [1] and is similar to the Deletion algorithm in [13].
MutatePromotePipeline’s parameters are pretty simple. Because its constraints are tighter, it doesn’t use a GPNodeSelector: instead it searches among all nodes in the tree to find one which is type-compatable with its parent. thus its parameters are simply the number of times it tries before giving up, and returning the original tree. Like previous methods, if the tree parameter doesn’t exist, a tree is picked at random (which is usually what’d you’d want anyway).
pop.subpop.0.species.pipe.tries = 1
pop.subpop.0.species.pipe.tree.0 = 0
The default parameter base versions:
gp.breed.mutate-promote.tries = 1
gp.breed.mutate-promote.tree.0
• ec.gp.breed.MutateDemotePipeline selects a GPNode, then replaces the node with a new nonterminal. The old Node becomes a child of the new node at a random argument location, and the remaining child slots are filled with terminals. This was called the DemoteNode algorithm in [1] and is similar to the Insertion algorithm in [13].
MutatePromotePipeline is similar to MutatePromotePipeline: it doesn’t use a GPNodeSelector, and tries a certain number of times to find valid node points before giving up and returning the tree. parameters are pretty simple. Because its constraints are tighter, it doesn’t use a NodeSelector: instead it searches among all nodes in the tree to find one which is type-compatable with its parent, and which wouldn’t create a tree deeper than a maximum legal value. Like previous methods, if the tree parameter doesn’t exist, a tree is picked at random (which is usually what’d you’d want anyway).
pop.subpop.0.species.pipe.tries = 1
pop.subpop.0.species.pipe.maxdepth = 17
pop.subpop.0.species.pipe.tree.0 = 0
The default parameter base versions:
155
gp.breed.mutate-demote.tries = 1
gp.breed.mutate-demote.maxdepth = 17
gp.breed.mutate-demote.tree.0 = 0
• ec.gp.breed.MutateSwapPipeline selects a GPNode with at least two children, then selects two children of that node such that each is type-compatable with the other. Then it swaps the two subtrees rooted by those children.
MutateSwap’s parameters are simple because it doesn’t use a GPNodeSelector (the constraints are too complex). You simply specify the tree (or have one picked at random if none is specified) and the number of tries:
pop.subpop.0.species.pipe.tries = 1
pop.subpop.0.species.pipe.tree.0 = 0
The default parameter base versions for all of these would be:
gp.breed.mutate-swap.tries = 1
gp.breed.mutate-swap.tree.0
• ec.gp.breed.MutateOneNodePipeline selects a GPNode, then replaces that node with a different node of the same arity and type constraints. This was called the OneNode algorithm in [1].
MutateOneNodePipeline uses a GPNodeSelector to pick the node. You also specify the tree number: or if you don’t specify anything, one will be picked at random (which is usually what’d you’d want).
pop.subpop.0.species.pipe.ns.0 = ec.gp.koza.KozaNodeSelector
pop.subpop.0.species.pipe.tree.0 = 0
The default parameter base versions:
gp.breed.mutate-one-node.ns.0 = ec.gp.koza.KozaNodeSelector
gp.breed.mutate-one-node.tree.0 = 0
• ec.gp.breed.MutateAllNodesPipeline selects a GPNode, then for every node in the subtree rooted by the GPnode, it replaces each node with a different node of the same arity and type constraints. This highly destructive operator was called the AllNodes algorithm in [1].
MutateAllNodesPipeline uses a GPNodeSelector to pick the GPNode. You also specify the tree number: or if you don’t specify anything, one will be picked at random (which is usually what’d you’d want).
pop.subpop.0.species.pipe.ns.0 = ec.gp.koza.KozaNodeSelector
pop.subpop.0.species.pipe.tree.0 = 0
The default parameter base versions:
gp.breed.mutate-one-node.ns.0 = ec.gp.koza.KozaNodeSelector
gp.breed.mutate-one-node.tree.0 = 0
• ec.gp.breed.RehangPipeline is an oddball mutator of my own design mean to be highly destructive. It selects a nonterminal other than the root, and designates it the “new root”. It then picks a child subtree of this new root, which is disconnected from its parent. The new root becomes the root of the tree. The original parent of the new root becomes the new root’s child, filling the spot vacated by the disconnected subtree. The grandparent then fills the spot vacated by the parent, and so on, clear up to the root. Then finally the disconnected subtree fills remaining spot. Figure 5.4 shows this procedure. There are two parameters, as usual:
156
Old Root
move
GP Tree
progn3
left
progn2
left
GP Tree
if-food- New ahead Root
progn2
move progn3 Old Root
move
if-food- ahead
New Root
left
progn2
New Root Subtree
right
move left
progn2
New Root Subtree
right
move
move
Figure 5.4 Rehanging a tree. A new root is chosen at random from among the nonterminals except for the original root. Then a subtree of that new root is chosen at random and disconnected. The tree is then rehung as shown: the parent of the new root becomes its child; the grandparent becomes the parent’s child, and so on up to the root. The disconnected subtree then fills the remaining spot.
pop.subpop.0.species.pipe.tries = 1
pop.subpop.0.species.pipe.tree.0 = 0
The default parameter base versions for all of these would be:
gp.breed.rehang.tries = 1
gp.breed.rehang.tree.0
Warning: Because of the complexity of its rehanging process, RehangPipeline ignores all typing information.
• ec.gp.breed.MutateERCPipeline works similarly to the Gaussian algorithm in [1]. The algorithm picks a random node in a random tree in the GPIndividual, then for every Ephemeral Random Constant (ERC) in the subtree rooted by that node, it calls mutateERC() on that ERC. ERCs are discussed later in Section 5.2.9. As usual, the two common parameters:
pop.subpop.0.species.pipe.tries = 1
pop.subpop.0.species.pipe.tree.0 = 0
The default parameter base versions for all of these would be:
gp.breed.mutate-erc.tries = 1
gp.breed.mutate-erc.tree.0
If you wished to mutate the ERCs in the entire tree, you could set the node selector parameters like this:
157
gp.breed.mutate-erc.ns.0 = ec.gp.koza.KozaNodeSelector
gp.breed.mutate-erc.ns.0.terminals = 0.0
gp.breed.mutate-erc.ns.0.nonterminals = 0.0
gp.breed.mutate-erc.ns.0.root = 1.0
• ec.gp.breed.SizeFairCrossoverPipeline implements the size fair and homologous crossover methods de- scribed in [5]. SizeFairCrossoverPipeline has many parameters in common with CrossoverPipeline, which should look familiar, something like:
pop.subpop.0.species.pipe.tries = 1
pop.subpop.0.species.pipe.maxdepth = 17
pop.subpop.0.species.pipe.toss = false
pop.subpop.0.species.pipe.ns.0 = ec.gp.koza.KozaNodeSelector
pop.subpop.0.species.pipe.ns.1 = same
% These are unset by default but you can set them to lock down
% the trees being crossed over
% pop.subpop.0.species.pipe.tree.0 = 0
% pop.subpop.0.species.pipe.tree.1 = 0
The default parameter base versions for all of these would be:
gp.breed.size-fair.tries = 1
gp.breed.size-fair.maxdepth = 17
gp.breed.size-fair.toss = false
gp.breed.size-fair.ns = ec.gp.koza.KozaNodeSelector
% These are unset by default but you can set them to lock down
% the trees being crossed over
% gp.breed.size-fair.tree.0 = ...
% gp.breed.size-fair.tree.1 = ...
By default SizeFairCrossoverPipeline will perform size-fair crossover, defined as follows. First, it finds a crossover point in the first parent in a fashion identical to CrossoverPipeline. But to find the crossover point in the second parent, it first computes the size s1 of the subtree in the first parent. It then calculates the size of every subtree in the second parent, and discards from consideration any subtrees whose size s2 is s2 > 1 + 2s1. From the remainder, it then counts the number of subtrees smaller, equal to, and larger than s1 (call these n<, n= and n>), and likewise the means of the smaller and larger subtrees (call these mu< and mu>).
If n< = 0 or if n> = 0, then a random subtree is selected from among the subtrees exactly the same size as s1. Thus terminals are always crossed over with terminals. Otherwise, we decide on selecting an equal, smaller, or larger subtree with the following probabilities:
p= = 1 s1
p= 1−p=
> n> × (1 + μ> )
μ< p< = 1 − (p= + p>)
The idea is to select bigger or smaller subtrees randomly such that the mean stays the same. Once we have decided on bigger or smaller or equal subtrees, we then select a size uniformly from among the sizes of subtrees appearing in that set. If there is more than one subtree of the selected size, we select uniformly from among them.
158
5.2.6
Alternatively we can perform a kind of homologous crossover, if we turn on the following parameter: pop.subpop.0.species.pipe.homologous = true
(or the default…)
gp.breed.size-fair.homologous = true
Homologous crossover is identical to size-fair crossover except for the final detail. Instead of selecting uniformly from among all subtrees of a chosen size, we instead select the one whose root is “closest” to the selected crossover point in the first parent. Here, distance between crossover points is defined as the depth at which their paths to the root begin to deviate from one another.
Size-fair and homologous crossover is due to Uday Kamath, a PhD student at GMU.
A Complete Example
Much of these initial parameters could have been entered simply by including the parameter file ec/gp/koza/koza.params. But we’ll go through it in detail. First some basic generational parameters:
# Threads and Seeds
evalthreads = 1
breedthreads = 1
seed.0 = time
# Checkpointing
checkpoint = false
checkpoint-modulo = 1
checkpoint-prefix = ec
# The basic setup
state = ec.simple.SimpleEvolutionState finish = ec.simple.SimpleFinisher
exch = ec.simple.SimpleExchanger
breed = ec.simple.SimpleBreeder
eval = ec.simple.SimpleEvaluator
stat = ec.simple.SimpleStatistics
pop = ec.Population
pop.subpops = 1
pop.subpops.0 = ec.Subpopulation pop.subpop.0.duplicate-retries = 0 pop.subpop.0.size = 1024
breed.elite.0 = 0
stat.file = $out.stat quit-on-run-complete = true
Genetic programming typically doesn’t run very long, and (for the time being) require their own Initializer. We’ll also use KozaFitness. Following the lil-gp example, we’ll set the duplicate retries to 100:
init = ec.gp.GPInitializer
generations = 51
pop.subpop.0.species.fitness = ec.gp.koza.KozaFitness
pop.subpop.0.duplicate-retries = 100
For good measure, let’s attach KozaShortStatistics to the statistics chain. This isn’t standard in the koza.params file, but what the heck.
159
stat.num-children = 1
stat.child.0 = ec.gp.koza.KozaShortStatistics stat.child.0.gather-full = true stat.child.0.file = $out2.stat
Our initializer will work by using HalfBuilder to build trees. We define its parameters here:
# HalfBuilder
gp.koza.half.min-depth = 2
gp.koza.half.max-depth = 6
gp.koza.half.growp = 0.5
We begin by defining the tree constraints, node constraints, types, and function sets for the problem:
# Types
gp.type.a.size = 1
gp.type.a.0.name = nil
gp.type.s.size = 0
# Basic Function Set Parameters (more later)
gp.fs.size = 1
gp.fs.0 = ec.gp.GPFunctionSet
gp.fs.0.name = f0
# Tree Constraints
gp.tc.size = 1
gp.tc.0 = ec.gp.GPTreeConstraints
gp.tc.0.name = tc0
gp.tc.0.fset = f0
gp.tc.0.returns = nil
gp.tc.0.init = ec.gp.koza.HalfBuilder
# Node Constraints
gp.nc.size = 3
gp.nc.0 = ec.gp.GPNodeConstraints
gp.nc.0.name = nc0
gp.nc.0.returns = nil
gp.nc.0.size = 0
gp.nc.1 = ec.gp.GPNodeConstraints
gp.nc.1.name = nc1
gp.nc.1.returns = nil
gp.nc.1.size = 1
gp.nc.1.child.0 = nil
gp.nc.2 = ec.gp.GPNodeConstraints
gp.nc.2.name = nc2
gp.nc.2.returns = nil
gp.nc.2.size = 2
gp.nc.2.child.0 = nil
gp.nc.2.child.1 = nil
Now we define the GP elements of the Species and the Individual:
160
# Representation
pop.subpop.0.species = ec.gp.GPSpecies
pop.subpop.0.species.ind = ec.gp.GPIndividual
pop.subpop.0.species.ind.numtrees = 1
pop.subpop.0.species.ind.tree.0 = ec.gp.GPTree
pop.subpop.0.species.ind.tree.0.tc = tc0
Here’s a basic GP breeding pipeline:
# Pipeline
pop.subpop.0.species.pipe = ec.breed.MultiBreedingPipeline
pop.subpop.0.species.pipe.generate-max = false
pop.subpop.0.species.pipe.num-sources = 2
pop.subpop.0.species.pipe.source.0 = ec.gp.koza.CrossoverPipeline
pop.subpop.0.species.pipe.source.0.prob = 0.9
pop.subpop.0.species.pipe.source.1 = ec.breed.ReproductionPipeline
pop.subpop.0.species.pipe.source.1.prob = 0.1
For no good reason, we’ll define the selection methods, and various other parameters using the default parameter bases for CrossoverPipeline and ReproductionPipeline:
# Reproduction
breed.reproduce.source.0 = ec.select.TournamentSelection
# Crossover
gp.koza.xover.source.0 = ec.select.TournamentSelection
gp.koza.xover.source.1 = same
gp.koza.xover.ns.0 = ec.gp.koza.KozaNodeSelector
gp.koza.xover.ns.1 = same
gp.koza.xover.maxdepth = 17
gp.koza.xover.tries = 1
# Selection
select.tournament.size = 7
Since Crossover is using a node selector, let’s define some parameters for that:
# Node Selectors
gp.koza.ns.terminals = 0.1
gp.koza.ns.nonterminals = 0.9
gp.koza.ns.root = 0.0
Let’s presume that we have created the X, Y, Sin, Mul, and Sub methods described in Section 5.2.3. We’ll now hook them up.
161
# Our Function Set
gp.fs.0 = ec.gp.GPFunctionSet
gp.fs.0.size = 5
gp.fs.0.func.0 = ec.app.myapp.X
gp.fs.0.func.0.nc = nc0
gp.fs.0.func.1 = ec.app.myapp.Y
gp.fs.0.func.1.nc = nc0
gp.fs.0.func.2 = ec.app.myapp.Mul
gp.fs.0.func.2.nc = nc2
gp.fs.0.func.3 = ec.app.myapp.Sub
gp.fs.0.func.3.nc = nc2
gp.fs.0.func.4 = ec.app.myapp.Sin
gp.fs.0.func.4.nc = nc1
Also we’re using the MyProblem and MyData classes defined in Section 5.2.3 as our Problem. Even though we’re not using ADFs (see Section 5.2.10), we need to define a few items here as well.
# Our Problem eval.problem = ec.app.myapp.MyProblem
eval.problem.data = ec.app.myapp.MyData
gp.problem.stack = ec.gp.ADFStack
gp.adf-stack.context = ec.gp.ADFContext
Phew! That was a lot of parameters. Thankfully nearly all of them are already defined for you in ec/gp/koza/koza.params.
5.2.7 GPNodes in Depth
GPNode has a gazillion utility methods to assist various crossover, mutation, statistics, and tree-building operators in their tasks of making, breaking, printing, reading, writing and examining trees of GPNodes. Let’s look at some of them here, and divvy up the rest in later sections where they’re more appropriate.
First, the two abstract methods which you must override:
public String toString();
public abstract void eval(EvolutionState state, int thread, GPData input,
ADFStack stack, GPIndividual individual, Problem problem);
The first method prints out the node in a human-readable fashion, with no whitspace. Except for rare cases such as Ephemeral Random Constants (Section 5.2.9), this should be a single simple symbol like “cos” or “if-food-ahead”. The second method we introduced in Section 5.2.3, of course.
Sanity Checking ECJ can double-check to make sure that the number of children to a given node, as claimed in the parameter file, is in fact proper. Other sanity checks might include that the children and parent are of the right GPType, and so on. The general method for doing this sanity check, which you can override if you like, is;
public void checkConstraints(EvolutionState state, int tree,
GPIndividual typicalIndividual, Parameter individualBase);
This method is called after the prototypical GPNode is loaded into a function set, and throws an error if the sanity check fails. The primary purpose of this method is to allow Automatically Defined Functions (Section 5.2.10) a chance to make sure that everything is working. But you can override this method to do some checking of your own as well. If you do so, be sure to call super(…).
The default implementation of checkConstraints(…) just calls a simpler method which I suggest you implement instead. This method is:
162
public int expectedChildren();
You can override this method to return the expected number of children to this kind of node, as the most commonly-needed sanity check. If you override it and provide this value, checkConstraints(…) will by default call this method, then double-check for you that the number of children attached is indeed the expected number. By default, expectedChildren() returns:
public static final int CHILDREN UNKNOWN;
This indicates to checkConstraints(…) that expectedChildren() is unimplemented and so checkConstraints(…) won’t do anything at all by default. So you have three options here to help ECJ sanity-check that everything’s okay:
• Override expectedChildren() to return the expected number of children to the node. The default implementation of checkConstraints(…) will compare this result against the actual number of children.
• Override checkConstraints() to do your own more sophisticated sanity-checking.
• Do nothing. ECJ will do no sanity checking and will rely on the correctness of your parameter file (this
is often perfectly fine).
Next come some methods which are usually overridden by Ephemeral Random Constants but rarely by any other kind of GPNode (the default implementation suffices):
ec.gp.GPNode Methods
public String name()
Returns the name of the GPNode. By default, this calls toString().
public int nodeHashCode()
Returns a hash code appropriate to your GPNode (hash by value, not by address).
public boolean nodeEquals(GPNode node)
Returns true if this method is identical to the given node.
public boolean nodeEquivalentTo(GPNode node)
Returns true if the two nodes are the same “kind” of node — usually meaning they could have been cloned from the same prototype node. The default form of this function returns true if the two nodes are the same class, have the same length child array, and have the same constraints. Often nodeEquals(…) and nodeEquivalentTo(…) may return the same thing, but in Ephemeral Random Constants, they often return different values. For example, two ERCs that are the same class, and have the same constraints, may hold different values (2.34 vs. 3.14 say). These ERCs would be equivalent to one another but not equal to one another. You’d rarely need to override this method.
public String toStringForHumans()
Writes the node to a String in a fashion readable by humans. The default version simply calls toString.
public void resetNode(EvolutionState state, int thread)
Randomizes the node. Ephemeral Random Constants randomize their internal values; other GPNodes typically do nothing.
Next, some test functions:
ec.gp.GPNode Methods
public int atDepth()
Returns the depth of the node (the root is at depth 0). 163
public int depth()
Returns the depth of the subtree rooted by the node (terminals have a subtree depth of 1).
public GPNodeParent rootParent()
Returns the parent of the root of the tree in which the GPNode resides. Though the method returns a GPNodePar- ent, this returned object should always be some kind of GPTree.
public boolean contains(GPNode subnode)
Returns true subnode exists somewhere within the subtree rooted by the GPNode.
public pathLength(int nodesearch)
Returns the sum of all paths from all nodes in the GPNode’s subtree to the GPNode itself. The nodesearch parameter allows us to restrict which nodes have paths included in the sum: only leaf nodes (terminals), non-leaf nodes (nonterminal), or all nodes:
public static final int GPNode. NODESEARCH ALL;
public static final int GPNode. NODESEARCH TERMINALS; public static final int GPNode. NODESEARCH NONTERMINALS;
public int numNodes(int nodesearch)
Returns the number of nodes in the subtree rooted by GPNode. The nodesearch parameter allows us to restrict which nodes are included in the total, using the constants above.
public int meanDepth(int nodesearch)
Returns the path length divided by the number of nodes.The nodesearch parameter allows us to restrict which nodes are included in the total, using the constants above.
public GPNode nodeInPosition(int p, int nodesearch)
Returns the pth node in the subtree rooted by the GPNode, using left-to-right depth-first search, and only considering those nodes specified by nodesearch, which can be one of:
public static final int GPNode. NODESEARCH ALL;
public static final int GPNode. NODESEARCH TERMINALS; public static final int GPNode. NODESEARCH NONTERMINALS;
public java.util.Iterator iterator(int nodesearch)
Returns a depth-first, left-to-right Iterator over all the nodes rooted by the GPNode. The nodesearch parameter allows us to restrict which nodes are iterated over, and may be one of:
public static final int GPNode. NODESEARCH ALL;
public static final int GPNode. NODESEARCH TERMINALS; public static final int GPNode. NODESEARCH NONTERMINALS;
public java.util.Iterator iterator()
Returns a depth-first, left-to-right Iterator over all the nodes rooted by the GPNode. Equivalent to itera- tor(GPNode.NODESEARCH ALL).
Three methods permit even more flexibility in filtering exactly which GPNodes you want to consider, by using a ec.gp.GPNodeGatherer object. This object contains a single method, which you should override, and a single instance variable:
GPNode node; // used internally only
public abstract boolean test(GPNode thisNode);
You can ignore the variable: it’s used by GPNode’s recursion. As far as you are concerned, GPNode- Gatherer provides a filter: override the test(…) method to specify whether certain GPNodes fit your needs. Armed with a GPNodeGatherer you’ve constructed, you can then call one of the two GPNode methods:
164
ec.gp.GPNode Methods
public int numNodes(GPNodeGatherer gatherer)
Returns the number of nodes (filtered by the GPNodeGatherer) in the subtree rooted by the GPNode.
public GPNode nodeInPosition(int p, GPNodeGatherer gatherer)
Returns the pth node in the subtree rooted by the GPNode, using left-to-right depth-first search, and only considering those nodes filtered by the provided GPNodeGatherer.
public java.util.Iterator iterator(GPNodeGatherer gatherer)
Returns a depth-first, left-to-right Iterator over all the nodes rooted by the GPNode, filtered by the GPNodeGath- erer.
And now we come to rigamarole which should look familiar to you if you’ve trudged through the Vector chapter.
ec.gp.GPNode Methods
public int printNodeForHumans(EvolutionState state, int log)
Prints a node to a log in a fashion readable by humans. You don’t want to override this method: it calls toStringForHumans() by default — override that instead.
public int printNode(EvolutionState state, int log)
Prints a node to a log in a fashion readable by humans and also parsable by readNode(…). You don’t want to override this method: it calls toString() by default — override that instead.
public int printNode(EvolutionState state, PrintWriter writer)
Prints a node to a writer in a fashion readable by humans and also parsable by readNode(…). You don’t want to override this method: it calls toString() by default — override that instead.
public String toStringForError()
Writes a node to a string in a fashion useful for error messages. The default writes out the name and the tree the node is in, which works fine.
public GPNode readNode(DecodeReturn dret)
Generates a GPNode from the DecodeReturn via a light clone: children and parents are not produced. The default version clones the the node, then reads a string from the DecodeReturn. This string should match toString() exactly. If not, returns null to indicate an error. Otherwise returns the GPNode. This default implementation should be fine in most cases: though Ephemeral Random Constants (Section 5.2.9) require a different procedure.
public void writeNode(EvolutionState state, DataOutput output) throws IOException Writes the node, but not any of its children or parents, out to output.
public void readNode(EvolutionState state, DataInput input) throws IOException Reads a node from input. Children and parents are not produced.
Last are a host of different ways of cloning a GPNode or tree. In most cases the default implementations work just fine:
ec.gp.GPNode Methods
public GPNode lightClone()
Light-clones a GPNode, including its children array, but not any children or parents.
public Object clone()
Deep-clones a GPNode, except for its parent. All children are cloned as well.
165
public final GPNode cloneReplacing(GPNode newSubtree, GPNode oldSubtree)
Deep-clones a GPNode, except that, if found within its cloned subtree, oldSubtree is replaced with a deep-cloned version of newSubtree.
public final GPNode cloneReplacing(GPNode[] newSubtrees, GPNode[] oldSubtrees)
Deep-clones a GPNode, except that, if found within its cloned subtree, each of the oldSubtrees is replaced with a clone of the corresponding member of newSubtrees.
public final GPNode cloneReplacingNoSubclone(GPNode newSubtree, GPNode oldSubtree)
Deep-clones a GPNode, except that, if found within its cloned subtree, oldSubtree is replaced with newSubtree (not a clone of newSubtree).
public final GPNode cloneReplacingAtomic(GPNode newNode, GPNode oldNode)
Deep-clones a GPNode, except that, if found within its cloned subtree, oldNode is replaced with newNode (not a clone of newNode).
public final GPNode cloneReplacingAtomic(GPNode[] newNodes, GPNode[] oldNodes)
Deep-clones a GPNode, except that, if found within its cloned subtree, each of the oldNodes is replaced with the corresponding member of newNodes (not a clone).
public final void replaceWith(GPNode newNode)
Replaces the GPnode with newNode right where it lives in its GPTree.
These are the primary public methods. There are plenty of other public methods, but they’re largely used internally and you’ll rarely need them.
5.2.8 GPTrees and GPIndividuals in Depth
Unlike GPNode, there’s nothing in a GPTree that you have to override or modify. It’s pretty rare to subclass GPTree, though it’s perfectly reasonable to do so. But there are a number of methods you should be aware of, many of which are probably very familiar by now. First, let’s cover the three non-familiar ones:
ec.gp.GPTree Methods
public int treeNumber()
Returns the position of the GPTree in its GPIndividual’s trees[] array. This is an O(n) operation — it works by scanning through the array until it finds the GPTree. If the tree is not found (which would indicate an error), then GPTree.NO TREENUM is returned.
public final void verify(EvolutionState state)
An auxillary debugging method which verifies many features of the structure of the GPTree and all of its GPNodes. This method isn’t called by ECJ but has proven useful in determining errors in GPTree construction by various tree building or breeding algorithms.
public void buildTree(EvolutionState state, int thread)
Builds a tree and attaches it to the GPTree, displacing the original, using the tree-generation algorithm defined for its GPTreeConstraints. No specific tree size is requested.
Next come cloning and tests for removing duplicates:
ec.gp.GPTree Methods
public boolean treeEquals(GPTree tree)
Returns true if the GPNodes which make up the GPTree are structured the same and equal in value to one another. Override this to provide more equality if necessary.
166
public int treeHashCode()
Returns a hash code generated for the structure and makeup of the GPNodes in the GPTree. Override this to add additional hash information.
public GPTree lightClone()
Performs a light clone on the GPTree: the GPNodes are not cloned but are instead the pointer to the root is simply copied.
public Object clone()
Performs a deep clone on the GPTree, including all of its GPNodes but not its parent GPIndividual.
Last, the standard methods for printing and reading:
ec.gp.GPTree Methods
public void printTreeForHumans(EvolutionState state, int log) Prints the tree in a human-readable fashion to log.
public void printTree(EvolutionState state, int log)
Prints to log the tree in a fashion readable both by humans and also by readTree(…, LineNumberReader). By default this uses the Code package (Section 2.2.3).
public void printTree(EvolutionState state, PrintWriter writer)
Prints to writer the tree in a fashion readable both by humans and also by readTree(…, LineNumberReader). By default this uses the Code package.
public void readTree(EvolutionState state, LineNumberReader reader) throws IOException
Reads a tree produced by printTree(…). By default, this uses the Code package (Section 2.2.3).
public void writeTree(EvolutionState state, DataOutput output) throws IOException
Writes a tree to output in binary fashion such that it can be read by readTree(…, DataInput).
public void readTree(EvolutionState state, DataInput input) throws IOException
Reads a tree from input in binary fashion that had been written by writeTree(…).
5.2.8.1 Pretty-Printing Trees
GPTrees have a particular gizmo that’s not well known but is quite nice: you can print out GPTrees in one of (at present) four styles:
• Lisp (the default):
(* (+ x (- (% x x) (cos x))) (exp x))
• A style easily converted to C, C++, Java, or C#:
(x + ((x % x) – cos(x))) * exp(x)
To print out trees this way you’d use a parameter along these lines (notice the lower-case “c”):
pop.subpop.0.species.ind.tree.0.print-style = c
…or using the default parameter base:
gp.tree.print-style = c
167
Printing in C-style has two options. First, by default ECJ prints out two-child GPNodes as if they were operators “b a c” rather than as “a(b, c)”. This is what’s being done above. But if you’re not using mathematical operators and would prefer to see 2-child GPNodes as functions, you can do it like this:
pop.subpop.0.species.ind.tree.0.c-operators = false
…or using the default parameter base:
gp.tree.c-operators = false
This results in the following:
*(+(x, -(%(x, x), cos(x))), exp(x))
This doesn’t seem useful for the example here (Symbolic Regression) but for other problems it’s probably the right thing to do, particularly if all the GPNodes aren’t operators. Additionally, by default ECJ prints out zero-child GPNodes as constants, as in “a”, rather than as zero-argument functions, as in “a()”. If you’d prefer zero-argument functions, you might say:
pop.subpop.0.species.ind.tree.0.c-variables = false
…or using the default parameter base:
gp.tree.c-variables = false
This results in the following:
x() + ((x() % x()) – cos(x()))) * exp(x())
Again, whether this will be useful to you is based on exactly what kind of problem you’re emitting. ECJ does not at present have support for converting if-statements (such as the “if-food-ahead” node in Artificial Ant) into a brace format appropriate to C. But hopefully these options will help you get most of the ugly parsing work out the way.
• .dot format: used by dot/GraphViz to produce high-quality trees and graphs. The code below produces the tree shown at left in Figure 5.5:
digraph g {
node [shape=rectangle];
n[label = “*”];
n0[label = “+”];
n00[label = “x”];
n0 -> n00;
n01[label = “-“];
n010[label = “%”];
n0100[label = “x”];
n010 -> n0100;
n0101[label = “x”];
n010 -> n0101;
n01 -> n010;
n011[label = “cos”];
n0110[label = “x”];
n011 -> n0110;
n01 -> n011;
n0 -> n01;
n -> n0;
168
!#
“* $ %’%’
&(&( !#!#!# “x $ “- $ “x $
+ exp
%’%’
&(&( !#!#!# “x$ “x$ “x$
% cos
x
x
+
*
x
–
%
x
exp
x
cos
Figure 5.5
n1[label = “exp”];
n10[label = “x”];
n1 -> n10;
n -> n1;
}
Auto-generated trees: in “.dot” format (left) and in LATEX format (right)
To generate trees in .dot format, you’d say: pop.subpop.0.species.ind.tree.0.print-style = dot
…or using the default parameter base:
gp.tree.print-style = dot
• LATEXformat, which emits the following code:
\begin{bundle}{\gpbox{*}}\chunk{\begin{bundle}{\gpbox{+}}\chunk{\gpbox{x}}
\chunk{\begin{bundle}{\gpbox{-}}\chunk{\begin{bundle}{\gpbox{%}}
\chunk{\gpbox{x}}\chunk{\gpbox{x}}\end{bundle}}\chunk{\begin{bundle}
{\gpbox{cos}}\chunk{\gpbox{x}}\end{bundle}}\end{bundle}}\end{bundle}}
\chunk{\begin{bundle}{\gpbox{exp}}\chunk{\gpbox{x}}\end{bundle}}\end{bundle}
To generate trees in LATEX format, you’d say:
pop.subpop.0.species.ind.tree.0.print-style = latex
…or using the default parameter base:
gp.tree.print-style = latex
169
1
This code works with the LATEX ecltree and fancybox packages to produce a tree. Note that you’ll have to replace the “%” with “\%” to make it legal LATEX. The code works with the following boilerplate to produce the tree shown at right in Figure 5.5.
\documentclass[]{article}
\usepackage{epic} % required by ecltree and fancybox packages
\usepackage{ecltree} % to draw the GP trees
\usepackage{fancybox} % required by \Ovalbox
\begin{document}
% minimum distance between nodes on the same line
\setlength{\GapWidth}{1em}
% draw with a thick dashed line, very nice looking
\thicklines \drawwith{\dottedline{2}}
% draw an oval and center it with the rule. You may want to fool with the
% rule values, though these seem to work quite well for me. If you make the
% rule smaller than the text height, then the GP nodes may not line up with
% each other horizontally quite right, so watch out.
\newcommand{\gpbox}[1]{\Ovalbox{#1\rule[-.7ex]{0ex}{2.7ex}}}
% And now the code. Note that % has been replaced with \%
\begin{bundle}{\gpbox{*}}\chunk{\begin{bundle}{\gpbox{+}}\chunk{\gpbox{x}}
\chunk{\begin{bundle}{\gpbox{-}}\chunk{\begin{bundle}{\gpbox{\%}}
\chunk{\gpbox{x}}\chunk{\gpbox{x}}\end{bundle}}\chunk{\begin{bundle}
{\gpbox{cos}}\chunk{\gpbox{x}}\end{bundle}}\end{bundle}}\end{bundle}}
\chunk{\begin{bundle}{\gpbox{exp}}\chunk{\gpbox{x}}\end{bundle}}\end{bundle}
% Finally end the document
\end{document}
5.2.8.2 GPIndividuals
Okay, there’s not much in-depth here. GPIndividual implements all the standard Individual methods discussed in Section 3.2. Note that the distanceTo(…) method is not implemented. Two methods you might want to be aware of:
ec.gp.GPIndividual Methods
public final void verify(EvolutionState state)
An auxillary debugging method which verifies many features of the structure of the GPIndividual and ll of its GPTrees (and their GPNodes). This method isn’t called by ECJ but has proven useful in determining errors in GPTree construction by various tree building or breeding algorithms.
public long size()
By default, returns the number of nodes in all the trees held by the GPIndividual.
5.2.9 Ephemeral Random Constants
An Ephemeral Random Constant or ERC [3] is a special GPNode, usually a terminal, which represents a constant such as 3.14159 or true or 724 or complex: 3.24 + 5i or “aegu”. Usually ERCs are used to add constants to programs which rely on math (such as Symbolic Regression).
An ERC needs to be able to do three things:
170
• Set itself to a random value when first created.
• Mutate to a new value when asked to do so.
• Stay fixed at that value all other times (as a constant).
In ECJ, ERCs are usually subclasses of ec.gp.ERC, an abstract superclass which provides basic functionality. For example, if we were building a subclass of ERC which represents a floating point numerical constant, we might add a single instance variable to hold its value:
package ec.app.myapp;
import ec.gp.*;
public class MyERC extends ERC {
public double value;
// other methods go here…
}
We’ll also probably need to implement some or most of the following methods to modify, read, write, or compare the ERC.
public String name();
public boolean nodeEquals(GPNode node);
public int nodeHashCode();
public void resetNode(EvolutionState state, int thread);
public void mutateERC(EvolutionState state, int thread);
public String toStringForHumans();
public String encode();
public boolean decode(DecodeReturn ret);
public void readNode(EvolutionState state, DataInput input) throws IOException
public void writeNode(EvolutionState state, DataOutput output) throws IOException
public abstract void eval(EvolutionState state, int thread, GPData input,
Let’s go through these in turn:
public String name();
ADFStack stack, GPIndividual individual, Problem problem);
This is not the default GPNode implementation. When an ERC prints itself out in various ways, it writes its name, then its value. By default the name is simply “ERC”, and if you have only one ERC type, you don’t have to override it. For example, an ERC printed out might be ERC[3.14159]. However if you have more than one class of ERCs, holding different kinds of values, you’ll want to distinguish them both for humans and for ECJ to read back in again. To do this, override name() to return a unique symbol for each kind of ERC. For example, you might just return “ERC1” versus “ERC2”, resulting in ERC1[3.14159] versus ERC2[921].
public boolean nodeEquals(GPNode node);
public int nodeHashCode();
Override the first method to test for equality with the second node: it’s the same kind of ERC, the same class, has the same values, etc. Override the second method to provide a hash code for the ERC based on its type and the values it contains. By default you can avoid implementing this second method, and just implement the encode() method, discussed later. The default version of nodeHashCode() calls encode() and then hashes the String.
public void resetNode(EvolutionState state, int thread);
public void mutateERC(EvolutionState state, int thread);
171
Override the first method to entirely randomize the value of the ERC. Override the second method to mutate the value of the ERC when called to do so by the MutateERCPipeline. By default, mutateERC(…) just calls resetNode(…), which is probably not what you want. Instead the mutation will likely need to be a small deviation from the current value.
public String toStringForHumans();
public String encode();
public boolean decode(DecodeReturn ret);
public void readNode(EvolutionState state, DataInput input) throws IOException
public void writeNode(EvolutionState state, DataOutput output) throws IOException
As usual, override toStringForHumans() to provide a pretty version of the ERC for human consumption, used by GPNode’s printer functions. The default version just calls toString(). You probably want to write something prettier. The encode() and decode(…) methods are supposed to use the Code package (Section 2.2.3) to encode and decode the ERC in a reasonable fashion.
Finally, the readNode(…) and writeNode(…) methods, as usual, read and write the ERC in binary fashion. You only need to implement these methods if you’re planning on writing over the network (such as using distributed evaluation or island models). But they’re easy so why not? And of course, there’s the method to actually execute the ERC as code. This typically returns the ERC’s internal value (it’s a constant after all):
public abstract void eval(EvolutionState state, int thread, GPData input,
ADFStack stack, GPIndividual individual, Problem problem);
Note that ERCs by default override the expectedChildren() method to return 0 (they are usually terminals). If for some reason your ERC is not a terminal, you’ll want to override this method to return the right number of children.
Example Let’s create an ERC and add it to our existing example from Section 5.2.6. We’ll make an ERC which represents constants between 0.0 and 1.0, not including 1.0. Our mutator will add a little Gaussian noise to the node. Here’s a full class:
package ec.app.myapp;
import ec.gp.*;
public class MyERC extends ERC {
public double value;
public String toStringForHumans() { return “” + value; }
public String encode() { return Code.encode(value); }
public boolean decode(DecodeReturn ret) {
int pos = dret.pos;
String data = dret.data;
Code.decode(dret);
if (dret.type != DecodeReturn.T_DOUBLE) // uh oh! Restore and signal error.
{ dret.data = data; dret.pos = pos; return false; }
value = dret.d
return true;
}
public boolean nodeEquals(GPNode node)
{ return (node.getClass() == this.getClass() && ((MyERC)node).value == value); }
public void readNode(EvolutionState state, DataInput input) throws IOException
{ value = dataInput.readDouble(); }
public void writeNode(EvolutionState state, DataOutput output) throws IOException
172
{ dataOutput.writeDouble(value); }
public void resetNode(EvolutionState state, int thread)
{ val = state.random[thread].nextDouble(); }
public void mutateNode(EvolutionState state, int thread) {
double v;
do v = value + state.random[thread].nextGaussian() * 0.01;
while( v < 0.0 || v >= 1.0 );
value = v;
}
public void eval(EvolutionState state, int thread, GPData input, ADFStack stack,
GPIndividual individual, Problem Problem)
{ ((MyData)data).val = value; }
}
Now let’s set up the parameters to use it. We’ll change the function set. Our ERC is a terminal so it takes no arguments: we’ll use nc0 as its constraints.
# Our Function Set
gp.fs.0 = ec.gp.GPFunctionSet
gp.fs.0.size = 6
gp.fs.0.func.0 = ec.app.myapp.X
gp.fs.0.func.0.nc = nc0
gp.fs.0.func.1 = ec.app.myapp.Y
gp.fs.0.func.1.nc = nc0
gp.fs.0.func.2 = ec.app.myapp.Mul
gp.fs.0.func.2.nc = nc2
gp.fs.0.func.3 = ec.app.myapp.Sub
gp.fs.0.func.3.nc = nc2
gp.fs.0.func.4 = ec.app.myapp.Sin
gp.fs.0.func.4.nc = nc1
gp.fs.0.func.5 = ec.app.myapp.MyERC
gp.fs.0.func.5.nc = nc0
… and we’re done.
5.2.10 Automatically Defined Functions and Macros
Automatically Defined Functions (ADFs) [4] are a standard way of creating some modularism in Genetic Programming. They define multiple trees in the GPIndividual and essentially define a function calling structure where certain trees can call other trees as subfunctions. ADFs are the primary reason why GPIndividual has multiple GPTrees.
The simplest kind of ADF is found in Figure 5.6. Here each ADF function is a terminal, and when it is evaluated, it simply evaluates the corresponding ADF tree, then returns the tree’s return value. The ADF function is a GPNode, an instance of ec.gp.ADF. It’s very rare to further specialize this class. Notice that calling is nested — an ADF can call another ADF and so on. However it’s not very common to have recursive calls because you’ll need to construct some kind of stopping base-case criterion to avoid infinite recursive loops.
ADFs add a two more parameters to the standard GPNode suite. Let’s say we’re adding a zero-argument ADF. Beyond the node constraints, we also need to specify which tree will be called when the ADF is executed; and also a simple name for the ADF to distinguish it from other ADFs and GPNodes. Ideally this name should only have lowercase letters, numbers, and hyphens (that is, “Lisp-style”):
173
GP ADF ADF Tree 0 1
progn3
move
ADF 1
Root
ADF 0
if-food- Root ahead
progn2 Root
left
if-food- ahead
ADF 1
progn2 move
ADF 1
left progn2
move
move
Figure 5.6 A GPIndividual with two no-argument (terminal) ADFs. The primary GP Tree has functions in its function set that can call the other two trees; in turn the ADF 1 tree has a function in its function set that can call the ADF 2 tree. The return values of the various ADF trees become the return values of their respective calling functions.
gp.fs.0.func.5 = ec.gp.ADF
gp.fs.0.func.5.nc = nc0
gp.fs.0.func.5.tree = 1
gp.fs.0.func.5.name = ADF0
It’s traditional that the first tree be called ADF0, the second ADF1, and so on. Since typically the first GP Tree in the array is the “main” tree, and the second tree is ADF0, this means that ADF0’s associated tree number is usually 1, and ADF1’s associated tree number is usually 2. If you don’t specify a name, ECJ will maintain this tradition by setting the name to “ADF” + (tree number − 1), which is usually right anyway. It’ll also issue a warning.
The name of the ADF (in this case “ADF1”) and its associated GP tree (in this case, tree 1) are stored in the ec.gp.ADF class like this:
public int associatedTree;
public String name;
The name() method returns the value of the name variable.
ADFArguments override the checkConstraints(…) method to do a lot of custom advanced checking. If you override this, be sure to call super().
We’ll show how to set up the ADF tree itself in the Example below.
ADFs can also have arguments to the functions. In Figure 5.7, we have an ADF with two arguments. The way this works is as follows: when an ADF function is called, we first evaluate its children, then hold their return values in storage. We then call the corresponding ADF tree. In that tree there may be one or more special terminal GPNodes, two different kinds of instances of ec.gp.ADFArgument. One group of instances, when evaluated, will return the value of the first child. The second group return the value of the second child. This enables the ADF tree to use arguments in its “function call” so to speak.
174
GP Tree
sin
ADF 0
Root
ADF 0
+
Root
cos
–
sin
ARG 1
–
sin
x
ARG 2
x sqrt x
x
Figure 5.7 A GPIndividual with one 2-argument ADF. The primary GP Tree has a function in its function set that can call the ADF tree. This function first evaluates its children, then executes the ADF tree. In the ADF tree there are two terminal functions (ARG 1 and ARG 2) which, when evaluated, return the values of the two children respectively. The return value of the ADF tree becomes the return value of the ADF function.
ADFArguments add one additional parameter: the child number associated with the argument. For example:
gp.fs.1.func.6 = ec.gp.ADFArgument
gp.fs.1.func.6.nc = nc0
gp.fs.1.func.6.arg = 0
gp.fs.1.func.6.name = ARG0
If you don’t specify a name, ECJ will set the name to “ARG” + arg number, which is usually right anyway. It’ll also issue a warning.
The ADFArgument’s name and argument number are stored in the ec.gp.ADFArgument class as: public int argument;
public String name;
Again, the name() method returns the value of the name variable.
ADFArguments are always terminals, and so they override the expectedChildren() method to return 0. ECJ also supports Automatically-Defined Macros (or ADMs), described in [20]. These differ from ADFs
only in when the children are evaluated. When an ADM node is evaluated, its children are not evaluated first; rather the ADM immediately calls its associated tree. When an argument node in that tree (again, a terminal) is evaluated, we teleport back to the associated child and evaluate it right then and there, then return its value. Note that this means that children may never be evaluated; or can be evaluated multiple times, as shown in Figure 5.8.
ADMs are just like ADFs in their parameters:
175
GP Tree
progn3 Root
ADM 0
progn2 Root
ADM 0
left
move
move left
right
ARG 1
if-food- ahead
progn2
progn2
ARG 1
ARG 2
move
Figure 5.8 A GPIndividual with one 2-argument ADM. The primary GP Tree has a function in its function set that can call the ADM tree. This function delays the evaluation of its its children, and immediately executes the ADM tree. In the ADM tree there are two terminal functions (ARG 1 and ARG 2) which, when evaluated, evaluate (or, as necessary, re-evaluate) the original children to the ADM function and then return their values. Notice that in this example child # 1 may be evaluated twice, and child #2, depending on whether there’s food ahead, may be never evaluated.
gp.fs.0.func.6 = ec.gp.ADM
gp.fs.0.func.6.nc = nc2
gp.fs.0.func.6.tree = 1
# This will be called “ADM1”
gp.fs.0.func.6.name = ADM1
If an ADF or ADM tree has arguments, it probably will require its own separate GPTreeConstraints, because it needs to have its own GPFunctionSet with those arguments defined. See the Example below.
5.2.10.1 About ADF Stacks
ADFs and ADMs are a bit complex. To do their magic, they need a special object called an ec.gp.ADFStack. This is actually two stacks of ec.gp.ADFContext objects which store the location and current return values of various children. These classes are almost never overridden: here’s the standard (default) parameters for them:
gp.problem.stack = ec.gp.ADFStack
gp.adf-stack.context = ec.gp.ADFContext
You could have defined it this way… eval.problem.stack = ec.gp.ADFStack
eval.problem.stack.context = ec.gp.ADFContext
… but there are advantages in using the default parameters, particularly when getting to Grammatical Evolution (Section 5.3).
176
Example ADFs and ADMs are fairly straightforward to implement but they can require a fair number of parameters. Continuing with the example in started in Section 5.2.6 and extended in Section 5.2.9, let’s add a 2-argument ADF to the individual. This will require adding a second GPTree and its own GPTreeConstraints.
Let’s begin by modifying the GPFunctionSet of the original tree to include this ADF:
# Our Function Set
gp.fs.0 = ec.gp.GPFunctionSet
gp.fs.0.size = 7
gp.fs.0.func.0 = ec.app.myapp.X
gp.fs.0.func.0.nc = nc0
gp.fs.0.func.1 = ec.app.myapp.Y
gp.fs.0.func.1.nc = nc0
gp.fs.0.func.2 = ec.app.myapp.Mul
gp.fs.0.func.2.nc = nc2
gp.fs.0.func.3 = ec.app.myapp.Sub
gp.fs.0.func.3.nc = nc2
gp.fs.0.func.4 = ec.app.myapp.Sin
gp.fs.0.func.4.nc = nc1
gp.fs.0.func.5 = ec.app.myapp.MyERC
gp.fs.0.func.5.nc = nc0
gp.fs.0.func.6 = ec.gp.ADF
gp.fs.0.func.6.nc = nc2
gp.fs.0.func.6.tree = 1
gp.fs.0.func.6.name = ADF1
Let’s create a second function set for our second (ADF) tree. This set will have all the same functions as the main tree, except for the ADF function (we don’t want to call ourselves recursively!) Instead we’ll add two ADFArgument nodes to represent the two children.
177
gp.fs.size = 2
# Our Second Function Set
gp.fs.1 = ec.gp.GPFunctionSet
gp.fs.1.name = f1
gp.fs.1.size = 8
gp.fs.1.func.0 = ec.app.myapp.X
gp.fs.1.func.0.nc = nc0
gp.fs.1.func.1 = ec.app.myapp.Y
gp.fs.1.func.1.nc = nc0
gp.fs.1.func.2 = ec.app.myapp.Mul
gp.fs.1.func.2.nc = nc2
gp.fs.1.func.3 = ec.app.myapp.Sub
gp.fs.1.func.3.nc = nc2
gp.fs.1.func.4 = ec.app.myapp.Sin
gp.fs.1.func.4.nc = nc1
gp.fs.1.func.5 = ec.app.myapp.MyERC
gp.fs.1.func.5.nc = nc0
gp.fs.1.func.6 = ec.gp.ADFArgument
gp.fs.1.func.6.nc = nc0
gp.fs.1.func.6.arg = 0
gp.fs.1.func.6.name = ARG0
gp.fs.1.func.7 = ec.gp.ADFArgument
gp.fs.1.func.7.nc = nc0
gp.fs.1.func.7.arg = 1
gp.fs.1.func.7.name = ARG1
Now we create a new GPTreeConstraints which uses this function set:
gp.tc.size = 2
# Our Second Tree Constraints
gp.tc.1 = ec.gp.GPTreeConstraints
gp.tc.1.name = tc1
gp.tc.1.fset = f1
gp.tc.1.returns = nil
gp.tc.1.init = ec.gp.koza.HalfBuilder
Next we add the second tree to the GPIndividual:
pop.subpop.0.species.ind.numtrees = 2
pop.subpop.0.species.ind.tree.1 = ec.gp.GPTree
pop.subpop.0.species.ind.tree.1.tc = tc1
The ADF stack and ADF context were already defined in the previous examples, but we’ll do it again here for clarity:
gp.problem.stack = ec.gp.ADFStack
gp.adf-stack.context = ec.gp.ADFContext
…and we’re done!
An important note: because the main GP Tree and the ADF have different function sets and thus different
GPTreeConstraints, standard GP Crossover (see Section 5.2.5) won’t cross over the ADF tree with a main tree of some other individual or vice versa. But if, for example, your GPIndividual had two ADF trees that
178
had the same GPTreeConstraints, they could get crossed over with arbitrary other ADF trees in another GPIndividuals.
5.2.11 Strongly Typed Genetic Programming
Sometimes it’s useful to constrain which GPNodes may serve as children of other GPNodes, or the root of the GPTree. For example, consider a GPNode called (if test then else). This node evaluates test, and based on its result (true or false) it either evaluates and returns then or else. Let’s presume that then and else (and if) return doubles. On the other hand, test is intended to return a boolean. So you’ll need to have some GPNodes in your function set which return doubles and others which return booleans; the (if …) node itself returns a double.
The problem is not that you have nodes which return different values — this is easily handled by hacking your GPData object. The problem is that you now have constraints on valid tree structures: you can’t plug a node which returns a double (say, (sin …)) into the test slot of your (if …) node, which is expecting a boolean.6 This is where strong typing comes in.
ECJ’s typing system is simple but sufficient for many common uses. It’s not as sophisticated as a full polymorphic typing system but it also doesn’t have the hair-raising complexity that such a system requires. ECJ is complex enough as it is thank you very much!
ECJ’s system is based on type objects, subclasses of the abstract superclass ec.gp.GPType, and nodes are allowed to connect as parent and child if their corresponding type objects are type compatible. ECJ’s type objects of two kinds: atomic types and set types. An atomic type is just a single object (in fact, it’s theoretically just a symbol, or in some sense, an integer). A set type is a set of atomic types. Type compatibility is as follows:
• Two atomic types are compatible if they are the same.
• A set type is compatible with an atomic type if it contains the atomic type in its set. • Two set types are compatible if their intersection is nonempty.
Every GPNode is assigned a type object to represent the “return type” of the GPNode. Furthermore every nonterminal GPNode is assigned a type object for each of its children: this is called the “child type” or “argument type” or that particular child slot. Last, the GPTree itself is assigned “root type”: a type for the root of the tree. Each GPTree in a GPIndividual can have a different root type. Here’s what must be true
about any given GPTree.
• For any parent and child in a tree, with the child in slot C, the return type of the child must be
compatible with the child type of the parent for slot C.
• The return type of the root GPNode must be compatible with the GPTree’s root type. This ensures that
the tree is returning
You can see an example of a GPTree with type constraints listed in Figure 5.9.
Every GPNodeBuilder and GP Breeding Pipeline must maintain the constraints guaranteed by typing. The issue is guaranteeing that if you replace one GPNode with another, that the second GPNode will be legal. This is done primarily with the following two utility functions:
ec.gp.GPNode Methods
public GPType parentType(GPInitializer initializer)
If the GPNode’s parent is another GPNode, this returns the type of the parent’s child slot presently filled by the GPNode. If the GPNode’s parent is a GPTree, this returns the type of the tree’s root.
6Well, you could if you assumed that 0.0 was false and anything else was true. But this is a hack. The right way to do it is to constrain things properly
179
tree
int, float float
if
bool
bool bool
tick>
int int
float
bool
int, float int, float
float
and
*
int, float int
6
2.3
bool bool
on- wall
int, float float
ir
int int
20 3
Figure 5.9 A typed genetic programming parse tree. Each edge is labeled with two types. The “lower” type is the return type of the child node. The “upper” type is the type of the argument slot of the parent. For the child to fit into that particular argument slot, the types must be compatible. Types of the form “int, float” are set types. All others are atomic types. Note that the root of the tree plugs into a particular slot of the “tree” object (ec.gp.GPTree) , which itself has a slot type. A repeat of Figure 1.4.
public final boolean swapCompatibleWith(GPInitializer initializer, GPNode node)
Returns true swapping the GPNode into the slot presently occupied by node is acceptable, type-wise.
Before you can assign types, you’ll need to define them. Each type is given a unique symbol. As an example, let’s begin by creating two atomic types, called (uninterestingly) “boolean” and “nil” (we say “nil” instead of “double” so we don’t have to redefine the GPTreeConstraints and GPNodeConstraints we defined earlier, which all use “nil” as their types). We’d say:
gp.type.a.size = 2
gp.type.a.0.name = boolean
gp.type.a.1.name = nil
We might also define a set type or two. For fun, let’s create a set type which contains both booleans and doubles. We’ll also have to stipulate the atomic types encompassed by the set type:
gp.type.s.size = 1
gp.type.s.0.name = boolean-or-nil
gp.type.s.0.size = 2
gp.type.s.0.member.0 = boolean
gp.type.s.0.member.1 = nil
What’s the point of set types, you might ask. Why not just atomic types? Set types are particularly useful for describing situations where a given GPNode can adapt to several different kinds of children in a given
180
slot. This is particularly useful for simulating notions of subtyping or subclassing. For example, a GPNode like (sin … ) might declare that its child can either be an “integer” or a “double”, by having its child type defined as a set type of “number” which encapsulates both integers and doubles.
What this typing facility cannot do is dynamically change types as necessary. For example, you cannot say that a GPNode like (+ … …) returns a double if either of its children is of type double, but if both of its children are of type integer, then it returns an integer.
You are also restricted to a finite number of types. For example, consider a GPNode called (matrix-multiply … …) which takes two children which return matrices. You cannot say that if the left child is an MxN matrix, and your right child is an NxP matrix, that the return type will be an MxP matrix. This is partly because you can’t define dynamic typing, but it’s also because M, N, and P can be any of an infinite number of numbers, resulting in an infinite number of types: specifying an infinite number of types would be hard on your fingers. There exist polymorphic typing systems for genetic programming but they’re fairly bleeding-edge. If you need things like this, I suggest instead to look at Grammatical Evolution (Section 5.3).
Example Let’s add some typing to the example we started in Section 5.2.6 and continued in Sections 5.2.9 and 5.2.10. We’ll add a boolean type as before, and a few functions which rely on it.
First the boolean type:
gp.type.a.size = 2
gp.type.a.0.name = boolean
gp.type.a.1.name = nil
gp.type.s.size = 1
gp.type.s.0.name = boolean-or-nil
gp.type.s.0.size = 2
gp.type.s.0.member.0 = boolean
gp.type.s.0.member.1 = nil
Next we need to say what kinds of nodes are permitted as the root of the tree. We do this by specifying the tree’s return type. For a node to be permitted as the root of the tree, it must have a return type compatible with this tree return type. Let’s say that we want our tree to only have root nodes which have a return type of nil (no boolean allowed):
gp.tc.size = 1
gp.tc.0 = ec.gp.GPTreeConstraints
gp.tc.0.name = tc0
gp.tc.0.fset = f0
# Here we define the return type of this GPTreeConstraints
gp.tc.0.returns = nil
As it so happens, this is what we already have by default, so putting it all here is a bit redundant.
Next we’ll need to modify the node constraints for each GPNode, so we specify the return type of each of our nodes and also the expected child types of their child slots. For the node constraints, let’s add three new GPNodeConstraints:
• A function which returns a double (nil), has three children, and the first child needs to be a boolean. The other two children are nil. This would be for things like the (if … … …) node.
• A function which takes two booleans and returns a boolean. This would be for functions like (nand … …).
• A function which takes two doubles and returns a boolean. This would be for functions like (> … …). These three new GPNodeConstraints would be:
181
gp.nc.size = 6
# … first come the original node constraints, then these:
# Example: (if … … …)
gp.nc.3 = ec.gp.GPNodeConstraints
gp.nc.3.name = nc3
gp.nc.3.returns = nil
gp.nc.3.size = 3
gp.nc.3.child.0 = boolean
gp.nc.3.child.1 = nil
gp.nc.3.child.2 = nil
# Example: (nand … …)
gp.nc.4 = ec.gp.GPNodeConstraints
gp.nc.4.name = nc2
gp.nc.4.returns = boolean
gp.nc.4.size = 2
gp.nc.4.child.0 = boolean
gp.nc.4.child.1 = boolean
# Example: (> … …)
gp.nc.5 = ec.gp.GPNodeConstraints gp.nc.5.name = nc2 gp.nc.5.returns = boolean gp.nc.5.size = 2
gp.nc.5.child.0 = nil gp.nc.5.child.1 = nil
Recall that our GPData looks like this:
package ec.app.myapp;
import ec.gp.*;
public class MyData extends GPData
{
public double val;
public GPData copyTo(GPData other)
{ ((MyData)other).val = val; return other; }
}
We could modify this to include a boolean data type as well, but we’ll just use the val variable to store both boolean and real-valued data. Before you go about that, re-read Section 5.2.3.1 (on GPData) and its discussion about clone() and copyTo(…).
Now let’s define our three new GPNodes. First our If-statement:
182
package ec.app.myapp;
import ec.*;
import ec.gp.*;
public class If extends GPNode {
public String toString() { return “if” };
public void eval(EvolutionState state, int thread, GPData input,
ADFStack stack, GPIndividual indivdiual, GPProblem problem) {
MyData data = (MyData) input;
children[0].eval(state, thread, data, stack, individual, prob);
if (data.val != 0.0) // true
children[1].eval(state, thread, data, stack, individual, prob);
else
children[2].eval(state, thread, data, stack, individual, prob);
// the result be stored in data
}
}
Next, the nand node:
package ec.app.myapp;
import ec.*;
import ec.gp.*;
public class Nand extends GPNode {
public String toString() { return “nand” };
public void eval(EvolutionState state, int thread, GPData input,
ADFStack stack, GPIndividual indivdiual, GPProblem problem) {
MyData data = (MyData) input;
children[0].eval(state, thread, data, stack, individual, prob);
boolean left = (data.val != 0.0);
children[1].eval(state, thread, data, stack, individual, prob);
boolean right = (data.val != 0.0);
data.val = !(left && right) ? 1.0 : 0.0;
}
}
Next, the > node:
package ec.app.myapp;
import ec.*;
import ec.gp.*;
public class GreaterThan extends GPNode {
public String toString() { return “>” };
public void eval(EvolutionState state, int thread, GPData input,
ADFStack stack, GPIndividual indivdiual, GPProblem problem) {
MyData data = (MyData) input;
children[0].eval(state, thread, data, stack, individual, prob);
double left = data.val;
children[1].eval(state, thread, data, stack, individual, prob);
double right = data.val;
data.val = (left > right) ? 1.0 : 0.0;
}
}
Notice that these functions hijack the double value to store boolean information. This is okay because we know that the recipient of this information will understand it. How do we know? Because the typing constraints have made it impossible to be otherwise.
183
So let’s add these functions to the function set of our main GP Tree:
# Our Main Tree Function Set
gp.fs.0 = ec.gp.GPFunctionSet
gp.fs.0.size = 10
gp.fs.0.func.0 = ec.app.myapp.X
gp.fs.0.func.0.nc = nc0
gp.fs.0.func.1 = ec.app.myapp.Y
gp.fs.0.func.1.nc = nc0
gp.fs.0.func.2 = ec.app.myapp.Mul
gp.fs.0.func.2.nc = nc2
gp.fs.0.func.3 = ec.app.myapp.Sub
gp.fs.0.func.3.nc = nc2
gp.fs.0.func.4 = ec.app.myapp.Sin
gp.fs.0.func.4.nc = nc1
gp.fs.0.func.5 = ec.app.myapp.MyERC
gp.fs.0.func.5.nc = nc0
gp.fs.0.func.6 = ec.gp.ADF
gp.fs.0.func.6.nc = nc2
gp.fs.0.func.6.tree = 1
gp.fs.0.func.6.name = ADF1
gp.fs.0.func.7 = ec.app.myapp.If
gp.fs.0.func.7.nc = nc3
gp.fs.0.func.8 = ec.app.myapp.Nand
gp.fs.0.func.8.nc = nc4
gp.fs.0.func.9 = ec.app.myapp.GreaterThan
gp.fs.0.func.9.nc = nc5
…and we’re done!
Mixing ADF and ADMs and Typed GP A quick note. The return type of an ADF node must match the root type of its corresponding ADF tree. Additionally, the child type of a certain slot in an ADF node must match the return type of the corresponding ADFArgument.
5.2.11.1 Inside GPTypes
If you want to create a GPNodeBuilder or a GP Breeding Pipeline, you ought to go in more detail about GPTypes.
ec.gp.GPType is an abstract superclass of ec.gp.GPAtomicType and ec.gp.GPSetType, which define the atomic and set types respectively. The two basic data elements in a GPType are:
public String name;
public int type;
The first variable, like GPFunctionSet, GPTreeConstraints, and GPNodeConstraints, holds the name of the type (as defined in the parameters). The second variable holds a uniquely-assigned integer for this type. The important feature for types is to determine whether they are type-compatible with one another. The compatibility function is this:
ec.gp.GPType Methods
public boolean compatibleWith(GPInitializer initializer, GPType type)
Returns true if the return type of this GPNode is type-compatible with the given type.
184
GPAtomicTypes are simple: they are compatible with one another if their type integer is the same. A GPSetType instead is a set of GPAtomicTypes, stored in different ways for query convenience:
public Hashtable types h; public int[] types packed; public boolean[] types sparse;
The first is the GPAtomicTypes in the set stored in a Hashtable. The second is an array of the GPAtomic- Types. And the third is an array of booleans, one for each GPAtomicType number, which is true if that GPAtomicType is a member of the set.
A GPSetType is compatible with a GPAtomicType if the GPSetType contains the GPAtomicType as an element. Two GPSetTypes are compatible with one another if their intersection is nonempty.
5.2.12 Parsimony Pressure (The ec.parsimony Package)
Genetic programming has a serious bloat problem: as evolution progresses, the size of the trees inside the population tend to grow without bound. This is a problem that exists for various arbitrary-length representations (lists, graphs, rulsests, etc.) but genetic programming has studied it the most.
The most common simple way of keeping trees down is to make it illegal to produce a tree larger than a certain depth. For example, Koza’s standard rules, adhered to by the basic parameters in ECJ, stipulate that crossover and mutation operators may not produce a child which is deeper than 17 nodes [3], for example:
gp.koza.xover.tries = 1
gp.koza.xover.maxdepth = 17
Here if the crossover operation produces a child greater than 17, it is not forwarded on; but rather its (presumably smaller) parent is forwarded on in its stead. This is a fairly crude approach, but it’s fairly effective. Another approach — which can be done at the same time — is to modify the selection operator to favor smaller individuals. This notion is called parsimony pressure.
In the ec.parsimony package, ECJ has several SelectionMethods which select both based on fitness and on size (smaller size being preferred). These methods compute size based on GPNode’s size() function. Many of these selection methods were compared and discussed at length in [10]. Here they are:
• ec,parsimony.LexicographicTournamentSelection is a straightforward TournamentSelection operator, ex- cept that the fitter Individual is preferred (as usual), but when both Individuals are the same fitness, the smaller Individual is preferred. Parameters for this operator are basically the same as for Tourna- mentSelection.
Let us presume that the SelectionMethod is the first source of the pipeline of Subpopulation 0. Then the basic parameters are the same as in TournamentSelection:
pop.subpop.0.species.pipe.source.0 = ec.parsimony.LexicographicTournamentSelection
pop.subpop.0.species.pipe.source.0.size = 7
pop.subpop.0.species.pipe.source.0.pick-worst = false
Or using the default parameters:
pop.subpop.0.species.pipe.source.0 = ec.parsimony.LexicographicTournamentSelection
select.lexicographic-tournament.size = 7
select.lexicographic-tournament.pick-worst = false
The problem with this method is that bloat control only comes into effect for problems where lots of fitness ties occur. This problem lead us to two modifications of the basic idea:
• ec,parsimony.BucketTournamentSelection is like LexicographicTournamentSelection, except that first individuals are placed into N classes (“buckets”) based on fitness. The subpopulation is first sorted
185
by fitness. Then the bottom subpopulation size individuals are placed in the worst bucket, plus any
individuals remaining in the subpopulation with the same fitness as the best individual in that bucket. Next the bottom remaining subpopulation size are placed in the second worst bucket, plus any
individuals remaining in the population with the same fitness as the best individual in that bucket. This continues until all the individuals are exhausted. BucketTournamentSelection then works like LexicographicTournamentSelection except that instead of comparing based on fitness, it compares based on the bucket the individual is in. The number of buckets is defined by num-buckets:
pop.subpop.0.species.pipe.source.0 = ec.parsimony.BucketTournamentSelection
pop.subpop.0.species.pipe.source.0.size = 7
pop.subpop.0.species.pipe.source.0.pick-worst = false
pop.subpop.0.species.pipe.source.0.num-buckets = 10
Or using the default parameters:
pop.subpop.0.species.pipe.source.0 = ec.parsimony.BucketTournamentSelection
select.bucket-tournament.size = 7
select.bucket-tournament.pick-worst = false
select.bucket-tournament.num-buckets = 10
• ec,parsimony.ProportionalTournamentSelection is like TournamentSelection, except that it either selects based on fitness or selects based on size. It determines which one to do by flipping a coin of a certain probability that fitness will be used. The parameters are:
pop.subpop.0.species.pipe.source.0 = ec.parsimony.ProportionalTournamentSelection
pop.subpop.0.species.pipe.source.0.size = 7
pop.subpop.0.species.pipe.source.0.pick-worst = false
pop.subpop.0.species.pipe.source.0.fitness-prob = 0.9
Or using the default parameters:
pop.subpop.0.species.pipe.source.0 = ec.parsimony.ProportionalTournamentSelection
select.bucket-tournament.size = 7
select.bucket-tournament.pick-worst = false
select.bucket-tournament.fitness-prob = 0.9
• ec,parsimony.DoubleTournamentSelection is actually two TournamentSelections in a row. In short, we do a TournamentSelection of tournament size N based on fitness: but the entrants to that tournament are not chosen uniformly at random from the subpopulation, but rather are the winners of N other tournament selections, each performed based on size. Alternatively, we can first do tournament selections on fitness, then have a final tournament on size.
Thus there are roughly twice as many parameters: ones describing the final tournament, and ones describing the initial (“qualifying”) tournaments
N
N
186
5.3
The Tarpiean method isn’t a selection procedure: it’s a fitness assignment procedure. As such it’s not implemented as a SelectionMethod but rather as a Statistics subclass which hooks into the evolutionary loop prior to evaluation.
Let’s say that TarpieanStatistics is the only child of our primary Statistics object. The parameters would look like this:
stat.num-children = 1
stat.child.0 = ec.parsimony.TarpeianStatistics
stat.child.0.kill-proportion = 0.2
Grammatical Evolution (The ec.gp.ge Package)
pop.subpop.0.species.pipe.source.0 = ec.parsimony.DoubleTournamentSelection
# Final tournament
pop.subpop.0.species.pipe.source.0.size = 2
pop.subpop.0.species.pipe.source.0.pick-worst = false
# Qualifying tournaments
pop.subpop.0.species.pipe.source.0.size2 = 2
pop.subpop.0.species.pipe.source.0.pick-worst2 = false
# Make the qualifying tournament based on size
pop.subpop.0.species.pipe.source.0.do-length-first = true
Or using the default parameters:
pop.subpop.0.species.pipe.source.0 = ec.parsimony.DoubleTournamentSelection
# Final tournament
select.double-tournament.size = 7
select.double-tournament.pick-worst = false
# Qualifying tournaments
select.double-tournament.size2 = 7
select.double-tournament.pick-worst2 = false
# Make the qualifying tournament based on size
select.double-tournament.do-length-first = true
• ec,parsimony.TarpeianStatistics implements the “Tarpiean” parsimony pressure method [15]. This method identifies the individuals in the subpopulation with above-average size. Notice that this may not be half the subpopulation: it could be a very small number if they are very large and the others are very small. Then a certain proportion of these individuals, picked randomly, are assigned a very bad fitness, and their evaluated flags are set. This happens before evaluation, so the evaluation procedure doesn’t bother to evaluate those individuals further.
Grammatical Evolution (GE) [19] is an approach to building genetic programming trees using a grammar interpreted by a list representation. The trees are then evaluated and tested. In ECJ the procedure for evaluating Individuals in GE works roughly like this.
• Therepresentationisanarbitrarilylonglistofints(SeeSection5.1.2),implementedasaspecialsubclass of IntegerVectorIndividual called ec.gp.ge.GEIndividual.
187
• The Fitness is typically a KozaFitness.
• The GEIndividual’s species is a ec.gp.ge.GESpecies.
• TheGESpeciesholdsagrammar,loadedfromafileandproducedviaanec.gp.ge.GrammarParser,which we will interpret according to the ints in the Individual.
• To assess the fitness of a GEIndividual, we hand it to the GESpecies which interprets it according to the grammar, producing a GPIndividual.
• The GPIndividual is then evaluated in the normal ECJ fashion and its Fitness is set.
• We then transfer the Fitness to the GEIndividual.
We’d like to use plain-old GP test problems to make this as easy as possible. To pull this off we have to do a few minor hacks. First, we must insert the GE conversion process in-between the Evaluator and the GPProblem that the user wrote. To do this we have a special Problem class called ec.gp.ge.GEProblem, which is assigned as the Evaluator’s problem. The GPProblem is then set up as a subsidiary of the GEProblem. When the Evaluator wishes to evaluate an IntegerVectorIndividual, it calls the GEProblem, which converts it to a GPIndividual and then hands the GPIndividual to the GPProblem to evaluate. This is done as follows:
eval.problem = ec.gp.ge.GEProblem
eval.problem.problem = ec.app.myproblem.MyGProblem
Note that GPProblems usually require auxiliary parameters (not the least of which is their GPData), so we’ll need to also say things like…
eval.problem.problem.data = ec.app.myproblem.MyGPData
… etc. You don’t need to define the ADFStack or ADFContext specially because the default parameters for them suffice:
gp.problem.stack = ec.gp.ADFStack
gp.adf-stack.context = ec.gp.ADFContext
5.3.1 GEIndividuals, GESpecies, and Grammars
The reason we use GEIndividual instead of just IntegerVectorIndividual is simple: when we print out the GEIndividual we wish to print out the equivalent GPIndividual as well. That’s the only thing GEIndividual does beyond being just a IntegerVectorIndividual.
The GEProblem doesn’t actually do the translation from list to tree. Instead, it calls on the GESpecies to do this dirty work. First, let’s set the GESpecies, GEIndividual, GrammarParser, and Fitness of Species 0:
pop.subpop.0.species = ec.gp.ge.GESpecies
pop.subpop.0.species.parser = ec.gp.ge.GrammarParser
pop.subpop.0.species.ind = ec.gp.ge.GEIndividual
pop.subpop.0.species.fitness = ec.gp.koza.KozaFitness
GESpecies requires a grammar file for each tree in the GPIndividual. If your GPIndividual has two trees (say), you’ll need two grammar files, which are specified like this:
pop.subpop.0.species.file.0 = foo.grammar
pop.subpop.0.species.file.1 = bar.grammar
Alternatively you can use the default parameter base: 188
ge.species.file.0 = foo.grammar
ge.species.file.1 = bar.grammar
Grammar files are text files consisting of lines. Each line can be blank whitespace (which is ignored), a comment (which starts with a #, just like parameter files), or a grammar rule. A grammar rule has a head, followed by whitespace, followed by the string “::=”, then more whitespace, and finally a body. Here is an example rule:
In this example,
Terminal (leaf-node) GPNodes are defined as S-expressions with no children. For example, (yo) is a termi- nal GPNode whose name is yo. And a GPNode can have more than one child, such as (whoa
symbol in a head of a rule. However you can have multiple such heads:
This says the exact same thing as the previous rule example. You are free to use pipe symbols or create separate rules with the same head, or mix and match the two. You can also have plain angle-bracket symbols in the body like this:
This says that
The head of the first rule in the file is the entry point of the grammar and represents the root node of the GPTree. In all of our examples we have called it
5.3.1.1 Strong Typing
At this point we don’t see a lot of power. But Grammatical Evolution can define all sorts of typing require- ments by specifying which symbols appear where in rules. For example, we could extend this grammar to handle the Strongly-Typed Genetic Programming example shown in Section 5.2.11:
7Traditionally, Gramamtical Evolution does not restrict the language for which the grammar is designed: you could plausibly have the system output a C program, for example. However to maintain compatibility with ECJ’s GP package, and thus greatly simplify the complexity of the system, ECJ’s GE system only outputs in the pseudo-lisp form typical of GP. There’s no loss of generality however.
189
If you have a grammar like this, you’d imagine it would need to be accompanied by GPNodes that have been appropriately typed in a strongly typed context. But it’s not true. Recall that strong typing is intended to provide constraints for building trees, mutating them, and crossing them over using traditional GP methods. But since Grammatical Evolution is doing all of this based on a list of ints, GPTypes serve no function.
As a result, you should just have a single GPType even in a “strongly typed” example such as the one above. You can have more if you like, but it serves no purpose and may trip you up.
5.3.1.2 ADFs and ERCs
Each GPNode in the grammar is looked up according to its name. Thus if you have a grammar element called (sin…), you’ll need to have a GPNode whose name() method returns sin. This goes for ERCs and ADFs as well. Typically the name() of an ERC is simply ERC, the name() of an ADF is something like ADF1, and the name() of an ADFArgument is often something like ARG0. Thus you might have a grammar that looks like:
Then a second grammar file for ADF1 might have something like:
5.3.2 Translation and Evaluation
Translation is done as follows. GESpecies is handed a GEIndividual, which is little more than an arbitrary- length IntegerVectorIndividual. You should set this GEIndividual to have maximal min-gene and max-gene values, so it can have any of all 256 possible settings per gene:
pop.subpop.0.species.min-gene = -128
pop.subpop.0.species.max-gene = 127
GESpecies starts working through the int array and the grammar, depth-first. Each time GESpecies comes across a point in the grammar where multiple expansions are possible, it consults the next int in the int array. Let’s say there are four possible expansions of the head
This continues until one of two things happens. Either the tree (or tree forest) is completed, or we run out of ints. In the first case, the tree is finished. In the second case, processing continues again at the first int (so-called wrapping). If after some number of wrapping iterations the tree is still not completed, the translation process is halted and the fitness is simply set to the worst possible Koza Standardized Fitness (Double.MAX VALUE).
You specify the number of wrappings using the following parameter:
190
pop.subpop.0.species.passes = 4
or alternatively
ge.species.passes = 4
This tells ECJ that it may pass through the genome no more than 4 times (including the original pass). At present this value must be a power of two, and should be less than MAXIMUM PASSES, presently set to 1024. In fact you should make it much smaller than this, perhaps 8 or 16, because with large genomes, if you have many passes, the recursion stack depth of the tree generator can be easily exceeded, which will throw an error internally and eventually result in (once again) the fitness being set to Double.MAX Value.
This scheme allows ECJ to do both of the common GE approaches to handling over-size genomes: either killing them immediately or wrapping around some N number of times. I prefer the first approach, and so by default have passes set to 1.
At any rate, once a tree or tree forest is completed, we record the position in the int array where we had stopped. If there are more trees yet to produce for the GPIndividual (for ADFs for example), we build the next one starting at that position, and so on, until the GPIndividual is completed. Then we send the GPIndividual to the GPProblem to be evaluated, and after evaluation the GEIndividual’s fitness is set to the GPIndividual’s fitness, the GEIndividal’s evaluated flag is set to the GPIndividual’s flag, and evaluation of the GEIndividual is now done.
To do the translation, GESpecies relies on certain methods that may be useful to you:
ec.gp.ge.GESpecies Methods
public int makeTree(EvolutionState state, int[] genome, GPTree tree, int position, int treeNum, int threadnum,
HashMap ERCmap)
Builds a GPTree from the genome of a GEIndividual. The tree is stored in tree. The tree number (and thus the particular grammar to be used) is defined by treeNum. Ints are read from the GEIndivdidual’s genome starting at position. When the tree has been generated, the first unread int position is returned. If the tree could not be built because there were not enough ints, then ec.gp.ge.GESpecies.BIG TREE ERROR is returned instead. If ERCmap is non-null, then as the tree is being built, any ERCs appearing the tree are registered in the ERCmap as the key-value pair ⟨Gene Value −→ ERC⟩ where Gene Value is the gene value used (an int), and ERC is the a ec.gp.ERC object which has the same value as the actual ERC object used. You should treat this map as read-only. If you don’t care about any of this, just pass in null.
public int makeTrees(EvolutionState state, int[] genome, GPTree[] trees, int threadnum, HashMap ERCmap)
Builds an entire array of GPTrees, sufficient to create an entire GPIndividual, from the genome of a GEIndividual. The trees are stored in trees. Ints are read from the GEIndivdidual’s genome starting at position 0. When the tree has been generated, the first unread int position is returned. If the trees could not be built because there were not enough ints, then ec.gp.ge.GESpecies.BIG TREE ERROR is returned instead. If ERCmap is non-null, then as the tree is being built, any ERCs appearing the tree are registered in the ERCmap as the key-value pair ⟨Gene Value −→ ERC⟩ where Gene Value is the gene value used (an int), and ERC is the a ec.gp.ERC object which has the same value as the actual ERC object used. You should treat this map as read-only. If you don’t care about any of this, just pass in null.
public int makeTrees(EvolutionState state, GEIndividual ind, GPTree[] trees, int threadnum, HashMap ERCmap)
Builds an entire array of GPTrees, sufficient to create an entire GPIndividual, from a GEIndividual, possibly performing wrapping. The trees are stored in trees. Ints are read from the GEIndivdidual starting at posi- tion 0. When the tree has been generated, the first unread int position is returned. If we have run out of ints and the trees are not completed, then processing continues at position 0 again. This is done some passes −1 times (specified in the individual’s species). If the trees have still not been completely built, then ec.gp.ge.GESpecies.BIG TREE ERROR is returned instead. If ERCmap is non-null, then as the tree is being built, any ERCs appearing the tree are registered in the ERCmap as the key-value pair ⟨Gene Value −→ ERC⟩ where Gene Value is the gene value used (an int), and ERC is the a ec.gp.ERC object which has the same value as the actual ERC object used. You should treat this map as read-only. If you don’t care about any of this, just pass in null.
191
public int consumed(EvolutionState state, GEIndividual ind, int threadnum)
Computes and returns the number of ints that would be consumed to produce a GPIndividual from the given GEIndividual, including wrapping as discussed in the previous method. This is done by actually building an individual, then throwing it away. If the GPIndividual could not be built because there were not enough ints, then ec.gp.ge.GESpecies.BIG TREE ERROR is returned instead.
Handling ERCs When the GESpecies needs to produce an ERC from the grammar, it consults the next int in the array. If the int is 27, it looks up 27 in a special hash table stored in GESpecies. If it finds that 27 has hashed to an existing ERC, it clones this ERC and uses the clone. Else, it creates a new ERC, calls reset() on it, stores it in the hash table with 27, clones it, and uses the clone. This way all ERCs whose int is 27 are the same value.
Handling ADFs and Multiple Trees Recall that the GESpecies maintains a separate grammar for each tree in the GPIndividual (and thus for each separate ADF). However there’s only a single int array in the GEIndividual. This is handled straightforwardly the array is used to build the first tree; then the unused remainder of the array is used to build the second tree; and so on. If not all trees were able to be built, evaluation of the GPIndividual is bypassed and the fitness is set to the (Double.MAX VALUE).
Grammatical Evolution Does Not Support GroupedProblemForm GroupedProblemForm (Section 7.1.2) evaluates several Individuals together. What if at least one of them is a GEIndividual, and the GEIndividual cannot generate a valid GPIndividual? This creates a number of hassles. For example, if we’re doing cooperative coevolution, and one Individual can’t be generated, what’s the fitness of the group? Is it fair to penalize the group thusly? Likewise if we’re doing competitive coevolution, and one of the competitors can’t be generated, what’s the resulting fitness of the other? For this reason, at present Grammatical Evolution does not support GroupedProblemForm.
5.3.3 Printing
GEIndividual prints itself (via printIndividualForHumans(…)) by first printing itself in the standard way, then translating and printing the GPTrees from equivalent GPIndividual, then finally printing the ERC mappings used by this GEIndividual in the GESpecies ERC hash table. For example, consider the GEIndividual below:
Evaluated: T
Fitness: Standardized=1.0308193 Adjusted=0.49241206 Hits=5
-96 122 -92 -96 -50 -96 122 122 -96 122 -92 -50 -50 -50 -50 111 -50 111 111 -50 -50 111 -50 111
Equivalent GP Individual:
Tree 0:
(* (+ (exp (* x (* (+ (+ (* (+ (exp x) x)
x) x) 0.4099041340133447) 0.25448855944201476)))
x) x)
ERCs: 111 -> 0.25448855944201476 -50 -> 0.4099041340133447
In this individual, two ERCs were used, 0.25448855944201476 and 0.4099041340133447, and they were associated with gene values 111 and -50 respectively. Note that in both cases the genome has these genes appearing multiple times: but not all these elements were used as ERCs: some were used as choice points or functions in the grammar, and others (near the end of the string) were ignored.
Individuals don’t have to have any ERCs at all of course. Here is a perfect Symbolic Regression individual:
192
Evaluated: T
Fitness: Standardized=0.0 Adjusted=1.0 Hits=20
112 80 -126 -40 112 80 -126 -40 112 80 -126 -40 -40 21 112
Equivalent GP Individual:
Tree 0:
(+ x (* x (+ x (* x (+ x (* x x))))))
ERCs:
5.3.4 Initialization and Breeding
Since we’re doing lists, we’ll need to define the parameters for creating new lists in the first place. The default parameters in ge.params do geometric size distribution as follows:
pop.subpop.0.species.genome-size = geometric
pop.subpop.0.species.geometric-prob = 0.85
pop.subpop.0.species.min-initial-size = 5
Change this to your heart’s content.
While we could use things like ListCrossoverPipeline, GE has two idiosyncratic list breeding operators:
• ec.vector.breed.GeneDuplicationPipeline picks two random indices in the list, copies the int sequence between them, then tacks the copy to the end of the list. Because this doesn’t rely on anything special to GE, it’s located in ec.breed.vector.
• ec.gp.ge.breed.GETruncationPipeline determines how many ints were consumed in the production of the GPTree. It then truncates the list to remove the unused ints.
The default pipeline in ge.params is a MultiBreedingPipeline which with 0.9 probability performs GETrun- cation, followed by crossover; and with 0.05 probability performs GETruncation followed by GeneDuplica- tion; and 0.05 probability simply does plain VectorMutation. In all cases we use Tournament Selection with size of 7.
pop.subpop.0.species.pipe = ec.breed.MultiBreedingPipeline
pop.subpop.0.species.pipe.num-sources = 3
pop.subpop.0.species.pipe.source.0 = ec.vector.breed.ListCrossoverPipeline
pop.subpop.0.species.pipe.source.0.source.0 = ec.gp.ge.breed.GETruncationPipeline
pop.subpop.0.species.pipe.source.0.source.0.source.0 = ec.select.TournamentSelection
pop.subpop.0.species.pipe.source.0.source.1 = same
pop.subpop.0.species.pipe.source.0.prob = 0.9
pop.subpop.0.species.pipe.source.1 = ec.vector.breed.GeneDuplicationPipeline
pop.subpop.0.species.pipe.source.1.source.0 = ec.gp.ge.breed.GETruncationPipeline
pop.subpop.0.species.pipe.source.1.source.0.source.0 = ec.select.TournamentSelection
pop.subpop.0.species.pipe.source.1.prob = 0.05
pop.subpop.0.species.pipe.source.2 = ec.vector.breed.VectorMutationPipeline
pop.subpop.0.species.pipe.source.2.source.0 = ec.select.TournamentSelection
pop.subpop.0.species.pipe.source.2.prob = 0.05
select.tournament.size = 7
You are of course welcome to change this any way you like.
The default pipeline also defines the mutation probability for VectorMutationPipeline, and includes a crossover type even though it’s unused (to quiet complaints from ECJ):
193
pop.subpop.0.species.mutation-prob = 0.01
# This isn’t used at all but we include it here to quiet a warning from ECJ
pop.subpop.0.species.crossover-type = one
5.3.5 Dealing with GP
Once you’ve created your grammar and set up your GEIndividual and GESpecies, you’ll still need to define all the same GPNodes, GPFunctionSets, GPNodeConstraints, GPTreeConstraints, etc. as usual. All GE is doing is giving you a new way to breed and create trees: but the tree information must still remain intact. However, certain items must be redefined because the GPSpecies is no longer in the normal parameter base (pop.subpop.0.species) but rather is the subsidiary to GESpecies, and thus at the parameter base (pop.subpop.0.species.gp-species). The simplest way to do this is to include ec/gp/koza/koza.params to define all the basic GP stuff, then create all of your GPNodes etc., and then override certain GP parameters, namely:
# We define a dummy KozaFitness here, and set the number of trees to 1.
# If you’re doing ADFs, you’ll need to add some more trees.
pop.subpop.0.species.gp-species = ec.gp.GPSpecies
pop.subpop.0.species.gp-species.fitness = ec.gp.koza.KozaFitness
pop.subpop.0.species.gp-species.ind = ec.gp.GPIndividual
pop.subpop.0.species.gp-species.ind.numtrees = 1
pop.subpop.0.species.gp-species.ind.tree.0 = ec.gp.GPTree
pop.subpop.0.species.gp-species.ind.tree.0.tc = tc0
# We also need a simple dummy breeding pipeline for GP, which will never
# be used, but if it’s not here GP will complain. We’ll just use Reproduction.
pop.subpop.0.species.gp-species.pipe = ec.breed.ReproductionPipeline
pop.subpop.0.species.gp-species.pipe.num-sources = 1
pop.subpop.0.species.gp-species.pipe.source.0 = ec.select.TournamentSelection
This is enough to convince the GP system to go along with our bizarre plans.
One Last Note There are some unusual and very rare cases where you may need to run GPIndividuals and GEIndividuals using the same problem class. To assist in this, GEProblem can recognize that GPIndividuals are being passed to it rather than GEIndividuals, in which case it simply evaluates them directly in the subsidiary GPProblem. You will receive a once-only warning if this happens. No other kinds of Individuals can be given to GEProblem: it’ll issue an error.
5.3.6 A Complete Example
We will continue the example given in Section 5.2.11. We don’t include the parameters and Java files specified so far in that example, but you’ll of course need them.
That example showed how to do a strongly-typed Individual with various functions, plus an ADF and an ERC. The ADF points to a tree that contains the same basic functions, plus another ERC and two ADFArguments. Keep in mind that strong typing just gets in our way, but in this example, it’s fairly harmless. We use all the existing parameters, but change a few. First, let’s do the most of the ones we’ve discussed so far:
194
# Basic parameters that we redefine
eval.problem = ec.gp.ge.GEProblem
pop.subpop.0.species = ec.gp.ge.GESpecies
pop.subpop.0.species.parser = ec.gp.ge.GrammarParser
pop.subpop.0.species.gp-species = ec.gp.GPSpecies
pop.subpop.0.species.fitness = ec.gp.koza.KozaFitness
pop.subpop.0.species.ind = ec.gp.ge.GEIndividual
pop.subpop.0.species.min-gene = -128
pop.subpop.0.species.max-gene = 127
pop.subpop.0.species.mutation-prob = 0.01
pop.subpop.0.species.crossover-type = one
pop.subpop.0.species.genome-size = geometric
pop.subpop.0.species.geometric-prob = 0.85
pop.subpop.0.species.min-initial-size = 5
# The pipeline
pop.subpop.0.species.pipe = ec.breed.MultiBreedingPipeline
pop.subpop.0.species.pipe.num-sources = 3
pop.subpop.0.species.pipe.source.0 = ec.vector.breed.ListCrossoverPipeline
pop.subpop.0.species.pipe.source.0.source.0 = ec.gp.ge.breed.GETruncationPipeline
pop.subpop.0.species.pipe.source.0.source.0.source.0 = ec.select.TournamentSelection
pop.subpop.0.species.pipe.source.0.source.1 = same
pop.subpop.0.species.pipe.source.0.prob = 0.9
pop.subpop.0.species.pipe.source.1 = ec.vector.breed.GeneDuplicationPipeline
pop.subpop.0.species.pipe.source.1.source.0 = ec.gp.ge.breed.GETruncationPipeline
pop.subpop.0.species.pipe.source.1.source.0.source.0 = ec.select.TournamentSelection
pop.subpop.0.species.pipe.source.1.prob = 0.05
pop.subpop.0.species.pipe.source.2 = ec.vector.breed.VectorMutationPipeline
pop.subpop.0.species.pipe.source.2.source.0 = ec.select.TournamentSelection
pop.subpop.0.species.pipe.source.2.prob = 0.05
select.tournament.size = 7
# GP hacks
pop.subpop.0.species.gp-species.fitness = ec.gp.koza.KozaFitness
pop.subpop.0.species.gp-species.ind = ec.gp.GPIndividual
pop.subpop.0.species.gp-species.pipe = ec.breed.ReproductionPipeline
pop.subpop.0.species.gp-species.pipe.num-sources = 1
pop.subpop.0.species.gp-species.pipe.source.0 = ec.select.TournamentSelection
You don’t have to specify all this: just include ge.params as a parent of your parameter file.
Since we’re doing ADFs, we’ll need two trees rather than just one. ge.params by default defines a single tree. We do two here:
# More GP hacks to handle an ADF
pop.subpop.0.species.gp-species.ind.numtrees = 2
pop.subpop.0.species.gp-species.ind.tree.0 = ec.gp.GPTree
pop.subpop.0.species.gp-species.ind.tree.0.tc = tc0
pop.subpop.0.species.gp-species.ind.tree.1 = ec.gp.GPTree
pop.subpop.0.species.gp-species.ind.tree.1.tc = tc1
Next let’s hook up the Problem, which was originally called ec.app.myapp.MyProblem:
195
# The problem
eval.problem.problem = ec.app.myapp.MyProblem
eval.problem.problem.data = ec.app.myapp.MyData
Now for each tree (the main tree and the ADF) we need to define the grammar files:
# The grammars
ge.species.file.0 = myproblem.grammar
ge.species.file.1 = adf0.grammar
5.3.6.1 Grammar Files
The myproblem.grammar file defines our basic functions (both boolean and real-valued) and also our ADF and ERC.
# This is the grammar file for the main tree
The adf0.grammar file uses just the real-valued functions, plus an ERC and two ADFArguments:
# This is the grammar file for ADF0, which requires two arguments
5.3.7 How Parsing is Done
GESpecies parses the grammar files by running them through an ec.gp.ge.GrammarParser, a special class which converts the files into parse graphs. The parse graphs consist of two kinds of nodes: a ec.gp.ge.GrammarFunctionNode and a ec.gp.ge.GrammarRuleNode, both subclasses of the abstract class ec.gp.ge.GrammarNode. You can replace ec.gp.ge.GrammarParser with your own subclass if you like, via:
pop.subpop.0.species.parser = ec.app.MySpecialGrammarParser
The parse graph produced by a GrammarParser is rooted with a GrammarRuleNode which indicates the initial rule. For example, reconsider the grammar below.
The parse graph for this grammar is shown in Figure 5.10. Once the parse graph has been generated, the GESpecies wanders through this graph as follows:
196
x y sin if * –
> nand
Figure 5.10 Parse graph (orderings among children not shown). The entry point is
• If on a GrammarRuleNode, the GESpecies selects exactly one child to traverse, using the next number in the GEIndividual array.
• If on a GrammarFunctionNode, the GESpecies traverses each and every child of the node in order. These become children to the GrammarFunctionNode’s GPNode.
GrammarFunctionNodes hold GPNode prototypes and represent them. Their children are arguments to the node in question. GrammarRuleNodes hold choice points in the rule, and their children are the choices (only one choice will be made).
5.4 Push (The ec.gp.push Package)
Push is a stack-based programming language developed by Lee Spector [22, 21]. The language is designed to allow the evolution of self-modifying programs, and this allows all manner of experimentation. ECJ has an experimental simple implementation of Push which uses a Push interpreter called Psh.8
The Push interpreter has several stacks on which data of different types can be pushed, popped, or otherwise manipulated. Built-in Push instructions designed to manipulate certain stacks usually have the name of the stack at the beginning of the instruction name, for convenience. For example, float.+ is an instruction which adds floating-point numbers found on the float stack. The Psh interpreter has a large number of built-in instructions: but you can also create your own instructions in ECJ to do whatever you like, and add them to Psh’s vocabulary.
Push programs take the form of Lisp lists of arbitrary size and nesting. For example, here is an uninter- esting Push program consisting entirely of instructions which use the float stack:
((float.swap ((float.swap float.dup))) float.dup float.* (float.+ float.-) float.dup)
Parentheses define blocks of code, but if you don’t have any code-manipulation instructions then parentheses don’t mean much. This code does the following:
8https://github.com/jonklein/Psh/ Note that ECJ uses a modified version of Psh which comes with the ECJ distribution: don’t use the GitHub version.
197
1. Swap the top two elements of the float stack
2. Swap the top two elements of the float stack
3. Duplicate the top item of the float stack and push it on the float stack
4. Duplicate the top item of the float stack and push it on the float stack
5. Multiply the top two items of the float stack and push the result on the float stack 6. Add the top two items of the float stack and push the result on the float stack
7. Subtract the top two items of the float stack and push the result on the float stack 8. Duplicate the top item of the float stack and push it on the float stack
Like I said, an uninteresting program.
Note that unlike in Lisp or GP tradition, the first element in each list in Psh isn’t special: it doesn’t define a function. Indeed it could just be another list. All lists do is set apart blocks of code, which might be useful in manipulation by special code-modifying instructions.
Because Push lists don’t start with a function symbol, their parse trees are different from GP program trees, and so while ECJ uses the GPNode and GPTree classes to encode Push trees, it does so in an unusual way. All the symbols in the Push program are leaf nodes in the tree. Dummy nonleaf nodes define the list and sublist structure of the tree. Furthermote, nonleaf nodes can have arbitrary arity, which is something not normally done in ECJ’s GP trees. For example, the previous Push program looks like this in tree form:
P float.dup
float.swap P P
float.swap float.dup
P
float.∗ P
float.+ float.−
float.dup
Thus ECJ creates Push programs using its GPTree facility with only two kinds of nodes: dummy nontermi- nals (the class ec.gp.push.Nonterminal) and instructions (the class ec.gp.push.Terminal, which is a special ECJ ERC that handles leaf nodes in the tree). The dummy nonterminals are shown in the figure above with the symbol P.
Evaluation Push trees are evaluated by submitting them to an interpreter, in this case the Psh interpreter. But there’s a problem. Psh, like every other Push interpreter out there, has its own notions of the objects used to construct a Push tree. It’s obviously not compatible with ECJ’s GPNode, GPTree, and GPIndividual objects. As a result, ECJ evaluates its Push trees by first writing them out to strings in Lisp form. It then sets up an interpreter, gives it the string program, and lets it parse the program into its own internal structure and evaluate it. This is circuitous but doesn’t really slow things down very much as it turns out.
Some Caveats Much of the modern Push work is being done on a Clojure-based interpreter (Clojush) rather than Psh, and perhaps in the future we may move to that instead. Long story short, this is an experimental package.
Also, much of the Push literature involves individuals which modify themselves and then pass these modifications onto successive generations. At present ECJ does not do this: once an individual is entered into
198
the interpreter, it stays there. Any changes made in the interpreter are ignored when the fitness evaluation is complete. We may change that in the future.
5.4.1 Push and GP
ECJ’s Push package only uses two GPNode classes, ec.gp.push.Nonterminal (for dummy nonleaf nodes) and ec.gp.push.Terminal (to define all possible leaf nodes). This means that, for all intents and purposes, the GP function set, from ECJ’s perspective, is hard-coded. But of course you still have the Push “function set” to set up: these are the actual instructions that ec.gp.push.Terminal can be set to (it’s an ERC which displays Strings) and that can be submitted to the Psh interpreter. Setting these instructions up is discussed in Section 5.4.2 coming next; for now, keep in mind that the Push “function set” is not the same thing as the GP function set.
This “hard-coded” GP function set is defined as follows in the ec/gp/push/push.params file:
gp.fs.size = 1
gp.fs.0.name = f0
gp.fs.0.size = 2
gp.fs.0.func.0 = ec.gp.push.Nonterminal
gp.fs.0.func.0.nc = nc1
gp.fs.0.func.1 = ec.gp.push.Terminal
gp.fs.0.func.1.nc = nc0
Note that Nonterminal is listed as having an arity of 1 even though it has an arbitrary arity. Because ec.gp.push.Nonterminal can have variable number of children, Push uses its own special tree builder to allow this. The class is called ec.gp.push.PushBuilder. PushBuilder follows the following algorithm:
1: 2: 3: 4: 5: 6: 7: 8: 9:
10:
11: 12:
function Build-Tree (int size) if size = 1 then
return a terminal (of the class ec.gp.push.Terminal) else
p ← a nonterminal (of the class ec.gp.push.Nonterminal) while size > 0 do
a ← a random number from 1 to size inclusive size ← size −a
c ← Build-Tree(a)
Add c as a child of p
Randomly shuffle the order of children of p return p
PushBuilder must choose a size, and this is done in the usual GPNodeBuilder way. To set up PushBuilder as the default initialization builder, and define it to pick sizes uniformly between 4 and 10 inclusive, you’d say:
gp.tc.0.init = ec.gp.push.PushBuilder
gp.tc.0.init.min-size = 4
gp.tc.0.init.max-size = 10
You can of course define the size distribution directly too, to describe a more complicated distribution:
199
gp.tc.0.init = ec.gp.push.PushBuilder
gp.tc.0.init.num-sizes = 10
gp.tc.0.init.size.0 = 0.0
gp.tc.0.init.size.1 = 0.0
gp.tc.0.init.size.2 = 0.0
gp.tc.0.init.size.3 = 0.0
gp.tc.0.init.size.4 = 0.5
gp.tc.0.init.size.5 = 0.25
gp.tc.0.init.size.6 = 0.125
gp.tc.0.init.size.7 = 0.0625
gp.tc.0.init.size.8 = 0.03125
gp.tc.0.init.size.9 = 0.015625
gp.tc.0.init.size.10 = 0.015625
Because ECJ’s Push package only uses two GPNode classes, you can basically use any GP crossover or mutation method you like, as long as it doesn’t rely on a fixed arity (ec.gp.push.Nonterminal has a variable number of children). None of the built-in ECJ GP breeding pipelines will cause any problem. However if you’re using a breeding pipeline which creates random subtrees (such as ec.gp.koza.MutationPipeline) you will want to make sure it uses the ec.gp.push.PushBuilder tree builder to generate those subtrees.
5.4.2 Defining the Push Instruction Set
Though the GP function set used by ECJ’s GP facility is hard-coded for Push, you still of course have to specify the instructions which can form a Push program. This is done via the ec.gp.push.Terminal.
Let’s say you wanted to do a symbolic regression problem and had settled on the following seven Push instructions: float.* float.+ float.% float.- float.dup float.swap float.pop. You”d set it up like this:
# The Instruction Set
push.in.size = 7
push.in.0 = float.*
push.in.1 = float.+
push.in.2 = float.%
push.in.3 = float.-
push.in.4 = float.dup
push.in.5 = float.swap
push.in.6 = float.pop
This is, in effect, the Push equivalent of a GP Function Set. Psh also supports a limited number of Ephemeral Random Constants, or ERCs. Keep in mind that these are not GP ERCs, though they operate similarly inside the Push interpreter. ECJ defines these with the names float.erc and int.erc, one for Push’s float stack and one for the integer stack. So if you’d like to add an ERC to your instruction set, you might say:
push.in.size = 8
push.in.0 = float.*
push.in.1 = float.+
push.in.2 = float.%
push.in.3 = float.-
push.in.4 = float.dup
push.in.5 = float.swap
push.in.6 = float.pop
push.in.7 = float.erc
If you use an ERC, you should also set its minimum and maximum values. ECJ defines the following defaults in the push.params file:
200
push.erc.float.min = -10.0
push.erc.float.max = 10.0
push.erc.int.min = -10
push.erc.int.max = 10
The following instructions are built into the Psh interpreter.
integer.+ float.+
integer.- float.-
integer./ float./
integer.% float.%
integer.* float.*
integer.pow float.pow
integer.log float.log
integer.= float.=
integer.> float.>
integer.< float.<
integer.min float.min
integer.max float.max
integer.abs float.sin
integer.neg float.cos
integer.ln float.tan
integer.fromfloat float.exp
integer.fromboolean float.abs
integer.rand float.neg
boolean.=
boolean.not
boolean.and
boolean.or
boolean.xor
boolean.frominteger
boolean.fromfloat
boolean.rand
true false
code.quote
code.fromboolean
code.frominteger
code.fromfloat
code.noop
code.do*times
code.do*count
code.do*range
code.=
code.if
code.rand
exec.k
exec.s
exec.y
exec.noop
exec.do*times
exec.do*count
exec.do*range
exec.=
exec.if
exec.rand
input.index
input.inall
input.inallrev
input.stackdepth
frame.push
frame.pop
float.ln
float.frominteger
float.fromboolean
float.rand
5.4.3 Creating a Push Problem
Push Problems are created by subclassing from ec.gp.push.PushProblem. Building a Push Problem is more complex than other ECJ Problems because you may have to interact with the Psh interpreter. We have provided some cover functions to enable you to keep your hands relatively clean.
One common way a Push program is evaluated (such as in Symbolic Regression) is along these lines:
1. A Push Program is created from the GPIndividual.
2. An Interpreter is created or reset.
3. Certain stacks in the interpreter are loaded with some data
4. The program is run on the interpreter
5. The tops of certain stacks in the interpreter are inspected to determine the return value, and thus the fitness.
6. The GPIndividual’s fitness is set to this fitness.
To create a Push Program, you’d call getProgram(...), passing in the individual in question. This will return a Program object (this is a class used in Psh).
To create an interpreter, you’d call getInterpreter(...), again passing in the individual in question. This will return an Interpreter object (another class used in Psh). You can also call resetInterpreter(...) to reset the interpreter to a pristine state so you can reuse it.
201
Let’s say your approach is to push a number on the float stack of the interpreter, run the program, and then examine what’s left on the float stack to determine fitness. PushProblem has utility methods for pushing, and examining the float and integer stacks. If you want to do more sophisticated stuff than this, you’re welcome to but will have to consult Psh directly.
Finally, to run the program on the interpreter, you can call executeProgram(...). ec.gp.push.PushProblem Methods
public org.spiderland.Psh.Program getProgram(EvolutionState state, GPIndividual ind) Builds and returns a new Program object from the provided individual.
public org.spiderland.Psh.Interpreter getInterpreter(EvolutionState state, GPIndividual ind, int thread) Builds and returns a new, empty Interpreter object.
public void resetInterpreter(org.spiderland.Psh.Interpreter interpreter) Resets the interpreter’s stack so it can execute a new program.
public void executeProgram(org.spiderland.Psh.Program program, org.spiderland.Psh.Interpreter interpreter, int maxSteps) Runs the provided program on the given interpreter for up to maxSteps steps.
public void pushOntoFloatStack(org.spiderland.Psh.Interpreter interpreter, float val) Pushes the given value onto the interpreter’s float stack.
public void pushOntoIntStack(org.spiderland.Psh.Interpreter interpreter, int val) Pushes the given value onto the interpreter’s integer stack.
public boolean isFloatStackEmpty(org.spiderland.Psh.Interpreter interpreter) Returns whether the interpreter’s float stack is emtpy.
public boolean isIntStackEmpty(org.spiderland.Psh.Interpreter interpreter) Returns whether the interpreter’s integer stack is emtpy.
public float topOfFloatStack(org.spiderland.Psh.Interpreter interpreter)
Returns the top element on the interpreter’s float stack. You ought to check to see if the stack is empty first.
public int topOfIntStack(org.spiderland.Psh.Interpreter interpreter)
Returns the top element on the interpreter’s stack. You ought to check to see if the stack is empty first.
You will define your PushProblem in the usual ECJ fashion. Furthermore, since Push uses the GP facility, you also need to provide a GPData object, though it’ll never be used:
eval.problem = ec.app.myapp.MyPushProblem
eval.problem.data = ec.gp.GPData
5.4.4 Building a Custom Instruction
It’s often the case that Psh’s built-in instructions aren’t enough for you. Perhaps you want an instruction which prints the top of the float stack. Perhaps you want an instruction which does more interesting math, or which reads from a file, or moves an ant. To do this you need to be able to create a class which you can submit as an instruction. This is what ec.gp.push.PushInstruction does for you.
PushInstruction is a org.spiderland.Psh.Instruction subclass which also implements ECJ’s ec.Prototype interface and so is properly cloneable and serializable. Since it’s a Prototype, it has a setup(...) method which you can take advantage of to get things ready, and a clone() method you might need to override. But critically it has an Execute(...) method which the interpreter uses to run the instruction. Note the unusual spelling: this is a Psh method name.
202
To implement a PushInstruction you’ll need to know how Psh works so you can manipulate its stacks according to the needs of the instruction. Two examples (Atan and Print) are provided in the ec/app/push application example.
ec.gp.push.PushInstruction Methods
public void Execute(org.spiderland.Psh.Interpreter interpreter) Executes the given instruction.
Once you have constructed your custom instruction, you need to add it to your instruction set. To do this you give it a name and also specify the PushInstruction class which defines the instruction. The presence of this class specification informs ECJ that this isn’t a built-in Psh instruction. For example, we might add to the end of our existing instruction set like so:
5.5
push.in.size = 9
push.in.0 = float.*
push.in.1 = float.+
push.in.2 = float.%
push.in.3 = float.-
push.in.4 = float.dup
push.in.5 = float.swap
push.in.6 = float.pop
push.in.7 = float.erc
push.in.8 = my.instruction.name
push.in.8.func = ec.app.myapp.MyInstruction
Rulesets and Collections (The ec.rule Package)
Let’s get one thing out of the way right now. Though we had rulesets in mind when we developed it, the ec.rule package isn’t really for rulesets. Not only can the package be used for things other than rules, but it’s not even sets: it’s collections (or “bags” or “multisets”) of arbitrary objects.
The representation defined by this package is fairly straightforward: an ec.rule.RuleIndividual contains one or more ec.ruleRuleSets, each of which contain zero or more ec.rule.Rules. A Rule is an abstract superclass which can contain anything you want. And that’s about it.
Problem domains for which the ec.rule package is appropriate are often also good candidates for the variable-length lists found in the ec.vector package. You’ll need to think about which is a better choice for you. Also beware that of the various representation packages in ECJ, ec.rule is definitely the least used and least tested. So its facilities are somewhat cruder than the others and it’s possible you may see bugs.
Each level of the ec.rule package (individual, ruleset, rule) is a Prototype and has a Flyweight relationship with a central object special to that level (for a reminder on Flyweights, see Section 3.1.4). Specifically:
Object In Flyweight Relationship With
ec.rule.RuleIndividual ec.rule.RuleSpecies ec.rule.RuleSet ec.rule.RuleSetConstraints
ec.rule.Rule ec.rule.RuleConstraints
The ec.rule package follows the same approach as the ec.vector package does when it comes to breeding: two basic breeding operators are provided (ec.rule.breed.RuleCrossoverPipeline and ec.rule.breed.RuleMutationPipeline) which simply call default mutation and crossover functions in the Indi- viduals themselves. Thus to do more sophisticated breeding you have the choice of either overriding these functions or creating new breeding pipelines which perform more detailed operations on their own.
203
5.5.1 RuleIndividuals and RuleSpecies
RuleIndividuals and RuleSpecies are specified in parameters in the standard way:
pop.subpop.0.species = ec.rule.RuleSpecies
pop.subpop.0.species.ind = ec.rule.RuleIndividual
A RuleIndividual is a subclass of Individual which simply consists of an array of RuleSets:
public RuleSet[] rulesets;
Each RuleSet can be a different class. You’d think that the number and class of RuleSets was specified in the RuleSpecies (like in ec.vector). But for no good reason it’s not the case: you specify them in the parameters for the prototypical individual, along these lines:
pop.subpop.0.species.ind.num-rulesets = 2
pop.subpop.0.species.ind.ruleset.0 = ec.rule.Ruleset
pop.subpop.0.species.ind.ruleset.1 = ec.app.myapp.MyRuleset
Alternatively, you can use the RuleIndividual’s default parameter base:
rule.individual.num-rulesets = 2
rule.individual.ruleset.0 = ec.rule.Ruleset
rule.individual.ruleset.1 = ec.app.myapp.MyRuleset
Though for many applications you will probably just have a single RuleSet, and it’ll probably just be an ec.rule.Ruleset:
pop.subpop.0.species.ind.num-rulesets = 1
pop.subpop.0.species.ind.ruleset.0 = ec.rule.Ruleset
5.5.2 RuleSets and RuleSetConstraints
A RuleSet contains an arbitrary number of Rules (ec.rule.Rule), anywhere between zero and up. It’s largely your job to customize the breeding and initialization procedures appropriate to your problem to constraint the number and type of rules. They’re defined here:
public Rule[] rules;
public int numRules;
Notice that the number of rules in the array may be less than the array size, that is, numRules ≤ rules.length. The rules themselves run from rules[0] ... rules[numRules − 1]. This is done because, like ArrayList etc., rules is variable in size and an grow and shrink. RuleSet contains a number of utility methods for manipulating the order and number of these rules:
ec.rule.RuleSet Methods
public int numRules()
Returns the number of rules in the RuleSet.
public void randomizeRulesOrder(EvolutionState state, int thread)
An auxillary debugging method which verifies many features of the structure of the GPTree and all of its GPNodes. This method isn’t called by ECJ but has proven useful in determining errors in GPTree construction by various tree building or breeding algorithms.
public void addRule(Rule rule)
Adds the rule to the end of the ruleset, increasing the length of the RuleSet array as necessary.
204
public void addRandomRule(EvolutionState state, int thread)
Produces a new randomly-generated rule and adds it to the RuleSet. The Rule is created by cloning the prototypical Rule from the RuleSet’s RuleSetConstraints, then calling reset(...) on it.
public Rule removeRule(int index)
Removes a rule located at the given index from the RuleSet and returns it. All rules are shifted down to fill the void.
public Rule removeRandomRule(EvolutionState state, int thread)
Removes a random rule returns it. All rules are shifted down to fill the void.
public RuleSet[] split(int[] points, RuleSet[] sets)
Breaks the RuleSet into n disjoint groups, then clones the rules in those groups and adds them to the respective sets, which must be provided in the given array. The first group of rules starts at 0 and ends below points[0]: this goes into sets[0]. Intermediate groups, which go into sets[i], start at points[i] and end below points[i+1]. The final group, which goes into sets[points.length], starts at points[points.length − 1] and continues to the end of the rule array. If points.length = 0, then all rules simply get put into sets[0]. Note that the size of sets must be one more than the size of points. The sets are returned.
public RuleSet[] split(EvolutionState state, int thread, RuleSet[] sets)
For each rule in the RuleSet, clones the rule and adds the clone to a randomly chosen RuleSet from sets. Returns sets.
public RuleSet[] splitIntoTwo(EvolutionState state, int thread, RuleSet[] sets, double probability)
For each rule in the RuleSet, clones the rule and, with the given probability, adds the clone to sets[0], else sets[1]. Note that sets must be two in length.
public void join(RuleSet other)
Copies the rules in the other RuleSet, then adds them to the end of this RuleSet.
public void copyNoClone(RuleSet other)
Deletes all the rules in the RuleSet. Then places all the rules from the other RuleSet into this RuleSet. No cloning is done: both RuleSets now have pointers to the same rules.
Groups of RuleSets have a flyweight relationship with a RuleSetConstraints object. RuleSetConstraints is a Clique. You specify the number and class of RuleSetConstraints and assign each a unique name. Let’s say you need two RuleSetConstraints objects, both instances of RuleSetConstraints itself. You’d write this:
rule.rsc.size = 2
rule.rsc.0 = ec.rule.RuleSetConstraints
rule.rsc.0.name = rsc1
rule.rsc.1 = ec.rule.RuleSetConstraints
rule.rsc.1.name = rsc2
RuleSetConstraints specify a number of constraints which guide the initialization and mutation of RuleSets, specifically:
• A distribution for choosing the number of Rules an initial RuleSet will have. This guides how the RuleSet’s reset(...) method operates. The distribution can either be uniform with a minimum and maximum, or you can specify a histogram of possible size probabilities. We’ll do the first case for Ruleset 0 and the second case for Ruleset 1 below:
205
# RuleSetConstraints 0 will have between 5 and 10 rules inclusive
rule.rsc.0.reset-min-size = 5
rule.rsc.0.reset-max-size = 10
# RuleSetConstraints 1 will have 0 to 4 rules with these probabilities...
rule.rsc.1.reset-num-sizes = 5
rule.rsc.1.size.0 = 0.1
rule.rsc.1.size.1 = 0.2
rule.rsc.1.size.2 = 0.2
rule.rsc.1.size.3 = 0.3
rule.rsc.1.size.4 = 0.4
• The probability of adding, deleting, and rearranging rules, when the RuleSet’s mutate(...) method is called, typically by the BreedingPipeline ec.rule.RuleMutationPipeline. When this method is called, the Ruleset first mutates all of its rules by calling mutate(...) on them. Then it repeatedly flips a coin of a given probability: each time the coin comes up true, or until the number of rules is equal to the minimum number of initial rules as specified above, one rule is deleted. Afterwards it repeatedly flips a coin of another probability: each time the coin comes up true, or until the number of rules is equal to the maximum number of initial rules, one new rule is added at random. Finally, with a certain probability the rule ordering is shuffled. Here’s some examples of specifying these probabilities:
rule.rsc.0.p-add = 0.1
rule.rsc.0.p-del = 0.1
rule.rsc.0.rand-order = 0.25
rule.rsc.1.p-add = 0.5
rule.rsc.1.p-del = 0.6
rule.rsc.1.rand-order = 0.0
Once you’ve specified a RuleSetConstraints, you then attach one to each RuleSet. For example:
pop.subpop.0.species.ind.ruleset.0.constraints = rsc2
pop.subpop.0.species.ind.ruleset.1.constraints = rsc1
... or alternatively use the default parameter base...
rule.individual.constraints = rsc2
Once set, you can access the constraints with the following method:
ec.rule.RuleSet Methods
public final RuleSetConstraints ruleSetConstraints(RuleInitializer initializer) Returns the RuleSet’s RuleSetConstraints
RuleSetConstraints one method for choosing random initial values under the constraints above:
ec.rule.RuleSetConstraints Methods
public int numRulesForReset(RuleSet ruleset, EvolutionState state, int thread)
Returns a random value from the initial (reset(...)) distribution, to use as the number of rules to initialize or reset the RuleSet.
Additionally, the various addition, deletion, and randomization probabilities can be accessed like this: 206
RuleSetConstraints rsc = myRuleSet.ruleSetConstraints((RuleInitializer)(state.init)); double addition = rsc.p add;
double deletion = rsc.p del;
double shuffling = rsc.p randorder;
RuleSets also contain all the standard reading and writing methods, none of which you’ll need to override unless you’re making a custom RuleSet.
ec.rule.RuleSet Methods
public void printRuleSetForHumans(EvolutionState state, int log) Writes a RuleSet to a log in a fashion easy for humans to read.
public void printRuleSet(EvolutionState state, int log)
Writes a RuleSet to a log in a fashion that can be read back in via readRule(...), typically by using the Code package.
public void printRuleSet(EvolutionState state, PrintWriter writer)
Writes a RuleSet to a Writer in a fashion that can be read back in via readRule(...), typically by using the code package.
public void readRuleSet(EvolutionState state, LineNumberReader reader) throws IOException
Reads a RuleSet written by printRuleSet(...) or printRuleSetToString(...), typically using the Code package.
public void writeRuleSet(EvolutionState state, DataOutput output) throws IOException Writes a RuleSet in binary fashion to the given output.
public void readRuleSet(EvolutionState state, DataInput input) throws IOException Reads a RuleSet in binary fashion from the given input.
5.5.3 Rules and RuleConstraints
RuleSetConstraints also contain the prototypical Rule for RuleSets adhering to a given constraints. RuleSets will clone this Rule to create Rules to fill themselves with. ec.rule.Rule is an abstract superclass which doesn’t do anything by itself: you’re required to subclass it to make the Rule into the kind of thing you want to create a collection of in your Ruleset.
The prototypical Rule is specified like this:
pop.subpop.0.species.ind.ruleset.0.rule = ec.app.MyRule
pop.subpop.0.species.ind.ruleset.1.rule = ec.app.MyOtherRule
You can get the prototypical rule like this:
RuleSetConstraints rsc = myRuleSet.ruleSetConstraints((RuleInitializer)(state.init));
Rule prototype = rsc.rulePrototype;
Each Rule has a flyweight-related RuleConstraints object, which is defined similarly to RuleSetConstraints (it’s also a Clique). For example, to create a single RuleConstraints in the clique, you might say:
rule.rc.size = 1
rule.rc.0 = ec.rule.RuleConstraints
rule.rc.0.name = rc1
RuleConstraints are essentially blank: they define no special parameters or variables. You can use them however you see fit. If you don’t really care, you can just make a single RuleConstraints object as above and assign it to your prototypical rules, such as:
207
pop.subpop.0.species.ind.ruleset.0.rule.constraints = rc1
pop.subpop.0.species.ind.ruleset.1.rule.constraints = rc1
...or use the default parameter base:
rule.rule.constraints = rc1
A Rule is abstract, and so has certain abstract methods which must be overridden, as well as others which ought to be overridden. First the required ones:
ec.rule.Rule Methods
public abstract int hashCode()
Returns a hash code for the rule, based on value, suitable for weeding out duplicates.
public abstract int compareTo(Object other)
Returns 0 if this Rule is identical in value to other (which will also be a Rule), -1 if this Rule is “less” than the other rule in sorting order, and 1 if the Rule is “greater” than the other rule in sorting order.
public abstract void reset(EvolutionState state, int thread) Randomizes the value of the rule.
Rules are Prototypes and so implement the clone(), setup(...), and defaultBase() methods. You’ll most likely need to override the clone() and setup(...) methods as usual. Additionally, you may want to override:
ec.rule.Rule Methods
public void mutate(EvolutionState state, int thread)
Mutates the Rule in some way and with some probability. The default implementation simply calls reset(...), which is probably much too harsh.
public abstract int compareTo(Object other)
Returns 0 if this Rule is identical in value to other (which will also be a Rule), -1 if this Rule is “less” than the other rule in sorting order, and 1 if the Rule is “greater” than the other rule in sorting order.
public abstract void reset(EvolutionState state, int thread)
Randomizes the value of the rule.
Then there are the standard printing and reading methods. You’ll need to override at least printRule- ToStringForHumans(), and probably will want to override toString(). The others you can optionally override depending on the kind of experiments you’re doing.
ec.rule.Rule Methods
public String toString()
Writes the Rule to a String, typically in a fashion that can be read back in via readRule(...). You’ll want to override this method or printRuleToString(). You probably want to use the Code package to write the rule out. You only really need to implement this method if you expect to write Individuals to files that will be read back in later.
public String printRuleToStringForHumans()
Writes the Rule to a String in a fashion easy for humans to read. The default implementation of this method simply calls toString(). You’ll probably want to override this method.
public void printRuleForHumans(EvolutionState state, int log)
Writes a Rule to a log in a fashion easy for humans to read. The default implementation of this method calls printRuleToStringForHumans(), which you should probably override instead.
208
public String printRuleToString()
Writes the Rule to a String, typically in a fashion that can be read back in via readRule(...). The default implemen- tation of this method simply calls toString(). You’ll want to override this method or toString(). You probably want to use the Code package to write the rule out. You only need to implement this method if you expect to write Individuals to files that will be read back in later.
public void printRule(EvolutionState state, int log)
Writes a Rule to a log in a fashion that can be read back in via readRule(...). The default implementation of this method calls printRuleToString(), which you should probably override instead.
public void printRule(EvolutionState state, PrintWriter writer)
Writes a Rule to a Writer in a fashion that can be read back in via readRule(...). The default implementation of this method calls printRuleToString(), which you should probably override instead.
public void readRule(EvolutionState state, LineNumberReader reader) throws IOException
Reads a Rule written by printRule(...) or printRuleToString(...), typically using the Code package. The default does nothing. You only need to implement this method if you expect to read Individuals from files.
public void writeRule(EvolutionState state, DataOutput output) throws IOException
Writes a Rule in binary fashion to the given output. The default does nothing. You only need to implement this method if you expect to read and write Rules over a network (such as the distributed evaluation or island models).
public void readRule(EvolutionState state, DataInput input) throws IOException
Reads a Rule in binary fashion from the given input. The default signals an error. You only need to implement this method if you expect to read and write Rules over a network (such as the distributed evaluation or island models).
5.5.4 Initialization
Basic Initialization works as follows:
1. The RuleSpecies method newIndividual(EvolutionState, int) produces a RuleIndividual by calling su- per.newIndividual(...) — cloning a RuleIndividual prototype- ̇– and then calling reset(...) on the resultant RuleIndividual.
2. The RuleIndividual’s reset(...) by default just calls reset(...) on each of the RuleSets.
3. A RuleSet’s reset(...) method calls numRulesForReset(...) on the RuleSetConstraints to pick a random number of rules to generate (see Section 5.5.2). It then produces an array of that size and fills it with rules cloned from the RuleSetConstraint’s prototypical Rule. Then it calls reset(...) on each of the Rules.
4. You are responsible for implementing a Rule’s reset(...) method.
You can of course intervene and modify any of these methods as you see fit.
5.5.5 Mutation
As in the case in the ec.vector package, the ec.rule.breed.RuleMutationPipeline class doesn’t mutate rules directly, but rather calls a method on them to ask them to mutate themselves. The procedure is as follows:
1. The RuleMutationPipeline calls preprocessIndividual(...) on the RuleIndividual.
2. The RuleIndividual’s preprocessIndividual(...) method calls preprocessRules(...) on each of the RuleSets. 3. The RuleSet’s preprocessRules(...) method by default does nothing: override it as you like.
4. The RuleMutationPipeline then calls mutate(...) on the RuleIndividual.
209
5. The RuleIndividual’s mutate(...) method by default just calls mutate(...) on each of its RuleSets.
6. The RuleSet’s mutate(...) method does several modifications to the rules in the RuleSet, in this order:
(a) All the Rules in the RuleSet have mutate(...) called on them.
(b) A coin of is repeatedly flipped of a certain probability (p del), and each time it comes up true, a rule is deleted at random using removeRandomRule(...). The indivdiual will not shrink smaller than its specified minimum size.
(c) A coin of is repeatedly flipped of a certain probability (p add), and each time it comes up true, a rule is added at random using addRandomRule(...). That method clones a new Rule from the prototypical Rule, then calls reset(...) on it. The indivdiual will not grow larger than its specified maximum size.
(d) With a certain probability (p randorder), the order of the rules is shuffled using randomizeRulesOr- der(...).
The three probabilities (p del, p add, and p randorder), and the minimum and maximum rule sizes, are discussed in Section 5.5.2, and are determined by RuleSetConstraints parameters, also discussed in that Section.
7. A Rule’s mutate(...) method by default simply calls reset(...), which is probably not what you want. You’ll probably want a much more subtle mutation if any, and so will need to override this method.
8. You are responsible for implementing a Rule’s reset(...) method.
9. Finally, the RuleMutationPipeline calls postprocessIndividual(...) on the RuleIndividual.
10. The RuleIndividual’s postprocessIndividual(...) method calls postprocessRules(...) on each of the RuleSets.
11. The RuleSet’s postprocessRules(...) method by default does nothing: override it as you like.
Often rules need to be in a carefully-constructed dance of constraints to be valid in an Individual. The intent of the preprocessIndividual(...), postprocessIndividual(...), preprocessRules(...), and postProcessRules(...) methods is to give your RuleIndividual a chance to fix RuleSets that have been broken by crossover or mutation. The default implementation of these methods doesn’t do much:
ec.rule.RuleSet Methods
public void preprocessRules(EvolutionState state, int thread)
A hook called prior to mutation or crossover to prepare for possible breakage of Rules due to the mutation or crossover event. The default implementation does nothing.
public void postprocessRules(EvolutionState state, int thread)
A hook called after to mutation or crossover to fix possible breakage of Rules due to the mutation or crossover event. The default implementation does nothing.
ec.rule.RuleIndividual Methods
public void preprocessIndividual(EvolutionState state, int thread) Calls preprocessRules(...) on each RuleSet in the Individual.
public void postprocessIndividual(EvolutionState state, int thread) Calls postprocessRules(...) on each RuleSet in the Individual.
210
5.5.6 Crossover
Unlike RuleMutationPipeline, the ec.rule.breed.RuleCrossoverPipeline performs direct crossover on two RuleIn- dividuals. Here is the procedure:
1. The RuleCrossoverPipeline calls preprocessIndividual(...) on each RuleIndividual.
2. The RuleIndividual’s preprocessIndividual(...) method calls preprocessRules(...) on each of the RuleSets. 3. The RuleSet’s preprocessRules(...) method by default does nothing: override it as you like.
4. For each pair of RuleSets, one per RuleIndividual...
(a) Each RuleSet A and B is split into two pieces, A1 and A2 (and B1 and B2) by calling splitIntoTwo(...)
(b) A new RuleSet A′ is formed from the union of A1 and B1, and likewise, a new RuleSet B′ is formed
from the union of A2 and B2.
(c) If A′ and B′ do not minimum and maximum size constraints (see Section 5.5.2), go to (a) and try
again.
(d) Else A′ and B′ replace A and B respectively in each RuleIndivdiual.
5. Finally, the RuleCrossoverPipeline calls postprocessIndividual(...) on each RuleIndividual.
6. The RuleIndividual’s postprocessIndividual(...) method calls postprocessRules(...) on each of the RuleSets. 7. The RuleSet’s postprocessRules(...) method by default does nothing: override it as you like.
RuleCrossoverPipeline has a few parameters which guide its operation. First, any given Rule will migrate from one Individual’s RuleSet to the other only with a certain probability. Second, the CrossoverPipeline can be set up to return only one child (tossing the second) rather than returning two. By default it returns both children. To set both of these parameters, let’s say that the RuleCrossoverPipeline is the root pipeline for the species. We’d say:
pop.subpop.0.species.pipe = ec.rule.breed.RuleCrossoverPipeline
pop.subpop.0.species.pipe.crossover-prob = 0.1
pop.subpop.0.species.pipe.toss = true
It doesn’t make any sense to have a rule crossover probability higher than 0.5. As usual, you could use the default parameter base as well:
rule.xover.crossover-prob = 0.1
rule.xover.toss = true
211
212
Chapter 6
Parallel Processes
ECJ has various built-in methods for parallelism, and they can be used in combination with one another:
• Multiple breeding and evaluation threads, already discussed in Section 2.4.
• Distributed evaluation: sending chunks of Individuals to remote computers to be evaluated. This is typically done in a generational fashion, but a variation of this is asynchronous evolution, in which Individuals are sent to multiple remote computers in a steady-state fashion. Additionally, remote computers can (given time) engage in a little evolutionary optimization of their own on the chunks they’ve received before sending them back. This is known as opportunistic evolution.1
• Island models: multiple parallel evolutionary processes occasionally send fit individuals to one another.2
6.1 Distributed Evaluation (The ec.eval Package)
Distributed Evaluation is connects one master ECJ process with some N slave ECJ processes. The master handles the evolutionary loop, but when Individuals are evaluated, they are shipped off to the remote slaves to do this task. This way evaluation can be parallelized.
Distributed Evaluation is only useful if the amount of time you save by parallelizing evaluation exceeds the amount of time lost by shipping Individuals over the network (and sending at least Fitnesses back). Generally this means that evaluation needs to take a fair bit of time per Individual: perhaps several seconds.
There are two kinds of EvolutionState objects which can work with Distributed Evaluation:
• •
6.1.1
SimpleEvolutionState, which sends entire Populations off to be evaluated in parallel. SteadyStateEvolutionState, which sends individuals off to be evaluated one at a time, in a fashion
called asynchronous evolution (discussed later). The Master
To set up Distributed Evaluation, you first need to set up the Master. This is done just like a regular evolutionary computation process: but there are some additional parameters which must be defined. First,
1ECJ’s built-in distributed evaluation is meant for clusters. However, Parabon Inc. has developed a grid-computing version, called Origin, which runs on hundreds of thousands or even millions of machines. See the ECJ main website for more information.
2ECJ’s built-in island models are meant for clusters. However, a version of ECJ was ported to run on top of the DR-EA-M system, a peer-to-peer evolutionary computation network facility developed from a grant in Europe. See the ECJ main website for more information.
213
Evaluator
Slave
problem
manages
evaluator
problem
Slave Computer
Master Problem
Posts Jobs to
Master Computer
1 *
1 *
Works with and Sends Jobs to
Slave Monitor
Evolution State
"actual" problem
Your Problem
Slave Connection
Evaluator
Job
Your Problem
Figure 6.1 Layout of the Distributed Evaluation package.
we must define the master problem, nearly always as ec.eval.MasterProblem. The presence of the master problem turns on distributed evaluation:
eval.masterproblem = ec.eval.MasterProblem
The MasterProblem is the interface that connects the distributed evaluation system to a regular evolu- tionary computation loop. When it is defined, ECJ replaces the Problem prototype with a MasterProblem prototype. The original Problem doesn’t go away — it’s rehung as a variable in the MasterProblem. Specifi- cally, you can get to it like this:
Problem originalProblem = ((MasterProblem)(state.evaluator.p problem)).problem;
When your evolutionary computation process wishes to evaluate one or more individuals, it hands them to the MasterProblem, which it thinks is the Problem for the application. But the MasterProblem doesn’t send them to a clone of your underlying Problem but rather routes them to a Slave.
Slaves register themselves over the network with an object in the Master process called a ec.eval.SlaveMonitor, which maintains one ec.eval.SlaveConnection per Slave to communicate with the remote Slave. The SlaveMonitor listens in on a specific socket port for incoming Slaves to register themselves. You’ll need to define this port (to something over 2000). Here’s what’s standard:
eval.master.port = 15000
Your MasterProblem will submit Individuals to the SlaveMonitor, which will in turn direct them to one of the SlaveConnections. SlaveConnections don’t just ship off single Individuals to Slaves — that would be far too inefficient a use of network bandwidth. Instead, they often batch them up into jobs for the Slave to perform. Here’s how you define the size of a job:
eval.masterproblem.job-size = 10
The default is 1, which is safe but maximally inefficient. The idea is to make the job large enough to pack Individuals into a network packet without wasting space. If your Individuals are large, then a large job size
214
won’t have any efficiency benefit. If they’re very small, the job size will have a huge benefit.
You also need to keep the Slaves humming along, ideally by keeping the TCP/IP streams filled with waiting jobs queued up. Increasing this number can have a significant effect on performance. Here’s a
reasonable minumum:
eval.masterproblem.max-jobs-per-slave = 3
Again, the default is 1, which is safe but inefficient.
Warning! If you set these two parameters wrong, you may wind up sending all your individuals to just a few slaves, while the others sit by idly. Let’s say that the value N is equal to the floor of the number of individuals in a subpopulation divided by the number of slaves. For example, if you have 100 individuals and 45 slaves, then N = 2. In general, you never want your job-size to be greater than N. Furthermore, if your job-size is set to some value M, you never want your max-jobs-per-slave to be greater than ⌊N/M⌋. The safe (but potentially network-inefficient) settings are always 1 and 1 respectively.
ECJ would prefer to compress the network streams to make them more efficient. But it can’t do it without the jzlib/ZLIB library, which you must install separately3 (see the ECJ main webpage or http://www.jcraft.com/jzlib). Once it’s installed, you can turn it on like this:
eval.compression = true
Another warning! If you turn on compression in your master but not your slave (or vice versa), the slave will connect but then nothing will happen. You will not get any warning about your error.
Keep Your Master Single-Threaded Multi-threaded breeding is fine. But evaluation should be kept single-threaded, that is,
evalthreads = 1
Multi-threaded evaluation will probably work fine: but I’m not absolutely positive of it, and there’s no use to multithreading when all you’re doing is shipping jobs off-site. Play it safe.
6.1.2 Slaves
Slaves are started up on separate CPUs, often in different machines from the Master. You can have as many Slaves as you like: the more the merrier. The Slave class replaces ec.Evolve to handle its own startup, so you don’t start up a Slave using the standard ec.Evolve procedure. Instead you’d type:
java ec.eval.Slave -file slave.params -p param=value ... (etc.)
The slave parameters must include all the evolutionary parameters and also all the master parameters (in
fact, you might as well say something like...)
parent.0 = master.params
Slaves set themselves up with their own nearly complete EvolutionState and Evaluator objects — enough to evaluate Individuals and also perform evolution if necessary. A slave distinguishes itself by setting a special internal parameter: eval.i-am-slave = true. You don’t need to set this parameter — it’s set programmatically by ec.eval.Slave when it’s fired up. But you should be aware of it: it’s used by ec.Evaluator to determine whether to replace the Problem with the MasterProblem (it needs to know if the process
3Sure, Java has built-in compression routines. Unfortunately they’re entirely broken for network streams: they don’t support “partial flush”, a critical item for sending stuff across networks. They’re only really useful for compressing files on disks. It’s a long-standing Java bug, and unlikely to get fixed in the foreseeable future.
215
is a Master or a Slave, and since your Slave probably included the Master parameters — including the eval.masterproblem parameter — it looks confusingly like a Master). You could also use it yourself to determine if your code is presently being run on a Slave or not. This is occasionally useful:
ec.util.Parameter param = new ec.util.Parameter("eval.i-am-slave");
boolean amASlave = state.parameters.getBoolean(param, null, false);
The first thing a Slave needs to know is where the Master is so it can set itself up. This is done with something like this:
eval.master.host = 129.8.2.4
Remember that you’ll also need the port (among other Master parameters!)
Next the Slave needs to know whether it should return entire Individuals or just the Fitnesses of those Individuals. Individuals are generally much bigger than Fitnesses, and if you only return Fitnesses you can cut your network traffic almost in half. The problem is that in some custom experiments your fitness evaluation procedure might modify the Individual (it depends on the nature of your experiment), and so you’d need to return it in that case. You’ll need to state whether to return entire Individuals or not:
eval.return-inds = false
Slaves can come and go at any time dynamically. If new slaves show up, the Master will immediately start taking advantage of them. If a Slave disappears, the Individuals it was responsible for will be reassigned to another Slave.
Slaves can be multithreaded: they can process multiple individuals in parallel in the background as they come in. This is done with the standard parameter:
evalthreads = 4
This will create four threads on the Slave process to evaluate Individuals. Note that it makes no sense for this parameter to exceed the Master’s eval.masterproblem.job-size parameter. Also, if you exceed the actual number of CPUs or cores allocated to your Slave process, it doesn’t make much sense either. Last, Slaves are not multithreaded when evaluating Individuals with GroupedProblemForm (such as coevolved evaluation, see Section 7.1.2).
When you fire up a Slave, it will repeatedly try to connect to a Master process until it succeeds. This means you can start Slaves before you start the Master; or you can start them after you start the Master, it doesn’t matter. Ordinarily when the Master dies (it finishes the EC process, or it bombs, or whatever) the Slaves will all terminate. But you can also set up Slaves to run in a daemon mode: when the Master terminates the Slave resets itself and waits for another Master to come online. Slaves in daemon mode also reset themselves if an error occurs on the Slave: perhaps a Java error in the user’s fitness evaluation procedure, say, or if the virtual machine runs out of memory. In the case of out-of-memory errors, the Slave will make a good-faith attempt to reset itself, but if it cannot, it will quit.
By default slaves do not run in daemon mode: they are one-shot. To run in daemon mode, say:
eval.slave.one-shot = false
The default is “true” (that is, not daemon mode).
Warning If you make your slave multithreaded, and run it in daemon mode, and an exception occurs while the slave is doing multithreaded evaluation, then the slave may reset without cleaning up the other threads: they may continue in the background.
You can give your slave a name which it and the master will use when printing out connection information. It’s not at all necessary: you don’t have to, in which case it’ll make up a name consisting of its IP address and a unique number . To assign a name, you say:
216
eval.slave.name = MyName
If you would like to prevent the slave from writing anything to the screen, except in the case of a fatal error or exception, you can say:
eval.slave.silent = true
The default is “false”.
6.1.3 Opportunistic Evolution
Slaves have the option of doing some evolutionary computation of their own, a procedure known as opportunistic evolution [24]. The procedure works like this:
1. The Master sends the Slave a large Job.
2. The Slave evaluates the Individuals in the Job.
3. The Slave has a maximum allotted time to evaluate the Individuals. If it has not yet exceeded this time, it treats the Job as a Population and does some evolution on the Individuals in the Population.
4. When the time is up, the Slave returns the most recent Individuals in the Population in lieu of the original Individuals. The new Individuals replace the old ones in the Master’s evolutionary process. This means that the Slave cannot just return Fitnesses, but must return whole Individuals.
This procedure is turned on with:
eval.slave.run-evolve = true
You’ll also need to specify the amount of time (in milliseconds) allotted to the Slave. Here’s how you’d set it to six seconds:
eval.slave.runtime = 6000
Last, if you’re doing opportunistic evolution, you must return whole Individuals, not just Fitnesses. After all, you could have entirely different individuals after running on the Slave. Thus you’ll need to set:
eval.return-inds = true
The procedure for evolution is entirely specified by the Slave’s parameters just as if it were specified in a regular ECJ process, including breeding and evaluation threads, etc. There’s absolutely no reason the Slave can’t have its own evolutionary algorithm that’s different from the Master ’s evolutionary algorithm — just specify it differently in the Slave’s parameters. The only thing that’d be required is that the Slave and the Master have exactly the same kinds of Individuals, Fitnesses, and Species in their Subpopulations.
Note that Opportunistic Evolution won’t work with coevolution or other procedures which require GroupedProblemForm (Section 7.1.2). Additionally, although Steady-State Evolution (via Asynchronous Evolution, see Section 6.1.4) can work with Opportunistic Evolution in theory, it’d be quite odd to do so.
6.1.4 Asynchronous Evolution
ECJ’s distributed evaluation procedure works intuitively with generational methods (such as ec.simple.SimpleEvolutionState) but it also works nicely with Steady-State evolution (ec.simple.SteadyStateEvolutionState). This procedure is called asynchronous evolution. See [24] for more information.
The procedure is similar to Steady State Evolution, as discussed in Section 4.2. The Population starts initially empty. Then the algorithm starts creating randomly-generated Individuals and shipping them off to
217
Recover from Checkpoint
Reinitialize Exchanger, Evaluator
Optional Post-Checkpoint Statistics
Optionally Checkpoint
Optional Pre-Checkpoint Statistics
Post-Post-Breeding Exchange Statistics
Post-Breeding Exchange
Pre-Post-Breeding Exchange Statistics
Post-Pre-Breeding Exchange Statistics
Pre-Breeding Exchange
Pre-Pre-Breeding Exchange Statistics
Pre-Initialization Statistics
Initializer
(Set up popuation, but don't populate)
Initialize Exchanger, Evaluator
Entering-Initial-Population Statistics
Post-Initialization Statistics
Choose a Subpoulation (Round-robin)
Evaluator Ready for an Indivdiual?
YES
Is the Subpopulation Full?
YES
Breeder
(Breed an Individual)
NO
Make an Indivdiual
NO
Evaluator
(Begin evaluation of Individual)
Individuals-Bred Statistics
NO
NO
Is an Evaluated Indivdiual Ready?
YES
Is the Subpopulation Full?
YES
Add Individual to Subpopulation
Possibly Replace Individual in the Population with Evaluated Individual
NO
YES
Generation Boundary?
First? Entering-Steady-State Statistics
Breeder
(Pick Individual to replace)
Indivdiuals-Evaluated Statistics
Figure 6.2 Top-Level Loop of ECJ’s SteadyStateEvolutionState class, used for simple steady-state EC and Asynchronous Evolution algorithms. “First?” means to perform the Statistics whenever the Subpopulation in question is picking an Individual to displace for the very first time. (Each Subpopulation will do it once, but possibly at different times). A repeat of Figure 4.2.
218
remote Slaves to be evaluated. If a Slave is available, the algorithm will generate an Individual for it to work on. When a Slave has finished evaluating an Individual and has returned it (or its Fitness), the Individual is then placed into the Population.
At some point the Population will fill up. At this point the algorithm shifts to “steady state” mode. When a Slave returns an Individual, and there’s no space in the Population, the algorithm makes room by marking an existing Individual for death and possibly replacing it with the newcomer, just like it’s done in Steady State Evolution (see Section 4.2 for details). And when a Slave becomes available, an Individual will no longer be created at random to give to it: rather, the Individual will be bred from the existing Population.
This procedure requires some careful consideration. First, note that at the point that the algorithm shifts to “steady state” mode, there are probably a large number of Individuals being evaluated on Slaves which were not bred from the Population but were created at random. Until those Individuals have made their way into the Population, we won’t be in a true “steady state”.
Second, Steady-State Evolution assumes the production of one Individual at a time: but distributed evaluation allows more than one individual per Job. This is reconciled as follows. when Steady-State Evolution starts up, it calls prepareToEvaluate(...) on the Problem (the MasterProblem) once. Thereafter whenever an individual is sent to the Problem to be evaluated, it calls evaluate(...). Recall from Section 3.4.1 that this process does not require the Problem to immediately assign Fitness — it can bulk up Individuals for evaluation and is only required to provide a Fitness on or prior to a call to finishEvaluating(...). However, Steady-State Evolution never calls finishEvaluating(...). As a result, the distributed evaluator is free to assess Individuals in any order and any way it likes, and to take as long as it likes to assign them a Fitness. The distributed evaluator will wait for up to job-size worth of calls to evaluate(...), then pack those Individuals together in one Job and ship them out to a remote Slave for evaluation. In “steady-state” mode, when the Individuals come back, they are placed in the Population, killing and replacing other individuals already there. Depending on the selection process for marking Individuals for death, it’s entirely possible that an Individual newly placed into the Population may be immediately marked for death and replaced with another Individual from the same Job! You can get around this by setting the job-size and max-jobs-per-slave to as low as 1:
eval.masterproblem.job-size = 1
eval.masterproblem.max-jobs-per-slave = 1
...but of course this will make the network utilization poor.
Rescheduling Lost Jobs If you’re doing generational evolution, then if a slave disappears you need the SlaveMonitor to reschedule the lost jobs, or else your population won’t be fully evaluated. But that’s not necessarily the case for Asynchronous Evolution. Here, if slave disappears along with its jobs, well, that’s life. No big loss. Indeed, this is probably the desired behavior.
By default the SlaveMonitor reschedules its lost jobs. But you can turn off this behavior, so lost jobs just disappear into the ether, with:
eval.masterproblem.reschedule-lost-jobs = false
6.1.5 The MasterProblem
The MasterProblem is where much of the magic lies in the interface between ECJ and the distributed evaluator, so it’s worth mentioning some of its details.
Checkpointing and Synchronization To start, let’s discuss how MasterProblem handles checkpointing. Evaluators have three methods which we didn’t discuss in Section 3.4:
219
public void initializeContacts(EvolutionState state);
public void reinitializeContacts(EvolutionState state);
public void closeContacts(EvolutionState state, int result);
These methods are meant to assist in checkpointing with remote slaves. They in turn call similar methods in the prototypical Problem:
public void initializeContacts(EvolutionState state);
public void reinitializeContacts(EvolutionState state);
public void closeContacts(EvolutionState state, int result);
result will be one of EvolutionState.R FAILURE (the most common case) or EvolutionState.R SUCCESS (which only happens if the Evaluator in this process or some external process found the ideal individual).
The default implementation of these Problem methods does nothing at all. But in MasterProblem these methods are used to handle reconnection of Slaves after a checkpoint recovery. The first two methods create both a new SlaveMonitor. The final method shuts down the monitor cleanly.
Asynchronous Evolution MasterProblem also has special methods used only by Steady-State Evolution (and thus Asynchronous Evolution:
ec.eval.MasterProblem Methods
public boolean canEvaluate()
Returns true if a Slave is available to take an Individual.
public boolean evaluatedIndividualAvailable()
Returns true if a Slave has a completed Individual waiting to be introduced to the Population.
public QueueIndividual getNextEvaluatedIndividual()
Blocks until an Individual is available from a Slave. Then returns it as an ec.steadystate.QueueIndividual. A QueueIndividual is a very simple class which just contains the Individual and the Subpopulation that the Individual should be introduced into. It has the following instance variables:
public Individual ind;
public int subpop;
Batching up Jobs MasterProblem implements all the methods defined in SimpleProblemForm and GroupedProblemForm (Section 7.1.2). Additionally, MasterProblem overrides common Problem meth- ods and handles them specially:
ec.eval.MasterProblem Methods
public void prepareToEvaluate(EvolutionState state, int threadnum)
Creates a new queue of Individuals who are out waiting to be sent to be processed.
public void finishEvaluating(EvolutionState state, int threadnum)
Sends all Individuals presently in the queue out in one or more Jobs. Then waits for all slaves to complete evaluation of all Individuals.
Most Evaluators instruct Problems to evaluate a sequence of Individuals by first calling prepareToEvalu- ate(...), then repeatedly calling evaluate(...), then finally calling finishEvaluating(...). MasterProblem takes advantage of this as follows. When prepareToEvaluate(...) is called, MasterProblem creates a list in which to
220
put Individuals. evaluate(...) does not actually evaluate individuals: instead, individuals are simply added to the list. When enough individuals have been added to the queue to form one Job, the Job is queued up to be submitted to the next available Slave. When finishEvaluating(...) is called, any remaining individuals in the list are bundled together in a final Job and sent to a Slave. Then finishEvaluating(...) waits until all the Jobs have been completed before it returns.
When the GroupedProblemForm form of evaluate(...) is called (such as from a coevolutionary evaluator), the Individuals passed to that call are treated as a single Job and shipped off immediately —t ̇hat is, there is no queue. However, evaluate(...) again does not wait for results to come back. The waiting again happens when finishEvaluating(...) is called.
There are certain Evaluators which do call prepareToEvaluate(...) or finishEvaluating(...), but instead just repeatedly call evaluate(...). This is meant for processes which cannot delay the evaluation of individuals in an evaluate(...) before the next evaluate(...) is called. For example, certain competitive coevolutionary processes may require a serial evaluation of this kinds. In this case, a call to evaluate(...) will cause MasterProblem to simply ship off the Individual or Individuals to evaluate as a single job, wait for the result, and return it. Perhaps this isn’t the best use of massive parallelism!
Sending Initialization Data from the Master to the Slaves MasterProblem has one additional feature: three methods you can override such that, whenever a Slave is started up, you are given the chance to send one-time data from the Master to the Slave, perhaps to initialize some variables you need your Slave to know. Note that this facility is untested and may be somewhat fragile: we suggest that before you use this facility you consider instead placing the data in a file that each Slave can read.
Here’s how it works. You begin by declaring a special subclass of MasterProblem—let’s call it ec.app.myapp.MyMasterProblem. Then you must tell ECJ that you’re using it (on both sides, master and slave) with something like this:
eval.masterproblem = ec.app.myapp.MyMasterProblem
An instance of this MasterProblem subclass is used by the Master. When a Slave shows up, the Master will call the method sendAdditionalData(...) on that instance to give you the chance to send data down the stream to the Slave before any evaluation occurs. The Slave receives this data by calling the method receiveAdditionalData(...) on a dummy instance of this same subclass. This dummy instance will not be used for any purpose except to handle the data reception. That method must store away the transferred data, typically as instance variable in your subclass which you have created for this purpose. Finally, each time the Slave creates a new EvolutionState (and it may do so many times), it calls transferAdditionalData(...) on that dummy instance to give it a chance to modify and set up EvolutionState appropriately to reflect the data it had received from the Master.
sendAdditionalData(...) will be called once on the Master for each Slave that shows up. receiveAdditional- Data(...) will only be called once on the Slave side. transferAdditionalData(...) may be called multiple times on the Slave side, every time a new EvolutionState is loaded and set up. It is entirely up to you to handle the protocol for sending and receiving this data: if you mess up (write more data than you read on the other side, say), it’s undefined what’ll happen. Probably something bad.
ec.eval.MasterProblem Methods
public void sendAdditionalData(EvolutionState state, DataOutputStream dataOut)
Called from the SlaveMonitor’s accept() method to optionally send additional data to the Slave via the dataOut stream. By default it does nothing.
public void receiveAdditionalData(EvolutionState state, DataInputStream dataIn)
Called on a dummy MasterProblem by the Slave. You should use this method to store away received data via the dataIn stream for later transferring to the current EvolutionState via the transferAdditionalData(...) method. You should NOT expect this MasterProblem to be used for by the Slave for evolution (though it might). By default this method does nothing, which is the usual situation. The EvolutionState is provided solely for you to be able
221
to output warnings and errors: do not rely on it for any other purpose (including access of the random number generator or storing any data).
public void transferAdditionalData(EvolutionState state)
Called on a dummy MasterProblem by the Slave to transfer data previously loaded via receiveAdditionalData(...) to a running EvolutionState at the beginning of evolution. This method may be called multiple times if multiple EvolutionStates are created. By default this method does nothing, which is the usual situation. Unlike in receiveAdditionalData(...), the provided EvolutionState is “live” and you can set it up however you like.
6.1.6 Noisy Distributed Problems
Sometimes your fitness function is noisy: the same individual will get different fitness values depending on some degree of randomness. An easy way to deal with this is to run the same individual for some N trials inside your Problem instance and to take the median or mean or whatnot, and use that as your fitness.
This works fine in a single-CPU setting, but in a parallel setting there’s a gotcha. Let’s say you have 100 parallel machines available to you but you have a population size of 20. You’d like to do 5 trials for each individual, and trials are not cheap. If you farmed out individuals using the distributed evaluator and a Problem subclass which did 5 trials internally, only 20 machines would be used, and they’d be expected to run 5 trials each. What you want to do is run one trial on each of the machines. But because of the way MasterProblem is set up, this won’t work.
ECJ has a hack that you can use in this situation. SimpleEvaluator has the parameters:
eval.num-tests = 5
% Other options are: median, best
eval.merge = mean
This tells the SimpleEvaluator to test each individual 5 times and set its fitness value to their mean. How does it do this? By temporarily modifying the Subpopulation.
If your Subpopulation presently has 20 individuals in it, and you want to test each 5 times, SimpleEvalu- ator will replace this Subpopulation with a copy which has 100 individual in it. These 100 individuals are 5 clones each of the original 20. Then SimpleEvaluator evaluates this new population (which causes the distributed evaluation system to farm them out to potentially 100 machines).
After all the fitnesses have come back, SimpleEvaluator merges the fitnesses of the 5 clones together to form a single fitness and sets the original individual’s fitness to that merged value. The legal merge settings are mean, median, and best. Finally, SimpleEvaluator restores the original Subpopulation.
This means that if you have a custom Subpopulation subclass, it needs to deal gracefully with being copied and having its individuals array extended. In general this is probably the case but you should be warned.
Also, in this situation, the number of individuals being evaluated is smaller than the number of machines available. Thus you will need to turn off ECJ features which bulk up multiple individuals per job, and multiple jobs per slave. Set these to 1:
eval.masterproblem.job-size = 1
eval.masterproblem.max-jobs-per-slave = 1
A note about merging and multiobjective optimization. Multiobjective Fitness classes can do merging, but only using mean. The median and best options will throw exceptions. When a multiobjective fitness undergoes merging, it’s essentially forming the centroid of the merged fitnesses. Merging is okay for subclasses like SPEA2 or NSGA-II’s fitness classes because it’d done before their auxiliary fitness information is computed.
There’s no equivalent to this hack in Asynchronous Evolution: you’ll just have to ask a machine to test the individual 5 times.
222
6.2 Island Models (The ec.exchange Package)
In addition to Distributed Evaluation, ECJ supports Island Models: separate ECJ processes (“islands”) which connect over the network and occasionally hand highly-fit Individuals to one another. This facility is handled by an Exchanger called ec.exchange.IslandExchange which ships individuals off to other islands immediately before breeding, and immediately after breeding brings in new individuals provided it by other islands. You’ll run your separate islands as ordinary processes.
Most of the issues in Island models surround the particular topology being chosen. Which islands will send Individuals to which other islands? How many at a time? How often? Are Individuals sent synchronously or asynchronously? Etc. ECJ manages all topology and connection parameters via a special ECJ process called the island model server. Each island will connect to and register itself with the server. When the islands have all connected, the server will tell them which islands need to hook up to which other islands and how. After the islands have hooked up, they’re given the go-ahead to start their evolutionary processes. If the islands are acting synchronously, each generation will wait for the server to give them the go-ahead to continue to the next generation; this go-ahead only occurs after all islands have finished the previous generation. Finally, when an island discovers an optimal individual, it will signal the server to let the other islands know (so they can shut down).
As you can see, the server really does little more than tell the islands how to connect and acts as a referee. Thus it’s actually a very lightweight process. You can run the server either as its own process like this:
java ec.exchange.IslandExchange -file server.params -p param=value ... (etc.)
...or an ECJ island can also do double-duty, serving as the server as well. All islands, whether ordinary
islands or double-duty island-server combos, are fired up in the same standard ECJ fashion:
java ec.Evolve -file island.params -p param=value -p param=value ... (etc.) ...or (as usual):
java ec.Evolve -checkpoint myCheckpointFile.gz
A double-duty island-server combo would differ from a plain island solely in the parameters it defines:
it’d need server parameters in addition to client parameters.
Mixing Island Models, Threading, and Distributed Evaluation There’s absolutely no reason you can’t create an unholy union of Island Models and Distributed Evaluation. For example, it’s perfectly reasonable to have an Island Model where each island maintains its own pool of Slaves to do distributed evaluation. It’d be a lot of parameter files though! Island Models also work perfectly fine in a multithreaded environment.
6.2.1 Islands
You set up an ECJ process as an island by defining a special Exchange object for it:
exch = ec.exchange.IslandExchange
IslandExchange maintains the island’s mailbox. Prior to breeding, the IslandExchange procedure will send some fit individuals off to mailboxes of remote islands. The procedure for selecting Individuals is defined along these lines:
exch.select = ec.select.TournamentSelection
Obviously this selection procedure may require its own parameters, such as (in this example):
223
exch.select.size = 2
After breeding, the IslandExchange will empty its own mailbox and introduce into the Population all of the Individuals contained therein. These Individuals will displace some of the recently-bred Individuals, which never get a chance to be selected.4 The selection method for picking the Individuals to die and be displaced is defined as:
exch.select-to-die = ec.select.RandomSelection
If this parameter isn’t defined, individuals are picked at random. Again, as this is a selection operator, it may have its own parameters.
An Island needs to know where the Server is so it can register itself, and the socket port on which the server is listening, for example:
exch.server-addr = 128.2.30.4
exch.server-port = 8999
When an Island registers itself with the Server, it’ll tell it two things. First, it’ll tell the Server the island’s name, a String which uniquely identifies the island (and by which the Server looks up topology information for the island). Second, it’ll tell the Server the socket port on which it’ll receive incoming Individuals from other islands. Let’s say that you’re creating an island called StatenIsland. You might specify the following:
exch.id = StatenIsland
exch.client-port = 9002
Note that the client socket port should be (1) higher than 2000 and (2) different from other client ports, and the server port, if they’re running on the same machine or in the same process. You’ll also probably want to have compressed network streams. Like was the case in Distributed Evaluation, this can’t be done without the jzlib/ZLIB library, which you must install separately (see the ECJ main webpage or http://www.jcraft.com/jzlib). This is because Java’s compression facilities are broken. Once this library installed, you can turn it on like this:
exch.compression = true
Be certain to give your island a unique random number seed different from other islands! Don’t set the seed to time, since it’s possible that two islands will have the same seed because they were launched within the one millisecond of one another. I’d hard-set the seed on a per-island basis.
seed.0 = 5921623
If your islands share the same file system, you’ll want to make sure they don’t overwrite each other’s statistics files etc. To do this, for example, StatenIsland might change its statistics file name to:
stat.file = $statenisland.stat
If you have multiple Statistics files you’ll need to change all of them; the same goes for other files being written out. Also, if you are checkpointing, and your islands might overwrite each others’ checkpoint files, you need to change the checkpoint prefix on a per-island basis. For example:
checkpoint-prefix = statenisland
... or alternatively change the directory in which checkpoint files are written on a per-island basis: 4Life’s not fair.
224
checkpoint-directory = /tmp/statenisland/
Last, you can cut down on the verbosity of the islands by setting...
exch.chatty = false
6.2.2 The Server
The Server holds all the parameters for setting up the island topology. But first we must clue your ECJ process into realizing that it is a Server in the first place. This is done with:
exch.i-am-server = true
Next we need to state how many islands are in the island model graph:
exch.num-islands = 3
As discussed in Section 6.2.1, each island has a unique name (id). Here you will state which island in your graph has which id:
exch.island.0.id = StatenIsland
exch.island.1.id = ConeyIsland
exch.island.2.id = EllisIsland
Each island has some number of connections to other islands (the islands it’ll send Individuals, or migrants, to). In this example, we’ll say that StatenIsland sends migrants to ConeyIsland, which sends migrants to EllisIsland, which sends migrants to both StatenIsland and ConeyIsland:
exch.island.0.num-mig = 1
exch.island.0.mig.0 = ConeyIsland
exch.island.1.num-mig = 1
exch.island.1.mig.0 = EllisIsland
exch.island.2.num-mig = 2
exch.island.2.mig.0 = StatenIsland
exch.island.2.mig.1 = ConeyIsland
StatenIsland and ConeyIsland send 10 migrants to each of the islands they’re connected to. But we want EllisIsland to send 50 migrants to each of the (two) islands it’s connected to:
exch.island.0.size = 10
exch.island.1.size = 10
exch.island.2.size = 50
Altenatively you can use a default parameter base of sorts:
exch.size = 10
Each island has a maximum mailbox capacity: if there is no room, further immigrants will be dropped and disappear into the ether. You should make your mailbox big enough to accept immigrants at a reasonable rate, but not so large that in theory they could entirely overwhelm your population! I suggest a mailbox three or four times the size of the expected immigrants. How about 100 or 200?
225
exch.island.0.mailbox-capacity = 200
exch.island.1.mailbox-capacity = 200
exch.island.2.mailbox-capacity = 200
Altenatively you can use a default parameter base of sorts:
exch.mailbox-capacity = 200
Last you’ll need to stipulate two additional parameters on a per-island basis: the start-generation (in which generation the island will start sending Individuals out) and modulus (how many generations the island will wait before it sends out another batch of Individuals). These are mostly set to maximize network utilization: perhaps you may wish the islands to send out individuals at different times so as not to clog your network, for example. Here we’ll tell each island to send out individuals every three generations, but to start at different initial generations so to be somewhat staggered:
exch.island.0.mod = 3
exch.island.1.mod = 3
exch.island.2.mod = 3
exch.island.0.start = 1
exch.island.1.start = 2
exch.island.2.start = 3
Altenatively you can use a default parameter base of sorts:
exch.mod = 3
exch.start = 2
6.2.2.1 Synchronicity
Island models can be either synchronous or asynchronous. In a synchronous island model, islands wait until they all have reached the next generation before sending immigrants to one another. In the asynchronous island model, islands go at their own pace and send immigrants whenever they feel like it. This means that one evolutionary process on one computer may run much faster than another one (good, because it doesn’t waste resources waiting for the other one to catch up) but it may overwhelm the other process with multiple generations of immigrants before the other process can get around to processing them (usually bad). Generally speaking asynchronicity is preferred — and is the default setting.
If for some reason you want to turn on synchronicity, you do this:
exch.sync = true
Note that the modulo and start-generation of islands results in a predictable behavior for synchronous island models: but since asynchronous islands can go at their own pace, the modulo and start-generation happen when they happen for each island.
Note too that because asynchronous island models go at their own pace, and are subject to the whims of the speed of the operating system and the CPU time allotted to the process, there’s no way to guarantee replicability.
6.2.3 Internal Island Models
ECJ’s Internal Island Model facility simulates islands using separate Subpopulations: each Subpopulation is an island, and occasionally highly fit Individuals migrate from Subpopulation to Subpopulation. Like any other Exchanger, the Internal Island Model facility takes Individuals from other Subpopulations immediately
226
before Breeding, stores them, and then introduces them into their destination Subpopulations immediately after Breeding.
There are four important things to note about this facility:
• Obviously each Subpopulation must have identical Species and Individual and Fitness prototypes.
• Internal Island Models are always synchronous.
• Because they use the Subpopulation facility, Internal Island Models are incompatible with any other ECJ procedure which relies on Subpopulations: notably coevolution.
• Because they define an Exchanger, Internal Island Models are incompatible with any other ECJ procedure which uses an Exchanger: in particular, you can’t mix Internal Island Models with regular Island Models!
Why would you use Internal Island Models? I think mostly for academic purposes: to study and simulate synchronous Island Models without having to rope together a bunch of machines. You could also use Internal Island Models to run N evolutionary processes in parallel — just set the number of immigrants to zero.
Internal Island Models depend solely on a specific Exchanger, ec.exchange.InterPopulationExchange. To build an Internal Island Model, you first define three subpopulations and their species, individuals, breeding pipelines, the whole works, using a standard generational algorithm. Then you define the exchanger:
exch = ec.exchange.InterPopulationExchange
Let’s say you have four Subpopulations acting as islands. You’ll first need to stipulate the Selection Method used to select individuals to migrate to other Subpopulations and the Selection Method used to kill Individuals to make way for incoming immigrants:
exch.subpop.0.select = ec.select.TournamentSelection
exch.subpop.1.select = ec.select.TournamentSelection
exch.subpop.2.select = ec.select.TournamentSelection
exch.subpop.3.select = ec.select.TournamentSelection
exch.subpop.0.select-to-die = ec.select.RandomSelection
exch.subpop.1.select-to-die = ec.select.RandomSelection
exch.subpop.2.select-to-die = ec.select.RandomSelection
exch.subpop.3.select-to-die = ec.select.RandomSelection
If you don’t define a selection method for death, ECJ will assume you mean to select individuals randomly. Alternatively you can use the default parameter base:
exch.select = ec.select.TournamentSelection
exch.select-to-die = ec.select.RandomSelection
Remember that these selection operators may have their own parameters. For example, we may wish to say (for some reason):
exch.subpop.0.select.size = 2
exch.subpop.1.select.size = 2
exch.subpop.2.select.size = 7
exch.subpop.3.select.size = 7
Otherwise the selection operators will rely on their default parameters. Note: recall that these default pa- rameters are not based on the Exchanger’s default parameter base. That is, they’re select.tournament.size and not exch.select.size. I strongly suggest using non-default paramters.
227
Next you need to state the number of immigrants sent at a time; the first generation in which they’ll be sent; and the modulus (the interval, in terms of generations, between successive migrations). These are basically the same as the standard Island Model. The parameters might look like this:
exch.subpop.0.size = 5
exch.subpop.1.size = 5
exch.subpop.2.size = 5
exch.subpop.3.size = 15
exch.subpop.0.start = 1
exch.subpop.1.start = 2
exch.subpop.2.start = 3
exch.subpop.3.start = 4
exch.subpop.0.mod = 8
exch.subpop.1.mod = 8
exch.subpop.2.mod = 8
exch.subpop.3.mod = 8
It’s here where you could convert these to separate independent evolutionary processes: just set the size parameter to 0 for all subpopulations. Anyway, you can also use default parameter bases for these:
exch.size = 5
exch.start = 1
exch.mod = 8
Now we need to define the topology. For each island we’ll define the number of Subpopulations it sends migrants to, and then which ones. Imagine if Subpopulation 2 sent migrants to everyone else, but the other Subpopulations just sent migrants to Subpopulation 2. We would define it like this:
exch.subpop.0.num-dest = 1
exch.subpop.0.dest.0 = 2
exch.subpop.1.num-dest = 1
exch.subpop.1.dest.0 = 2
exch.subpop.2.num-dest = 3
exch.subpop.2.dest.0 = 0
exch.subpop.2.dest.1= 1
exch.subpop.2.dest.2 = 3
exch.subpop.3.num-dest = 1
exch.subpop.3.dest.0 = 2
Last, Internal Island Models tend to be verbose. To make them less chatty, you can say:
exch.chatty = false;
Reminder It’s laborious to write a zillion parameters if you have large numbers of subpopulations in your internal island model experiments. You can greatly simplify this process using the pop.default-subpop parameter (see Section 3.2.1).
6.2.4 The Exchanger
In Section 3.6 we talked about various basic Exchanger methods. But three were not discussed, mostly because they’re only used for Island Models (not even Internal Island Models). They are:
228
public void initializeContacts(EvolutionState state);
public void reinitializeContacts(EvolutionState state);
public void closeContacts(EvolutionState state, int result);
If these look similar to the methods in Section 6.1.5, it’s with good reason. Their function is to set up net- working connections, re-establish networking connections after restarting from a checkpoint, and shut down networking connections in a clean way. IslandExchange implements them but not InterPopulationExchange.
Additionally, prior to migrating an Individual to another island, Exchangers typically call a hook you can override to modify that Individual or replace it with some other Individual:
protected Individual process(EvolutionState state, int thread, String island,
int subpop, Individual ind);
There are various reasons you might do this: perhaps the islands have different representations for their individuals or fitness classes, for example, requiring a conversion before you send migrants off.
This method has two parameters which require explanation. The island is the id of the island that will re- ceive the Individual, or null if there is no such island (as is the case for InterPopulationExchange). The subpop is the destination subpopulation of the Individual (particularly important in InterPopulationExchange).
To assist you in your processing, IslandExchange provides an additional method which is useful to call inside the process method:
public int getIslandIndex(EvolutionState state, String island);
This method returns the index in the parameter database of the island referred to by id, or IslandEx- change.ISLAND INDEX LOOKUP FAILED.
The purpose of this method is to allow islands to look up information, typically stored in a parameter database, about the islands they’re about to send migrants to, so as to convert individuals in a way ap- propriate to those islands, perhaps. To take advantage of this method, you’d need to make sure that the server’s island exchange parameters are also in each client’s database. For example, let’s say that your id is
“GilligansIsland”, and in the parameter database (which again the client must have) we have the following parameters in the server database file:
exch.num-islands = 8
...
exch.island.1.id = GilligansIsland
exch.island.1.num-mig = 3
exch.island.1.mig.0 = SurvivorIsland
exch.island.1.mig.1 = EllisIsland
exch.island.1.mig.2 = FantasyIsland
exch.island.1.size = 4
exch.island.1.mod = 4
exch.island.1.start = 2
exch.island.1.mailbox-capacity = 20
# this is just made up
exch.island.1.is-gp-island = true
...
(Note the last nonstandard parameter, which I made up for purposes of this example.) We begin by also including these parameters in each island’s database file. We are about to send some migrants to GilligansIsland (the name passed into process(...). To look up information about this island, we can use getIslandIndex(...) to discover that GilligansIsland is island number 1 in our parameter database. This makes it easy for us to look up other information we’ve stashed there to tell us how we should process our island:
protected Individual process(EvolutionState state, int thread, String island, int subpop, Individual ind)
229
{
int index = getIslandIndex(state, island);
if (index == ISLAND_INDEX_LOOKUP_FAILED) // uh oh
state.output.fatal("Missing island index for " + island);
Parameter param = new Parameter("exch.island." + index + ".is-gp-island");
if (!state.parameters.exists(param)) // uh oh
state.output.fatal("Missing parameter for island!", param, null);
boolean isGPIsland = getBoolean(param, null, false);
if (isGPIsland)
{
// process the individual in some way because it’s going to a "GP Island" or whatever
}
}
This method isn’t necessary (nor provided) for InterPopulationExchange since it doesn’t really do “islands” per se so much as exchanges between subpopulations. In this case, the process(...) method has told us what subpopulation we’ll be sending the individual to and we have direct access to it. For example, we might do this:
protected Individual process(EvolutionState state, int thread, String island, int subpop, Individual ind)
{
if (state.population.subpops[subpop].species instanceof ec.gp.GPSpecies)
{
// convert to a GP individual, or whatever
} }
230
Chapter 7
Additional Evolutionary Algorithms
7.1 Coevolution (The ec.coevolve Package)
The coevolution package is meant to provide support for three kinds of Coevolution:
• One-Population Competitive Coevolution • Two-Population Competitive Coevolution • N-Population Cooperative Coevolution
Coevolution differs from evolutionary methods largely in how evaluation is handled (and of course, by the fact that there are often multiple subpopulations). Thus the classes in this package are basically Problems and Evaluators. The first form of Coevolution is provided by the ec.coevolve.CompetitiveEvaluator class. The second two are made possible by the ec.coevolve.MultiPopCoevolutionaryEvaluator class.
7.1.1 Coevolutionary Fitness
Coevolution is distinguished by its evaluation of Individuals not separately but in groups, where the Fitness of an Individual depends on its performance in the context of other Individuals (either competing with them or working with them towards a common goal). For this reason, coevolution typically involves evaluating an Individual multiple times, each time with a different set of other Individuals, and then computing the Fitness based on these multiple trials.
To assist in this procedure, the ec.Fitness class has two auxiliary variables: public ArrayList trials = null;
public Individual[] context = null;
The first variable (trials) is available for you to maintain the results of each trial performed. Later on you will be asked to compute the final fitness of an individual, and at this point you can use this variable to do the final calculation. The variable is initially null, and after fitness is assessed it will be reset to null again. You are free to use this variable as you like (or ignore it entirely), Most commonly you’d store each trial in the trials variable as a java.lang.Double, with higher values being considered better. You’re free to do something else, but if you do, be sure to read Section 7.1.5 first. Even if your trials are not java.lang.Double, they must be immutable: their internals cannot be modified, so they can be pointer-copied and not just cloned.
The second variable (context) is available for you to maintain the context of the best trial discovered for the Individual. It’s typically only useful when you’re doing Cooperative Coevolution, where it’s important to retain not only the performance of the Individual (his Fitness), but which collaborating Individuals made it possible for him to achieve that Fitness. Again, this is an optional variable though often useful. If context is
231
used, it’s assumed that the slots of context hold each collaborator, except for the Individual himself, whose slot is set to null. Section 7.1.4.2 discusses the issue of context in more detail.
7.1.2 Grouped Problems
Since Coevolution involves evaluating multiple Individuals at once, it will require a new kind of Problem which takes multiple Individuals at a time. This Problem Form is defined by ec.coevolve.GroupedProblemForm. Evaluation in coevolution involves multiple trials, along these lines:
1. Performance scores of Individuals are cleared.
2. Individuals are tested against each other in various matches (or with one another in various collabo- rative problems). These trials cause trial performance scores of the Individuals to accumulate. If the trials are cooperative, the best trial found for a given Individual (its context) is maintained.
3. The final Fitnesses of the Individuals are set based on the performance scores over all the trials.
It’s up to you to store the trial results and eventually form them into final Fitness values, as discussed later. It’s also up to you to maintain the best context if you find this useful, as discussed later as well (in Section 7.1.4.2). GroupedProblemForm will help you by defining three methods to do these various portions of the evaluation process. You’ll need to implement all three methods:
ec.coevolve.GroupedProblemForm Methods
public abstract void preprocessPopulation(EvolutionState state, int thread, boolean[] prepareForAssessment,
boolean countVictoriesOnly)
Called prior to the evaluation of a Population, mostly to clear the trials of Individuals. Only clear the trials for Individuals in Subpopulations for which prepareForAssessment is true. Note that although this method is not static, you should not assume that this method will be called on the same Problem as is used later for evaluate(...). Thus don’t use this method to set any instance variables in the Problem for later use. If countVictoriesOnly is true, the method being used is SingleEliminationTournament. Commonly you’ll use this method to create a brand-new trials ArrayList for every Fitness of every Individual in every Subpopulation.
void evaluate(EvolutionState state, Individual[] individuals, boolean[] updateFitness, boolean countVictoriesOnly,
int[] subpops, int thread)
Evaluates the individuals in a single trial, setting their performance scores for that trial. Each individual will be from a certain subpopulation, specified in subpops. In some versions of coevolution, only certain individuals are supposed to have their performance scores updated (the others are acting as foils). In this case, the relevant individuals will be indicated in the updateFitness array. Typically you’ll update fitness by adding trial results to the trials ArrayList. For any individual for whom you’re updating trials, also set the Fitness value to reflect that one trial: this allows Single Elimination Tournament to compare Individuals based on this notional Fitness value. In doing so, do not set the evaluated flag for Individuals.
If you’re doing cooperative coevolution, in this method you’ll also probably want to maintain the context of the trial (the collaborating Individuals) if it’s produced the best results so far. More on that in Section 7.1.4.2.
public abstract void postprocessPopulation(EvolutionState state, int thread, boolean[] assessFitness,
boolean countVictoriesOnly)
Called after evaluation of a Population to form final Fitness scores for the individuals based on the various performance scores they accumulated during trials; and then to set their evaluated flags to true. Only assess the Fitness and set the evaluated flags for Individuals in Subpopulations for which assessFitness is true. Note that although this method is not static, you should not assume that this method will be called on the same Problem as was used earlier for evaluate(...). You’ll probably want to set the trials variable to null to let it garbage collect.
Do not assume that Individuals will have the same number of trials: in several versions of Coevolution this will not be the case.
232
You’re going to have to do some work to make sure that your Fitnesses are properly updated, and this depends on the kind of Coevolution you choose to do. So how do you keep track of trials? If you’re just accumulating scores or wins, then you could use SimpleFitness and just increment it with each new win. But since different Individuals may have different numbers of trials, it’s possible that you may need to keep track of the number of trials, or keep track of each trial result separately. Probably the best approach is to use the auxiliary variable trials found in ec.Fitness to store all your trials in an ArrayList. Thus you typically use GroupedProblemForm like this:
1. In preprocessPopulation(...), set the Fitness’s trials to a new ArrayList for all Individuals.
2. In evaluate(...), add to trials the results of this trial for each Individual for whom updateFitness is set. Then set the Fitness to reflect just the immediate results of the trial (particularly if countVictoriesOnly is set). This allows Single Elimination Tournament — if you’re using that — to determine who should advance to the next round. Make certain that whatever you add to trials is java.io.Serializable.
If you’re doing cooperative coevolution, determine if the new trial is superior to all the existing trials previously found in the trials array. If it is, set the context of this trial using Fitness.setContext(...).
3. In postprocessPopulation(...), set the Fitness to the final result. For example, you might set it to the average or the maximum of the various trials. Finally, set the Fitness’s trials to null to let it GC.
Notice that unlike in SimpleProblemForm (Section 3.4.1) there’s no describe(...) method. This is because to describe an Individual, you’d need to do so in the context of other Individuals. So we left it out.
Example Assume we’ve replaced the Fitness with our MyCoevolutionaryFitness class. Let’s create a Problem similar to the example in Section 4.1.1. In our Problem we take two Individuals, and the trial performance of an Individual is his vector values’ product minus that of his opponent. Obviously this is a stupid example since Individuals are in a total ordering Fitness-wise: so it hardly illustrates the issues in Coevolution. But it’ll suffice for the demonstration here. What we’ll do is set an Individual’s fitness to his average score over the trials performed.
package ec.app.myapp;
import ec.*;
import ec.simple.*;
import ec.vector.*;
import ec.coevolve.*;
public class MyCoevolutionaryProblem extends Problem implements GroupedProblemForm {
public void preprocessPopulation(EvolutionState state, Population pop,
boolean[] prepareForAssessment, boolean countVictoriesOnly) {
for(int i = 0; i < pop.subpops.length; i++)
if (prepareForAssessment[i])
for(int j = 0; j < pop.subpops[i].individuals.length; j++) {
SimpleFitness fit = (SimpleFitness)(pop.subpops[i].individuals[j].fitness);
fit.trials = new ArrayList();
}
}
public void evaluate(EvolutionState state, Individual[] ind, boolean[] updateFitness,
boolean countVictoriesOnly, int[] subpops, int threadnum) {
int[] genome1 = ((IntegerVectorIndividual)ind[0]).genome;
int[] genome2 = ((IntegerVectorIndividual)ind[1]).genome;
double product1 = 1.0;
double product2 = 1.0;
for(int x=0; x
int targetY = target % 10;
int minY = targetY – distance < 0 ? 0 : targetY - distance;
int maxY = targetY + distance > 9 ? 9 : targetY + distance;
7This is what’s done in MuCommaLambdaBreeder (Section 4.1.2) to enable the magic of ESSelection always picking a certain individual.
8I didn’t say it’d be useful: just an easy demonstration!
242
int x = minX + state.random(thread).nextInt(maxX – minX + 1);
int y = minY + state.random(thread).nextInt(maxY – minY + 1);
return x * 10 + y;
}
}
7.2.2 Spatial Breeding
To breed, you can use the Space ECJ has already done for you: a 1D Space. ec.spatial.Spatial1DSubpopulation can be set up to be either toroidal (the default) or non-toroidal. To use it as toroidal, you might say:
pop.subpop.0 = ec.spatial.Spatial1DSubpopulation
pop.subpop.0.toroidal = true
You’ll also need a selection procedure which understands how to pick using Spaces. ECJ provides one such procedure: ec.spatial.SpatialTournamentSelection. This selection operator picks its tournament members randomly from the Subpopulation but constrained so that they’re within distance of the target index. That is: it uses getIndexRandomNeighbor(…) to pick them.
SpatialTournamentSelection has three parameter options above and beyond the standard TournamentSe- lection operators. Let’s say that SpatialTournamentSelection is our pipeline’s first source:
pop.subpop.0.species.pipe.source.0 = ec.spatial.SpatialTournamentSelection
pop.subpop.0.species.pipe.source.0.size = 2
pop.subpop.0.species.pipe.source.0.pick-worst = false
pop.subpop.0.species.pipe.source.0.neighborhood-size = 3
pop.subpop.0.species.pipe.source.0.ind-competes = false
pop.subpop.0.species.pipe.source.0.type = uniform
The size parameter should be obvious: it’s TournamentSelection’s tournament size. Likewise the pick-worst parameter determines whether we’re picking the fittest or the least fit of the tournament. The remaining three parameters are as follows. neighborhood-size defines the distance from the target in- dex. If ind-competes is true, then at least one member of the tournament is guaranteed to be the target Individual itself. Last, if type is uniform, then an individual is picked simply using getIndexRandomNeigh- bor(…). However you also have the option of doing a random walk through the space. The walk will be neighborhood-size steps and each step will move to a neighbor immediately bordering your current position (that is, one chosen when you pass in 1 as the distance to getIndexRandomNeighbor. Random walks of this kind approximate a Gaussian distibution centered on the target Individual. You can choose to do a random walk rather than uniform selection with:
pop.subpop.0.species.pipe.source.0.type = random-walk
You can of course use the default parameters for this kind of stuff. Note that since the default parameter base is different, you need to specify all the standard Tournament Selection stuff under this default base, not under Tournament Selection’s own default base (select.tournament):
spatial.tournament.size = 2
spatial.tournament.pick-worst = flase
spatial.tournament.neighborhood-size = 3
spatial.tournament.ind-competes = false
spatial.tournament.type = uniform
In order for SpatialTournamentSelection to work, you’ll have to have the target Individual set. This is the job of ec.spatial.SpatialBreeder, a simple extension of SimpleBreeder. The procedure works like this. For each Individual i in the new Subpopulation to fill:
243
1. SpatialBreeder sets the target index to i.
2. SpatialBreeder requests one Individual from its Pipeline
3. SpatialTournamentSelection methods in that Pipeline use this index to compute distances and pick Individuals
Warning: Note that although SpatialBreeder is enumerating over the new Subpopulation, the SpatialTour- namentSelection operator is using the target index to pick Individuals from the old Subpopulation. This means that Subpopulations must not change in size from generation to generation.
SpatialBreeder is very simple: there are no parameters to set up at all.
7.2.3 Coevolutionary Spatial Evaluation
The spatial package also enables one of many9 possible approaches to doing coevolution in a spatially embedded Subpopulation. Recall from Section 7.1.4 that the ec.coevolve.MultiPopCoevolutionEvaluator class performs multi-population coevolution: each Individual is tested N times, each time by grouping it with one Individual from each of the other Subpopulations. These other Individuals are called collaborators. Theec.spatial.SpatialMultiPopCoevolutionaryEvaluator extends this by allowing you to pick those collaborators in a spatial manner: their locations in their respective Subpopulations are near to the position of the target Individual in his Subpopulation:
eval = ec.spatial.SpatialMultiPopCoevolutionaryEvaluator
The way this is done is by using the SpatialTournamentSelection class to pick collaborators, based on the target Individual’s index. Two kinds of collaborators can be picked with a Selection Method in Multi- PopCoevolutionEvaluator: members of the current Population, and members of the previous generation’s Population. You need to be careful with the first situation, since the members don’t have their Fitnesses set, and so you can’t use a Selection Method which is based on Fitness. In the examples in Section 7.1.4, we used RandomSelection. But for spatial coevolution, we can do a “random” selection which simply picks a collaborator based on the neighborhood from the target Individual. We do this by setting the size parameter in SpatialTournamentSelection — the tournament size — to 1:
eval.subpop.0.select-current = ec.spatial.SpatialTournamentSelection
# It’s important that the size be 1 —
# this causes ’random’ selection, so Fitness is not considered
eval.subpop.0.select-current.size = 1
eval.subpop.0.select-current.pick-worst = false
eval.subpop.0.select-current.neighborhood-size = 3
# It’s also important that this value be false —
# otherwise the target index will always be selected
eval.subpop.0.select-current.ind-competes = false
eval.subpop.0.select-current.type = uniform
eval.subpop.1.select-current = ec.spatial.SpatialTournamentSelection
eval.subpop.1.select-current size = 1
eval.subpop.1.select-current pick-worst = false
eval.subpop.1.select-current neighborhood-size = 3
eval.subpop.1.select-current ind-competes = false
eval.subpop.1.select-current type = random-walk
9The other obvious approach would be a variant of ec.coevolve.CompetitiveEvaluator where competitors to test Individuals in a population are selected based on nearness to the Individual. Perhaps we might put this together one day.
244
You can also select collaborators from the previous generation based on spatial distance. In this case, you can have the Selection Method pick based on Fitness too (though you don’t have to if you don’t want to). For example:
eval.subpop.0.select-prev = ec.spatial.SpatialTournamentSelection
eval.subpop.0.select-prev.size = 2
eval.subpop.0.select-prev.pick-worst = false
eval.subpop.0.select-prev.neighborhood-size = 3
eval.subpop.0.select-prev.ind-competes = false
eval.subpop.0.select-prev.type = uniform
eval.subpop.1.select-prev = ec.spatial.SpatialTournamentSelection
eval.subpop.1.select-prev.size = 7
eval.subpop.1.select-prev.pick-worst = false
eval.subpop.1.select-prev.neighborhood-size = 4
eval.subpop.1.select-prev.ind-competes = true
eval.subpop.1.select-prev.type = random-walk
The remaining parameters are basically just like those in MultiPopCoevolutionaryEvaluator (see Sec- tion 7.1.4), for example:
eval.num-current = 4
eval.num-prev = 6
eval.subpop.0.num-elites = 5
Warning: Just like was the case for Spatial Breeding, Spatial Multi-Population Coevolution requires that Subpopulations not change in size from generation to generation.
7.3 Particle Swarm Optimization (The ec.pso Package)
The ec.pso package provides a basic framework for Particle Swarm Optimization (PSO).
PSO differs from most other population-based optimization methods in that the individuals never die, never are selected, and never breed. Instead they just undergo a directed mutation influenced by personal
best results, neighborhood (“informant”) best results, and global best results.
ECJ’s PSO algorithm works like this. Each individual is an instance of the class ec.pso.Particle, which is
itself a subclass of ec.vector.DoubleVectorIndividual. Because it’s a DoubleVectorIndividual, you should use the associated species ec.vector.FloatVectorspecies.
Particles hold various information:
• The particle’s (real-valued) genome (or, in PSO parlance, its “location”).
• The particle’s velocity. This is equal to the Particle’s present genome, minus the genome it had last generation.
• The best genome the particle has ever had (its personal best)
• The fitness the particle had when it held its personal best genome.
• The best genome ever discovered from among the neighbors or informants of the particle.
• The fitness assigned to the best-of-neighbors genome
• The indexes into the subpopulation array of the members of the subpopulation comprising the neigh- bors (or informants) of the particle.
The genome is stored in the DoubleVectorIndividual superclass. Other items are stored like this: 245
// my velocity
public double[] velocity ;
// the individuals in my neighborhood
public int[] neighborhood = null ;
// the best genome and fitness members of my neighborhood ever achieved
public double[] neighborhoodBestGenome = null;
public Fitness neighborhoodBestFitness = null;
// the best genome and fitness *I* personally ever achieved
public double[] personalBestGenome = null;
public Fitness personalBestFitness = null;
Notice that some values are initially null, and will be set when the particle discovers proper values for them. The velocity is at present initially all zero.
PSO has a special breeder called ec.pso.PSOBreeder. This breeder holds the global best genome ever discovered by the subpopulation, and its associated fitness:
public double[][] globalBest = null ; // one for each subpopulation
public Fitness[] globalBestFitness = null;
Note that PSOBreeder is single-threaded for now, though that may change in the future.
Reading and Writing Particles have a lot of data, and this impacts on their ability to read and write to files,
streams, etc. Some notes:
• If you are writing a Particle for human consumption (using printIndividualForHumans()), it will write out just like a DoubleVectorIndividual: velocity information, neighborhood information, personal information, etc.
• If you are writing a Particle meant to be read by a computer (using printIndividual() or writeIndividual()), it will write out the DoubleVectorIndividual information, followed by all the auxiliary information except for global best information, which is stored in PSOBreeder and not Particle.
Updating Each timestep, all particles simultaneously compute three vectors: the difference between the particle’s personal best and the particle’s genome (that is, a vector towards the personal best), the difference between the particle’s neighborhood best and the particle’s genome, and finally the difference between the global best and the particle’s genome. These vectors essentially act to move the particle towards the personal best, neighborhood best, and global best.
Then the particle updates its velocity as the weighted sum of its old velocity and these three vectors. The four weights are global values and are defined in PSOBreeder as:
public double velCoeff;
public double personalCoeff;
public double informantCoeff;
public double globalCoeff;
// coefficient for the velocity
// coefficient for self
// coefficient for informants/neighbors
// coefficient for global best
Finally, each particle updates its location (genome) as the sum of the old genome and the new velocity, effectively moving the particle “towards” these respective values.
These coefficients are specified fairly straightforwardly.
246
breed.velocity-coefficient = 0.7
breed.personal-coefficient = 0.4
breed.informant-coefficient = 0.4
breed.global-coefficient = 0.0
Notice that the global-coefficient is set to 0 in the example above: we’ve found it’s not very good and best left out. And we’re not alone — a lot of current PSO practice does this as well.
PSO determines the neighbors or informants of a particle in one of three ways. Both require that you specify a neighborhood size (the number of informants to the individual), for example:
# We strongly suggest this be an even number if you’re doing toroidal (see below)
breed.neighborhood-size = 10
Then you can either have ECJ:
• Select informants at random without replacement for each particle at the beginning of the run (not including the particle itself).
• Select informants at random without replacement for each particle, not including the particle itself. This is done every single generation.
• Select the ⌊N/2⌋ informants immediately below the of the particle’s position in the population, and the ⌈N/2⌉ informants immediately above the particle’s position. This is toroidal (wrap-around), is done at the beginning of the run, and does not include the particle itself.
These three options are random, random-each-time, and toroidal respectively:
breed.neighborhood-style = random
#breed.neighborhood-style = random-each-time
#breed.neighborhood-style = toroidal
(Note that the 2007 PSO C standard suggests that it’s using random-each-time, but we’ve informally found random or toroidal to perform better.) To this collection of informants, you can also add the particle itself with:
breed.include-self = true
The default is false in ECJ, but in historical forms of PSO this was set to true. Finally, you’ll also need to specify the breeder and individual:
breed = ec.pso.PSOBreeder
pop.subpop.0.species = ec.vector.FloatVectorSpecies
pop.subpop.0.species.ind = ec.pso.Particle
The rest of the parameters are fairly standard, though since we’re using FloatVectorSpecies, we’ll need to specify a pipeline and various mutation and crossover methods even though we don’t use them. We can pick dummy stuff for this. Here’s an example of a full working PSO parameter file:
247
state = ec.simple.SimpleEvolutionState
init = ec.simple.SimpleInitializer
finish = ec.simple.SimpleFinisher
exch = ec.simple.SimpleExchanger
breed = ec.pso.PSOBreeder
eval = ec.simple.SimpleEvaluator stat = ec.simple.SimpleStatistics stat.file $out.stat
breedthreads = auto
evalthreads = auto
checkpoint = false
checkpoint-modulo = 1
checkpoint-prefix = ec
generations = 1000
quit-on-run-complete = true
pop = ec.Population
pop.subpops = 1
pop.subpop.0 = ec.Subpopulation
pop.subpop.0.size = 1000
pop.subpop.0.duplicate-retries = 2
pop.subpop.0.species = ec.vector.FloatVectorSpecies
pop.subpop.0.species.ind = ec.pso.Particle
pop.subpop.0.species.fitness = ec.simple.SimpleFitness
# You have to specify some kind of dummy pipeline even though we won’t use it
pop.subpop.0.species.pipe = ec.vector.breed.VectorMutationPipeline
pop.subpop.0.species.pipe.source.0 = ec.select.TournamentSelection
select.tournament.size = 2
# You also have to specify a few dummy mutation parameters we won’t use either
pop.subpop.0.species.mutation-prob = 0.01
pop.subpop.0.species.mutation-stdev = 0.05
pop.subpop.0.species.mutation-type = gauss
pop.subpop.0.species.crossover-type = one
# Here you specify your individual in the usual way
pop.subpop.0.species.genome-size = 100
pop.subpop.0.species.min-gene = -5.12
pop.subpop.0.species.max-gene = 5.12
# Problem. Here we’re using ECSuite’s rastrigin function
eval.problem = ec.app.ecsuite.ECSuite
eval.problem.type = rastrigin
# PSO parameters
breed.velocity-coefficient = 0.7
breed.personal-coefficient = 0.4
breed.informant-coefficient = 0.4
breed.global-coefficient = 0.0
breed.neighborhood-size = 10
breed.neighborhood-style = random
breed.include-self = false
248
7.4 Differential Evolution (The ec.de Package)
The ec.de package provides a Differential Evolution framework and some basic DE breeding operators. Differential Evolution is notable for its wide range of operator options: ECJ only has a few of them defined, but others are not hard to create yourself.
Differential Evolution does not have a traditional selection mechanism: instead, children are created entirely at random but then must compete with their parents for survival. Additionally, Differential Evolution breeding operators are somewhat opaque and not particularly amenable to mixing and matching. Because of this, we have eschewed the Breeding Pipeline approach for doing breeding. Instead, Differential Evolution simply has different Breeders for each of its breeding operator approaches. Differential Evolution operators are defined by creating a subclass of ec.de.DEBreeder. We have three such operators implemented, including the default in ec.de.DEBreeder itself: more on them in a moment.
7.4.1 Evaluation
We implement DE’s unusual selection procedure through a custom SimpleEvaluator called ec.de.DEEvaluator. It turns out this code could have been done in ec.de.DEBreeder (making the package simpler) but then various Statistics objects wouldn’t report the right values. So we made a simple Evaluator to do the job. DEEvaluator is very simple subclass of SimpleEvaluator and works just like it, including multithreaded evaluation. You specify it as follows:
eval = ec.de.DEEvaluator
7.4.2 Breeding
Before we start, note that Differential Evolution does not use ECJ’s breeding pipelines. However, ECJ species still expect some kind of pipeline even though it won’t be used. So you might define a default pipeline for a species like this:
# This pipeline won’t be used, it’s just a dummy, pick some simple
stuff pop.subpop.0.species.pipe = ec.breed.ReproductionPipeline
pop.subpop.0.species.pipe.source.0 = ec.select.FirstSelection
If you’re using DoubleVectorIndividual (and you probably are), you’ll need some default mutation and crossover stuff too, though it again won’t be used. Some of this stuff are required (but unused parameters), others are here just to quiet warnngs:
pop.subpop.0.species.mutation-prob = 1.0 pop.subpop.0.species.mutation-type = reset
pop.subpop.0.species.crossover-type = one
Okay back to breeding. DEBreeder serves both as the superclass of all DE breeding operators, and also implements the basic “classic” Differential Evolution breeding operator, known as DE/rand/1/bin. We begin by talking about the top-level elements of DEBreeder, then we discuss DEBreeder’s specific default operator, followed by other operators.
Most Differential Evolution breeding operators work as follows. For each Individual ⃗a in the Population (recall that in Differential Evolution all Individuals are DoubleVectorIndividuals, so we’ll treat them as vectors), we will create a single child ⃗c. After this is done, we replace each parent with its child if the child is as good or better than the parent (this last part is done in DEEvaluator).
Children are usually, but not always created by first generating a child d⃗ through a combination, of sorts, of Individuals than ⃗a. We then perform a crossover between d and a, essentially replacing parts of d with a, which results in the final child c. Thus there are often two parts to creating children: first building d, then crossing over d with a.
249
The whole creation process is handled by a single method:
ec.de.DEBreeder Methods
public DoubleVectorIndividual createIndividual(EvolutionState state, int subpop, int index, int thread)
Creates and returns a new Individual derived from the parent Individual found in state.population.subpops[subpop].individuals[index], using other individuals chosen from state.population.subpops[subpop] as necessary.
The default implementation performs the DE/rand/1/bin operation, discussed later. If you want to create a whole new breeding operator, you’ll largely need to override this individual-creation method.
Because crossover is a common operation, ECJ has broken it out into a separate method. You can override just this method, if you like, to customize how crossover is done.
ec.de.DEBreeder Methods
public DoubleVectorIndividual crossover(EvolutionState state, DoubleVectorIndividual target,
DoubleVectorIndividual child, int thread)
Crosses over child (earlier referred to as d⃗) with target (earlier referred to as ⃗a), modifying child. child is then returned.
The default implementation performs uniform crossover: for each gene, with independent probability Cr, the child’s gene will be replaced with the target’s gene. One child gene, chosen at random, is guaranteed to not be replaced.
The Cr value, which must be between 0.0 and 1.0, is set by a parameter: breed.cr = 0.5
And it’s stored in the following DEBreeder instance variable:
public double Cr;
You can leave this parameter unspecified: as mentioned before, some DEBreeder subclasses do no crossover and so don’t need it. But if crossover is performed and the parameter is unspecified (it’s set to the value ec.DEBreeder.CR UNSPECIFIED), it’ll be assumed to be 0.5 (probably not a good choice) and a warning will be issued.
The combination of various Individuals to form d⃗ in the first place is usually a mathematical vector operation involving certain variables. The most common variable, used by all the operators in this package, is a scaling factor F, which ranges from 0.0 to 1.0. We define F like this:
breed.f = 0.6
The F parameter is then stored in the following DEBreeder instance variable: public double F;
Unlike Cr, F is required as a parameter.
Some DE breeding operators need to know the fittest Individual in the Subpopualtion. Thus prior to constructing individuals, DEBreeder first computes this whether it’s used or not. The location of the fittest Individuals, indexed by Subpopulation, are stored in the following DEBreeder variable:
public int[] bestSoFarIndex;
For example, you can get the fittest Individual in Subpopulation 0 as: 250
DoubleVectorIndividual bestForSubpopZero = (DoubleVectorIndividual)
(state.population.subpops[0].individuals[bestSoFarIndex[0]]);
Last but not least, DEBreeder must store away the old Population before it is overwritten by the new child Individuals. This is because in DEEvaluator the original parents in the old Population get a chance to displace the children if the children are not sufficiently fit. So we need to keep the parents around until then. The parents are stored in DEBreeder as:
public Population previousPopulation;
This value is initially null, since there are no parents in the first generation. But in successive generations it’s set to the parents.
A final note: though evaluation is multithreaded, breeding at present is not. The breedthreads parameter has no effect.
7.4.2.1 The DE/rand/1/bin Operator
The “classic” DE breeding operator, DE/rand/1/bin, is the default operator implemented by DEBreeder itself. The implementation follows that found on page 140 of the text Differential Evolution [17]. It works as follows. For each Individual ⃗a in the Subpopulation, we select three other Individuals at random. These
⃗⃗⃗⃗ Individuals must be different from one another and from ⃗a. We’ll call them r0, r1, and r2. We create a child d
whose values are defined as:
di =r0i +F×(r1i −r2i)
We then cross over d with a, producing a final child c, which is then placed in the new Subpopulation. Note the use of the F parameter.
To use this operator, you’ll need to specify DEBreeder as the breeder, and of course also set the F and Cr parameters as well. For example:
breed = ec.de.DEBreeder
breed.f = 0.6
breed.cr = 0.5
This operator can produce values which are outside the min/max gene range. You’ll need to specify whether or not you wish to force the operator to only produce values bounded to within that range. For example, to turn of bounding and allow values to be anything, you would say (on a per-Species basis):
pop.subpop.0.species.mutation-bounded = false
Otherwise you’d say:
pop.subpop.0.species.mutation-bounded = true
If you bound the operator, then it will try repeatedly with new values of r0, r1, and r2, until it produces a valid individual di.
7.4.2.2 The DE/best/1/bin Operator
The class ec.de.Best1BinDEBreeder, a subclass of DEBreeder, implements the DE/best/1/bin operator, plus
“random jitter”, as found on page 140 of the text Differential Evolution [17]. It works as follows. For each
Individual⃗a in the Subpopulation, we first identify the fittest Individual⃗b in the Subpopulation, and also
select two other Individuals at random. These Individuals must be different from one another, from⃗b, and ⃗⃗⃗
from⃗a. We’ll call them r1 and r2. We create a child d whose values are defined as: 251
⃗⃗⃗
di =⃗b + jittter() × (r1i − r2i)
We then cross over d with a, producing a final child c, which is then placed in the new Population. jitter()
produces different random numbers for each i. It’s define as:
jitter() = F + FNOISE × (random(0, 1) − 0.5)
The random(0,1) function returns a random number between 0.0 and 1.0 inclusive. Again, note the use of the F parameter. The FNOISE constant is typically 0.001. It’s defined by the parameter breed.f-noise and is stored in the Best1BinDEBreeder instance variable:
public double F NOISE;
To use this breeding operator, you’ll need to specify the breeder and the various parameters:
breed = ec.de.Best1BinDEBreeder
breed.f = 0.6
breed.cr = 0.5
breed.f-noise = 0.001
This operator can produce values which are outside the min/max gene range. You’ll need to specify whether or not you wish to force the operator to only produce values bounded to within that range. For example, to turn of bounding and allow values to be anything, you would say (on a per-Species basis):
pop.subpop.0.species.mutation-bounded = false
Otherwise you’d say:
pop.subpop.0.species.mutation-bounded = true
If you bound the operator, then it will try repeatedly with new values of r1 and r2, until it produces a valid individual di.
7.4.2.3 The DE/rand/1/either-or Operator
The class ec.de.Rand1EitherOrDEBreeder, again a subclass of DEBreeder, implements the DE/rand/1/either-or
operator, as found on page 141 of the text Differential Evolution [17]. It works as follows. For each Individual
⃗a in the Subpopulation, we select three other Individuals at random. These Individuals must be different
⃗⃗⃗ ⃗ from one another and from ⃗a. We’ll call them r0, r1, and r2. Then with probability PF, we create a child d
defined in the same way as the DE/rand/1/bin operator:
di =r0i +F×(r1i −r2i)
… else we create the child like this:
di =r0i +0.5×(F+1)×(r1i +r2i −2×r0i)
We do not cross over d with a: we simply return d as the child.
The PF probability value, which must be between 0.0 and 1.0, is defined by the parameter breed.pf and is stored in the Rand1EitherOrDEBreeder instance variable:
public double PF;
To use this breeding operator, you’ll need to specify the breeder and the various parameters:
252
⃗⃗
breed = ec.de.Rand1EitherOrDEBreeder
breed.f = 0.6
breed.pf = 0.5
Note that the Cr parameter is not used, since no crossover is performed.
This operator can produce values which are outside the min/max gene range. You’ll need to specify whether or not you wish to force the operator to only produce values bounded to within that range. For example, to turn of bounding and allow values to be anything, you would say (on a per-Species basis):
pop.subpop.0.species.mutation-bounded = false
Otherwise you’d say:
pop.subpop.0.species.mutation-bounded = true
If you bound the operator, then it will try repeatedly with new values of r0, r1, and r2, until it produces a valid individual di.
7.5 Multiobjective Optimization (The ec.multiobjective Package)
ECJ has three packages which handle multiobjective optimization: the ec.multiobjective package, and two concrete implementations (SPEA2 and NSGA-II): the ec.multiobjective.spea2 and ec.multiobjective.nsga2 packages respectively.
7.5.0.4 The MultiObjectiveFitness class
The ec.multiobjective package contains a single new kind of Fitness. Multiobjective optimization differs from other optimization algorithms in that the Fitness of an Individual is not a single value but rather consists of some N objectives, separate values describing the quality of the Individual on various aspects of the Problem. Thus the primary class overridden by multiobjective algorithms is a special version of Fitness called ec.multiobjective.MultiObjectiveFitness. Internally in this class the objective results are stored in an array of doubles:
public double[] objectives;
The number of objectives is the length of this array.
You create a MultiObjectiveFitness and define its objectives along these lines:
pop.subpop.0.species.fitness = ec.multiobjective.MultiObjectiveFitness
pop.subpop.0.species.fitness.num-objectives = 3
Though ECJ tends to assume that higher Fitness values are better, many multiobjective algorithms assume the opposite. Thus MultiObjectiveFitness has the option of doing either case. ECJ assumes lower objective values are better when you state:
pop.subpop.0.species.fitness.maximize = false
You can also state maximization (versus minimization) on a per-objective basis, for example:
pop.subpop.0.species.fitness.maximize.0 = false
pop.subpop.0.species.fitness.maximize.1 = true
Per-objective settings override global settings.
In any event, true is the default setting. Maximization settings are stored in the variable:
253
⃗⃗⃗
public boolean[] maximize;
Similarly, you can also set the minimum and maximum objective values on a per-objective basis:
pop.subpop.0.species.fitness.min.0 = 0.0
pop.subpop.0.species.fitness.max.0 = 2.0
pop.subpop.0.species.fitness.min.1 = 1.0
pop.subpop.0.species.fitness.max.1 = 3.5
pop.subpop.0.species.fitness.min.2 = -10
pop.subpop.0.species.fitness.max.2 = 0
The default minimum is 0.0 and the default maximum is 1.0. If you like you can set global minimum and maximum values instead:
pop.subpop.0.species.fitness.min = 0.0
pop.subpop.0.species.fitness.max = 2.0
Local min/max values, if set, override the global values. The resulting minimum and maximum values are stored in the following arrays:
public double[] minObjectives;
public double[] maxObjectives;
Note that these min/max arrays are shared 10 among all Fitness objects cloned from the same Prototype. So you should treat them as read-only.
You can get and set objectives via functions, which have the added benefit of double-checking their min/max validity:
ec.multiobjective.MultiObjectiveFitness Methods
public int getNumObjectives()
Returns the number of objectives.
public double[] getObjectives()
Returns all the current objective values.
public double getObjective(int index) Returns a given objective value.
public void setObjectives(EvolutionState state, double[] objectives)
Sets the objective values to the ones provided, double-checking that they are within valid minimum and maximum ranges (discussed in a moment)
public double sumSquaredObjectiveDistance(MultiObjectiveFitness other)
Returns the sum squared distance in objective space between this Fitness and the other. That is, if for a given objective i, this fitness has value Ai and the other has value Bi, this function will return ∑i(Ai − Bi)2.
public double manhattanObjectiveDistance(MultiObjectiveFitness other)
Returns the Manhattan distance in objective space between this Fitness and the other. That is, if for a given objective i, this fitness has value Ai and the other has value Bi, this function will return ∑i ||(Ai − Bi)||.
Since MultiObjectiveFitness is a Prototype, of course you can define these parameters using a default parameter base. For example:
10Why doesn’t Fitness store them in the Species or something? Because Fitness, for historical and not particularly good reasons, does not have a Flyweight relationship with any object.
254
pop.subpop.0.species.fitness = ec.multiobjective.MultiObjectiveFitness
multi.fitness.num-objectives = 3
multi.fitness.maximize = false
# Global objectives
multi.fitness.min = 0.0
multi.fitness.max = 2.0
# Local Overrides (heck, why not?)
multi.fitness.min.1 = 1.0
multi.fitness.max.1 = 3.5
multi.fitness.min.2 = -10
multi.fitness.max.2 = 0
7.5.0.5 The MultiObjectiveStatistics class
To help output useful statistics about the Pareto Front, ECJ has an optional but strongly encouraged Statistics class, ec.multiobjective.MultiObjectiveStatistics. This class largely overrides the finalStatistics method to do the following:
• All the individuals forming the Front are printed to the statistics log file.
• If a separate “front log file” is specified, a whitespace-delimited table of the objective values of each member of the Front, one per line, is written to this log. If the log is not specified, they’re printed to the screen instead.
If the Front is 2-objective, and the file is called, say, front.stat, you can easily view the Front results by firing up GNUPLOT and entering:
plot front.stat
The MultiObjectiveStatistics class is used, and the Front file defined, as follows:
stat = ec.multiobjective.MultiObjectiveStatistics stat.front = $front.stat
Keep in mind that none of this happens of the do-final parameter has been set to false. In addition to standard statistics-quieting features (see Section 3.7.3), you can also quiet the front-log writing. This is done with:
stat.silent.front = true
Various multiobjective optimization algorithms subclass from MultiObjectiveFitness to add auxiliary fitness values. We’d like those values to be included as columns in the Front summary when it is outputted to the screen by MultiObjectiveStatistics. To do this, MultiObjectiveFitness has two special methods which are overridden by subclasses:
ec.multiobjective.MultiObjectiveFitness Methods
public String[] getAuxilliaryFitnessNames()
Returns an array of names, one per auxiliary fitness element. These will appear as headers to their respective columns.
public double[] getAuxilliaryFitnessValues()
Returns an array of doubles, one per auxiliary fitness element, of the current values of those elements.
255
7.5.1 Selecting with Multiple Objectives
ECJ’s primary multiobjective algorithms don’t use MultiObjectiveFitness directly, but rather subclass it to add per-algorithm gizmos. But if you’re not using these algorithms, you can still use MultiObjectiveFitness in a more “traditional” generational algorithm. This requires some thought: at the end of the day, ECJ needs to select based on this Fitness mechanism. But if there is more than one objective value, how is this done?
You could just sum the objectives to form a final “fitness” value of course. If your SelectionMethod uses the raw fitness() value (for example, FitProportionateSelection from Section 3.5.2) this is exactly what MultiObjectiveFitness returns.
But the most common approach is instead to use some form of Pareto domination. Pareto domination works like this. Individual A Pareto dominates Individual B if and only if A is at least as good as B in all objectives, and is superior to B in at least one objective. Notice that there are three cases where A might not Pareto-dominate B:
• B Pareto-dominates A
• AandBhaveexactlythesameobjectivevaluesforallobjectives.
• A is superior to B in some objectives, but B is superior to A in other objectives.
MultiObjectiveFitness implements the betterThan and equivalentTo11 methods to return pareto domination results: betterThan is true if the Individual Pareto-dominates the one passed in. equivalentTo returns true if neither Pareto-dominates the other (either of the last two cases above).
If your SelectionMethod relies entirely on the betterThan and equivalentTo methods (such as Tourna- mentSelection), it’ll use Pareto domination to sort Individuals.
Various subclasses of MultiObjectiveFitness for different kinds of algorithms override the betterThan and equivalentTo methods to measure fitness differently. However you can still determine Pareto Domination with the following method:
ec.multiobjective.MultiObjectiveFitness Methods
public boolean paretoDominates(MultiObjectiveFitness other) Returns true if this fitness Pareto-dominates other.
7.5.1.1 Pareto Ranking
An alternative method, widely used in multiobjective algorithms but not implemented in MultiObjectiv- eFitness proper, is to perform Pareto ranking (sometimes called non-dominated sorting [23], which works as follows. The Pareto non-dominated front (or simply Pareto front) is the set of Individuals in a Sub- population who are dominated by no one else, including one another. All members of a Pareto front of a Subpopulation receive a ranking of 0. We then remove those members from consideration in the Subpop- ulation and form the Pareto front among the remaining members. These members receive a ranking of 1. We then remove those members from consideration as well and repeat. Ultimately every member of the Subpopulation receives a ranking. These rankings become fitness values, where lower rankings are preferred to higher rankings. Pareto ranking plays a major part in the NSGA-II algorithm (Section 7.5.2).
Here are methods for extracting the Pareto Front and various Pareto Front Ranks:
ec.multiobjective.MultiObjectiveFitness Methods
public static ArrayList partitionIntoParetoFront(Individual[] inds, ArrayList front, ArrayList nonFront)
Partitions inds into the front and the non-front Individuals. If front is provided, the front Individuals are placed there and returned. If front is not provided, a new ArrayList is created, the front is placed there, and it is returned. If nonFront is provided, the non-front Individuals are placed in it, else they are discarded.
11Perhaps now it might make sense why the method is called equivalentTo and not equalTo. 256
public static ArrayList partitionIntoRanks(Individual[] inds)
Partitions inds into Pareto Front ranks. Each rank is an ArrayList of Individuals. The ranks are then placed into an ArrayList in increasing order (worsening rank) starting with rank zero. This ArrayList is returned.
public static int[] getRankings(Individual[] inds)
For each individual, returns the Pareto Front ranking of that individual, starting at 0 (the best ranking) and increasing with worse Pareto Front ranks. Note that though this function is O(n), it has a high constant overhead because it does some boxing and hashing.
7.5.1.2 Archives
The most common — and often most effective — multiobjective optimization algorithms are very strongly elitist, essentially versions of the (μ + λ) evolution strategy (Section 4.1.2). Specifically, they split the Population into two groups: the primary Population, and an archive consisting largely of the Pareto front of nondominated Individuals discovered so far. Any time a new Individual is discovered which dominates members of this front, those members are removed from the elitist archive and the Individual is introduced to it. ECJ has two Pareto archive multiobjective algorithms: NSGA-II and SPEA2. We discuss them next.
7.5.2 NSGA-II (The ec.multiobjective.nsga2 Package)
The Non-dominated Sorting Genetic Algorithm Version II (or NSGA-II) [2] is essentially a version of the (μ + μ) evolution strategy using non-dominated sorting. It maintains and updates an archive (half the population) of the current best individuals (essentially the current estimate of the Pareto Front), and breeds the remaining population from the archive.
NSGA-II requires a Breeder to maintain the archive and handle the algorithm’s custom breed- ing; and an Evaluator to compute the non-dominated sorting. These classes are defined by the ec.mutiobjective.nsga2.NSGA2Breeder and ec.multiobjective.nsga2.NSGA2Evaluator classes respectively. The non-dominated sorting information is included in the fitness, and so NSGA-II uses a subclass of MultiObjec- tiveFitness called ec.mutiobjective.nsga2.NSGA2MultiObjectiveFitness. To output the Pareto Front for the user, we also include a special Statistics subclass called ec.mutiobjective.nsga2.NSGA2Statistics.
Simply replace MultiObjectiveFitness with this class:
pop.subpop.0.species.fitness = ec.multiobjective.nsga2.NSGA2MultiObjectiveFitness
You’ll need to set the number of objectives, and min/max objectives, etc., as usual, something like:
multi.fitness.num-objectives = 3
multi.fitness.min = 0.0
multi.fitness.max = 2.0
multi.fitness.min.1 = 1.0
multi.fitness.max.1 = 3.5
multi.fitness.min.2 = -10
multi.fitness.max.2 = 0
This class contains two special fitness measures: the rank and sparsity of the Individual. The rank is computed as the Pareto Rank of the individual. The sparsity is a measure of distance of the individual to others on the same rank. We like sparse individuals because we don’t want individuals all clustered in one area of the front. The fitness is simple: an Individual is superior to another if its rank is lower (better). If the same, an Individual is superior if its sparsity is higher. These two measures are stored in NSGA2MultiObjectiveFitness as the instance variables:
257
public int rank;
public double sparsity;
The NSGA2Evaluator computes and sets these values. You define it in the obvious way:
eval = ec.multiobjective.nsga2.NSGA2Evaluator
The NSGA2Breeder performs breeding using the archive. Again, defined in the obvious way:
breed = ec.multiobjective.nsga2.NSGA2Breeder
You’re responsible for setting up breeding pipelines like you see fit.
Note that although NSGA2Breeder is a subclass of SimpleBreeder, it cannot be used with elitism and will complain if you attempt to do so.
Where to get examples The package ec.app.moosuite has NSGA-II implementations of a number of famous multiobjective test problems. Be sure to read the moosuite.params file, where you can specify that you want to use NSGA-II for the problems.
7.5.3 SPEA2 (The ec.multiobjective.spea2 Package)
The Strength Pareto Evolutionary Algorithm 2 (SPEA2) [26] splits the Subpopulation in two two parts: the archive and the “regular” Subpopulation. Unlike the NSGA-II algorithm, this archive can vary in size. The size of the archive is a parameter in ec.multiobjective.spea2.SPEA2Subpopulation:
pop.subpop.0 = ec.multiobjective.spea2.SPEA2Subpopulation
pop.subpop.0.size = 100
pop.subpop.0.archive-size = 50
Each iteration SPEA2 updates the archive to include the Pareto front, plus (if there is room) additional fit individuals using a special domination-based fitness measure called strength. If there are too many Individuals to fit in the archive, the archive is trimmed by removing Individuals which are too close to one another. It then uses its special fitness measure to breed individuals from the Archive and place them into the “regular” Subpopulation.
To do this, SPEA2 needs to augment the MultiObjectiveFitness with a few additional values:
public double strength;
public double kthNNDistance;
public double fitness;
The first measure is the strength of an Individual, defined as the number of Individuals whom it dominates. The second measure (“distance”) is an inverted measure of how far the Individual is from other Individuals in the population. The final measure (the actual fitness) is the sum of the so-called SPEA2 “raw fitness” and the distance measure: higher fitness values are worse.
This version of MultObjectiveFitness is called ec.multiobjective.spea2.SPEA2MultiObjectiveFitness, and we include it as well, for example,
pop.subpop.0.species.fitness = ec.multiobjective.spea2.SPEA2MultiObjectiveFitness
multi.fitness.num-objectives = 2
multi.fitness.min.0 = 1.0
multi.fitness.max.0 = 3.5
multi.fitness.min.1 = -10
multi.fitness.max.1 = 0
258
SPEA2 also requires a special Breeder and a special Evaluator:
eval = ec.multiobjective.spea2.SPEA2Evaluator
breed = ec.multiobjective.spea2.SPEA2Breeder
When breeding, SPEA2 has a special version of TournamentSelection which only selects among archive members. We’d include it in various places instead of other selection methods, for example something like:
pop.subpop.0.species.pipe.source.0 = ec.multiobjective.spea2.SPEA2TournamentSelection
Where to get examples The package ec.app.moosuite has SPEA2 implementations of a number of famous multiobjective test problems. Be sure to read the moosuite.params file, where you can specify that you want to use SPEA2 for the problems.
7.6 Meta-Evolutionary Algorithms
A Meta-Evolutionary Algorithm (or Meta-EA) is an evolutionary algorithm used to optimize the parameters of a second evolutionary algorithm. Meta-EAs were originally called “Meta-GAs” (for obvious reasons), and are closely related to the concept of hyperheuristics. ECJ implements Meta-EAs in a surprisingly clean fashion, using a single subclass of Problem called ec.eval.MetaProblem. Interestingly, that is the only class in the Meta-EA facility! The Meta-EA package was developed through collaboration with Khaled Ahsan Talukder, a GMU graduate student.
In ECJ, a Meta-EA works like this. We create an ordinary evolutionary process which evolves individuals in the form of DoubleVectorIndividuals. These individuals’ genomes notionally contain parameter values for a second ECJ system. To test an individual, we hand it to MetaProblem, which fires up another ECJ system (in the same Java process) using those parameter values and runs it. This happens some N times, and the mean best fitness over those N runs becomes the fitness of the DoubleVectorIndividual holding those parameter values.
Note that there are at least two levels of evolution.12 First there is the evolutionary process which is optimizing the parameters. We will refer to this process as the meta-level process. Then there is the evolutionary process whose being run with these parameters. We will refer to this process as the base-level process.
Within certain constraints, you can use pretty much any EA at the meta-level or at the base level. But there are some rules. First off, the meta-level individual must be a DoubleVectorIndividual. We will treat this individual as a heterogeneous individual (see Section 5.1.1.5), which allows the Meta-EA system to use genes of different types: integers, floats, booleans, and the like. Second the meta-level fitness must be one which can respond to setToMeanOf(…), and typically should be of the same kind of fitness as the base-level fitness. Generally this means: no multiobjective fitness at either level; and coevolutionary fitness might be a bit tricky. SimpleFitness is fine, as is KozaFitness.
Meta-EAs work very well in combination with ECJ’s distributed evaluator, which is good news because it’s the natural setting for using them! The idea here is to have the meta-level process as a master process, distributing meta-level individuals out to slaves, where each is tested by firing up a base-level ECJ process to assess it. And Meta-EAs also work nicely in multiple threads. This magic all works because ECJ is self-contained: ECJ processes do not interact with other ECJ processes even within the same thread.
7.6.1 The Two Parameter Files
Generally speaking, you’ll create two parameter files: a base-level parameter file and a meta-level parameter file. The base-level parameter file tells ECJ what the default parameters are for the base-level EA process
12You can of course have an EA which optimizes the parameters for an EA which optimizes the parameters for an EA, and so on. MetaProblem can handle this without much issue. But it’s a pretty rare need! So we’ll go with “two” for purposes of this discussion.
259
(some of these parameters will be modified to reflect the meta-level individual being tested). The meta-level parameter file defines the meta-level EA process.
The first step in constructing a Meta-EA is to get the base level working on its own without any parameter optimization. This is straightforward: it’s just a standard ECJ process.
Generally you want an ordinary EA at the base level. You’ll probably want to stay away from multi- objective optimization at the base level (how do you define “best individual of run”, and its fitness, in a multiobjective setting?). You can use coevolution in some cases, but note that the best individual of run, whose fitness is extracted, will be only the best individual of Subpopulation 0. This means that cooperative coevolution doesn’t make much sense, though 2-population competitive coevolution might make sense assuming that the primary (non-foil) subpopulation is 0.
So for example, we might have a base-level EA like this:
#### BASE-LEVEL FILE
evalthreads =
breedthreads =
seed.0 =
checkpoint =
checkpoint-modulo =
checkpoint-prefix =
state =
init =
finish =
exch =
breed =
eval =
eval.problem =
stat =
stat.file
generations =
quit-on-run-complete =
pop =
pop.subpops =
pop.subpop.0 =
pop.subpop.0.size =
pop.subpop.0.duplicate-retries =
pop.subpop.0.species =
pop.subpop.0.species.pipe =
pop.subpop.0.species.pipe.source.0 =
pop.subpop.0.species.pipe.source.0.source.0 =
pop.subpop.0.species.pipe.source.0.source.1 =
pop.subpop.0.species.fitness =
pop.subpop.0.species.ind =
pop.subpop.0.species.mutation-bounded =
pop.subpop.0.species.min-gene =
pop.subpop.0.species.max-gene =
pop.subpop.0.species.genome-size =
pop.subpop.0.species.mutation-prob =
pop.subpop.0.species.crossover-type =
pop.subpop.0.species.mutation-type =
pop.subpop.0.species.mutation-stdev =
select.tournament.size =
1
1
time
false
1
ec
ec.simple.SimpleEvolutionState
ec.simple.SimpleInitializer
ec.simple.SimpleFinisher
ec.simple.SimpleExchanger
ec.simple.SimpleBreeder
ec.simple.SimpleEvaluator
MyTestProblem
ec.simple.SimpleStatistics
$out.stat
1000
true
ec.Population
1
ec.Subpopulation
1000
2
ec.vector.FloatVectorSpecies
ec.vector.breed.VectorMutationPipeline
ec.vector.breed.VectorCrossoverPipeline
ec.select.TournamentSelection
same
ec.simple.SimpleFitness
ec.vector.DoubleVectorIndividual
true
-5.12
5.12
100
1.0
one
gauss
0.01
2
Let’s call this file base.params. It’s nothing special. Assuming you have a Problem subclass called 260
MyTestProblem, ECJ should run this file all by itself (independent of any meta stuff) without incident.
Next you need to create the meta-level file, where we set up evolution of a DoubleVectorIndividual. Let’s
call this file meta.params. Here’s one possibility. We’ll start with certain basic items (it won’t run yet):
#### META-LEVEL FILE
evalthreads =
breedthreads =
seed.0 =
checkpoint =
checkpoint-modulo =
checkpoint-prefix =
state =
init =
finish =
exch =
breed =
eval =
stat =
generations =
quit-on-run-complete =
pop =
pop.subpops =
pop.subpop.0 =
pop.subpop.0.size =
pop.subpop.0.duplicate-retries =
pop.subpop.0.species =
pop.subpop.0.species.ind =
# Stuff special to meta-evolution
eval.problem =
eval.problem.file =
eval.problem.set-random =
eval.problem.reevaluate =
eval.problem.runs =
stat.file
1
1
time
false
1
ec
ec.simple.SimpleEvolutionState
ec.simple.SimpleInitializer
ec.simple.SimpleFinisher
ec.simple.SimpleExchanger
ec.simple.SimpleBreeder
ec.simple.SimpleEvaluator
ec.simple.SimpleStatistics
50
true
ec.Population
1
ec.Subpopulation
50
2
ec.vector.FloatVectorSpecies
ec.vector.DoubleVectorIndividual
ec.eval.MetaProblem
base.params
true
true
1 $meta.stat
Notice that we have set apart six parameters (so far) special to meta-evolution:
• The Problem must be an ec.eval.MetaProblem.
• MetaProblem has a file parameter which points to the base.params file (wherever it is). This is how the MetaProblem identifies what file to use to set up base-level evolutionary processes.
• The reevaluate parameter, typically set to true. This tells MetaProblem to reevaluate individuals even if their evaluated flag has been set. This is because meta-level evolution of individuals is stochastic — it involves firing up an ECJ process underneath — and so every time an individual is evaluated its fitness is likely to be different.
• The set-random parameter, typically set to true. This tells MetaProblem to override the random number seeds of the base ECJ processes it constructs and instead seed them with random numbers.13
13Seeding a random number generator using random numbers from another number generator can be perilous, particularly for 261
•
•
7.6.2
Alternatively, you could tell the base process to seed itself using the wall clock time. This is slightly less safe as it runs the (very small) risk of having multiple parallel base processes having the same seed. To do this second option, instead of setting set-random to true, you’d say:
### In the Base Parameter File
seed.0 = time
I don’t suggest it though.
The runs parameter is set to the number of times you want to run a base-level ECJ process to test a certain meta-level individual. Because base-level processes are stochastic, the fitness at the meta-level is potentially very noisy. If you set this to 1, then only a single test is done: the meta-level individual’s fitness is set to the best fitness of run of the base process which was run using its parameters. Otherwise multiple tests are done and the fitness of the meta-level individual is set to the mean best fitness of run of all the multiple runs performed using those parameters. Based on our experiments [11], we suggest using a single test.
The stat.file file is set to $meta.stat, not $out.stat as usual. This is because the base-level process has $out.stat as its statistics file, and even if the base-level process is instructed not to write anything out to it, it’d probably be best if the two ECJ processes not write to the same file! So we renamed it to another filename.
Defining the Parameters
Next you’ll need to specify the parameters whose values we will evolve at the meta-level. In ECJ’s Meta-EA facility, every parameter value is stored as a double in a heterogeneous (Section 5.1.1.5) DoubleVectorIndi- vidual. However there are a variety of ways these doubles are interpreted:
• Type: float As a double (of course) between some min value and max value inclusive.
• Type: integer
• Type: boolean
0 and 1.
• (no type name)
As an integer between some min value and max value inclusive.
As a boolean (either the string true or false), represented in the genome by the values
As one of M strings, represented in the genome by the values 0 … M − 1.
A great many, though not all, ECJ parameters can be represented by one of the above options. To represent a parameter, you’ll need to specify its gene form for the heterogeneous DoubleVectorIndividual, and you’ll also need to specify the parameter name and other information about how the parameter is to be interpreted.
Let’s say that you have chosen to evolve the following parameters:
• pop.subpop.0.species.mutation-prob
• pop.subpop.0.species.mutation-type
A double-valued number ranging from (let’s say) 0.0 to
• pop.subpop.0.species.mutation-distribution-index An integer ranging from 0 to 10 inclusive.
• pop.subpop.0.species.alternative-polynomial-version A boolean value, that is, one of the strings true false
random number generators with simple internal seeds. But it’s probably fine in this case, because the way MersenneTwister uses its seed is to build an array of 624 random numbers from that seed using a Knuth generator; we then pulse the MersenneTwister 624 × 2 + 1 times to prime it, at which point these numbers have been throughly converted and cleaned out.
• pop.subpop.0.species.mutation-stdev 1.0 inclusive.
A double-valued number ranging from 0.0 to 1.0 inclusive. One of the following strings: reset gauss polynomial
262
Let’s start by setting up some some default mutation and crossover information for our meta-level individuals:
### In the Meta Parameter File
pop.subpop.0.species.min-gene = 0.0
pop.subpop.0.species.max-gene = 1.0
pop.subpop.0.species.mutation-prob = 0.25
pop.subpop.0.species.mutation-type = gauss
pop.subpop.0.species.mutation-stdev = 0.1
pop.subpop.0.species.mutation-bounded = true
pop.subpop.0.species.out-of-bounds-retries = 100
pop.subpop.0.species.crossover-type = one
Now we need to define information about each of the parameters. Note that in each case we define the name of the parameter, and the type of the parameter. We also define some mutation features; different parameter types should be mutated in the appropriate way.
We start with the number of parameters and the genome size (which will be the same thing, namely 5):
### In the Meta Parameter File
pop.subpop.0.species.genome-size = 5
eval.problem.num-params = 5
Now the first parameter. It’s a double, so its type is float (for “floating-point type”). We’ll use the default mutation settings (gaussian, 0.25 probability, 0.1 standard deviation, bounded to 0…1) for this gene, so we don’t specify anything special.
### In the Meta Parameter File
eval.problem.param.0 = pop.subpop.0.species.mutation-prob
eval.problem.param.0.type = float
Next the mutation-type parameter is one of 3 possible strings, represented in the genome as the integers 0 … 2. For this kind of parameter we don’t declare a type, but instead declare the number of values it can take on and what those values (as strings) are. Additionally, since it’s represented as an integer internally, we define the maximum gene to be 2 (the minimum gene was already declared in the defaults as 0). And we use integer-reset mutation, which is the appropriate mutation for genes of this type.
### In the Meta Parameter File
eval.problem.param.1 = pop.subpop.0.species.mutation-type
eval.problem.param.1.num-vals = 3
eval.problem.param.1.val.0 = reset
eval.problem.param.1.val.1 = gauss
eval.problem.param.1.val.2 = polynomial
pop.subpop.0.species.max-gene.1 = 2
pop.subpop.0.species.mutation-type.1 = integer-reset
The standard deviation parameter is also floating-point, ranging from 0…1, like the first parameter was.
### In the Meta Parameter File
eval.problem.param.2 = pop.subpop.0.species.mutation-stdev
eval.problem.param.2.type = float
The mutation distribution index parameter is an integer which can range from 0 to 10 inclusive, hence our max gene value below (remember that the min gene value was declared in the defaults already to be 0). Unlike the mutation-type parameter, which has three unrelated strings, here the integers have an explicit ordering: 5 is closer to 6 than it is to 9. For integer values of this kind it’s probably more appropriate to use something like integer-random-walk mutation than integer-reset mutation.
263
### In the Meta Parameter File
eval.problem.param.3 = pop.subpop.0.species.mutation-distribution-index
eval.problem.param.3.type = integer
pop.subpop.0.species.max-gene.3 = 10
pop.subpop.0.species.mutation-type.3 = integer-random-walk
pop.subpop.0.species.random-walk-probability.3 = 0.8
Finally, the alternative polynomial version parameter is a boolean value (a string of the form true or false). Booleans will be represented internally as integers (0 or 1), and integer-reset is the most appropriate form. Since the default min and max values are already 0.0 and 1.0, there’s no need to define them.
### In the Meta Parameter File
eval.problem.param.4 = pop.subpop.0.species.alternative-polynomial-version
eval.problem.param.4.type = boolean
pop.subpop.0.species.mutation-type.4 = integer-reset
Note that integer-reset randomizes the value. This isn’t very efficient if you only have boolean values (just 0 or 1): there’s a 50% chance that integer-reset does nothing at all, right? Nope. Recall that we set pop.subpop.0.species.out-of-bounds-retries = 100. This means that if we’re presently a 0, then ECJ will try resetting the gene up to 100 times until it sets it to a 1. So in this case integer-reset is more or less bit-flip mutation.
In every case, note that we define both parameters (eval.problem.n….) and corresponding gene information (pop.subpop.0.species…n ). You need to make sure that the right kind of parameter types match up with the right kinds of gene types.
7.6.3 Statistics and Messages
Meta-EAs can be chatty: you’re running lots of base-level ECJ processes and they’re writing out all sorts of messages to the screen. Furthermore, these processes are probably writing statistics files that are unnecessary and significantly slow them down. Once you’ve got things debugged and working well, it’s probably best to shut the base-level ECJ processes up before doing your big run.
Let’s start with eliminating the base-level ECJ statistics file. If you’re using SimpleStatistics, it normally creates a file and starts writing it. Buf if you say:
### In the Base Parameter File
stat.silent = true
… this will prevent this behavior even if the file name was defined in the base-level parameters.
Next, your base-level ECJ process will write all sorts of things to the screen. The aforementioned stat.silent parameter will quiet some of them but not all of them. To completely shut the base-level ECJ process up, you can set up things so that the stdout and stderr logs of the base-level ECJ process are
completely silenced. This is defined at the base-level again
### In the Base Parameter File
silent = true
In addition to the best fitnesses of N runs, used to determine the fitness of a meta-level individual, the Meta-EA system also keeps track of the best base-level individual discovered in any run each meta-level generation, and ultimately the best base-level individual ever discovered.
The best base-level individual is printed out via the MetaProblem’s describe(…) method. Ordinarily such individuals are only printed out to the statistics file at the very end of the run; but you also have the option of printing them out once every generation (which we suggest you do) with:
### In the Meta Parameter File
stat.do-per-generation-description = true
See Section 5.2.3.5 for more information on this parameter. 264
7.6.4 Populations Versus Generations
One very common parameter setting you may wish to evolve with a Meta-EA is how big you’d like your Population to be, where the number of generations shrinks as the Population grows so as to maintain a constant number of evaluations. Unfortunately by default ECJ treats the population size (or more properly, subpopulation sizes) and the number of generations as separate parameters.
But there’s an easy way around that. All you do is define your evolution in terms of number of evaluations rather than generations. Let’s say that you have a fixed budget of 64K (65536) evaluations. You can set various population sizes and let ECJ modify the generations appropriately like this:
### In the Base Parameter File
evaluations = 65536
### In the Meta Parameter File
eval.problem.param.3 = pop.subpop.0.size
eval.problem.param.3.num-vals = 5
eval.problem.param.3.val.0 = 16
eval.problem.param.3.val.1 = 32
eval.problem.param.3.val.2 = 64
eval.problem.param.3.val.3 = 128
eval.problem.param.3.val.4 = 256
pop.subpop.0.species.mutation-type.3 = integer-random-walk
pop.subpop.0.species.random-walk-probability.3 = 0.5
Because we’ve fixed the evaluations parameter, and are varying the pop.subpop.0.size parameter, the generations parameter is automatically inferred from the other two.
Note that we’ve set the population sizes to numbers which divide evenly into the desired number of evaluations (in this case, powers of 2). This is important so that when the number of generations is chosen, the total number of evaluations stays constant (ECJ rounds the evaluations down if it can’t divide the population size evenly into them). Also note that even though the parameters are strings, we’ve chosen to still use integer-random-walk, with a low probability, rather than reset mutation. You might ponder why.
7.6.5 Using Meta-Evolution with Distributed Evaluation
Because of its high cost, almost certainly the most common usage of a Meta-EA is in a massively parallel setting. The general idea is to farm out the base-level processes to remote slaves. To do this, we take advantage of ECJ’s master-slave evaluator. Before we go further, you should familiarize yourself with the master-slave distributed evaluator, in Section 6.1.
The meta-level evolution will take place on the master. The master will then ship meta-level individuals to slaves, which will then hand them to the MetaProblem to fire up a base-level process to test them:
### In the Meta Parameter File
eval.masterproblem = ec.eval.MasterProblem
eval.master.port = 5000
eval.masterproblem.job-size = 1
eval.masterproblem.max-jobs-per-slave = 1
eval.compression = true
eval.master.host = my.machine.ip.address
Some notes. First note that you must replace my.machine.ip.address in eval.master.host with the IP address of your master computer. Second, notice the job size and maximum number of jobs per slave are both set to 1. You almost certainly want this because the evaluation time is so long on the slaves: it’s
265
an entire evolutionary process. You probably don’t want to bulk up multiple jobs per slave, though you theoretically could if your evolutionary processes were very short.
Your slave’s parameter file is a new file called, let’s say, slave.params. Note that it points to meta.params in order to grab some of the master’s parameters.
### In the Slave Parameter File
# This file is run like this:
# java ec.eval.Slave -file slave.params
parent.0 = meta.params
Not complicated! However, in addition to various muzzling parameters (stat.silent, eval.problem.silent, etc) you probably had added to your meta parameter file, you might also want to add the following to your slave parameter file:
### In the Slave Parameter File
eval.slave.silent = true
This tells the slave to be completely silent when starting up and printing slave information. It’s often a good idea. Once again, you’d do this only when you’ve got everything debugged.
There is one important gotcha involved with combining meta-EAs with distributed evaluation, and that is the matter of tests. The meta-EA does some N tests of a single meta-level individual in order to assess its fitness, and each test is a base-level evolutionary run. The problem is that these tests are performed by the MetaProblem.
Here’s the issue. Let’s say you have 1000 machines at your disposal. You want to distribute an entire generation to these 1000 machines, and so you have decided to do 10 tests per meta-level individual, and to have a population of 100. So in the meta parameter file, we’ve set eval.problem.runs = 10 to turn on those tests.
Sounds great, right? Not so fast. The master-slave evaluator will distribute your 100 individuals out to 100 machines. Because the MetaProblem resides on the slave machines, each of those 100 machines will do 10 tests, while the other 900 machines stand by idle.
The problem is that ECJ is distributing the tests happens after the distribution out to the slaves. We need a way to break up the tests before slave distribution. The simplest way to do this is to use a different way of doing tests: SimpleEvaluator’s num-tests mechanism. We turn off tests in the MetaProblem (set it to 1) and instead turn on tests in the SimpleEvaluator. SimpleEvaluator also gives us options about how to merge those tests: we’ll stick with averaging like before:
### In the Meta Parameter File
eval.problem.runs = 1
eval.num-tests = 10
eval.merge = mean
What’s going on here exactly? When the SimpleEvaluator is told to evaluate its population, what it will do in this case is create a new population that is ten times the size of the old one. It then clones all the old population members ten times each and inserts them in the new population. Then it evaluates this new population. Finally, it gathers each of the ten clones and merges their fitnesses and reinserts the resulting fitness back into the original individual.
This hack helps us because when SimpleEvaluator has already made ten copies of each individual before it starts evaluating them. Thus each copy will get shipped off to a separate slave machine. Problem solved! If you aren’t using SimpleEvaluator at the meta-level, you’re out of luck with this approach. But it’s not hard to code a similar procedure in your own Evaluator subclass. For hints, see SimpleEvaluator’s private
expand(…) and contract(…) methods.
266
7.6.6 Customization
It may be the case that the four parameter types aren’t sufficient for your purposes. Never fear! You can override certain protected methods in MetaProblem to customize how parameters are mapped from the genome to parameter strings in the base-level database.
For each parameter, MetaProblem needs to store the name of the parameter and the kinds of values which the parameter may be to set to. MetaProblem performs mapping as follows. During setup(…), it calls a protected method called loadDomain(…), which reads the parameters and stores the type of the values for of each parameter, storing them all in a instance variable:
public Object[] domain;
Once built, this array does not change, and is shared among all instances of MetaProblem cloned from the same prototype. The array contains, for each parameter (and hence each gene in the genome), an object which indicates the parameter’s value type:
• double[0] Double floating-point values.
• int[0] Integer values.
• boolean[0] Boolean values (mapped to 0 and 1 for false and true)
• String[…] An array of strings which are valid parameter values. These are mapped to the integers 0 … String.length − 1
The first three cases simply identify the type as being a double, an int, or a boolean. The last case actually identifies all the valid strings the parameter may take on. In the case of a double or an int, we also need to know the minimum and maximum values. The minimum and maximum values for parameter number i are determined by the minGene(…) and maxGene(…) values called on the Species of the corresponding genome, passing in i.
During setup(…), the MetaProblem also loads a ParameterDatabase from the base parameter file. To evaluate a meta-level individual, it will copy this ParameterDatabase and then modify some of its parameters to reflect the settings in the meta-level individual. The modification procedure is done via the method modifyParameters(…), which is passed the individual and the database to modify.
To modify the database, modifyParameters(…), must first map this individual into the proper parameter value strings. It does this by repeatedly calling the method map(…), passing in the EvolutionState, the genome, the index of the gene of interest, and the Species (a FloatVectorSpecies). The Species is passed in to enable map(…) to call minGene(…) and maxGene(…) if necessary. map(…) will also certainly use the domain instance variable, of course. This method returns a String which holds the parameter value corresponding to the gene’s numerical value.
modifyParameters(…) also must identify the parameter names: it does this by consulting the meta-level ParameterDatabase to identify the parameter names. To do this, modifyParameters(…) must know the parameter base underlying the parameter names. This is stored in the variable:
public Parameter base;
(Forgive the overriding of the terms “base” and “parameter” here). For example, the name of base parameter 1 is:
base.push(P_PARAM).push(“1”)
All of these are opportunities for you to customize the parameter loading facility. For example, if you need another kind of parameter type, you’d override loadDomain(…) to load new kinds of information into the domain array, and then override map(…) to map genes using this new kind of parameter to appropriate strings.
267
Finally, modifyParameters(…) gives you nearly full control over the mapping process. But note that if you override modifyParameters(…), you’ll also need to override describe(…) in a similar fashion. See the source code for MetaProblem to work through this, including certain locks. describe(…) normally prints out the parameters and values corresponding to the given Individual, then also prints out the best underlying (base-level) individual discovered, which is stored in the array…
public Individual bestUnderlyingIndividual[]; // per subpopulation
One last method you might be interested in overriding: combine(…). This method is responsible for taking multiple fitness results and combining them into a final fitness for a meta-level individual. The default form calls fitness.setToMeanOf(…) to combine them using their average. But you can change this if you like.
ec.eval.MetaProblem Methods
protected void loadDomain(EvolutionState state, Parameter base) Constucts the domain array from the given parameter database.
public void modifyParameters(EvolutionState state, ParameterDatabase database, int run, Individual metaIndividual) Given a ParameterDatabase to modify, and an Individual, extracts the parameters from the individual and sets them in the database. The Individual is normally a DoubleVectorIndividual.
public void describe(EvolutionState state, Individual ind, int subpopulation, int threadnum, int log)
Given an Individual, extracts the parameters from the individual and prints them and their values to the log. Then prints the best “underlying individual” so far (the fittest base-level Individual discovered). The Individual is normally a DoubleVectorIndividual.
protected String map(EvolutionState state, double[] genome, FloatVectorSpecies species, int index)
Given a genome and gene index, maps the gene into a String value corresponding to a base-level parameter and returns it.
public void combine(EvolutionState state, Fitness[] runs, Fitness finalFitness)
Combines multiple fitness values into a single final fitness, typically by setting it to their mean.
7.7 Resets (The ec.evolve Package)
The ec.evolve package presently contains a single class, ec.evolve.RandomRestarts. This is a subclass of Statistics which performs random or timed restarts. RandmRestarts reinitializes your entire Population either once every fixed N generations or once every M generations where M is a random integer between 1 and N inclusive. The value of M is randomized each reset.
To use RandomRestarts, include it as part of your Statistics chain. For example, if you have a single Statistics object at present, you could say:
stat.num-children = 1
stat.child.0 = ec.evolve.RandomRestarts
Next you need to specify N. Here we set it to 20. stat.child.0.restart-upper-bound = 20
You’ll also need to state whether we reset exactly every N generations in some random number of generations between 1 and N inclusive (this random value changes every reset). The options are fixed and random. Here we set the reset type to fixed:
stat.child.0.restart-type = fixed
268
Last, you’ll need to state the generation in which resetting begins. By default, this is generation 1 (after all, in generation 0 either the population was randomly generated to start with, or you specifically had loaded it from a file). But you can set it to any value >= 0. To set it to generation 4, you’d say:
stat.child.0.start = 4
Resetting occurs just before evaluation. Thus a new generation may be bred, then entirely thrown away and replaced with a reset population.
One common use for resets is to randomize the population every single generation so as to do random search. You can do it like this:
stat.child.0.restart-upper-bound = 1
stat.child.0.restart-type = fixed
# This is the default anyway so it’s not necessary to state this:
stat.child.0.start = 1
Random Restarts is due to James O’Beirne, a student at GMU.
269
270
Bibliography
[1] Kumar Chellapilla. A preliminary investigation into evolving modular programs without subtree crossover. In John R. Koza, Wolfgang Banzhaf, Kumar Chellapilla, Kalyanmoy Deb, Marco Dorigo, David B. Fogel, Max H. Garzon, David E. Goldberg, Hitoshi Iba, and Rick Riolo, editors, Genetic Programming 1998: Proceedings of the Third Annual Conference, pages 23–31, University of Wisconsin, Madison, Wisconsin, USA, 22-25 July 1998. Morgan Kaufmann.
[2] Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and T. Meyarivan. A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. In Marc Schoenauer, Kalyanmoy Deb,Gu ̈ntherRudolph,XinYao,EvelyneLutton,JuanJulianMerelo,andHans-PaulSchwefel,editors, Parallel Problem Solving from Nature (PPSN VI), pages 849–858. Springer, 2000.
[3] John R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, 1992.
[4] John R. Koza. Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press, 1994.
[5] W. B. Langdon. Size fair and homologous tree genetic programming crossovers. In Wolfgang Banzhaf, Jason Daida, Agoston E. Eiben, Max H. Garzon, Vasant Honavar, Mark Jakiela, and Robert E. Smith, editors, Proceedings of the Genetic and Evolutionary Computation Conference, volume 2, pages 1092–1097, Orlando, Florida, USA, 13-17 July 1999. Morgan Kaufmann.
[6] Sean Luke. Genetic programming produced competitive soccer softbot teams for robocup97. In John R. Koza, Wolfgang Banzhaf, Kumar Chellapilla, Kalyanmoy Deb, Marco Dorigo, David B. Fogel, Max H. Garzon, David E. Goldberg, Hitoshi Iba, and Rick Riolo, editors, Genetic Programming 1998: Proceedings of the Third Annual Conference, pages 214–222, University of Wisconsin, Madison, Wisconsin, USA, 1998. Morgan Kaufmann.
[7] Sean Luke. Essentials of Metaheuristics. 2009. Available at http://cs.gmu.edu/∼sean/book/metaheuristics/.
[8] SeanLuke,CladioCioffi-Revilla,LiviuPanait,KeithSullivan,andGabrielBalan.MASON:Amultiagent simulation environment. Simulation, 81(7):517–527, July 2005.
[9] Sean Luke and Liviu Panait. A survey and comparison of tree generation algorithms. In Lee Spector, Erik D. Goodman, Annie Wu, W. B. Langdon, Hans-Michael Voigt, Mitsuo Gen, Sandip Sen, Marco Dorigo, Shahram Pezeshk, Max H. Garzon, and Edmund Burke, editors, Proceedings of the Genetic and Evolutionary Computation Conference (GECCO-2001), pages 81–88, San Francisco, California, USA, 7–11 July 2001. Morgan Kaufmann.
[10] Sean Luke and Liviu Panait. A comparison of bloat control methods for genetic programming. Evolu- tionary Computation, 14(3):309–344, Fall 2006.
[11] Sean Luke and A. K. M. Khaled Ahsan Talukder. Is the meta-ea a viable optimization method? In Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation (GECCO 2013), 2013.
271
[12] Makato Matsumoto and Takuji Nishimura. Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simulation, 8(1):3–30, 1998.
[13] Una-MayO’Reilly.AnAnalysisofGeneticProgramming.PhDthesis,CarletonUniversity,Ottawa-Carleton Institute for Computer Science, Ottawa, Ontario, Canada, 22 September 1995.
[14] Liviu Panait. A comparison of two competitive fitness functions. In GECCO 2002: Proceedings of the Genetic and Evolutionary Computation Conference, pages 503–511. Morgan Kaufmann Publishers, 2002.
[15] Ricardo Poli. A simple but theoretically-motivated method to control bloat in genetic programming. In Genetic Programming, Proceedings of EuroGP’2003, pages 204–217. Springer, 14-16 April 2003.
[16] Riccardo Poli, William B. Langdon, and Nicholas Freitag McPhee. A Field Guide to Genetic Programming. Available in print from lulu.com, 2008.
[17] Kenneth Price, Rainer Storn, and Journi Lampinen. Differential Evolution: A Practical Approach to Global Optimization. Springer, 2005.
[18] Bill Punch and Douglas Zongker. lil-gp 1.1. A genetic programming system. Available at http://garage.cse.msu.edu/software/lil-gp/, 1998.
[19] Conor Ryan, J. J. Collins, and Michael O’Neill. Grammatical evolution: Evolving programs for an arbitrary language. In EuroGP 1998, pages 83–96, 1998.
[20] Lee Spector. Simultaneous evolution of programs and their control structures. In Peter J. Angeline and K. E. Kinnear, Jr., editors, Advances in Genetic Programming 2, chapter 7, pages 137–154. MIT Press, 1996.
[21] Lee Spector, Jon Klein, and Martin Keijzer. The Push3 execution stack and the evolution of control. In Proceedings of the Genetic and Evolutionary Conference (GECCO 2005), pages 1689–1696. Springer, 2005.
[22] Lee Spector and Alan Robinson. Genetic programming and autoconstructive evolution with the push programming language. Genetic Programming and Evolvable Machines, 3(1):7–40, 2002.
[23] N. Srinivas and Kalyanmoy Deb. Multiobjective optimization using nondominated sorting in genetic algorithms. Evolutionary Computation, 2:221–248, 1994.
[24] Keith Sullivan, Sean Luke, Curt Larock, Sean Cier, and Steven Armentrout. Opportunistic evolution: efficient evolutionary computation on large-scale computational grids. In GECCO ’08: Proceedings of the 2008 GECCO conference companion on Genetic and evolutionary computation, pages 2227–2232, New York, NY, USA, 2008. ACM.
[25] Seth Tisue and Uri Wilensky. Netlogo: A simple environment for modeling complexity. In International Conference on Complex Systems, pages 16–21, 2004.
[26] Eckart Zitzler, Marco Laumanns, and Lothar Thiele. SPEA2: Improving the strength pareto evolutionary algorithm for multiobjective optimization. In K. Giannakoglou, D. Tshalis, J. Periaux, K. Papailiou, and T. Fogarty, editors, Evolutionary Methods for Design, Optimization, and Control, pages 19–26, 2002.
272
Index
(+ … …), 177
(if …), 175
(if test then else), 175 (matrix-multiply … …), 177 (sin … ), 177
(sin …), 175
-50, 188
../c/bar.params, 25 ../z.params, 16
.class, 13, 20, 25
.dot, 164, 165 .gz—hyperpage, 84 /a/b/Foo.class, 25 /a/c/bar.params, 25 /tmp/, 31 /tmp/population.in, 64 /tmp/subpopulation.in, 64 $meta.stat, 258
$out.stat, 258
0.1, 83 0.25448855944201476, 188 0.4099041340133447, 188 2.34, 159
3.14, 159
111, 188
a.params, 15
accept(), 217 addRandomRule(…), 206 adf0.grammar, 192
ADF1, 186
ALL MESSAGE LOGS, 27 allValid(…), 80 app/ant/ant.params, 20 ARG0, 186
argposition, 133, 145 assessFitness[], 234
Atan, 199
auto, 42
b.params, 15 bar, 15
base, 21
base(), 52
base.params, 257
best, 218
betterThan, 252
betterThan(), 139
blah, 185
boolean, 111, 118
boolean[0], 263
breed.f-noise, 248
breed.pf, 248
Breeder, 5, 6
BreedingPipeline, 6 BreedingPipeline.DYNAMIC SOURCES, 76, 77 breedthreads, 247
buildOutput(), 42
byte, 111
byte, short, int, long, 113, 119 ByteVectorIndividual, 55
c.params, 15
CHANGES, 11
checkConstraints(), 141, 159 checkConstraints(…), 158, 159, 170 children[], 133
childtypes, 134
Class.getResource(…), 13
cleanup(…), 41, 42
clone(), 59, 61, 67, 71, 138, 139, 178, 198, 204 clone(…), 128
Cloneable, 47 CloneNotSupportedException, 51
Code, 30
Code.decode(decodeReturn), 30 combine(…), 264
CompetitiveEvaluator, 5
constraints, 133
cont, 235
context, 227, 228, 235
context[index], 235 contextIsBetterThan(…), 237
contract(…), 262
copyTo(…), 138, 139, 178
cos, 132
273
countVictoriesOnly, 229, 232 crossover(…), 112 crossover-probability, 115 crossover-type, 117 CrossoverPipeline, 6 curmudgeon, 31
d.params, 15
decode(…), 168
DecodeReturn.getFloat(), 29
defaultBase(), 52, 59, 61, 67, 71, 128, 204 defaultCrossover(…), 116
describe(), 46
describe(…), 46, 67, 85, 86, 99, 140, 229, 260, 264 distanceTo(…), 59, 166
do-depth, 144
do-final, 251
do-size, 88, 89, 143, 144
do-subpops, 143
do-time, 88, 89, 143, 144
docs, 11, 12
domain, 263, 264
double, 111
Double.MAX VALUE, 186, 188
Double.MAX Value, 187
Double.POSITIVE INFINITY, 59, 68
double[0], 263
double[], 125
DoubleVectorIndivdual, 123 DoubleVectorIndividual, 255
ec, 11, 13, 31 ec,parsimony.BucketTournamentSelection, 181 ec,parsimony.DoubleTournamentSelection, 182 ec,parsimony.LexicographicTournamentSelection,
181 ec,parsimony.ProportionalTournamentSelection, 182 ec,parsimony.TarpeianStatistics, 183
ec.4.gz, 32
ec.generation.gz, 31
ec.app.ant.Ant, 16
ec.app.moosuite, 254, 255 ec.app.myapp.MyMasterProblem, 217 ec.app.myapp.MyProblem, 191
ec.app.myapp.X,
ec.app.myapp.Mul,
ec.app.myapp.Sin, 136 ec.breed, 79
ec.breed.BufferedBreedingPipeline, 79, 108 ec.breed.CheckingBreedingPipeline, 80 ec.breed.ForceBreedingPipeline, 80 ec.breed.GenerationSwitchPipeline, 81
ec.breed.InitializationPipeline, 79 ec.breed.MultiBreedingPipeline, 79, 115 ec.breed.ReproductionPipeline, 79 ec.breed.UniquePipeline, 81 ec.breed.vector, 189
ec.Breeder, 51, 68, 99 ec.BreedingPipeline, 71, 76 ec.BreedingSource, 71 ec.Clique, 51
ec.coevolve, 227
ec.coevolve.CompetitiveEvaluator, 227, 231, 232, 240 ec.coevolve.GroupedProblemForm, 45, 228, 234, 236 ec.coevolve.MultiPopCoevolutionaryEvaluator, 227,
232, 234 ec.coevolve.MultiPopCoevolutionEvaluator, 240 ec.de, 245
ec.de.Best1BinDEBreeder, 247
ec.de.DEBreeder, 245
ec.de.DEEvaluator, 245 ec.de.Rand1EitherOrDEBreeder, 248 ec.DEBreeder.CR UNSPECIFIED, 246 ec.DefaultsForm, 99
ec.es, 97, 101
ec.es.ESDefaults, 102
ec.es.ESSelection, 101 ec.es.MuCommaLambdaBreeder, 101 ec.es.MuPlusLambdaBreeder, 101
ec.eval, 209
ec.eval.Job, 236
ec.eval.MasterProblem, 210 ec.eval.MetaProblem, 255, 257
ec.eval.Slave, 211
ec.eval.SlaveConnection, 210 ec.eval.SlaveMonitor, 210
ec.Evaluator, 51, 65, 99, 211
ec.EvolutionState, 14, 31, 35, 40, 41, 49, 51, 99 ec.EvolutionState.R FAILURE, 91 ec.EvolutionState.R SUCCESS, 91
ec.Evolve, 13, 14, 20, 40–42, 49, 211
ec.evolve, 264
ec.evolve.RandomRestarts, 264
ec.exchange, 219 ec.exchange.InterPopulationExchange, 223 ec.exchange.IslandExchange, 219
ec.Exchanger, 51, 84, 99
ec.Finalizer, 51
ec.Finisher, 46, 99
ec.Fitness, 99, 139, 227, 229
ec.gp, 63, 130
ec.gp.ADF, 169, 170
ec.gp.ADFArgument, 170, 171 ec.gp.ADFContext, 172
ec.app.myapp.Y, ec.app.myapp.Sub,
274
ec.gp.ADFStack, 138, 140, 172
ec.gp.breed, 148 ec.gp.breed.InternalCrossoverPipeline, 150 ec.gp.breed.MutateAllNodesPipeline, 152 ec.gp.breed.MutateDemotePipeline, 151 ec.gp.breed.MutateERCPipeline, 153 ec.gp.breed.MutateOneNodePipeline, 152 ec.gp.breed.MutatePromotePipeline, 151 ec.gp.breed.MutateSwapPipeline, 152 ec.gp.breed.RehangPipeline, 152 ec.gp.breed.SizeFairCrossoverPipeline, 154 ec.gp.build, 145
ec.gp.build.PTC1, 146 ec.gp.build.PTC2, 147 ec.gp.build.PTCFunctionSet, 147 ec.gp.build.PTCFunctionSetForm, 146 ec.gp.build.RandomBranch, 148 ec.gp.build.RandTree, 148 ec.gp.build.Uniform, 148
ec.gp.ERC, 167, 187
ec.gp.ge, 183 ec.gp.ge.breed.GETruncationPipeline, 189 ec.gp.ge.GEIndividual, 183 ec.gp.ge.GEProblem, 184
ec.gp.ge.GESpecies, 184 ec.gp.ge.GESpecies.BIG TREE ERROR, 187, 188 ec.gp.ge.GrammarFunctionNode, 192 ec.gp.ge.GrammarNode, 192 ec.gp.ge.GrammarParser, 184, 192 ec.gp.ge.GrammarRuleNode, 192 ec.gp.GPAtomicType, 132, 180
ec.gp.GPData, 138
ec.gp.GPFunctionSet, 132
ec.gp.GPIndividual, 132
ec.gp.GPInitializer, 144
ec.gp.GPNode, 132
ec.gp.GPNodeBuilder, 145 ec.gp.GPNodeBuilder.NOSIZEGIVEN, 145 ec.gp.GPNodeConstraints, 132 ec.gp.GPNodeGatherer, 160 ec.gp.GPNodeParent, 133 ec.gp.GPNodeSelector, 148
ec.gp.GPSetType, 132, 180
ec.gp.GPSpecies, 132
ec.gp.GPTree, 132, 133, 176 ec.gp.GPTreeConstraints, 132
ec.gp.GPType, 132, 175, 180
ec.gp.koza, 145
ec.gp.koza.Crossover, 148 ec.gp.koza.CrossoverPipeline, 149 ec.gp.koza.FullBuilder, 145 ec.gp.koza.GrowBuilder, 145
ec.gp.koza.HalfBuilder, 146 ec.gp.koza.KozaNodeSelector, 148 ec.gp.koza.KozaShortStatistics, 90, 234 ec.gp.koza.Mutation, 148 ec.gp.koza.MutationPipeline, 150, 196 ec.gp.push, 193 ec.gp.push.Nonterminal, 194–196 ec.gp.push.PushBuilder, 195, 196 ec.gp.push.PushInstruction, 198 ec.gp.push.PushProblem, 197 ec.gp.push.Terminal, 194–196 ec.Individual, 51, 52, 57
ec.Initializer, 45, 51, 99
ec.multiobjective, 249 ec.multiobjective.MultiObjectiveFitness, 249 ec.multiobjective.MultiObjectiveStatistics, 251 ec.multiobjective.nsga2, 249, 253 ec.multiobjective.nsga2.NSGA2Breeder, 70 ec.multiobjective.nsga2.NSGA2Evaluator, 253 ec.multiobjective.spea2, 249, 254 ec.multiobjective.spea2.SPEA2Breeder, 70 ec.multiobjective.spea2.SPEA2MultiObjectiveFitness,
254 ec.multiobjective.spea2.SPEA2Subpopulation, 254 ec.mutiobjective.nsga2.NSGA2Breeder, 253 ec.mutiobjective.nsga2.NSGA2MultiObjectiveFitness,
253 ec.mutiobjective.nsga2.NSGA2Statistics, 253 ec.Parameter, 20
ec.parsimony, 59, 181
ec.Population, 18, 50, 53
ec.Problem, 67
ec.Prototype, 51, 198
ec.pso, 241
ec.pso.Particle, 241
ec.pso.PSOBreeder, 242
ec.rule, 63, 199 ec.rule.breed.RuleCrossoverPipeline, 199, 207 ec.rule.breed.RuleMutationPipeline, 199, 205 ec.rule.Rule, 199, 200, 203 ec.rule.RuleConstraints, 199 ec.rule.RuleIndividual, 199 ec.rule.RuleMutationPipeline, 202 ec.rule.RuleSet, 199
ec.rule.Ruleset, 200 ec.rule.RuleSetConstraints, 199 ec.rule.RuleSpecies, 199
ec.ruleRuleSet, 199
ec.select, 73
ec.select.BestSelection, 74, 75, 102 ec.select.BoltzmanSelection, 74 ec.select.FirstSelection, 73
275
ec.select.FitProportionateSelection, 72, 74, 139 ec.select.GreedyOverselection, 74 ec.select.MultiSelection, 76 ec.select.RandomSelection, 74 ec.select.SigmaScalingSelection, 74 ec.select.SUSSelection, 74 ec.select.TournamentSelection, 75 ec.SelectionMethod, 71
ec.Setup, 51
ec.simple, 52, 97, 99, 101
ec.simple.Finisher, 63
ec.simple.SimpleBreeder, 69, 99, 234 ec.simple.SimpleDefaults, 52, 99 ec.simple.SimpleEvaluator, 65, 66, 99 ec.simple.SimpleEvolutionState, 40, 91, 99, 213 ec.simple.SimpleExchanger, 84, 99 ec.simple.SimpleFinisher, 99 ec.simple.SimpleFitness, 52, 55, 61, 99 ec.simple.SimpleInitializer, 63, 99 ec.simple.SimpleProblemForm, 67, 99 ec.simple.SimpleShortStatistics, 84, 87, 90, 99 ec.simple.SimpleStatistics, 84, 90, 99, 234, 236 ec.simple.SteadyStateEvaluator, 66 ec.simple.SteadyStateEvolutionState, 213 ec.Singleton, 51
ec.spatial, 232, 237
ec.spatial.Space, 237 ec.spatial.Spatial1DSubpopulation, 239 ec.spatial.SpatialBreeder, 70, 239 ec.spatial.SpatialMultiPopCoevolutionaryEvaluator,
240 ec.spatial.SpatialTournamentSelection, 239 ec.Species, 52, 56, 64
ec.Statistics, 51, 84, 99 ec.steady.SteadyStateDefaults, 107 ec.steadystate, 91, 103 ec.steadystate.QueueIndividual, 216 ec.steadystate.SteadyStateBreeder, 70, 106 ec.steadystate.SteadyStateBSourceForm, 106 ec.steadystate.SteadyStateEvaluator, 106 ec.steadystate.SteadyStateEvolutionState, 105 ec.steadystate.SteadyStateExchangerForm, 107 ec.steadystate.SteadyStateStatisticsForm, 84, 108 ec.subpop, 19
ec.subpop.species, 19
ec.Subpopulation, 19, 53
ec.util.Code, 28, 58, 60, 62
ec.util.DataPipe, 47, 237
ec.util.DecodeReturn, 29 ec.util.MersenneTwisterFast, 35, 36 ec.util.Output, 26, 27 ec.util.ParamClassLoadException, 21
ec.util.ParameterDatabase, 14, 20, 21 ec.util.RandomChoice, 37 ec.util.RandomChoiceChooser, 38 ec.util.RandomChoiceChooserD, 38 ec.util.ThreadPool, 39
ec.util.ThreadPool.Worker, 39
ec.vector, 83, 111, 125, 199, 200, 205 ec.vector.BitVectorIndividual, 111
ec.vector.breed, 112, 126 ec.vector.breed.GeneDuplicationPipeline, 127, 189 ec.vector.breed.ListCrossoverPipeline, 126 ec.vector.breed.MultipleVectorCrossoverPipeline,
113, 115, 117 ec.vector.breed.VectorCrossoverPipeline, 112, 114 ec.vector.breed.VectorMutationPipeline, 112, 117 ec.vector.ByteVectorIndividual, 111 ec.vector.DoubleVectorIndividual, 111, 241 ec.vector.FloatVectorSpecies, 111 ec.vector.FloatVectorspecies, 241
ec.vector.Gene, 111, 113, 128 ec.vector.GeneVectorIndividual, 111, 118, 128 ec.vector.GeneVectorSpecies, 111, 128 ec.vector.IntegerVectorIndidual, 67 ec.vector.IntegerVectorIndividual, 100, 111 ec.vector.IntegerVectorSpecies, 111 ec.vector.LongVectorIndividual, 111 ec.vector.ShortVectorIndividual, 111 ec.vector.VectorCrossoverPipeline, 82 ec.vector.VectorIndividual, 111 ec.vector.VectorMutationPipeline, 82 ec.vector.VectorSpecies, 111, 123
ec/…, 13
ec/app, 11, 12
ec/app/ant/Ant.class, 16 ec/app/ant/ant.params, 16 ec/app/lawnmower, 23
ec/app/push, 199
ec/display, 11
ec/Evolve.class, 20
ec/gp/koza/koza.params, 155, 158, 190 ec/gp/push/push.params, 195
ecj, 11, 13
ecj.jar, 13
ecj.tar.gz, 11
ecj.zip, 11
ecsuite.params, 92
encode(), 167, 168
equals(…), 59
equalTo, 252
equivalentTo, 252
equivalentTo(), 139
ERC, 186
276
ERC1[3.14159], 167
ERC2[921], 167
ERC[3.14159], 167
es.params, 103 eval(…),137,138,141 eval.i-am-slave = true, 211 eval.master.host, 261 eval.masterproblem, 212 eval.masterproblem.job-size, 212 eval.problem.n…., 260 eval.problem.runs = 10, 262 eval.problem.silent, 262 evalthreads, 70, 232, 234 evaluate(), 43
evaluate(…), 45, 67, 99, 140, 215–217, 228, 229, 232, 235
evaluated, 68, 183, 187, 257 evaluatedState, 48
evaluations, 97, 105, 261
Evaluator, 5, 6, 8, 85 EvolutionState, 5, 39 EvolutionState.numEvaluations, 97 EvolutionState.numGenerations, 97 EvolutionState.R FAILURE, 216 EvolutionState.R NOTDONE, 95 EvolutionState.R SUCCESS, 216 EvolutionState.UNDEFINED, 49 Evolve, 5
exch.select.size, 223
Exchanger, 5
Execute(…), 198
executeProgram(…), 198 exitIfErrors(), 28
expand(…), 262
expectedChildren(), 141, 159, 168, 171
f0, 135
false, 66, 70, 72, 81, 125, 251, 258, 260
file, 257
finalStatistics, 251
finalStatistics(…), 91
Finisher, 5
finishEvaluating(…), 67, 215–217 finishProducing(…), 72, 77
Fitness, 5, 8, 62
fitness(), 61, 139, 252
Fitness.clone(), 230
Fitness.cloneTrials(), 230 Fitness.contextIsBetterThan(Fitness other), 236 Fitness.merge(…), 236
Fitness.setContext(…), 229 fitness.setToMeanOf(…), 264
fitnessToString(), 62
fixed, 264
float, 111, 193, 259
float, double, 113, 120
float.* float.+ float.% float.- float.dup float.swap
float.pop, 196 float.+, 193
float.erc, 196
FloatVectorIndividual, 123 FloatVectorSpecies, 123
Foo, 25
foo, 15 foo.threadnumber.out—hyperpage, 43 front.stat, 251
functionset, 134
ga.params, 101
gauss, 120, 258
ge.params, 189, 191
Gene, 123, 126
generate-max, 79, 81
Generational, 91
generations, 261
GeneVectorIndividual, 111, 123, 124, 126 GeneVectorSpecies, 123
genome, 116 genotypeToStringForHumans(), 60 getFile(…), 21 getIndexRandomNeighbor, 238, 239 getIndexRandomNeighbor(…), 239 getInterpreter(…), 197 getIslandIndex(…), 225 getProgram(…), 197 getResource(…), 21
gp.fs, 135
gp.fs.2.info = ec.gp.GPFuncInfo, 24 gp.tc, 135
gp.type, 135
GPNodeParent, 145
GPTree.NO TREENUM, 162 GPType, 158
Group, 53
growp, 146
hashCode(), 59 hits, 139
id, 225
if-food-ahead, 132 ind-competes, 239 index.html, 12 Individual, 5, 6, 8 Individual.merge, 237
277
Individual.merge(…), 236 Individuals, 5 individuals, 69
inds, 72
inds[start] … inds[start+n-1], 73, 78
inds[start], inds[start+1], …, 72
init, 134
initialize(…), 42
initialize(parameters, randomSeedOffset, output), 42 Initializer, 5
initialPopulation(…), 63 input, 140
int, 111
int.erc, 196
int[0], 263
integer-reset, 260
IntegerVectorIndividual, 55
IntegerVectorSpecies, 55
intermediate, 115
InterPopulationExchange, 226
Interpreter, 197
interrupt(), 40
isIdealFitness(), 61, 139
island, 225
IslandExchange.ISLAND INDEX LOOKUP FAILED,
225 iterator(GPNode.NODESEARCH ALL), 160
jar, 25
java.io.Serializable, 33, 51, 229 java.lang.Cloneable, 51, 52 java.lang.Double, 62, 227, 236, 237 java.lang.Random, 10 java.lang.Runnable, 39 java.util.logging, 10 java.util.Properties, 10, 47 java.util.Random, 34–36, 44 job-size, 211, 215
job.3.out.stat, 40
job.jobnumber., 40 jobs.5.out.stat.gz—hyperpage, 84 jobs.5.out.stat—hyperpage, 84 jobs.n.—hyperpage, 84
koza.params, 155 KozaFitness, 62
LICENSE, 11
likelihood, 78, 115
line, 115
loadDomain(…), 263 long, 111 LongVectorIndividual, 55
main, 92
main(…), 43
Makefile, 11
map(…), 263 Math.random(…), 44
max, 72, 78 max-jobs-per-slave, 211, 215 maxGene(), 112 maxGene(…), 263 MAXIMUM PASSES, 187 maxsize, 149, 150
mean, 218
median, 218
merge(…), 236
MersenneTwister, 8 MersenneTwisterFast, 8
min, 72, 78
minChildProduction(), 77 minGene(), 112
minGene(…), 263 modifyParameters(…), 263, 264 modulus, 143
moosuite.params, 254, 255
Mul, 157
MultiBreedingPipeline, 6 MultiobjectiveFitness, 63 MultiPopCoevolutionaryEvaluator, 5 must-clone, 79
mutate(…), 112, 202, 205, 206 mutateERC(), 153 mutateERC(…), 168 mutation-prob, 103 my.machine.ip.address, 261 MyData, 158
MyProblem, 158 myproblem.grammar, 192 MyTestProblem, 257
n, 78
name, 134, 170, 171
name(), 167, 170, 171, 186
nc0, 136, 169
nc1, 136
nc2, 136
neighborhood-size, 239
new, 5, 10 newFitness.setContext(oldFitness.getContext(context)),
235 newIndividual(…), 64
newIndividual(EvolutionState, int), 205 newpop.subpops[subpopulation].species, 72 nextInt(), 36, 43
278
nil, 135, 136
NO LOGS, 27 nodeEquals(…), 159 nodeEquivalentTo(…), 159 nodeHashCode(), 167
ns, 150, 151
ns.0, 150, 151
ns.1, 150, 151 null,21,42,225,227,235,247 num-buckets, 182
num-inds, 80, 81
num-tests, 262 numEvaluations, 49 numGenerations, 49 numRules ≤ rules.length, 200 numRulesForReset(…), 205 numSources(), 77
one-point, 115 one-point-nonempty, 115 org.spiderland.Psh.Instruction, 198 other, 237 other.fitness.merge(fitness), 237 other.writeIndividual(…), 237 out-of-bounds-retries, 120, 121 out.stat, 40, 84
Output, 8, 47
p, 95
p add, 206
p del, 206
p randorder, 206
ParameterDatabase, 5, 8
parent, 133
parent.0 = ../../gp/koza/koza.params, 24 passes, 187
pick-worst, 75, 239
pickNode(…), 148, 149
pieces, 126
points, 201
points.length = 0, 201
points[0], 201
points[i+1], 201
points[i], 201
points[points.length − 1], 201 polynomial, 120, 258
pop, 18
pop.default-subpop, 69, 224, 234 pop.subpop.0, 21
pop.subpop.0.size, 20, 261 pop.subpop.0.species, 19, 190 pop.subpop.0.species…n, 260
pop.subpop.0.species.alternative-polynomial- version, 258
pop.subpop.0.species.gp-species, 190 pop.subpop.0.species.mutation-distribution-index,
258
pop.subpop.0.species.mutation-prob, 258 pop.subpop.0.species.mutation-stdev, 258 pop.subpop.0.species.mutation-type, 258 pop.subpop.0.species.out-of-bounds-retries = 100,
260 pop.subpop.0.species.pipe.source.1.source.1.scaled-
fitness-floor, 83 pop.subpop.1, 54
pop.subpop.1.species, 19 pop.subpop.2, 54
pop.subpops, 18
populate(…), 64
Population, 5 possiblyRestoreFromCheckpoint(…), 42 postCheckpointStatistics(…), 34 postprocessIndividual(…), 206, 207 postprocessPopulation, 232, 234 postprocessPopulation(…), 229, 233 postProcessRules(…), 206 postprocessRules(…), 206, 207 preBreedingExchangePopulation(…), 84 preCheckpointStatistics(…), 33 preparePipeline(…), 77 prepareToEvaluate(…), 67, 215–217 prepareToProduce(…), 71, 72, 77 preprocessIndividual(…), 205–207 preprocessPopulation(…), 229 preprocessRules(…), 205–207
Print, 199
print-unaccessed-params, 23 printFitness(…), 62 printFitnessForHumans(…), 62, 235 printIndividual() or writeIndividual(), 242 printIndividual(…, 60
printIndividual(…), 60 printIndividualForHumans, 57 printIndividualForHumans(), 242 printIndividualForHumans(…), 60, 85, 188 printPopulation(…), 58, 65 printPopulationForHumans(…), 57, 58 printRule(…), 205
printRuleSet(…), 203 printRuleSetToString(…), 203 printRuleToString(), 204, 205 printRuleToString(…), 205 printRuleToStringForHumans(), 204 printSubpopulationForHumans(…), 57
279
printTree(…), 163 probabilityOfSelection, 134 Problem, 6, 8, 43, 46 problem, 67
process, 225
process(…), 225, 226 produce(), 79
produce(…), 73, 77, 78 produces, 72
produces(…), 77
Program, 197 push.params, 196
random, 35, 243, 264
random-each-time, 243
random-walk, 119 random-walk-probability, 119 randomizeRulesOrder(…), 206 readFitness(…, DataInput), 62 readFitness(…, LineNumberReader), 62 readIndividual(…), 57, 60, 237 readIndividual(…, DataInput), 60 readIndividual(…, LineNumberReader), 60 README, 11
readNode(…), 161, 168
readPopulation(…, DataInput), 58 readPopulation(…, LineNumberReader), 58, 64 readSubpopulation(…, LineNumberReader), 64 readTree(…, DataInput), 163
readTree(…, LineNumberReader), 163 receiveAdditionalData(…), 217, 218
reevaluate, 257
reinitializeContacts(…), 34 removeRandomRule(…), 206
requestedSize, 145
reset, 119, 120, 258
reset(), 148, 188
reset(…), 128, 201, 202, 204–206 resetFromCheckpoint(…), 34 resetInterpreter(…), 197
resetNode(…), 168
restart(…), 34
restoreFromCheckpoint(…), 34
result, 91, 216
rules, 200
rules[0] … rules[numRules − 1], 200
run(…), 34, 41
runs, 258
same, 76, 149 sbx, 116 seed, 20
select.boltzman, 74
select.greedy, 75
select.multiselect, 76 select.sigma-scaling, 74 select.tournament, 75, 239 select.tournament.size, 223 SelectionMethod, 6 SelectionMethod.INDS PRODUCED, 72 sendAdditionalData(…), 217
set, 145
set-random, 257, 258
setCheckpoint(…), 33
sets, 201
sets[0], 201
sets[1], 201
sets[i], 201
sets[points.length], 201
setToMeanOf(…), 62, 63, 255
setup(), 44
setup(…), 10, 18, 40, 45, 51, 59, 61, 67, 71, 76–78, 84,
128, 140, 198, 204, 263 setupPopulation(…), 63
short, 111
ShortVectorIndividual, 55
silent, 90
sim.util.distribution, 37
simple.fitness, 52
SimpleEvaluator, 5 SimpleEvolutionState, 4, 98 SimpleFitness, 62
SimpleStatistics, 48
Sin, 157
sin, 186
size, 18, 19, 21, 224, 239, 240
size(), 59, 181
sources.length, 76
Species, 5, 6
splitIntoTwo(…), 207
start, 11
startFresh(), 40, 45
startFresh(…), 40 startFromCheckpoint(), 45 startFromCheckpoint(…), 34
stat.file, 258
stat.silent, 260, 262
state, 41, 54, 133
state.evalthreads, 43
state.evolve(), 92, 95
state.finish(…), 92
state.parameters, 51
state.population, 66, 95 state.population.subpops[subpop], 246
280
state.population.subpops[subpop].individuals[index], 246
state.population[subpopulation].f prototype, 72 state.R NOTDONE, 92
state.random[thread], 44
state.run(…), 92
state.startFresh(), 92 StatenIsland, 220 Statistics, 5
stderr, 47
stdout, 47
SteadyStateEvaluator, 5 SteadyStateEvolutionState, 104, 214 String[…], 263
Sub, 157
subpop, 225
subpops, 18
Subpopulation, 5, 6
subpopulation, 71, 72
super(), 170
super(…), 91, 158 super.newIndividual(…), 205 super.setup(…), 51
T ERROR, 30
tc0, 135
tcsh, 41
test(…), 160
thread, 44
ThreadLocal, 39
time, 220
toroidal, 243
toss, 149
toss=true, 149
toString, 159
toString(), 25, 60, 62, 129, 141, 161, 168, 204, 205 toString()., 159
toStringForHumans(), 161, 168 transferAdditionalData(…), 217 trees[], 162
treetype, 134
trials, 227, 229, 230, 236
true, 27, 66, 70, 72, 249, 257, 258, 260 two-point, 115, 126 two-point-nonempty, 115
type, 145, 239 typicalIndsProduced(), 72, 77
UNDEFINED, 49 uniform, 115, 239 updateFitness, 229, 232
val, 178
writeFitness(…), 62 writeIndividual(…), 60 writeNode(…), 168 writePopulation(…), 58 writeState(…), 37 writeTree(…), 163
X, 157 Y, 157
z.params, 16 zeroChildren[], 134
281