8
Adaptive Isolation Model using Data Clustering for Multimodal Function Optimization Shin Ando Division of Intelligent Systems Engineering, Faculty of Engineering, Yokohama National University 79-5 Tokiwadai, Hodogaya Yokohama, Kanagawa, Japan [email protected] Jun Sakuma Interdisciplinary School of Science and Engineering, Tokyo Institute of Technology 4259 Nagatsutacho, Midori-ku Yokohama, Kanagawa, Japan [email protected] Shigenobu Kobayashi Interdisciplinary School of Science and Engineering, Tokyo Institute of Technology 4259 Nagatsutacho, Midori-ku Yokohama, Kanagawa, Japan [email protected] ABSTRACT In this paper, we propose a GA model called Adaptive Iso- lation Model(AIM), for multimodal optimization. It uses a data clustering algorithm to detect clusters in GA popula- tion, which identifies the attractors in the fitness landscape. Then, subpopulations which makes-up the clusters are iso- lated and optimized independently. Meanwhile, the region of the isolated subpopulations in the original landscape are suppressed. The isolation increases comprehensiveness, i.e., the probability of finding weaker attractors, and the over- all efficiency of multimodal search. The advantage of the AIM is that it does not require distance between the op- tima as a presumed parameter, as it is estimated from the variance/covariance matrix of the subpopulation. Further, AIM’s behavior and efficiency is equivalent to ba- sic GA in unimodal landscape, in terms of number of eval- uation. Therefore, it is applied recursively to all subpop- ulations until they converge to a suboptima. This makes AIM suitable for locally-multimodal landscapes, which have closely located attractors that are difficult to distinguish in the initial run. The performance of AIM is evaluated in several bench- mark problems and compared to iterated hill-climbing meth- ods. Categories and Subject Descriptors Computing Methodologies [Artificial Intelligence]: Prob- lem Solving, Control Methods, and SearchHeuristic Meth- ods General Terms Algorithm Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. GECCO’05, June 25–29, 2005, Washington, DC, USA. Copyright 2005 ACM 1-59593-010-8/05/0006 ...$5.00. Keywords Genetic Algorithm, Numerical Optimization, Multimodal Optimization, Clustering 1. INTRODUCTION The multimodal optimization algorithm have many prac- tical needs in real-world problems. In many instances, prepar- ing multiple solutions, preferably with varying characteris- tics, is a better general strategy. The objective of multi- modal optimization is to search multiple optima/suboptima comprehensively, or to certain extent. In nature, the pro- cess of evolution has produced great diversity in population and their functions. In the field of evolutionary computa- tion, many researchers have attempted to reproduce and uti- lize such mechanism in the framework of evolutionary algo- rithms, which were successfully applied to many benchmark problems. In this paper, we describe a framework of GA for multi- modal optimization called Adaptive Isolation Model(AIM), which use clustering algorithm to identify different region of attractors, then search each attractors independently. With proper setting, it is reasonable to assume that GA population will form a ‘cluster’, which has higher density near the center and sparse at the marginal region, on the attracting region of the landscape. Such clusters can be detected using proper clustering algorithm. In AIM, the subpopulation in the detected clusters are isolated and optimized in later process. On the other hand, the fitness function in original GA is modified to suppress the region of isolated subpopulation, so that the effect of the isolated attractor is also removed from remaining popu- lation. In multimodal optimization, searching each of the attrac- tors independently from the rest is beneficial, in terms of efficiency and comprehensiveness. The efficiency is induced since the crossovers will only include parents from same at- tractors, thereby reducing the number of offspring sampled outside of the attractors. With regard to comprehensive- ness, the probability of finding suboptima in weaker attrac- tors increases by isolating strong attractors. This is due to combination of excluding parents in strong attractors from crossover, and prohibiting children in strong attractor to re- place parents in weak attractors. 1417

Adaptive Isolation Model using Data Clustering for Multimodal …gpbib.cs.ucl.ac.uk/gecco2005/docs/p1417.pdf · the attractors heuristically using GA and clustering algo-rithm. 3

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Adaptive Isolation Model using Data Clustering for Multimodal …gpbib.cs.ucl.ac.uk/gecco2005/docs/p1417.pdf · the attractors heuristically using GA and clustering algo-rithm. 3

Adaptive Isolation Model using Data Clustering forMultimodal Function Optimization

Shin AndoDivision of Intelligent Systems

Engineering, Faculty ofEngineering, Yokohama

National University79-5 Tokiwadai, Hodogaya

Yokohama, Kanagawa, Japan

[email protected]

Jun SakumaInterdisciplinary School ofScience and Engineering,

Tokyo Institute of Technology4259 Nagatsutacho, Midori-kuYokohama, Kanagawa, Japan

[email protected]

Shigenobu KobayashiInterdisciplinary School ofScience and Engineering,

Tokyo Institute of Technology4259 Nagatsutacho, Midori-kuYokohama, Kanagawa, Japan

[email protected]

ABSTRACTIn this paper, we propose a GA model called Adaptive Iso-lation Model(AIM), for multimodal optimization. It uses adata clustering algorithm to detect clusters in GA popula-tion, which identifies the attractors in the fitness landscape.Then, subpopulations which makes-up the clusters are iso-lated and optimized independently. Meanwhile, the regionof the isolated subpopulations in the original landscape aresuppressed. The isolation increases comprehensiveness, i.e.,the probability of finding weaker attractors, and the over-all efficiency of multimodal search. The advantage of theAIM is that it does not require distance between the op-tima as a presumed parameter, as it is estimated from thevariance/covariance matrix of the subpopulation.

Further, AIM’s behavior and efficiency is equivalent to ba-sic GA in unimodal landscape, in terms of number of eval-uation. Therefore, it is applied recursively to all subpop-ulations until they converge to a suboptima. This makesAIM suitable for locally-multimodal landscapes, which haveclosely located attractors that are difficult to distinguish inthe initial run.

The performance of AIM is evaluated in several bench-mark problems and compared to iterated hill-climbing meth-ods.

Categories and Subject DescriptorsComputing Methodologies [Artificial Intelligence]: Prob-lem Solving, Control Methods, and SearchHeuristic Meth-ods

General TermsAlgorithm

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.GECCO’05,June 25–29, 2005, Washington, DC, USA.Copyright 2005 ACM 1-59593-010-8/05/0006 ...$5.00.

KeywordsGenetic Algorithm, Numerical Optimization, MultimodalOptimization, Clustering

1. INTRODUCTIONThe multimodal optimization algorithm have many prac-

tical needs in real-world problems. In many instances, prepar-ing multiple solutions, preferably with varying characteris-tics, is a better general strategy. The objective of multi-modal optimization is to search multiple optima/suboptimacomprehensively, or to certain extent. In nature, the pro-cess of evolution has produced great diversity in populationand their functions. In the field of evolutionary computa-tion, many researchers have attempted to reproduce and uti-lize such mechanism in the framework of evolutionary algo-rithms, which were successfully applied to many benchmarkproblems.

In this paper, we describe a framework of GA for multi-modal optimization called Adaptive Isolation Model(AIM),which use clustering algorithm to identify different region ofattractors, then search each attractors independently.

With proper setting, it is reasonable to assume that GApopulation will form a ‘cluster’, which has higher densitynear the center and sparse at the marginal region, on theattracting region of the landscape. Such clusters can bedetected using proper clustering algorithm.

In AIM, the subpopulation in the detected clusters areisolated and optimized in later process. On the other hand,the fitness function in original GA is modified to suppressthe region of isolated subpopulation, so that the effect ofthe isolated attractor is also removed from remaining popu-lation.

In multimodal optimization, searching each of the attrac-tors independently from the rest is beneficial, in terms ofefficiency and comprehensiveness. The efficiency is inducedsince the crossovers will only include parents from same at-tractors, thereby reducing the number of offspring sampledoutside of the attractors. With regard to comprehensive-ness, the probability of finding suboptima in weaker attrac-tors increases by isolating strong attractors. This is due tocombination of excluding parents in strong attractors fromcrossover, and prohibiting children in strong attractor to re-place parents in weak attractors.

1417

Page 2: Adaptive Isolation Model using Data Clustering for Multimodal …gpbib.cs.ucl.ac.uk/gecco2005/docs/p1417.pdf · the attractors heuristically using GA and clustering algo-rithm. 3

Many multimodal optimization algorithms have attemptedto avoid repeatedly finding or converging to same optima byapplying penalty function to a region within a predefinedrange of known optima. However, this approach introducesthe range of the penalty function as a critical problem de-pendent parameter, which made the algorithm impracticalfor unknown problems. One of the advantages of AIM is thatit does not require such distance/range parameter, since itis estimated from the range of the subpopulation.

In AIM, isolation occurs when more than one clusters aredetected within the population. This implementation allowsAIM to behave as an unimodal GA in unimodal landscape.On the other hand, many multimodal optimization algo-rithms propose two-phase algorithm, with fast convergencealgorithm ensuing the multimodal search algorithm. This isdue to the inefficiency of the multimodal GA in unimodallandscape compared to the standard GA.

Meanwhile, many researches have reported the difficultyin a locally multimodal landscape, where ‘clusters’ or groupsof suboptima are located closely together in a relativelysmall region. In such landscape, it is difficult to distin-guish suboptima within that cluster. [8] proposed to reap-ply multimodal-optimizer to the subpopulation to distin-guish such suboptima. Therefore, in practice, users cannotswitch to ‘convergence-mode’ without somehow confirminglocal landscape’s unimodality.

With regards to above, another advantage of AIM comesfrom its efficiency in unimodal landscape. AIM can be recur-sively applied to its isolated subpopulation, without analysisof local landscape’s unimodality/multimodality.

In Sec. 2, we give an overview of GAs for multimodaloptimization. Then in Sec. 3, we describe the implementa-tion of AIM. In Sec. 4, we evaluates AIM’s performance onseveral benchmarks problems. Sec. 5, 6 give discussion andour conclusion.

2. RELATED WORKSIn multimodal-landscape, there are multiple optima/suboptima

and their attractors, i.e., convex surrounding each. WhenGA is applied to such landscape, after certain generations,its population will be distributed among different attractors.In Fig.1, GA population is distributed among four attractorsin the fitness landscape.

-6

-4

-2

0

2

4

6

-6

-4

-2

0

2

4

6

0

500

1000

1500

2000

2500

Figure 1: A Snapshot of GA Population on Multi-modal Landscape

The attractors can have different characteristics, whichcan largely affect the GA’s behavior[4, 5]. Ikeda et al. [4]showed a type of deception caused by presence of hetero-

geneous attractors. Such landscape are known to exist inJob Shop scheduling[4] and Fletcher-Powell’s Function[14].They identified three properties of strong attractors, whichcan attract larger number of population. They were: a) size,b) better fitness in initial sampling, and c)search efficiency.Ikeda et al. termed ‘UV-structure’ as a landscape whensuboptima have significantly stronger attractors than thatof the optima and claimed such landscape caused deception.

They proposed Innately Split Model(ISM)[4], where mul-tiple GA population are initialized on small subdivisions ofthe landscape. The division helped to remove the effect ofstrong attractors from others and ‘localize’ the search pro-cess.

An implication from [4] is that strong attractors are detri-mental to search in the rest of landscape. Assuming randomselection of the parents, strong attractors have better pos-sibility of having parents in the crossover since they havelarger number of individuals, therefore slowing search speedin other attractors. Further, with slower search speed, theindividual in weaker attractors are more likely to be replacedby offspring in stronger attractors.

To that end, it is beneficial to ’localize‘ a) selection and b)reproduction procedures, to prevent a strong attractor fromtaking away individuals from other attractors. In addition,localizing the reproduction, i.e., crossing-over parents fromsame attractors will reduce the number of offspring gener-ated outside the attractors, which will improve efficiency ofthe search. [14] referred to a) as survival selection and b) asreproductive selection.

We can also classify conventional approaches to multi-modal optimization into two categories, with regards to typeof localization is used. First group of methods, includingCrowding[2] and its variations Deterministic[9] and Proba-bilistic Crowding[10], and Species Conserving GA[8], mod-ifies the selection procedure. These method try to applyselection to the parent-child pair that are neighbors or same‘species’ in terms of Euclidean distance, so that parents arereplaced by individuals from same attractors, thereby local-ize the selection process.

The second group of methods use multi-populational im-plementation, e.g. Multi-National GA[15], ISM[4], CBN[13].These approaches, intend to assign one population to anattractor. In effect, parents from same attractor will becrossed-over, thus localizing the reproductive selection. Inaddition, ANS[14], which uses nearest-neighbor approach toselect the parents of genetic operator, can also be catego-rized in the latter group.

AIM also belongs to the latter, since it isolates subpopula-tion on each attractors. However, unlike ISM which dividesthe landscape at the predefined threshold, AIM will detectthe attractors heuristically using GA and clustering algo-rithm.

3. ADAPTIVE ISOLATION MODELIn unimodal landscape, GA population can converge to a

single optima after iteration of proper genetic operators andselection. The population forms a cluster over the attrac-tor of the optima, densely populated near the center andsparsely at the margin. In multimodal landscape, popula-tions are distributed among several attractors and formingmultiple clusters on each attractors. In both cases, we canassume that the clusters represent the local convex aroundthe optima. The main procedure of AIM is to detect such

1418

Page 3: Adaptive Isolation Model using Data Clustering for Multimodal …gpbib.cs.ucl.ac.uk/gecco2005/docs/p1417.pdf · the attractors heuristically using GA and clustering algo-rithm. 3

clusters and isolate them into independent subpopulations.The detection of clusters is done by repeatedly apply-

ing clustering algorithm to GA population. The isolationconsist of a) recording the subpopulation that compose thecluster and removing them from original population, andb) modifying the fitness function to suppress the region ofthe attractors. The isolated subpopulation is optimized in-dependently. The range of suppression is calculated fromvariance/covariance of the cluster.

CBN[13] also uses density-based clustering algorithm tosplit and merge multiple population, but does not isolatesubpopulation.

Following are the detailed description of the AIM proce-dure.

1. Apply GA step, i.e., crossover operator(UNDX) andselection(MGG), k times.

2. Run clustering algorithm on GA population.

3. If the clustering algorithm detects nc(> 2) compo-nents, isolate subpopulations which belong to compo-nents C1, ...Cnc−1. Further, supplement individual tomaintain the size of original population.

4. Adjust the fitness function to suppress the region of theisolated components Ci. The clusters are identified asGaussian components as described in Section 3.1, andthe region within 2σ of its mean are suppressed. In-dividuals within that region will have the fitness valueof fworst, which is the value of the worst individual inthe run.

5. Repeat the steps 1 through 4, until the terminationcriteria is met. The AIM is terminated after: a) fit-ness value reaches target value, b) performing certainnumber of evaluation, or c) when certain proportion ofthe population are trapped in suppressed region.

6. After termination, if isolated subpopulations exist, AIMis applied to each. In other word, AIM is applied re-cursively to subpopulations, until the subpopulation isdistributed over only one attractor.

Fig.3 and Fig.4 shows the pseudo code and the flow chartof the AIM’s algorithm.

In the the unimodal landscape, clustering algorithm de-tect only one cluster as AIM population converge to a singleoptima. Therefore AIM behaves as UNDX+MGG in uni-modal landscape. We will show in Section 4 that AIM hasconvergence speed equivalent to UNDX+MGG in unimodallandscape, thus computational cost of applying AIM to sub-population is not infeasible.

Fig.2 shows the snapshots of population when AIM is ap-plied to the Rastrigin function. In each snapshot, membersof population detected as clusters are represented by ’∆‘,while others are represented by ‘+’. The region of Gaus-sian components, denoted by circles, are suppressed in theoriginal fitness landscape. In this example, the periodicallyplaced optima induce clustering algorithm to cluster indi-viduals on several attractor together.

3.1 Clustering AlgorithmThere are several requirement for clustering algorithms

used in AIM.

Optimize

Population

?#Components

(≤ δ)

(> δ)

Every 100 replacement

Isolate

Subpopulation

Clustering

Adjust

Fitness

Function

Optimize

Subpopulation

Figure 3: Flow chart of AIM

( )

[ ]

[ ]

( )

( )

( )

( ) ( )

1

0

1 1

, {

0;

0 100

, ,...,

1

...

nc nc k

nc k

nc nc k

function AIM pop maxcomponent

gen

while gen maxgen AND pop

do

for count to

do

pop mggstep pop

done

k subpop subpop CLS pop

if k

pop subpop subpop

obj obj restrict G restrict G

nc

+ +

+

+ + −

=

< ≠ ∅

=

>

= +

= × × ×

+ =

( ) ( )

( )

1

}

i

k

done

if overlap merge subpops

AIM subpop

return

( ) ( ) ( ){ }

( ) ( ){ }

( )

( )

( )

11 1 21 2 0

1

1

: ,..., , ,..., ,..., ,...,

, 0 : , 1 :

,..., , {

, ,...,

}

n n m mn

i

n

n

pop x x x x x x

functionCLS pop return k subpop i U i tok G

functionrestrict x x G

d maharanobis G x x

if d threshold return feasible

elsereturninfeasible

= =

=

<

Figure 4: Pseudo code of AIM

1419

Page 4: Adaptive Isolation Model using Data Clustering for Multimodal …gpbib.cs.ucl.ac.uk/gecco2005/docs/p1417.pdf · the attractors heuristically using GA and clustering algo-rithm. 3

-60

-40

-20

0

20

40

60

-6 -4 -2 0 2 4 6

contourc0c1

-60

-40

-20

0

20

40

60

-6 -4 -2 0 2 4 6

contourc0c1c2c3c4c5c6

-60

-40

-20

0

20

40

60

-6 -4 -2 0 2 4 6

contourc0c1c2

(a) (b) (c)

Figure 2: Detection and Isolation in AIMThe example of AIM’s behavior in two-dimensional, rotated, and scaled Rastrigin function is shown. In the left figure(a), twocomponents c0 and c1 are found and c0 is isolated. The individuals indicated by ∆ belong to c0, and + indicate the rest ofthe population. The circle indicate the 2σ threshold line. The fitness value of the region within the circle will be suppressed.Following several GA steps (b), another component c2 is found and c1 is isolated. After more iteration (c), components c0−c6,indicated by circles, are isolated.

1. Does not require number of clusters as parameters.

2. Able to identify clusters from noisy data (noise fromdata).

3. Computationally stable.

4. Scale-invariant (preferably).

Scale-invariance ensures that the result of the clusteringwill not affected by linear transformation of the data/domain,which is an important feature for algorithm applied to real-world problems. In many instance of real-world problems,variables may have different units of measurement and dif-ferent in the order of thousands. This is referred to as anill-scaled domain[5]. Additionally, strong correlation amongvariables also induce ill-scale property into attractors. Ill-scaled domain can be problematic for real-coded GA[5] andoptimization algorithms in general.

We used a clustering algorithm called Localized MAP-EM(LMAP-EM)[6], which is an EM algorithm based onMAP learning. While LMAP-EM is model-free, we usedGaussian distribution to model components for mathemati-cal convenience.

The data is modeled as a mixture of normal distributionsas follow.

Pj (~x |~µj ) =1

mexp

„− (~x− ~µj)

T (~x− ~µj)

2σ2

«(1)

P (~x |~µ ) =

mXj=1

Pj (~x |~µj ) (2)

m is the number of components and Θj = (~µj , σ) representsthe parameters of each component.

Assuming the latent variables as

Z = {zij |i = 1, . . . , N, j = 1, . . . , m} (3)

where N is the data size. zij = 1 if data ~xi is generated byjth component Pj(~x |Θj ) and zij = 0 otherwise.

In LMAP-EM, hard-assignment is assumed, i.e., each dat-apoint is assigned to only one component. More formally,

the following equality for p (zij |xi ), the probability thatdata xi was generated by jth component is assumed.

p (zij |xi ) = {0, 1} ,

mXj=1

p (zij |xi ) = 1 (4)

Under this assumption, the log likelihood of the modelis expressed as the sum of likelihood of each component asfollows.

L (Θ) =mX

j=1

8<:log Pj

“Θj

˛˛pa

“Θj

” ”+

NX

i=1zij log Pj

“~xj

˛˛Θj

”9=; (5)

where pa (Θj) is the prior distribution of the parameter.The E-step of LMAP-EM, new latent variables Z′, which

increase the localized likelihood (6) is generated.

NXi=1

z′ij log Pj (X, Z |Θj ) ≥NX

i=1

zij log Pj (X, Z |Θj ) (6)

In the M-step, the component parameter is updated withequation (7). This is a simplified version of the M-step,localized likelihood is increased by steepest decent methodin [6].

~µ =

NXi=1

zij~x

,NX

i=1

zij (7)

After the localized likelihood has converged, we removethe subset and apply the same procedure on the remainingdata. This is repeated until all data are removed.

Since each component is estimated locally and sequen-tially, the number of components is not required in the al-gorithm. Furthermore, use of Gaussian component with fullparameters ensures that the result of clustering is consis-tent with the linear transformation of the domain. This iscomputationally stable as it does not require inversion ofthe matrix, therefore meeting 1,3,4 of the requirement men-tioned.

3.2 Customizing LMAP-EMFollowing are the implementation of the LMAP-EM that

is customized for use with AIM.With LMAP-EM, we obtain a mixture model of one uni-

form distribution and Gaussian distributions by labeling

1420

Page 5: Adaptive Isolation Model using Data Clustering for Multimodal …gpbib.cs.ucl.ac.uk/gecco2005/docs/p1417.pdf · the attractors heuristically using GA and clustering algo-rithm. 3

each data cluster as a Gaussian distribution or part of uni-form distribution. We determine type of distribution foreach cluster by comparing Akaike’s Information Criteria(AIC)of two presumed mixture models. AIC is a statistical in-dex for selecting probabilistic model with better predictionpower, calculated as follows

AIC = −2×"

log likelihood of the

estimated model

#+ 2×

"# estimated

parameters

#

We prepare one model which include the data as a newGaussian component, and another model which include datain a uniform distribution. We will select a mixture modelwith smaller AIC.

The initial component size Lz is a required parameter inthis algorithm. This parameter is statistically determinedfrom the number of data needed to sufficiently distinguishUniform distribution from Normal distribution.

3.3 UNDX and MGGIn this section, we describe the genetic operators and re-

placement scheme used in AIM. Unimodal Normal Distri-bution Crossover (UNDX)[11, 3] produce offspring by a nor-mally distributed probability function defined by the par-ents. A variety of real-coded crossover operators were de-rived from UNDX and have been successfully applied tomany benchmark problems [11, 7, 3].

UNDX produces offspring based on following Gaussiandistribution (8).

y = g +

µ−1Xi=1

wid(i) +

nXi=µ

viDe(i) (8)

where,

g =1

µ− 1

µ−1Xi=1

x(i)

is the center of the mass, and

d(i) = x(i) − g : i = 1− µ

e(i) = d(i).˛˛d(i)

˛˛

D =

˛˛˛x

(µ) − g −µ−1Xi=1

κe(i)

˛˛˛

The normally distributed variables

wi = N`0, σ2

ζ

´, vi = N

`0, σ2

η

´

whose parameters σζ and ση are suggested in [11] as

ση = 0.35.p

n− µ− 2, σζ = 1.p

µ− 2

The Minimal Generation Gap (MGG) [12] is a replace-ment scheme used to maintain diversity in GA population.In MGG, large number of offspring is generated from oneoperation of crossover. Then two individuals, one by elitistand one roulette-wheel, are selected out of all offspring andparents and returned to the population. Following is theprocedure for one cycle of MGG.

1. From Population P , select m parents at random.

2. Generate λ offspring by iterating the crossover of thesame parents.

3. selects one elitist and one roulette-selected individualparents and offspring and return to the population.

4. EXPERIMENTIn this section, the performance of proposed method is

tested on unimodal and multimodal benchmarks and artifi-cial deceptive function.

4.1 Standard Benchmark FunctionsFirst, we studied the proposed methods behavior in three

standard benchmark functions. Sphere Function F1(9), Rosen-brock’s Function F2(10), which has a nonlinear dependency,and Rastrigin’s Function F3(11). F3 has a big-valley struc-ture and 4n local optima within the given range.

F1 (~x) =

nXi=1

x2i [−5.12 ≤ xi ≤ 5.12] (9)

F2 (~x) =

nXi=2

n100

`x1 − x2

i

´2+ (xi − 1)2

o[−10 ≤ xi ≤ 10]

(10)

F3 (~x) = 10n +

nXi=1

xi sinp|xi| [−1.5 ≤ xi ≤ 2.5] (11)

Following parameters were used in the experiment. Di-mension D = 5, population size M = 400, # of parents:m = 6, offspring size: λ = 20, number of iteration: 20. ForAIM, the minimum component size: LZ = 35, and AIM in-terval k = LZ . Each run was terminated when best fitnessvalue reached 1.0−10.

ResultFor F1 and F2, AIM detected one component throughoutall of the runs, therefore its behavior and efficiency wereequivalent to that of basic UNDX+MGG in terms of eval-uation. The convergence curve of AIM and UNDX+MGGare shown in Figures 5 and 6. The vertical and the horizon-tal axis denote the best obtained fitness and the number ofevaluation.

In optimization of unimodal functions F1 and F2, the AIMpopulation quickly converge to compose a very large cluster,consisting of more than half of the population. Figure 7shows the size of the largest cluster detected in the primarypopulation while optimizing F1, F2, and F3. In multimodallandscape F3, the primary (unisolated) population quicklyforms a large cluster after all but one attractor has beenisolated.

To this end, we suggest to use the size of cluster to signalAIM to switch from multimodal search to unimodal search,for the sake of improving convergence speed. This is a prac-tical modification, since it is difficult to perform multimodalsearch at the presence of very large cluster.

In F3, AIM found 1024 suboptima after applying AIM to2123 subpopulations and evaluating Ne = 8.69 × 1012 indi-viduals in average . When use (UNDX+MGG) with smallpopulation (M = 100) in unimodal search. The unimodal

1421

Page 6: Adaptive Isolation Model using Data Clustering for Multimodal …gpbib.cs.ucl.ac.uk/gecco2005/docs/p1417.pdf · the attractors heuristically using GA and clustering algo-rithm. 3

1e-025

1e-020

1e-015

1e-010

1e-005

1

100000

0 50000 100000 150000 200000

AIMUNDX+MGG

Figure 5: Optimization of Sphere Function byUNDX+MGG and AIM. Best Fitness vs. # of Eval-uation.

1e-016

1e-014

1e-012

1e-010

1e-008

1e-006

0.0001

0.01

1

100

10000

0 50000 100000 150000 200000

AIMUNDX+MGG

Figure 6: Optimization of Rosenbrock Function byUNDX+MGG and AIM Best Obtained Fitness vs# of Evaluation.

0

50

100

150

200

250

300

350

0 1000 2000 3000 4000 5000 6000

F1F2F3

Figure 7: Size of the largest cluster in Optimizationof F1, F2, and F3

AIM IHC(σ=0.01) IHC(σ=0.05) IHC(σ=0.1)1002 1022 212.2 24.2

Table 1: # of Suboptima found with AIM and IHCwithin 8.69× 1012 evaluations.

search is terminated when the best individual is within 1/108

of suboptimal value.We compared the result of Rastrigin function with iter-

ated hill-climbing(IHC). IHC is a primitive method but canbe more effective than GA in some problems. An HC pro-cess is started from random point. After one HC processconverge, next HC is started from random initial point if thetotal number of evaluations is less than Ne. Here we simu-late HC with (1+1)+ES with a fixed mutation rate σmut.

The comparison of average number of suboptima foundin 5 runs by IHC and AIM is shown in Table 1. It showsthat step size is the critical parameter to the performanceof hill-climbing.

4.2 Locally multimodal functionThe Shubert’s function 12 is often used as a benchmark

for multimodal optimization[8].

F4 (x) =

nYi=0

5Xj=1

j cos ((j + 1) xi + j) (12)

We applied AIM to two-dimensional Shubert Function. Ithas over 1000 symmetrically distributed suboptima withinthe range (−10 ≤ xi ≤ 10). There are 3n pairs of optima,each pair within a close proximity of 4n suboptima. Thecontour of F4 is drawn with dark line in Fig.4.2.

We applied AIM parameters equivalent to the previousexperiment. The AIM was terminated when 75% of thepopulation are trapped in suppressed region, or one clusterinclude 75% of the population. in the latter case, M = 40best individuals of the largest clusters were used as initialpopulation of UNDX+MGG.

Following describes a typical behavior of AIM applied toF4. In a typical primary run of AIM, 30 clusters were de-tected before 75% of the population. shown in Figure 4.2.

The secondary runs, i.e. application of AIM to isolatedsubpopulation, can detect clusters with higher resolution.This is due to the increase in overall density of the GApopulation. Clusters detected in a typical secondary run ofAIM is shown in Figure 4.2

In the typical run AIM found 1231 suboptima after 3.39×1011 evaluation. The subpopulations of unimodal search, areshown in Figure 4.2. In five runs, AIM found 1220 subop-tima in 5.58× 1012 evaluations on average.

5. DISCUSSIONThe behavior of IHC shows the critical effect of distance

parameter in multimodal optimization. Conventionally, Se-quential Niching[1] also used penalty function to suppressregions around the known optima. However, it used a pre-defined the radii for penalty function, which was a problemdependent parameter. Since no study, to our knowledge,has proposed general method for estimating such parame-ters, the primary advantage of the proposed method comesfrom excluding such parameter from the algorithm. In thiscase, we use variance/covariance of subpopulation to esti-mate the range of attractors. In doing so, we are exploiting

1422

Page 7: Adaptive Isolation Model using Data Clustering for Multimodal …gpbib.cs.ucl.ac.uk/gecco2005/docs/p1417.pdf · the attractors heuristically using GA and clustering algo-rithm. 3

-10

-5

0

5

10

-10 -5 0 5 10

contour

c0

c1

c2

c3

c4

c5

c6

c7

c8

c9

c10

c11

c12

c13

c14

c15

c16

c17

c18

c19

c20

c21

c22

c23

c24

c25

c26

c27

c28

c29

Figure 8: Components found in a typical primaryrun of AIM

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

3 3.5 4 4.5 5 5.5 6 6.5 7

contourc0c1c2c3c4c5c6c7c8c9

c10c11c12c13c14c15

Figure 9: Components found in a typical secondaryrun of AIM

-10

-5

0

5

10

-10 -5 0 5 10

Figure 10: Subpopulation of AIM which were ap-plied Unimodal Search

the GA’s property as a populational search, by extractinginformation of the fitness landscape from the subpopulation.

Further, since many conventional works proposed to usefast converging algorithm after multimodal search, the tim-ing of switch to unimodal optimization were another crucialparameter. AIM exclude this parameter as well, since it hassame convergence speed as basic GA in unimodal landscape,and can be applied recursively to subpopulations. However,the efficiency can be improved by using largest cluster sizeto signal AIM to switch to unimodal search mode.

Another advantage of AIM is its reduced space complex-ity. While conventional multimodal optimization algorithmsuse large population size for highly multimodal landscape[8], this algorithm use relatively small population on thememory to find large number of optima. This contributesto improved convergence speed. This advantage comes fromisolating multiple optima as a cluster, and recursively ap-plying multimodal optimization algorithm.

The parameters used in AIM are standard GA parame-ters, except for the population size. While the analysis ofAIM’s behavior with different population size remain as ourfuture work, we suggest an empirically determined value of10×Lz. The suggested values for minimal components sizeLz are shown in Table 2

D ≤ 5 D ≤ 8 D ≤ 1Lz 35 40 50

Table 2: Suggested values for D ≤ 10 DimensionalDomain

6. CONCLUSIONWe proposed a framework for combining clustering algo-

rithm with GA for multimodal optimization. The methodis aimed to isolate the attractors in the landscape to bene-fit multimodal search in terms of efficiency and robustnessagainst deception.

AIM is capable of comprehensive multimodal search by

1423

Page 8: Adaptive Isolation Model using Data Clustering for Multimodal …gpbib.cs.ucl.ac.uk/gecco2005/docs/p1417.pdf · the attractors heuristically using GA and clustering algo-rithm. 3

isolating and suppressing strong attractors. It can also searchunimodal landscape at the equivalent evaluation cost as thebasic GA.

The main contribution of AIM are a) elimination of dis-tance parameter, b) recursive application, and c) reducedtime complexity for multimodal search.

We also note that AIM is scale-invariant, i.e., the result isequivalent to the linear transformation of the domain. Thisis an important property for real-world problems, where thevariable have different dimension and significantly differentscale.

In our future work, we plan to apply this method to prob-lems with deception and higher dimensionality. We are alsoimplementing the clustering algorithm to run online to re-duce the computational cost of clustering procedure.

7. REFERENCES[1] D. Beasley, D. R. Bull, and R. R. Martin. A sequential

niche technique for multimodal function optimization.Evolutionary Computation, 1(2):101–125, 1993.

[2] K. A. De Jong. An analysis of behavior of a class ofgenetic adaptive systems. Ph.D. thesis,UniversityofMichigan, 1975.

[3] K. DEB, A. Anand, and D. Joshi. A computationallyefficient evolutionary algorithm for real-parameteroptimization. Evolutionary Computation, 10, 2002.

[4] K. Ikeda and S. Kobayashi. Ga based on theuv-structure hypothesis and its application to jsp.Lecture Notes in Computer Science, 1281:273–282,March 2000.

[5] S. K. J. Sakuma. Edx. In Proceedings of GECCO2002,Late Breaking Paper, 2002.

[6] S. K. J. Sakuma. Localized map-em on interdependentprior distribution (in japanese). In Proceedings ofWorkshop on Information-Based Induction Sciences,2003.

[7] S. Kimura, I. Ono, H. Kita, and S. Kobayashi. injapanese. Transactions of the Society of Instrumentand Control Engineers, 36:1162–1171, 2000.

[8] J. Li, M. Balazs, G. Parks, and P. Clarkson. A geneticalgorithm using species conservation for multimodalfunction optimization. Evolutionary Computation,10(3):207–234, 2002.

[9] S. W. Mahfoud. Crowding and preselection revisited.In R. Manner and B. Manderick, editors, Parallelproblem solving from nature 2, pages 27–36,Amsterdam, North-Holland, 1992.

[10] O. Mengshoel and D. Goldberg. Probabilisticcrowding: Deterministic crowding with probabilisticreplacement. pages 409–416. Morgan KauRmann,1999.

[11] I. Ono and S. Kobayashi. A real-coded geneticalgorithm for function optimization using unimodalnormal distribution crossover. In Proceedings of 7thInternational Conference on Genetic Algorithms,pages 246–253, 1997.

[12] H. Satoh, M. Yamamura, and S. Kobayashi. Minimalgeneration gap model for gas considering both,exploration and exploitation. In Proceedings ofIIZUKA: Methodologies for the Concep-tion, Design,and Application of Intelligent Systems, pages 494–497,Singapore, 1996. World Scientific.

[13] F. Streichert, G. Stein, H. Ulmer, and A. Zell. Aclustering based niching ea for multimodal searchspaces. In Proceedings of Evolution Artificielle (LNCS2935), pages 293–304. Springer-Verlag, 2003.

[14] K. Takahashi. A real-coded genetic algorithm usingdistance dependent alternation model for complexfunction optimization. pages 219–227, March 2000.

[15] R. K. Ursem. Multinational gas: Multimodaloptimization techniques in dynamic environments. InProceedings of the Genetic and EvolutionaryComputation Conference, pages 19–26. MorganKaufmann, 2000.

1424