7
Infection, Genetics and Evolution 4 (2004) 345–351 Clonal reproduction and linkage disequilibrium in diploids: a simulation study Thierry de Mee ˆ us a,, François Balloux b a Génétique et Evolution des Maladies Infectieuses, Equipe Evolution des Systèmes Symbiotiques, UMR 2724 CNRS-IRD, BP 64501, 911 Av. Agropolis, 34394 Montpellier Cedex 5, France b Department of Genetics, Downing Street, University of Cambridge, Cambridge CB2 3EH, UK Received 4 February 2004; received in revised form 30 April 2004; accepted 3 May 2004 Available online 20 July 2004 Abstract Estimating the rate of clonal reproduction in natural population of diploid organisms is recognised as being problematic and even the detection of strictly clonal populations is often controversial. One well-acknowledged signature of clonal reproduction is the generation of non-random associations between loci. Linkage disequilibrium (LD) is thus often used for estimating the amount of clonal reproduction. Here we explore with computer simulations the effect of the rate of clonal reproduction on LD estimates obtained from different estimators within a comprehensive parameter range. None of the LD estimators studied is able to accurately measure the proportion of clonal (or sexual) reproduction on its own, due to strong bias, incoherent behaviour, or huge variances. The joint use of several statistics is thus recommended for the estimation rates of clonal reproduction in natural populations. We hope that our work will provide useful tools for the study of clonal diploids, many of which can only be studied with molecular markers, as it is the case for medically important parasites. © 2004 Elsevier B.V. All rights reserved. Keywords: Clonality; Parthenogenesis; Population genetics; Linkage disequilibrium; Diploids; Heterozygosity 1. Introduction Sexual reproduction is dominant in eukaryotic organisms (Charlesworth, 1989; West et al., 1999), although many or- ganisms are known to reproduce mainly or strictly clonally. The estimation of the rate of clonal reproduction is not just an academic topic (Tibayrenc, 1997). Many diploid organ- isms believed to reproduce mainly or strictly clonally are of major medical, veterinary and economical importance, including pathogenic fungi such as Candida albicans or parasitic protozoa such as Trypanosoma or Leishmania.A better understanding of the reproductive system of such or- ganisms might be crucial for planning successful long-term drug administration or vaccination programs (Milgroom, 1996; Taylor et al., 1999; Tibayrenc, 1999). Various approaches have been used for the estimation of genetic recombination, including population genetics approaches (e.g. Mulvey et al., 1991; Boerlin et al., 1996; Arnavielhe et al., 2000), phylogenetic approaches (e.g. Burt Corresponding author. Tel.: +33-467-416310; fax: +33-467-416299. E-mail address: [email protected] (T. de Meeˆ us). et al., 1996; Geiser et al., 1998; Maynard-Smith and Smith, 1998) and even functional genomics approaches (Tzung et al., 2001). However, in the near absence of theoretical models providing clear expectations under controlled con- ditions, estimating the rate of clonal reproduction in natural population appears problematic (e.g. Anderson and Kohn, 1998) and even documenting strict clonal reproduction is of- ten controversial (e.g. Tibayrenc, 1997; Vigalys et al., 1997). Despite the paucity of available theoretical models, asex- ual reproduction leads to three straightforward predictions. First, for ancient asexual lineages, the two alleles within an individual are expected to be highly divergent as they will accumulate different mutations (Birky, 1996), a phe- nomenon termed the “Meselson effect”, which has been empirically documented in bdelloid rotifers (Mark Welch and Meselson, 2000, 2001). Even in more recent asexual lin- eages, strong excess of heterozygotes is expected, and this will translate into strongly negative F is estimates (Balloux et al., 2003; Bengtsson, 2003). This excess however rapidly decays with increasing rates of sexual recombination, with extreme variance over loci for very low rates (Balloux et al., 2003). A second clear prediction is that limited sexual 1567-1348/$ – see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.meegid.2004.05.002

Clonal reproduction and linkage disequilibrium in diploids: a simulation study

Embed Size (px)

Citation preview

Page 1: Clonal reproduction and linkage disequilibrium in diploids: a simulation study

Infection, Genetics and Evolution 4 (2004) 345–351

Clonal reproduction and linkage disequilibriumin diploids: a simulation study

Thierry de Meeusa,∗, François Ballouxba Génétique et Evolution des Maladies Infectieuses, Equipe Evolution des Systèmes Symbiotiques,

UMR 2724 CNRS-IRD, BP 64501, 911 Av. Agropolis, 34394 Montpellier Cedex 5, Franceb Department of Genetics, Downing Street, University of Cambridge, Cambridge CB2 3EH, UK

Received 4 February 2004; received in revised form 30 April 2004; accepted 3 May 2004Available online 20 July 2004

Abstract

Estimating the rate of clonal reproduction in natural population of diploid organisms is recognised as being problematic and even thedetection of strictly clonal populations is often controversial. One well-acknowledged signature of clonal reproduction is the generation ofnon-random associations between loci. Linkage disequilibrium (LD) is thus often used for estimating the amount of clonal reproduction.Here we explore with computer simulations the effect of the rate of clonal reproduction on LD estimates obtained from different estimatorswithin a comprehensive parameter range. None of the LD estimators studied is able to accurately measure the proportion of clonal (orsexual) reproduction on its own, due to strong bias, incoherent behaviour, or huge variances. The joint use of several statistics is thusrecommended for the estimation rates of clonal reproduction in natural populations. We hope that our work will provide useful tools forthe study of clonal diploids, many of which can only be studied with molecular markers, as it is the case for medically important parasites.© 2004 Elsevier B.V. All rights reserved.

Keywords:Clonality; Parthenogenesis; Population genetics; Linkage disequilibrium; Diploids; Heterozygosity

1. Introduction

Sexual reproduction is dominant in eukaryotic organisms(Charlesworth, 1989; West et al., 1999), although many or-ganisms are known to reproduce mainly or strictly clonally.The estimation of the rate of clonal reproduction is not justan academic topic (Tibayrenc, 1997). Many diploid organ-isms believed to reproduce mainly or strictly clonally areof major medical, veterinary and economical importance,including pathogenic fungi such asCandida albicansorparasitic protozoa such asTrypanosomaor Leishmania. Abetter understanding of the reproductive system of such or-ganisms might be crucial for planning successful long-termdrug administration or vaccination programs (Milgroom,1996; Taylor et al., 1999; Tibayrenc, 1999).

Various approaches have been used for the estimationof genetic recombination, including population geneticsapproaches (e.g.Mulvey et al., 1991; Boerlin et al., 1996;Arnavielhe et al., 2000), phylogenetic approaches (e.g.Burt

∗ Corresponding author. Tel.:+33-467-416310; fax:+33-467-416299.E-mail address:[email protected] (T. de Meeus).

et al., 1996; Geiser et al., 1998; Maynard-Smith and Smith,1998) and even functional genomics approaches (Tzunget al., 2001). However, in the near absence of theoreticalmodels providing clear expectations under controlled con-ditions, estimating the rate of clonal reproduction in naturalpopulation appears problematic (e.g.Anderson and Kohn,1998) and even documenting strict clonal reproduction is of-ten controversial (e.g.Tibayrenc, 1997; Vigalys et al., 1997).

Despite the paucity of available theoretical models, asex-ual reproduction leads to three straightforward predictions.First, for ancient asexual lineages, the two alleles withinan individual are expected to be highly divergent as theywill accumulate different mutations (Birky, 1996), a phe-nomenon termed the “Meselson effect”, which has beenempirically documented in bdelloid rotifers (Mark Welchand Meselson, 2000, 2001). Even in more recent asexual lin-eages, strong excess of heterozygotes is expected, and thiswill translate into strongly negativeFis estimates (Ballouxet al., 2003; Bengtsson, 2003). This excess however rapidlydecays with increasing rates of sexual recombination, withextreme variance over loci for very low rates (Ballouxet al., 2003). A second clear prediction is that limited sexual

1567-1348/$ – see front matter © 2004 Elsevier B.V. All rights reserved.doi:10.1016/j.meegid.2004.05.002

Page 2: Clonal reproduction and linkage disequilibrium in diploids: a simulation study

346 T. de Meeˆus, F. Balloux / Infection, Genetics and Evolution 4 (2004) 345–351

recombination will lead to high occurrence of identicalgenotypes (Burt et al., 1996). Finally clonal reproductionis expected to generate non-random associations betweenloci (Tibayrenc et al., 1991; Taylor et al., 1999), as clonalreproduction mimics complete physical linkage over the en-tire genome. Association among loci can be assessed eitherthrough the study of linkage disequilibrium (LD) betweenpairs of loci and/or through multilocus association mea-sures (Taylor et al., 1999). The sensitivity of various linkagemeasures to allele frequencies and physical linkage has al-ready been investigated (e.g.Hedrick, 1987). It is howeverunknown so far how various proposed estimators of LD cor-relate to the rate of clonal reproduction, and how sensitivethese estimators might be to other population parameters.

In a previous paper we addressed both analytically andthrough simulations the expectations of single locus popula-tion genetic quantities such as effective sizes andF-statisticsunder variable rates of clonal reproduction (Balloux et al.,2003). Here we extend our investigations to the effect of vari-able rates of clonal reproduction on frequencies of repeatedmultilocus genotypes and linkage disequilibrium. However,as multiple loci population genetics rapidly gets intractable,we limit ourselves to simulations. The behaviour of variousmeasures of association between loci is explored in an ide-alised framework of population structure (Wright’s 1951is-land model) in order to answer the question: can linkage dis-equilibrium accurately measure clonality rate? Neither theoccurrence of repeated identical genotypes nor any linkagedisequilibrium estimator under investigation is able to ac-curately estimate the amount of sexual reproduction on itsown. However, the joint use of various measures appears aspotentially useful.

2. Materials and methods

2.1. Basic assumptions of the model

We consider a subdivided monoecious population ofdiploid individuals, which reproduce clonally with probabil-ity c, sexual reproduction occurring at the complementaryprobability (1−c). Sexual reproduction in the model followsrandom union of gametes, so that self-fertilisation is allowedat a rate of 1/N, with N being the number of individual withina sub-population. In our model, individuals, rather than ga-metes, migrate following an island model (Wright, 1951) ata ratem, implying that a migrant has an equal probabilityto reach any of the sub-populations. We further assume sta-ble census sizes and population structure and no selection.The life cycle involves non-overlapping generations andjuvenile migration. The precise sequence goes as follows:

1. Adult reproduction and subsequent death2. Juvenile dispersal3. Random regulation of juveniles, the survivors reaching

adulthood

Drift, migration, mutation and the effect of clonality arethus all the forces that drive the evolution of the geneticcomposition of this population.

2.2. Simulations

We used an individual-based simulation approach as im-plemented in the software Easypop (version 1.7.4) (Balloux,2001) to generate all population genetics data sets. For allsimulations, we used 20 loci with a mutation rate of 10−5.Mutations had an equivalent probability to generate any ofthe 99 possible allelic states. This number of allelic states ishigh enough to keep the probability to get indistinguishablealleles through different mutational events (homoplasy) low,and it is small enough for the simulations not being too slow.At the start of the simulation, genetic diversity was set to themaximum possible value at the first generation. This meansthat at the first generation all the 99 possible alleles wererandomly assigned in all individuals of all demes. The sim-ulation was then run for 10,000 generations, at which pointall statistics measured in Easypop (Fis, Fst, Hs, Ht and thenumber of alleles) had reached equilibrium.Wright’s (1965)Fis and Fst are the heterozygote deficit measured withindemes and a measure of population differentiation betweendemes respectively. The exact definitions of gene diversitieswithin demes (Hs) and overall (Ht,) can be found inNei(1987). For all cases, we simulated 50 sub-populations com-prising a fixed number of either 50, 100 or 1000 individuals.Four migration rates (0, 0.001, 0.01 and 0.1) and seven ratesof sexual reproduction (0, 0.001, 0.01, 0.05, 0.1, 0.2 and1) were simulated for each case of population subdivision.Overall this corresponded thus to 3× 4 × 7 = 84 parame-ters sets. For each parameter set, 20 replicates were run (i.e.1680 simulations). Some additional simulations were alsorun for 50% of sex rate. The simulations presented here havebeen carefully checked against their analytical expectations,whenever available (Balloux et al., 2003).

2.3. Parameters analysed

Linkage disequilibria between pairs of loci were esti-mated by different parameters. The correlation coefficientRGGD of Garnier-Géré and Dillmann (1992)is a mod-ified version of Weir’s (1979) coefficient of correlationbetween two loci.Ohta’s (1982)D2

is, D2′isD2

st, D2′st and D2

itare the within, the between sub-populations and total com-ponents of linkage disequilibrium in a subdivided popu-lation. The subscript i stands for ‘individuals’, the s for‘sub-populations’ and the t for ‘total population’. Thus,Dis

2

is the variance component of linkage disequilibrium mea-sured within individuals within sub-populations, whileDit

2

is the same measure made within all individuals irrespectiveof the sub-population they come from. Ohta’sD have alsothe interesting property to be able to discriminate betweendifferent causes generating the observed LD. According toOhta (1982)D2

is < D2st andD2′

is > D2′st hold when migration

Page 3: Clonal reproduction and linkage disequilibrium in diploids: a simulation study

T. de Meeˆus, F. Balloux / Infection, Genetics and Evolution 4 (2004) 345–351 347

is limited, andD2is > D2

st and D2′is < D2′

st when epistaticselection is causing LD.Black and Krafsur (1985)gave anadditional condition when epistatic selection occurs only ina few sub-populations, in which case one should observeD2

is > D2st and D2′

is > D2′st. All these LD estimators were

computed by Genetix 4.04 (Laboratoire Génome, Popula-tions, Interactions, CNRS UMR 5000, Université de Mont-pellier II, Montpellier, France), which has incorporated theLinkdos program (Garnier-Géré and Dillmann, 1992).

Multilocus genotypic diversity and multilocus linkage dis-equilibrium were computed with Multilocus 1.2 (Agapowand Burt, 2001). The genotypic diversity is defined as theprobability that two individuals taken at random have dif-ferent genotypes. The formula for genotypic diversity is:

G = n

n − 1

(1 −

n∑i

p2i

)(1)

where pi, is the frequency of theith genotype andn thesample size. This statistic varies between 0 (all genotypesare identical) and 1 (all genotypes are different).

Multilocus linkage disequilibrium was estimated asIA theindex of association (Brown et al., 1980; Maynard-Smithet al., 1993; Haubold et al., 1998) and asrd which re-moves the dependency of the measure on the number of loci(Agapow and Burt, 2001). Consider first a single locusj thathas been analysed inn haploid isolates. Letni be the numberof isolates that have theith allele. The probability that twohaploid isolates have a different allele at the locus will be:

hj = 1 −∑

i ni(n − 1)

n(n − 1)= n

n − 1

(1 −

∑i

p2i

)(2)

This is also the mean distance (either 0 or 1) between alln(n − 1)/2 possible pairs of isolates, and the variance ofthose distances will be varj = hj(1 − hj). Generalising form loci, let D be the distance between two isolates over allloci (i.e. the number of loci at which they differ). Then,the average distance over all pair-wise comparisons will be:D = ∑

j hj, and the variance of the distances will

VD =∑

j

varj + 2∑

j

∑k

covj,k, (3)

where covj, k is the covariance between distance at locusjand distance at locusk and the double summation is over allm(m−1)/2 possible pairs of loci. If there are no associationsbetween loci, then all these covariances are expected to bezero. Thus:

IA = VD∑j varj

− 1 (4)

will be equal to zero if there is no linkage disequilibrium;this is the index of association. In practice, the programMultilocus 1.2 calculatesIA using (4) where:

VD =∑

p D2p − ((

∑p Dp)2)/(n(n − 1))/2

n(n − 1)/2(5)

and whereDp is the total distance between the two isolatesof the pairp, over all loci, and:

varj =∑

j d2j − ((

∑j dj)

2)/(n(n − 1)/2)

n(n − 1)/2(6)

wheredj is the distance (0 or 1) at locusj;. Substituting (3)into (4) yields:

IA = 2∑

j

∑k covj,k∑

j varj(7)

IA is thus a generalised measure of linkage disequilibriumand has an expected value of zero if there is no associationbetween loci (Maynard-Smith et al., 1993).

In the presence of linkage disequilibrium,IA is not in-dependent on the number of loci included in the analysis,making comparisons among studies difficult (Brown et al.,1980; Maynard-Smith et al., 1993). To avoid this problem,Agapow and Burt (2001)consider a slightly modified statis-tic rD, which removes the dependency on number of loci.

rD =∑

j

∑k covj,k∑

j

∑k

√varjvark

= VD −∑j varj

2∑

j

∑k

√varjvark

(8)

rD has a form similar to a correlation coefficient (hence thesymbol r, with the subscriptD referring to distances), andwill have a maximum value of 1. Diploids are handled in ananalogous way to haploids, noting that the distance betweentwo individuals at a particular locus may then take values of0, 1, or 2.

The above parameters were estimated from a sub-sampleof 20 demes comprising 50 diploid individuals each, withthe exception ofG, IA and rD, estimated with Multilocus1.2 (Agapow and Burt, 2001). Handling multiple popula-tions for the 1680 simulations turned out to be impossiblewith this software, so that we had to limit ourselves to 50individuals from a single randomly chosen sub-population.The measures given are thus estimated from samples, i.e. aremimetic of what could be empirically obtained from wildclonal populations.

3. Results

3.1. Linkage disequilibrium between locus pairs

With the exception of simulations with no migrationwhere huge variances were observed, the coefficient ofcorrelation RGGD of Garnier-Géré and Dillmann (1992)progressively decreases with recombination (Fig. 1). Thevariances are large, especially for low levels of sex, but stillallow discriminating between small differences in clonalrates within the same pattern of population structure (sameN andm). However, the measure is strongly dependent onpopulation size and migration rate. Its mean stays markedlyabove 0 in panmictic sub-populations (Fig. 2). The bias liesaround 0.14 but tends to increase with decreasing deme

Page 4: Clonal reproduction and linkage disequilibrium in diploids: a simulation study

348 T. de Meeˆus, F. Balloux / Infection, Genetics and Evolution 4 (2004) 345–351

(a)

m =0.01, N =100

0

0.2

0.4

0.6

0.8

1

0 0.001 0.01 0.05 0.1 0.2 0.5 1Sex rate

rGG

D

(b)

m =0.01, N =50

0

0.2

0.4

0.6

0.8

1

0 0.001 0.01 0.05 0.1 0.2 0.5 1Sex rate

rG

GD

(c)

m =0.1, N =100

0

0.2

0.4

0.6

0.8

1

0 0.001 0.01 0.05 0.1 0.2 0.5 1Sex rate

rG

GD

(d)

m =0.1, N =50

0

0.2

0.4

0.6

0.8

1

0 0.001 0.01 0.05 0.1 0.2 0.5 1Sex rate

rG

GD

Fig. 1. Correlation coefficient between pairs of locirGGD plotted as a function of the rate of sexual reproduction for different migration rates (m) anddeme sizes (N). Confidence intervals of means are corrected for sample size (20 replicates).

size and migration rate (Fig. 2), migration rates ofm = 0being a particular case. Whereas this bias can be partlyaccounted for the sampling variance inherent to finite pop-ulations, it remains even for infinitely large populations, asthis correlation coefficient cannot take negative values.

Ohta’s (1982)components of linkage disequilibrium showsome puzzling behaviours.Dis

2 performs well (decreasingwith recombination) only when demes are large (n = 1000)and exchange some migrants (m > 0), otherwise the profilesare clearly aberrant with a maximum for very low rates of

Sex rate = 1

0.05

0.1

0.15

0.2

0.25

0.3

0.35

m=0

, N=1

000

m=0

, N=1

00

m=0

, N=5

0

m=0

.001

, N=1

000

m=0

.001

, N=1

00

m=0

.001

, N=5

0

m=0

.01,

N=1

000

m=0

.01,

N=1

00

m=0

.01,

N=5

0

m=0

.1, N

=100

0

m=0

.1, N

=100

m=0

.1, N

=50

All

rG

GD

Fig. 2. Bias of rGGD given for different migration rates (m) and demesizes (N) for simulations of strictly sexual populations (i.e.c = 0), forwhich no linkage should be expected.

recombination (i.e. 0.001) and then a drop at 100% of sexrate.Dst

2 behaves similarly toFst and thus basically repre-sents a measure of differentiation between demes (Fig. 3).D2′

is is strongly correlated toD2st (Pearson’s correlationR =

0.9996).D2′st displays a behaviour similar toDis

2 despite apoor correlation between both statistics (R2 = 0.011). Inter-preting these patterns according toOhta (1982)andBlackand Krafsur (1985), we obtain among the 1680 simulations,1295 (with Nm around 1.6) consistent with LD generatedthrough restricted migration, 52 consistent with epistatic se-lection in all demes (51 cases with 0 sex and one with 0.001sex rate withNmaround 30–50) and 333 cases (40 cases with

Dst2 = 0.7956Fst

2+ 0.1868 Fst + 0.008R 2 = 0.9965

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

F st

Dst

2

Fig. 3. Relationship betweenFst andD2st over all simulations.

Page 5: Clonal reproduction and linkage disequilibrium in diploids: a simulation study

T. de Meeˆus, F. Balloux / Infection, Genetics and Evolution 4 (2004) 345–351 349

(b)

m =0.01, N =50

0.2

0.4

0.6

0.8

1

0 0.001 0.01 0.05 0.1 0.2 0.5 1Sex rate

Gen

otyp

ic d

iver

sity

(a)

m =0.01, N =100

0.2

0.4

0.6

0.8

1

0 0.001 0.01 0.05 0.1 0.2 0.5 1

Sex rate

Gen

otyp

ic d

iver

sity

(c)

m =0.1, N =100

0.2

0.4

0.6

0.8

1

0 0.001 0.01 0.05 0.1 0.2 0.5 1Sex rate

Gen

otyp

ic d

iver

sity

(d)

m =0.1, N =50

0.2

0.4

0.6

0.8

1

0 0.001 0.01 0.05 0.1 0.2 0.5 1Sex rate

Gen

otyp

ic d

iver

sity

Fig. 4. Multilocus genotypic diversity plotted as a function of sex rate for different migration rates (m) and deme sizes (N). Confidence intervals ofmeans are corrected for sample size (20 replicates).

panmixia withNm around 40) fitting a model of epistaticselection in few demes.

3.2. Multilocus linkage disequilibrium

The multilocus genotypic diversity is strictly unbiased for100% sex rate but, as sex rate decreases, it becomes stronglydependent on migration rate and population size and displayshuge variances (Fig. 4). This statistic rarely reaches 0 in100% clonal populations.

IA behaves very similarly toD2is with which it is strongly

correlated (R = 0.82). It displays thus the same problemsas D2

is. Fig. 5 shows an example of such problematic be-haviours. The standardised version ofIA, rD displays thesame properties asRGGD except that it is unbiased in pan-

m =0, N =1000

-0.1

0.3

0.7

1.1

0 0.001 0.01 0.05 0.1 0.2 1Sex rate

I A

Fig. 5. The index of associationIA plotted as a function of the rate ofsexual reproduction for simulations with population sizeN = 1000, andno migration (m = 0). Confidence intervals of means are corrected forsample size (20 replicates).

mictic populations (i.e. 0 is comprised within the 95% con-fidence interval of the mean for panmictic sub-populations),but becomes dependent on the structure of the population(i.e. of N andm) as sex rate decreases. Moreover,rD gener-ally displays higher variance thanrGGD for low rates of sex,and lower variances for high levels of sex rate (e.g.Fig. 6).The variance is such that values mimetic to panmixia (i.e.≤0) can be observed in strictly clonal populations.

4. Discussion

None of the measures of linkage disequilibrium we inves-tigated, being it measured by locus pairs or on a multilocusbasis, performed well enough to allow for accurate infer-ences on the rate of sexual reproduction. Some estimators(Ohta’sD2, IA) even seem to measure unrelated quantities(migration rate, sub-population size and/or their product).For D2

is, D2′st and IA, this probably stems from the fact that

these parameters were initially designed for gametes (orhaploids). With diploids, heterozygosity introduces a prob-lem that appears difficult to handle, especially so becausehigh rates of asexual reproduction generates high heterozy-gote excess (Balloux et al., 2003; Bengtsson, 2003). Ad-ditional simulations were computed for haploids (data notshown). Ohta’sD2

is and D2′st still display similar problems

but the behaviour ofIA becomes comparable torD. Thedetection of epistatic selection by Ohta’sD is mimickedby strict clonal reproduction, and the criterion proposedby Black and Krafsur (1985)for the detection of epistatic

Page 6: Clonal reproduction and linkage disequilibrium in diploids: a simulation study

350 T. de Meeˆus, F. Balloux / Infection, Genetics and Evolution 4 (2004) 345–351

(a)

m =0.1, N =50

0

0.2

0.4

0.6

0.8

1

0 0.001 0.01 0.05 0.1 0.2 0.5 1

Sex rate

rD

(b)

m =0.1, N =50

0

0.2

0.4

0.6

0.8

1

0 0.001 0.01 0.05 0.1 0.2 0.5 1Sex rate

rG

GD

for

one

pop

ulat

ion

(c)

m =0.01, N =100

0

0.2

0.4

0.6

0.8

1

0 0.001 0.01 0.05 0.1 0.2 0.5 1

Sex rate

rD

(d)

m =0.01, N =100

0

0.2

0.4

0.6

0.8

1

0 0.001 0.01 0.05 0.1 0.2 0.5 1Sex rate

rG

GD

for

one

pop

ulat

ion

Fig. 6. The correlation coefficientsrD (panels a and c) andrGGD (panels b and d) given for increasing rates of sexual reproduction. For comparisonpurposes,RGGD was computed on the first 50 individuals of the first sub-population as forrD (seeSection 2). Confidence intervals of means are correctedfor sample size (20 replicates).

selection in a sub-sample of demes is not supported by oursimulations, even in strictly sexual populations.

Garnier-Géré and Dillmann’s (1992)correlation coeffi-cient performed best for comparisons between different datasets sharing the same population structure, but it is highly bi-ased (depending on sample and/or effective population sizes)and does not allow for comparisons between populationswith different population structure. Its multi-locus equiva-lent rD is not biased around 100% of sex but suffers fromincreased associated variance and bias when sexual recom-bination is rare. Generally, bothrD and rGGD display hugevariances, in particular for clonal populations. This impliesthat the statistical power of these estimators will be restrictedin purely clonal populations, in particular for the compar-ison of different data sets. Recombination rates in haploidorganisms could be easier to infer, however, the huge vari-ances observed in the strictly clonal simulations is not partic-ularly encouraging, and the dependency on population sizeand migration pattern (orNm) will not be overcome.

The relatively poor performance (in the context of theestimation of rates of clonal reproduction) of the statisticsinvestigated in the present paper leaves us with as bestoption the joint use of an LD estimator (G, rGGD and rD)together with a single locus statistic. As clonal reproductiongenerates massive heterozygote excess (Balloux et al., 2003;Bengtsson, 2003), one obvious choice is theF-statisticFis,

a measure of the deviation from random mating withinsub-populations. Fixed heterozygosity at nearly all loci willprovide strong evidence for strictly clonal populations, struc-tured in rather small demes. Additionally very low levelsof sex may be revealed by high variance among loci inFis,with some loci displaying extreme heterozygote deficit andothers extreme excess (Balloux et al., 2003). Large varianceamong loci in mutation rates and/or number of possible al-lelic states inflatesFis variance as well but to a much lowerextent (data not shown). The accuracy of inferences basedonFis will however strongly depend on the a priori samplingscheme. The sampling strategy should insure that there isno hidden genetic structuring within the units defined assub-populations (no Wahlund effect), so that each prede-fined sub-population behaves as a single reproductive unit(a deme). Pooling differentiated reproductive units withinsingle sub-population will artificially inflateFis estimates.Therefore without a good understanding of the genetic sub-divisions within the population, the interpretation of thisparameter alone in terms of sex rates could easily lead tomisleading conclusions. For many organisms, it is difficult toknow a priori where the boundaries lie between demes. Onepotential solution is to perform a cumulative pooling of sam-ples to estimate the size of reproductive unit as suggested byGoudet et al. (1994). As long as samples from a same demeare pooled together, no significant change inFis is expected.

Page 7: Clonal reproduction and linkage disequilibrium in diploids: a simulation study

T. de Meeˆus, F. Balloux / Infection, Genetics and Evolution 4 (2004) 345–351 351

However, when a sample from a different reproductive unitis included, a significant increase inFis is expected.

The simultaneous use ofFis (mean and variance) and ofthe LD estimators (G, rGGD and rD) seems thus to repre-sent the best way to evaluate the amount of clonal repro-duction to date, even if far from ideal. Strongly negativeFiswith a low variance among loci, and strong linkage betweenloci are expected for strictly clonal populations. High vari-ance inFis with strong linkage between loci is expected un-der predominantly clonal modes of reproduction with rareevents of sex (less than 5%). The existence of linkage dis-equilibrium between physically unlinked loci could repre-sent intermediate situations and the absence of any signalis obviously expected in populations with frequent geneticrecombination. A next step of our study would be to anal-yse the power of the different statistics that may be usedto infer linkage disequilibrium. This remains to be done.We hope such studies will help to provide useful infer-ence tools for the study of clonal diploids, particularly thosethat can only be studied by molecular approaches, such asmany pathogenic agents (e.g.Candida, Trypanozoma, andLeishmania).

Acknowledgements

We thank Jerome Goudet, Francois Renaud, Franck Prug-nolle, Yannis Michalakis and Michel Tibayrenc for veryinspiring conversations, Bengt O. Bengtsson for a criticalreading of the manuscript, and Louis Ski for his strongsupport. Thierry de Meevis is supported by the CNRS.

References

Agapow, P.M., Burt, A., 2001. Indices of multilocus linkage dis-equilibrium. Mol. Ecol. Notes 1, 101–102.

Anderson, J.B., Kohn, L.M., 1998. Genotyping, gene genealogies andgenomics bring fungal population genetics above ground. Trends Ecol.Evol. 13, 444–449.

Arnavielhe, S., De Meeus, T., Blancart, A., Mallie, M., Renaud, F.,Bastide, J.M., 2000. Multicentric study ofCandida albicansisolatesfrom non-neutropenic patients: population structure and mode ofreproduction. Mycoses 43, 109–117.

Balloux, F., 2001. Easypop (Version 1.7): a computer program forpopulation genetics simulations. J. Hered. 92, 301–302.

Balloux, F., Lehmann, L., De Meeus, T., 2003. The population geneticsof clonal and partially clonal diploids. Genetics 164, 1635–1644.

Bengtsson, B.O., 2003. Genetic variation in organisms with sexual andasexual reproduction. J. Evol. Biol. 16, 189–199.

Birky Jr., C.W., 1996. Heterozygosity, heteromorphy, and phylogenetictrees in asexual eukaryotes. Genetics 144, 427–437.

Black VI, W.C., Krafsur, E.S., 1985. A FORTRAN program forthe calculation and analysis of two-locus linkage disequilibriumcoefficients. Theor. Appl. Genet. 70, 491–496.

Boerlin, P., Boerlin-Petzold, F., Goudet, J., Pagani, J.L., Chave,J.P., Bille, J., 1996. TypingCandida albicansoral isolates fromhuman immunodeficiency virus-infected patients by multilocus enzymeelectrophoresis and DNA fingerprinting. J. Clin. Microbiol. 34, 1235–1248.

Brown, A.H.D., Feldman, M.W., Nevo, E., 1980. Multilocus structure ofnatural populations ofHordeum spontaneum. Genetics 96, 523–536.

Burt, A., Carter, D.A., Koenig, G.L., White, T.J., Taylor, J.W., 1996.Molecular markers reveal cryptic sex in the human pathogenCoccidioides immitis. Proc. Natl. Acad. Sci. U.S.A. 93, 770–773.

Charlesworth, B., 1989. The evolution of sex and recombination. TrendsEcol. Evol. 9, 264–267.

Garnier-Géré, P., Dillmann, C., 1992. A computer program for testingpairwise linkage disequilibria in subdivided populations. J. Hered. 83,239.

Geiser, D.M., Pitt, J.I., Taylor, J.W., 1998. Cryptic speciation andrecombination in the aflatoxin-producing fungusAspergillus flavus.Proc. Natl. Acad. Sci. U.S.A. 95, 388–393.

Goudet, J., De Meeus, T., Day, A.J., Gliddon, C.J., 1994. The differentlevels of population structuring of the dogwhelk,Nucella lapillus,along the south Devon coast. In: Beaumont, A.R. (Ed.), Genetics andEvolution of Aquatic Organisms. Chapman & Hall, London, pp. 81–95.

Haubold, B., Travisano, M., Rainey, P.B., Hudson, R.R., 1998. Detectinglinkage disequilibrium in bacterial populations. Genetics 150, 1341–1348.

Hedrick, P.W., 1987. Gametic disequilibrium measures: proceed withcaution. Genetics 117, 331–341.

Mark Welch, M.D., Meselson, M.S., 2000. Evidence for the evolutionof bdelloid rotifers without sexual reproduction or genetic exchange.Science 288, 1211–1215.

Mark Welch, M.D., Meselson, M.S., 2001. Rates of nucleotide substitutionin sexual and anciently asexual rotifers. Proc. Natl. Acad. Sci. U.S.A.98, 6720–6724.

Maynard-Smith, J., Smith, N.H., 1998. Detecting recombination fromgene trees. Mol. Biol. Evol. 15, 590–599.

Maynard-Smith, J., Smith, N.H., O’Rourke, M., Spratt, B.G., 1993. Howclonal are bacteria? Proc. Natl. Acad. Sci. U.S.A. 90, 4384–4388.

Milgroom, M.G., 1996. Recombination and the multilocus structure offungal populations. Annu. Rev. Phytopathol. 34, 457–477.

Mulvey, M., Aho, J.M., Leberg, P.L., Smith, M.H., 1991. Comparativepopulation genetics structure of a parasite (Fascioloides magna) andits definitive host. Evolution 45, 1628–1640.

Nei, M., 1987. Molecular Evolutionary Genetics. Columbia UniversityPress, New York.

Ohta, T., 1982. Linkage disequilibrium due to random genetic drift in finitesubdivided populations. Proc. Natl. Acad. Sci. U.S.A. 79, 1940–1944.

Taylor, J.W., Geiser, D.M., Burt, A., Koupopanou, V., 1999. Theevolutionary biology and population genetics underlying fungal straintyping. Clin. Microbiol. Rev. 12, 126–146.

Tibayrenc, M., 1997. Are Candida albicans natural populationssubdivided? Trends Microbiol. 5, 253–254.

Tibayrenc, M., 1999. Toward an integrated genetic epidemiology ofparasitic protozoa and other pathogens. Ann. Rev. Genet. 33, 449–477.

Tibayrenc, M., Kjellberg, F., Arnaud, J., Oury, B., Breniere, F., Darde,M.L., Ayala, F.J., 1991. Are eukaryotic microorganisms clonal orsexuals? A population genetics vantage. Proc Natl. Acad. Sci. U.S.A.88, 5129–5133.

Tzung, K.W., Williams, R.M., Scherer, S., Federspiel, N., Jones, T.,Hansen, N., Bivolarevic, V., Huizar, L., Surzycki, R., Tamse, R., Davis,R.W., Agabian, N., 2001. Genomic evidence for a complete sexualcycle in Candida albicans. Proc. Natl. Acad. Sci. U.S.A. 98, 3249–3253.

Vigalys, R., Gräser, Y., Presber, W., 1997. Response from Vigalys et al.Trends Microbiol. 5, 254–256.

West, S.A., Lively, C.M., Read, A.F., 1999. A pluralist approach to sexand recombination. J. Evol. Biol. 12, 1003–1012.

Wright, S., 1951. The genetical structure of populations. Ann. Eugenics15, 323–354.

Wright, S., 1965. The interpretation of population structure byF-statisticswith special regard to system of mating. Evolution 19, 395–420.

Weir, B.S., 1979. Inferences about linkage disequilibrium. Biometrics 35,235–254.