2

Click here to load reader

Gene similarity networks provide tools for understanding ... fileGene similarity networks provide tools for understanding eukaryote origins and evolution ... (Archaebacteria, Eubacteria,

Embed Size (px)

Citation preview

Page 1: Gene similarity networks provide tools for understanding ... fileGene similarity networks provide tools for understanding eukaryote origins and evolution ... (Archaebacteria, Eubacteria,

Gene similarity networks provide tools forunderstanding eukaryote origins and evolutionDavid Alvarez-Poncea,1,2, Philippe Lopezb, Eric Baptesteb, and James O. McInerneya,3,4aDepartment of Biology, National University of Ireland Maynooth, Maynooth, Ireland; and bCentre National de la Recherche Scientifique, Unité Mixte deRecherche 7138, Systématique, Adaptation, Evolution, Université Pierre et Marie Curie, Paris, France

AUTHOR SUMMARY

The evolutionary relationships between thethree domains of life (Archaebacteria,Eubacteria, and Eukaryotes) remain highlydebated (1, 2). Although phylogenetic treesrepresent valuable tools for assessing theserelationships, these tools also have a numberof shortcomings, particularly when it comes tostudying deep evolutionary events. Here, weuse a gene similarity network, whose nodesand edges represent genes and sequencesimilarities, respectively, to study the originand early evolution of Eukaryotes. This typeof analysis allows us to draw evolutionaryconclusions from a previously untappedsource of information: the topology of genesimilarity graphs and subgraphs. Our networkexhibits multiple signatures characteristic ofthe chimerical nature of Eukaryotes, consis-tent with the merger of an archaebacteriumand a eubacterium, each of which contributeddifferent functions and cell compartments tothe eukaryotic cell. Unexpectedly, we revealthat the number of eubacterium-derivedeukaryotic genes is highly variable, resultingin certain eukaryotic genomes containingmore archaebacterium-derived genes thaneubacterium-derived genes.Addressing questions about ancient evolu-

tion, such as the emergence and early evolu-tion of Eukaryotes, which likely arose ∼2 billion years ago, isa challenging task. Molecular sequences probably contain pre-cious evolutionary signatures of early Eukaryote history, and thetask for evolutionary biologists is to uncover this information.Phylogenetic trees have often been used to represent the rela-tionships between the three domains of life (3). These trees,however, suffer from a series of shortcomings, particularly whenthey are used to study the relationships among organisms span-ning the three domains of life. Tree reconstruction requires thedefinition of gene families and the alignment of all sequencesthat show sufficient similarity to be amenable to further com-parisons. Therefore, the type of information used in phylogeneticanalyses is restricted to relatively conserved homologs, and themost divergent gene forms cannot be readily used in such anal-yses. This constraint may hamper the resolution of questionsregarding ancient evolution, for which the study of distant andancient homologs, in addition to relatively conserved homologs,may be very informative.We designed a protocol that increases the amount of ancient

evolutionary information amenable to evolutionary analyses. Weconstructed a network that displayed the similarities among allproteins encoded in the genomes of 52 Archaebacteria, 52Eubacteria, 14 representatives of all the main eukaryotic line-ages, and their mobile genetic elements. The resulting network,which encompasses more than 445,000 sequences connected by∼8 million edges, provides a previously untapped source of in-

formation, namely, dozens of thousands of subgraphs showingboth close and distant homology relationships between thesesequences. The topology of a number of these subgraphs isconsistent with a chimerical origin of Eukaryotes resulting fromthe fusion of an archaebacterium and a eubacterium. These genefamilies contain two groups of eukaryotic genes: one group

Eubacterialgenes

Diverged ~4 billion years ago

Archaebacterialgenes

Eukaryotic genes of archaebacterial ancestry

Eukaryotic genes of eubacterial ancestry

A

B

0 10,000 20,000 30,000

12

34

Number of genes

Arc

haeb

acte

rial:E

ubac

teria

lge

nes

ratio

1

2

3

4

5

67

8 9 10

1112 13

14

1. B. natans

2. H. andersenii

3. E. intestinalis

4. P. knowlesi

5. S. cerevisiae

6. G. lamblia

7. E. histolytica

8. C. variabilis

9. N. gruberi

10. P. infestans

11. T. cruzi

12. H. sapiens

13. T. thermophila

14. A. thaliana

Fig. P1. (A) Significant similarities betweeneukaryotic (green), archaebacterial (blue)and eubacterial (red) sequences belongingto the same gene family. (B) Relationshipbetweengenome size and the archaebacterial-to-eubacterial gene ratio for 14 eukaryoticgenomes: Bigelowiella natans, Hemiselmisandersenii, Encephalitozoon intestinalis,Plasmodium knowlesi, Saccharomycescerevisiae, Giardia lamblia, Entamoebahistolytica, Chlorella variabilis, Naegleriagruberi, Phytophtora infestans, Trypanosomacruzi, Homo sapiens, Tetrahymenathermophila, and Arabidopsis thaliana. Nodesin the network represent sequences, whereaslinks represent significant similarities. Theconnected component exhibits a remarkableEukaryote-Archaebacteria-Eubacteria-Eukaryote structure, with some eukaryoticgenes exhibiting similarity to eubacterialhomologs and other eukaryotic genesexhibiting similarity to archaebacterialhomologs. This topology is in agreementwith a chimerical origin of Eukaryotes. Noticethat a tree encompassing all the sequencesin the gene family cannot be easilyconstructed, because not all sequences aresignificantly similar to each other at thespecified thresholds.

Author contributions: D.A.-P., P.L., E.B., and J.O.M. designed research; D.A.-P. and P.L.performed research; D.A.-P. and P.L. contributed new reagents/analytic tools; D.A.-P., P.L.,E.B., and J.O.M. analyzed data; and D.A.-P., P.L., E.B., and J.O.M. wrote the paper.

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Freely available online through the PNAS open access option.

Data deposition: The survey sequence data have been deposited in the Dryad database,http://datadryad.org (doi no. 10.5061/dryad.qr81p).1Present address: Smurfit Institute of Genetics, Trinity College, University of Dublin, Dublin2, Ireland.

2Present address: Integrative and Systems Biology Laboratory, Instituto de Biología Mo-lecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas-UniversidadPolitécnica de Valencia, 46022 Valencia, Spain.

3Present address: Center for Communicable Disease Dynamics, Harvard School of PublicHealth, Boston, MA 02115.

4To whom correspondence should be addressed. E-mail: [email protected].

See full research article on page E1594 of www.pnas.org.

Cite this Author Summary as: PNAS 10.1073/pnas.1211371110.

6624–6625 | PNAS | April 23, 2013 | vol. 110 | no. 17 www.pnas.org/cgi/doi/10.1073/pnas.1211371110

Page 2: Gene similarity networks provide tools for understanding ... fileGene similarity networks provide tools for understanding eukaryote origins and evolution ... (Archaebacteria, Eubacteria,

that is connected to archaebacterial genes and another groupthat is connected to eubacterial genes (Fig. P1A). Despite beinghomologous, these two groups do not exhibit significant se-quence similarity to each other at the specified thresholds. Thesenetworks allow for the simultaneous comparison of distanthomologs in Eukaryotes, Archaebacteria, and Eubacteria, a taskthat could not have been achieved using phylogenetic trees, be-cause distant eukaryotic homologs could not have been aligned.Statistical analysis of the network reveals that eukaryotic

genes tend to be highly linked to either archaebacterial oreubacterial genes. Far fewer eukaryotic genes exhibit a similarnumber of archaebacterial and eubacterial homologs. Therefore,eukaryotic genomes contain two groups of clearly distinguishablegenes: one group with strong archaebacterial affinities (whichprobably descended from the archaebacterial ancestor) andanother group with strong eubacterial affinities (of likelyeubacterial ancestry). Proteins with archaebacterial affinitiestend to perform informational functions and to locate to thecytosol and the nucleus, whereas those with eubacterial affinitiestend to be operational and to locate to the mitochondrion, cellwall, vacuole, and peroxisome, thereby indicating that bothendosymbiotic partners contributed different parts and cellularcompartments to the eukaryotic cell.Surprisingly, our analyses demonstrate that, beyond the

primordial genetic chimerism of Eukaryotes, the subsequentevolutionary fate of genes from these distinct prokaryotic sourcesdiffered in each eukaryotic lineage. Remarkably, genes ofeubacterial ancestry appear to be more evolvable (i.e., increasingin number in some lineages and decreasing in others) than thoseof archaebacterial ancestry, whose number remained relativelyconstant in all lineages. As a result, we uncovered two types

of eukaryotic genomes: Of the 14 eukaryotes studied, 9 possessmore genes with eubacterial than archaebacterial affinities (inagreement with previous observations in humans and yeast;refs. 4, 5), whereas 5 present a predominance of genes witharchaebacterial affinities. Remarkably, the archaebacterial-to-eubacterial gene ratio negatively correlates with genomesize, which might be the result of genes of eubacterial ancestrybeing preferentially lost during genome reductions (Fig. P1B).Our results suggest that a functional perspective on evolution

is required, in addition to the traditional genealogical perspec-tive, to understand how genes from different evolutionarysources (e.g., eubacterial lineages, archaebacterial lineages) arestabilized in a third mosaic lineage over evolutionary times(e.g., Eukaryotes). Network-based analyses may increasinglybecome components of the effort to embrace such anexpanded framework. The protocol and type of data thatwe introduce in this study will also play a major role inaddressing many questions regarding deep evolutionary rela-tionships, which continue to pose serious challenges inevolutionary studies.

1. Gribaldo S, Poole AM, Daubin V, Forterre P, Brochier-Armanet C (2010) The origin ofeukaryotes and their relationship with the Archaea: Are we at a phylogenomic impasse?Nat Rev Microbiol 8(10):743–752.

2. Martin W, et al. (2007) The evolution of eukaryotes. Science 316(5824):542–543, authorreply 542–543.

3. Gouy M, Li WH (1989) Phylogenetic analysis based on rRNA sequences supports thearchaebacterial rather than the eocyte tree. Nature 339(6220):145–147.

4. Rivera MC, Jain R, Moore JE, Lake JA (1998) Genomic evidence for two functionallydistinct gene classes. Proc Natl Acad Sci USA 95(11):6239–6244.

5. Alvarez-Ponce D, McInerney JO (2011) The human genome retains relics of itsprokaryotic ancestry: Human genes of archaebacterial and eubacterial origin exhibitremarkable differences. Genome Biol Evol 3:782–790.

Alvarez-Ponce et al. PNAS | April 23, 2013 | vol. 110 | no. 17 | 6625

EVOLU

TION

PNASPL

US