View
234
Download
0
Category
Preview:
Citation preview
1
Aix-Marseille Université
Faculté de Médecine de Marseille
Ecole Doctorale des Sciences de la Vie et de la Santé
THESE DE DOCTORAT
présentée et soutenue le 18 Décembre 2013 par
Mano Joseph MATHEW
En vue de l'obtention du grade de docteur de l'Université Aix-Marseille
Spécialité : Pathologie humaine et Maladies Infectieuses
______________________________________________________________________________
Insight into intracellular bacterial genome repertoire
using comparative genomics
______________________________________________________________________________
Composition du jury :
M. le Professeur Jérôme ETIENNE Rapporteur
M. le Professeur Max MAURIN Rapporteur
M. le Professeur Jean-Louis MEGE Président du Jury
M. le Professeur Didier RAOULT Directeur de Thèse
Unité de Recherche sur les Maladies Infectieuses Tropicales et
Emergentes (URMITE), UM 63 CNRS 7278 IRD 198 INSERM 1095
2
3
To my Lord, precious family and friends…
4
5
Preamble
Le format de présentation de cette thèse correspond à une
recommandation de la spécialité Maladies Infectieuses et Microbiologie,
à l'intérieur du Master de Sciences de la Vie et de la Santé qui dépend de
l'Ecole Doctorale des Sciences de la Vie de Marseille. Le candidat est
amené à respecter des règles qui lui sont imposées et qui comportent un
format de thèse utilisé dans le Nord de l'Europe permettant un meilleur
rangement que les thèses traditionnelles. Par ailleurs, la partie
introduction et bibliographie est remplacée par une revue envoyée dans
un journal an de permettre une évaluation extérieure de la qualité de la
revue et de permettre à l'étudiant de le commencer le plus tôt possible
une bibliographie exhaustive sur le domaine de cette thèse. La thèse est
présentée sur article publié, accepté ou soumis associé d'un bref
commentaire donnant le sens général du travail. Cette forme de
présentation a paru plus en adéquation avec les exigences de la
compétition internationale et permet de se concentrer sur des travaux
qui bénéficieront d'une diffusion internationale.
Professeur Didier RAOULT
6
Abstract
Prokaryotic microorganisms are prevalent in all the
environments on Earth. Given their ecological ubiquity, it is not
surprising to find many prokaryotic species in close relationships
with members of many eukaryotic taxa, often establishing a
persistent association, which is known as symbiosis. Conforming to
the fitness effects on the members of the symbiotic relationship,
associations can be referred to as parasitism, mutualism or
commensalism and, depending on the location of the symbiont with
respect to host cells, as ectosymbiosis or endosymbiosis. Genome
sequencing, especially using Next Generation Sequencing (NGS) has
changed radically the face of microbiology and has helped to discern
how the diverse group of intracellular bacteria evolved to survive
and replicate in host cells. Therefore, the initial purpose of my thesis
is to understand with the help of comparative genomics, genomic
variations based on coexistence, by examining data on the ancient
existence of intracellular bacteria, their host adaptation and the
differences between sympatry and allopatry. The first part of my
thesis is a review giving insight into intracellular bacterial genome
repertoire and symbionts. The goal of this review is to explore how
intracellular microbes acquire their specific lifestyle. Due to their
different evolutionary trajectories, these bacteria have different
genomic compositions. We reviewed data on the ancient existence
of intracellular bacteria, their host adaptation and the differences
between sympatry and allopatry. Furthermore, we elaborate on the
genomic repertoire to understand the phenomenon of gene loss in
intracellular bacteria. To understand the genomic repertoire and its
composition in intracellular bacteria, it is essential to understand
specialization in bacteria with respect to their niches. A comparison
of the genomic contents of bacteria with certain lifestyles revealed
the bacterial capacity to exchange genes to different extents,
7
depending on the ecosystem. Moreover, genomics has provided
important clues to the mechanisms driving the genome-reduction
process, the functions that are retained when a species becomes
intracellular, and the role of the host in molding the genomic
composition of intracellular bacteria. The second part of my thesis
present about the genome sequence of Diplorickettsia massiliensis
strain 20B which is an obligate intracellular, gram negative
bacterium isolated from Ixodes ricinus ticks collected from Slovak. In
the third part, we investigated the genome repertoire of
Diplorickettsia massiliensis compared to closely related bacteria
according to its niche, revealing its allopatric lifestyle. In this study,
we compared the genomic features of Diplorickettsia massiliensis
with twenty-nine sequenced Gammaproteobacteria species
(Legionella strains, Coxiella burnetii strains, Francisella tularensis
strains and Rickettsiella grylli) using multi-genus pangenomic
approach. This thesis work provides original data and sheds light on
intracellular bacterial diversity.
Keywords : Intracellular bacteria, Diplorickettsia massiliensis,
genome repertoire, allopatry, sympatry, pangenome,
gammaprotebacteria
8
Résumé
Les microorganismes sont présents dans presque tous les
habitats de la planète. Compte tenu de leur ubiquité écologique, il
n'est pas surprenant de trouver de nombreuses espèces procaryotes
en relations étroites avec des membres de nombreux taxons
eucaryotes, établissant souvent une association persistante appelée
symbiose. En fonction des interactions entre les partenaires au sein
de cette relation symbiotique, celle ci peut être considérée comme
du parasitisme, du mutualisme ou du commensalisme. Et selon
l'emplacement du symbiote par rapport aux cellules de l'hôte,
comme de l'ectosymbiose ou de l'endosymbiose. Le séquençage des
génomes, en particulier le séquençage à haut débit (NGS), a
YミoヴマYマeミt aマYlioヴY ミotヴe IoマpヴYheミsioミ de lげY┗olutioミ des différents groupes de bactéries intracellulaires et de leur survie au
sein des cellules hôtes. LげoHjeItif de Iette thXse est doミI de comprendre, avec l'aide de la génomique comparative, les variations
génomiques liées à la coexistence, en examinant les données
concernant l'existence ancienne de bactéries intracellulaires, leur
adaptation à leur hôte et les différences entre sympatrie et
allopatrie. La première partie de ma thèse est une revue donnant un
aperçu du répertoire génomique des bactéries intracellulaires et de
leurs symbiotes. L'objectif de cette étude est d'explorer le processus
permettant aux bactéries intracellulaires d'acquérir leur mode de
vie spécifique. En raison de leurs différentes voies évolutives, ces
bactéries ont des compositions génomiques différentes. Nous avons
commencé par examiner les données à propos de l'existence
ancienne de bactéries intracellulaires, leur adaptation à leur hôte et
les différences entre sympatrie et allopatrie. En outre, nous avons
exploré le répertoire génomique de ces bactéries pour comprendre
le phénomène de perte de gènes chez les bactéries intracellulaires.
Pour comprendre le répertoire génomique et sa composition chez
9
bactéries intracellulaires, il est nécessaire de comprendre la
spécialisation de ces bactéries par rapport à leurs niches. Une
comparaison du contenu génomique de plusieurs bactéries avec
différents modes de vie a révélé la capacité des bactéries à échanger
des gènes à des degrés différents, en fonction de l'écosystème.
Dげailleuヴs, la gYミoマiケue a fouヴミi dげiマpoヴtaミts indices sur, les
mécanismes causant le processus de réduction des génomes, les
fonctions qui sont conservés loヴsケuげuミe espèce devient
iミtヴaIellulaiヴe et lげiミflueミIe ケue l'hôte peut a┗oiヴ suヴ la Ioマpositioミ génomique des bactéries intracellulaires. La deuxième partie de ma
thèse porte sur la séquence du génome de la souche Diplorickettsia
massiliensis 20B qui est une bactérie intracellulaire obligatoire à
Gram négatif isolée à partir des tiques de Slovaquie Ixodes ricinus.
Dans ma troisième et dernière partie, nous exploré le répertoire du
génome de Diplorickettsia massiliensis en le comparant aux
génomes de bactéries phylogénétiquement très proches de
Diplorickettsia massiliensis, issues de différentes niches. Ceci a
permis de révélé son mode de vie allopatrique. Dans cette étude,
nous avons comparé les caractéristiques du génome de
Diplorickettsia massiliensis avec vingt-neuf espèces séquencées de
Gammaproteobacteria (Legionella, Coxiella burnetii, Francisella
tularensis et Rickettsiella grylli) en utilisant l'approche
pangénomique multi-genre. Ce travail de thèse fournit des données
oヴigiミales et peヴマet dげappoヴteヴ plus de luマiXヴe suヴ la di┗eヴsitY des bactéries intracellulaires.
Mots clés : Bactéries intracellulaires, Diplorickettsia massiliensis,
répertoire génomique, sympatrie, allopatrie, pangénom,
Gammaproteobacteria
10
11
Contents
Preamble 5
Abstract 6
Résumé 8
Contents 11
1 Chapter One : Introduction 13
2 Chapter Two: Review 17
2.1 Review:
Genome repertoire of intracellular bacteria and symbionts
3 Chapter Three: Genome sequencing of intracellular bacteria 63
3.1 Article 1:
Genome Sequence of Diplorickettsia massiliensis,
an Emerging Ixodes ricinus-Associated Human Pathogen
4 Chapter Four: Comparative genomics 73
4.1 Article 2:
The genomic repertoire of Diplorickettsia massiliensis
reveals its allopatric lifestyle
5 Chapter Five: Conclusions 119
5.1 Conclusions and perspectives
5.2 Future perspective
Bibliography 125
Acknowledgements 143
12
13
Chapter 1
Introduction
The following section introduces the reader about the studies on
intracellular bacteria and their interactions between intracellular bacteria
and different niches. In the past, microbiologists were mainly restricted to
the study of microorganisms that could be isolated and grown on
relatively simple media. This often made it almost impossible to study
species that cannot survive outside their hosts, and severely limited our
knowledge of the genetics of these organisms. Advances in culture
techniques and genome sequencing now allow these organisms to be
studied, and the results of these endeavours have enlightened us on their
complete genetic code and provided powerful insights into their exquisite
relationships with their hosts. Three microbal categories have been
defined based on their niches: free-living, facultative intracellular and
obligate intracellular bacteria.
The genomes of intracellular bacteria are extremely varied.
Examples of facultative intracellular bacteria, which can multiply inside
vacuoles, include Legionella pneumophila spp., Francisella tularensis spp.
and Mycobacterium tuberculosis spp., and the obligate intracellular
bacteria include Chlamydia spp., whereas Listeria monocytogenes,
Shigella flexneri, enteroinvasive Escherichia coli and some Rickettsia spp.
are able to enter and replicate in the cytosol of mammalian cells (Zientz,
et al., 2004). Intracellular bacteria need factors to distinguish, intrude and
14
replicate within the host cells when their intracellular phase is transient.
The intracellular location may facilitate the understanding of host
metabolites, which support bacterial multiplication in a relatively safe
host compartment devoid of potent host defense mechanisms. Moreover,
the intracellular compartment may allow the diffusion of bacteria within
the host and, after evading the host cells, the bacteria may be released
into the environment or directly transmitted to another host organism
(Finlay & Falkow, 1997, Gross, et al., 2003, Zientz, et al., 2004). Genome
sequencing, especially using Next Generation Sequencing (NGS) has
changed radically the face of microbiology and has helped to discern how
the diverse group of intracellular bacteria evolved to survive and replicate
in host cells. In the first part of my thesis, we reviewed literature to
summarize the knowledge on the ancient existence of intracellular
bacteria, their host adaptation and the differences between sympatry and
allopatry. Moreover, genomics has provided important clues to the
mechanisms driving the genome-reduction process, the functions that are
retained when a species becomes intracellular, and the role of the host in
molding the genomic composition of intracellular bacteria (Chapter2).
Subsequently my thesis work proceeds from the observation that,
despite the recent advent of sequencing techniques, little is still known
about the interactions between intracellular bacteria and various niches.
In the second part of my thesis, we report our work on the genome
completion and sequencing of Diplorickettsia massiliensis strain 20B
which is an obligate intracellular, gram negative bacterium isolated from
15
Ixodes ricinus ticks collected from Slovak. D. massiliensis belongs to the
Gammaproteobacteria class, is non-endospore-forming, and is shaped as
small rods that are usually grouped in pairs. An initial phylogenetic
analysis based on 16S rRNA showed that Diplorickettsia massiliensis
clustered with Rickettsiella grylli. Because of its low 16S rDNA similarity
(94%) with R. grylli, it was classified as a new genus Diplorickettsia into
the family Coxiellaceae and the order Legionellales. D. massiliensis strain
20B was identified in three patients with suspected tick-borne infections
that exhibited a specific seroconversion. The evidence of infection was
further reconfirmed by using PCR-assay, thus establishing its role as a
human pathogen. Therefore, we were interested to understand the
genome repertoire of Diplorickettsia massiliensis.
Furthermore, we investigated the genome repertoire of
Diplorickettsia massiliensis compared to closely related bacteria according
to niche, revealed its allopatric lifestyle. In this study, we compared the
genomic features of Diplorickettsia massiliensis with twenty-nine
sequenced Gammaproteobacteria species (Legionella strains, Coxiella
burnetii strains, Francisella tularensis strains and Rickettsiella grylli) using
multi-genus pangenomic approach and sheds light on intracellular
bacterial diversity.
16
17
Chapter 2
Review: Genome repertoire of intracellular
bacteria and symbionts
18
19
2.1 Review:
Genome repertoire of intracellular bacteria and symbionts
Mano J. Mathew 1 and Didier Raoult1*
1 Unité de Recherche sur les Maladies Infectieuses et Tropicales
Emergentes: URMITE, Aix Marseille Université, UMR CNRS 7278,
IRD 198, INSERM 109, Faculté de Médecine, 27 Bd Jean Moulin,
13005, Marseille, France.
Submitted to FEMS Microbiology Review
*Corresponding author. E-mail: didier.raoult@gmail.com
Keywords: Genome repertoire, intracellular, host-microbe, facultative,
obligate, genome reduction, virulence, secretion system
20
Abstract
The recent explosion in knowledge of the diverse group of
intracellular bacteria has helped to discern how these microbes
evolved to survive and replicate in host cells. This review highlights
the genomic repertoire of intracellular bacteria and symbionts by
examining data on the ancient existence of intracellular bacteria,
their host adaptation and the differences between sympatry and
allopatry. Moreover, genomics has provided important clues to the
mechanisms driving the genome-reduction process, the functions
that are retained when a species becomes intracellular, and the role
of the host in molding the genomic composition of intracellular
bacteria are highlighted. This wealth of information will contribute to
a better understanding of the interactions between intracellular
bacteria and various niches.
21
Contents
Introduction
Intracellular bacteria: an ancient outlook
Sympatric and allopatric lifestyles
Genomic repertoire
– Bias in base compositions
– Metabolic variations
– Ribosomal split operons
– Other observations
Loss of non-virulent genes in intracellular bacteria
Gene duplication facilitating adaptation in intracellular bacteria
Mobilome of intracellular bacteria
– General distribution of the mobilome in intracellular bacteria
– Types of mobile genetic elements
– Transposable elements
– Repeated palindromic elements (RPEs)
– Ankyrin and tetratricopeptide repeat proteins
Secretion system machinery in intracellular bacteria
Concluding remarks
Acknowledgements
References
22
Introduction
Understanding the genome repertoire of intracellular bacteria and
symbionts cannot be considered without first grappling with the
uncertainties and ambiguities in the meanings of the terms repertoire,
intracellular bacteria and symbionts. Additionally, none of these terms
lends itself to a straightforward explanation. This confusion must be
addヴessed Hefoヴe del┗iミg iミto this Ioマple┝ suHjeIt. The teヴマ けgeミoマe
ヴepeヴtoiヴeげ Ioミミotes the eミtiヴe geミoマiI Ioマpositioミ of aミ oヴgaミism. The
goal of this review is to explore the microbes that reside inside cells and
how they came to acquire this specific lifestyle. Three microbe categories
have been defined based on their niches: free-living, facultative
intracellular and obligate intracellular bacteria. The genomes of
intracellular bacteria are extremely varied. Examples of facultative
intracellular bacteria, which can multiply inside vacuoles, include
Legionella pneumophila spp., Francisella tularensis spp. and
Mycobacterium tuberculosis spp., and the obligate intracellular bacteria
include Chlamydia spp., whereas Listeria monocytogenes, Shigella
flexneri, enteroinvasive Escherichia coli and some Rickettsia spp. are able
to enter and replicate in the cytosol of mammalian cells [1]. Intracellular
bacteria need factors to distinguish, intrude and replicate within the host
cells when their intracellular phase is transient. The intracellular location
may facilitate the understanding of host metabolites, which support
bacterial multiplication in a relatively safe host compartment devoid of
potent host defense mechanisms. Moreover, the intracellular
23
compartment may allow the diffusion of bacteria within the host and,
after evading the host cells, the bacteria may be released into the
environment or directly transmitted to another host organism [1-3].
These intracellular bacteria possess certain mechanisms used to
protect or invade host cells. Legionella pneumophila induces its own
uptake and blocks lysosomal fusion; otherwise, lysosomes would degrade
the bacteria [4]. It also uses a Type IV secretion system known as Dot/Icm
to inject effector proteins into the host cell required for bacterium
sustainability [5], meanwhile, Salmonella and Mycobacterium spp.
are very resistant to intracellular killing by phagocytic cells [6, 7].
A comprehensive list of intracellular bacteria is shown in Table 1.
Obligate intracellular bacteria cannot multiply outside host cells, as
they lack many biosynthetic pathways; hence, they are dependent on host
cells. These cells are also known as obligate endosymbionts, which
multiply exclusively inside the cells of many eukaryotic organisms and
usually have no extracellular state. Compared with their free-living
relatives, obligate intracellular bacteria exhibit a set of features shared by
intracellular parasites and endosymbionts. They tend to have small
population sizes, and their genomes are usually small and show marked
AT nucleotide biases, increased rates of nucleotide substitution, random
accumulation of deleterious mutations, accelerated sequence evolution
and the loss of genes that are involved in recombination and repair
pathways [8].
24
The word symbiont, originating from the Greek simbios, or living
together, was first introduced by Anton de Bary in 1879 and was defined
as さthe peヴマaミeミt assoIiatioミ Het┘eeミ t┘o oヴ マoヴe oヴgaミisマs of
diffeヴeミt speIies, at least duヴiミg a paヴt of the life I┞Ileざ (Gil, et al., 2004).
Considering the ecological ubiquity of bacteria, it is not surprising to find
many species in close relationships with members of several eukaryotic
taxa. Depending on the fitness effects on the members of the symbiotic
relationship, the relationship can be referred to as parasitism, mutualism
or commensalism. Based on the location of the symbiont in relation to the
host cells, these relationships may be ectosymbiotic or endosymbiotic.
Rickettsia are frequently identified in close relationships with arthropod
vectors that may assist in the transmission of the organism to mammalian
hosts [9]. Between 15 and 20% of the known insects have symbiotic
relationships with bacteria, making them the most species-rich group. The
nutritional enrichment that bacteria offer to insects could be an
interesting factor in the evolutionary success of this group [10, 11].
The recent explosion in knowledge of bacterial pathogenesis has
assisted efforts to discern why certain intracellular bacteria have evolved
to survive and replicate in host cells as part of their pathogenic
mechanisms. Recent developments in genomics have introduced concepts
such as bacterial genome expansion and reduction, which have provided
insight into bacterial genome evolution [12]. The comparison of free-living
and intracellular bacteria has revealed dramatic differences in genome
size and content. In this paper, we review the genomic repertoire of
25
intracellular bacteria and symbionts. We begin by reviewing data on the
ancient existence of intracellular bacteria, their host adaptation and the
differences between sympatry and allopatry. Furthermore, we elaborate
on the genomic repertoire to understand the phenomenon of gene loss in
intracellular bacteria.
Intracellular bacteria: an ancient outlook
Ecologists and biologists are fascinated with the enormous diversity of
bacteria that complete their life cycles within, or closely associated with,
eukaryotic cells. Symbiosis between unicellular and multicellular
organisms has contributed considerably to the evolution of life on Earth
[13]. These interactions include a broad range of effects on hosts, from
invasive pathogenesis to obligate relationships in which the hosts depend
on infection for survival or reproduction. Bacterial associations can be
difficult to categorize, and many bacteria can be unambiguously labeled
as mutualists (bacteria that assist in the fitness of the host) or as parasites
(bacteria that decrease the fitness of the host).
Intracellular bacteria are found in a wide range of niches. Due to
their different evolutionary trajectories, these bacteria have different
genomic compositions. These intracellular bacteria, unfortunately, have
no fossil record to assist scientists in determining when they acquired the
ability to survive inside other organisms. Based on an endosymbiotic
origin for mitochondria and other eukaryotic organelles [14, 15], we
predict that the intracellular culture is ancient and predated the
26
emergence of eukaryotic organisms. Intracellular bacteria exhibit three
important properties: a) size differences compared to non-intracellular
bacteria, b) a mechanism for insertion into the host and c) survival within
the host. These initial interactions could have resulted in the survival of
symbiotic microbes. The situation in which the survival of the host occurs
at the expense of the microbe is termed predation, the situation in which
the host is harmed is called intracellular pathogenesis and the situation in
which the microbes are damaged is called incompatibility or antagonism.
Each situation is subject to selection that allows the emergence of varied
types of bacteria-host relationships. In the case of insects, the arthropod
lineage arose 385 million years (MY) ago and swiftly diversified [8]. The
early establishment of symbiotic relationships among insects and bacteria
approximately 300 MY ago and the nutritional advantage that these
bacteria offered to insects could have been key factors in the evolutionary
success of this group [8]. Mealy bug beta-proteobacterial endosymbiosis
was the first stable intracellular symbiotic association identified involving
two species of bacteria [16]. Facultative or obligate intracellular bacteria
can be identified throughout the tree of life from eukaryotic
microorganism protists to multicellular plants and animals [17]. The
Rickettsiales order, which belongs to Alphaproteobacteria, comprises
obligate intracellular bacteria that are closely related to mitochondrial
origin, having diverged approximately 850–1500 MY ago [9]. Rickettsiales
species have well-known close relationships with varied eukaryotic hosts,
as shown by the manipulation of cellular process such as host
reproduction [18, 19]. These relationships have led to a massive
27
integration of bacterial genome fragments into host cells [20, 21]. Studies
on Rickettsiales have thus improved the knowledge of intracellular
bacteria contemporaneous with mitochondrial origin, as parts of a
Rickettsiales genome were found integrated into the nucleus of one
eukaryotic host, and another genome fragment was found to be
integrated into the mitochondrial genome of another host [22]. In another
striking example of lateral genetic transfer, nearly the entire Wolbachia
genome was found to be integrated into the genome of its host [20, 23]. A
recent study on mitochondrial protein-based phylogeny suggested that
Rickettsiales and Rhizobiales may have diverged 1.5 billion years ago (BYA)
[24, 25]. Their fusion likely created the first mitochondrion approximately
1 BYA [24]. Additionally, the origin of mitochondrial genes is not limited to
the Rickettsiales, and the transfer of these genes did not happen in a
single event but rather through numerous successive events [24, 25].
These studies clearly establish that the intracellular bacterial lifestyle is
ancient and constantly co-evolving with the host [26]. Before we
understand the genomic repertoire and its composition in intracellular
bacteria, it is essential to understand specialization in bacteria with
respect to their niches.
Sympatry and Allopatry lifestyle
A comparison of the genomic contents of bacteria with certain lifestyles
revealed the bacterial capacity to exchange genes to different extents,
depending on the ecosystem [27]. Allopatric speciation in bacteria is
associated with restricted opportunities to exchange genes with other
28
organisms, although gene duplication, mutation and deletion are more
frequently observed. A prominent example is the association of Rickettsia
prowazekii with the human louse Pediculus humanus [9]. Allopatry is
generally associated with genome reduction, especially in pathogens that
have a small genomic repertoire compared to less specialized bacteria. In
sympatry, multiple bacteria infect the same host and thus undergo
massive genetic exchange [28]. Some authors [29, 30] have identified
which bacteria participate in each intracellular lifestyle. For example, the
strictly intracellular bacteria that live in narrow niches are allopatric, and
intracellular bacteria that live in amoebas are sympatric, as in the case of
Legionella sp., where an amoeba definitely constitute the place for DNA
exchange [29, 30]. Intracellular bacteria that have sympatric relationships
within amoebas exhibit larger genomes than their relatives [30]. The
bacteria that live in a sympatric manner interact with many other bacteria
belonging to divergent phyla, allowing them to share genes at an
increased rate. The sympatric lifestyle is associated with larger genomes,
larger pan-genomes, a larger mobilome and genetic exchanges with other
bacteria. These bacteria often have more genes, ribosomal operons,
better metabolic capacities and significant resistance to antibiotics [31].
Gene recombination is found in sympatric organisms, resulting in genetic
diversity [32]. In Rickettsia felis, using a single gene phylogenetic
approach, researchers found that some genes could be linked to those of
other bacteria, namely Rickettsia bellii, Rickettsia typhi, Legionella sp. and
Francisella sp. [32]. The different sizes and functions of the genes
suggested random horizontal gene transfer in R. felis [32]. Bacteria in
29
sympatric environments have conserved genomes with phenotypic
plasticity and exhibit species complexity. Species complexity may have
promoted varied genomic repertoires that produced environmentally
adaptable alternative phenotypes [33]. However, like several obligate
pathogens, many of these obligate intracellular endosymbionts have an
extraordinary genome repertoire, an extremely reduced genome size and
correspondingly less coding capacity [34]. Hence, it is likely that the
mutual relationships of these bacteria with their host cells may have
promoted genome reduction. Thus, it is important to understand the
dynamics of the processes whereby new genes are acquired and old genes
are removed.
Genomic repertoire: an insight
Complete genome sequences are available for many bacteriome-
associated symbionts with shared features. The genome size, number of
genes and G + C content of intracellular bacteria, which has become
reduced during the specialization to an intracellular niche, reflect a
continual selective pressure for a minimal genome [35]. The reason for
this reduction could be that an intracellular niche reduces the possibility
for gene acquisition by lateral gene transfer (LGT) [31, 36-38]. Genes may
also be lost upon adaptation to the niche [39, 40]. In free-living bacteria,
the G + C base composition is close to 50%; in obligate intracellular
bacteria, it ranges from 16.5–33%. The genome sizes of these bacteria
vary depending on the host adaptation stage [41].
30
Bias in base composition
The most extreme bias in base composition is the uninterrupted shift
towards an increased A + T content. This content is highest at sites that
are neutral or near neutral with respect to selection, such as silent
positions in codons and intergenic spacers. A + T content is favored by
mutational bias and is also commonly found in obligate pathogenic
bacteria such as Rickettsiales and Chlamydiales. The bias has an important
effect on the amino acid composition of proteins, but in the Buchnera
genome, the silent sites and spacer base compositions have less than 10%
G + C content, while the overall genome composition has 25–30% G + C
content [42-45]. In general, the mutational bias reflects the loss of DNA
repair pathways. Support this trend, many repair genes are retained in
Baumannia cicadellinicola, which has 33% G + C content, whereas no
repair genes are retained in Carsonella ruddii and Sulcia muelleri, which
have 16.5% and 22% G + C content, respectively [46].
Metabolic variations
Compared to free-living bacteria, host-dependent bacteria exhibit fewer
transcriptional regulators, as determined from a statistical comparative
analysis of 317 bacterial genomes from bacteria with different lifestyles
[47]. Genes involved in translation modification and transcription are
often among the lost genes [47]. In bacteriocytes such as Carsonella ruddii
and Sulcia muelleri [44, 48], genes involved in important processes such as
translation, replication and transcription are depleted, along with genes
31
required for the production of cell envelope components [36, 49, 50]. The
suggestion that host functions can replace those of the original bacterial
cell envelope can be demonstrated by the enclosure of symbionts in a
host-derived membrane within the bacteriocytes (Buchnera aphidicola,
Sulcia muelleri and Carsonella ruddii); these symbionts lose a greater
proportion of genes involved in the production of the cellular envelope
than those of the symbionts that are free in the cytosol (Wigglesworthia
glossinidia, Candidatus Blochmannia). Bacterial symbionts that live in
harmony within host mitochondria or host nuclei [51, 52], and mutualistic
bacterial symbionts dwelling within different types of bacterial symbionts
in the host cytoplasm are examples of rare close associations [16, 53]. Put
differently, the transition from free living to intracellular culture is
facilitated with the loss of large segments of DNA [8, 54]. Rickettsia spp.
have lost many genes needed for metabolic pathways, including those for
sugar, purine and amino acid metabolism [55]. Similarly, the loss of DNA
for host adaptation was observed in Candidatus Candidatus Blochmannia,
which is an obligate endosymbiont of ants [56]. Conversely, gene
acquisition can be observed in the eukaryote L. pneumophila, which is
closely associated with amoebae [57]. Genome reduction is also
associated with increased pathogenicity, as seen for Rickettsia conorii and
Rickettsia prowazekii [18, 47, 58, 59].
32
Ribosomal split operons
In a recent study on intracellular bacteria, several abnormal or split
ribosomal operons were identified. This abnormal feature occurred
independently in several groups of specialized bacteria [60]. Split
ribosomal operons are found in Rickettsiales, Helicobacter pylori and
Leptospira species, the group containing Mycoplasma and Buchnera and
recently, in Bartonella birtlesii. In the study on B. birtlesii, the authors
found that disrupted genes belonged to the translation COG and
ribosomal operon. The number of activated genes in a restricted
environment is much lower than that in a changing environment, as the
translation genes are not used extensively. If the bacteria do not use many
ribosomal operons in their current environment, they often lose them,
and restricting translation is critical for specialization, as speciation is
often correlated with ribosomal operon inactivation [47, 60]. In another
comparative genomic analysis of free-living and host-dependent bacteria,
the host-dependent bacteria exhibited fewer rRNA genes, more split rRNA
operons and fewer transcriptional regulators, characteristics that are
linked to slow growth rates [47]. The identification of function-dependent
and non-random loss of 100 orthologous genes in the analyzed
intracellular bacteria revealed that these bacteria from different phyla
underwent convergent evolution by specialization according to their niche
[47]. The ribosomal RNA (rRNA) genes are classically organized in operons
with the general structure 16S-23S-5S; transfer RNA (tRNA) genes are
typically found in the spacer between the 16S and the 23S rRNA genes
[47]. Intracellular bacteria have fewer copies of each rRNA gene than free-
33
living bacteria and significantly lower copy numbers of typical rRNA
operons. In obligate intracellular bacteria such as Rickettsia sp., split rRNA
operons are important evolutionary factors [61]. The co-adaptation of
host genes and the modification of ancestral bacterial genes create the
base for symbiosis [62]. A minimal genome size is typically observed in
sequenced symbiont genomes.
Other observations
Adenine-specific DNA methylase is an enzyme that methylates specific
DNA targets, namely GANTC for alphaproteobacteria, resulting in a
reduction of the thermodynamic stability of the DNA. This alteration
changes transcriptional regulation, which is important in host-pathogen
interactions and is missing in specialized bacteria [60]. Another distinctive
attribute of obligate symbionts is the elevated expression of heat shock
proteins, which is linked to lower thermal stability [60]. In Buchnera and
other obligate intracellular symbionts, the expression of GroEL, a protein
associated with chaperonin, is elevated [63, 64]. Based on microarray and
quantitative RT-PCR studies on available genome sequences, in the
absence of stress, other heat shock proteins also show unusually elevated
expression in these bacteria [65, 66]. It is likely that a compensatory
adaptation balances the effects of mutations genome-wide with lower
protein stability [43, 45, 67].
34
Loss of non-virulent genes in intracellular bacteria
After understanding the various elements involved in gene loss, it is
important to understand how gene loss must have occurred in
intracellular bacteria. Two crucial mechanisms of evolution, Lamarckian
and Darwinian, have been commonly studied [68]. The central Lamarckian
concept is that phenotypic changes result from adaptation to a niche and
can be transmitted vertically [69]. In contrast, in the present vision of
evolutionary biology and in agreement with post-Darwinian experiments,
genetic modifications produce phenotypic changes and precede the
selection of the fittest individuals in a given niche. In this situation,
genotypic changes precede phenotypic changes. Lamarckian evolution
may have been involved in bacterial speciation events associated with a
reduction in the genome size [47], a finding that contradicts the dominant
model in which speciation and fitness gains are linked to an increase in
the gene repertoire. Thus, the main course of speciation (through
adaptation to a given environment) is usually through allopatry [70] and is
related to genome size reduction through the loss of useless genes—
aIIoヴdiミg to the LaマaヴIkiaミ マodel desIヴiHed H┞ Moヴaミ, さuse it oヴ lose itざ
[40]. In several intracellular pathogens, namely Shigella, Salmonella and
Francisella tularensis, when certain genes were inactivated or deleted, the
bacteria became pathogenic. These genes are called antivirulence genes
[71]. Gene loss is seminal to specialization. As an excellent example, 100
orthologous genes were lost in all specialized bacteria, as determined by a
comparative analysis of 317 bacterial genomes from different niches [47].
The most notable genes were associated with ribosomal operons,
35
translation regulation and metabolism [47]. In the study on B. birtlesii, the
identification of a deletion in one of the two rRNA operons and
disruptions in genes that are associated with translation showed the
importance of translation for specialization in a specific niche [60]. Other
interesting features of intracellular bacteria include gene duplication,
which facilitates adaptation to different environments; the mobilome,
which transports virulent genes (repeat elements that cause instability
and lead to evolution); and a secretion system, which assists in bacterial
colonization, invasion and survival within the niche.
Gene duplication facilitating adaptation in intracellular
bacteria
Gene duplication facilitates the adaptation of bacteria to changing
environments and new niches [72]. The high number of duplicated genes
in small intracellular bacterial genomes, including those of Rickettsia
species, constitutes an intriguing phenomenon. After gene duplication,
the copies undergo one of three possible processes: they may retain the
same function and produce an increased amount of the gene product;
they may accumulate deleterious mutations and become non-functional;
or under positive selection, they may acquire divergent mutations and
eventually evolve new functions and confer a selective advantage in a
new niche [73-75]. For example, the Rickettsia prowazekii and Rickettsia
conorii genomes both contain two copies of the virB4 gene that are
distantly related to each other and have evolved under different
36
functional constraints [18]. These copies show differences in non-
synonymous substitution frequencies, indicating different functions and
counter-selective constraints within the same genome [76]. In a
sequenced Rickettsia spp., SpoT paralogs (4–14 copies) were found to
have functions that control the concentration of alarmone [(p)ppGpp,
guanosine tetra-and pentaphosphates] in response to starvation in
Escherichia coli, as was the relA gene. Alarmone acts as an effector of
transcription, creating changes in cellular metabolism and (p)ppGpp-
mediated regulation, which may be involved in pathogenesis and bacterial
symbiosis [77]. All 14 spoT genes were transcribed in Rickettsia felis [78]
whereas, interestingly, the five spoT genes present in R. conorii were
differentially regulated depending on the niche. Gene families such as TLc,
ProP, AmpG and Sca have been identified in Rickettsia spp., in which
multiple copies of TLc, which exchanges ADP for host cytoplasmic ATP,
may be important for efficient host cell adaptation [78]. The multiple
copies of the proline/betaine transporter ProP seem to play an important
role in the adaptation of Rickettsia spp. to osmotic stress and to host
temperature conditions. AmpG may confer natural resistaミIe to β-lactam
antibiotics, and Sca proteins function in host-parasite interactions and
adaptive responses to host defense systems [59]. A genome analysis of
Rickettsia spp. disclosed 17 members of the Sca family that showed
diverse patterns of expression across various species and whose N-
terminal domains were highly variable, which may have facilitated
immune evasion and persistent growth [78, 79].
37
Mobilome of intracellular bacteria
In recent years, much data on the distribution of mobile genetic elements
in bacterial genomes has become available [38, 80]. The genomic science
so far indicates that most bacterial genomes have viral origins, and in
some cases these elements make up to 20% of the host genome [81].
These mobile DNA elements, such as prophages, contribute more than
50% of the strain-specific DNA in many important pathogens [82-84] and
are common transporters of virulent genes in bacteria [85-87]. They
constitute the mobilome and include transposable elements, plasmids,
bacteriophages and associated genes for which horizontal movement is
critical [88, 89]. For this reason, understanding the mobilome of
intracellular bacterial genomes is necessary.
General distribution of mobilome in intracellular bacteria
Few mobile genetic elements are observed in free-living organisms
with larger genome sizes of 4–10 Mb. Facultative intracellular bacteria are
not restricted by host replication and are capable of living and
reproducing either inside or outside of host cells, as is the case for some
pathogenic bacteria. Their genome sizes of 2–7 Mb are similar to those of
some free-living organisms, and they have intermediate population sizes
[90]. The number of mobile genetic elements found in obligate
intracellular and facultative intracellular bacteria show similar ranges, but
facultative intracellular species contain four-fold more mobile DNA
elements than obligate intracellular bacteria. This observation is
consistent with predictions that these elements are similar to those of
38
free-living obligate species [90]. Wolbachia pipientis is an exception; its
mobile genetic elements comprise less than 2% of its genome. This
estimate is similar to the lower end of the range of facultative intracellular
bacterial species [91]. Reductive evolution is supported by the small
genome size and deletion biases [92]. The Rickettsiales order shows
reductive evolution and also contains various families of mobile elements,
such as plasmids, transposases, and phage-related genes [32, 61]
Types of mobile genetic elements
There are three main classes of mobile genetic elements that occur in
prokaryotes: a) The first are small pieces of extrachromosomal DNA that
are either linear or circular and mostly replicate independently in the
host. These elements are called plasmids and are subject to evolution.
Lateral transfer from a donor to a recipient bacterial cell by direct contact
between the cells occurs via conjugative plasmids. b) Phage elements, as
the name suggests, are derived from phages, which are viruses of bacteria
that use the host machinery to replicate by a process in which the DNA of
the phage enters the host cell and integrates into the bacterial genome as
a prophage. These integrated prophage DNA molecules are passively
inherited until DNA excision and phage-induced lysis of the bacterial cell
takes place [93]. c) Transposable elements are short inverted repeats that
typically encode for proteins that help move genes and, in a few cases, are
embedded in the prophage regions [94]. Genome analysis of Rickettsiae
revealed a large fraction of mobile DNA that helps the movement of DNA
within and between genomes [18]. Plasmids are considered conjugative
39
plasmids when they are dispersed by conjugation from cell to cell if they
can spread autonomously. Recent genomic data and phylogenetic
analyses have established the presence of conjugative plasmids and
suggested the existence of LGT events in the Rickettsia genus [95].
Transposable elements
In Orientia tsutsugamushi, transposable elements constitute the largest
portion of mobile DNA. A similar amplification of transposable elements
was noted in other intracellular bacteria such as Wolbachia pipientis wMel
[seven types of IS elements (51 copies in total) and four types of GII
introns (17 copies)] [91], Parachlamydia sp. UWE25 [82 IS transposases
(TPases)] [96], R. felis (82 TPase) [79], and R. bellii (39 TPases) [97]. In O.
tsutsugamushi, the transposable element copy number is 10 times higher
than that of obligate intracellular bacteria. Shigella dysenteriae contains
the highest number of insertion sequence (IS) elements among the
prokaryotes (701 copies in a 4469 kb chromosome and a 183 kb plasmid)
[98]. The number of prophage genes per genome is intermediate to those
of plasmids and transposable elements, while the proportion of plasmid
genes is notably small [90]. These intracellular bacterial genomes are
dominated by transposable elements, which can integrate into a genome
that already has a copy of the same transposable element and generally
do not require a specific site for insertion [90].
In contrast, phages are site-specific and confer immunity to multiple
infections. They also serve as vectors that carry other mobile elements,
such as transposable elements, into a host genome [90]. There is a striking
40
difference between the quantity of transposable elements and prophage-
related genes found in intracellular prokaryotes, as prophage genomes
comprise tens of genes, whereas a transposable element carries a single
gene (encoding a transposase or reverse transcriptase/maturase) [99].
Repeated palindromic elements (RPEs)
Repeated elements are usually confined to the intergenic regions of
bacterial genomes [100]. For some of these RPEs, the variable number of
tandem repeats represents inter-individual length variability and has been
used for genotyping [100, 101]. RPEs are well studied in Rickettsia spp.
They are approximately 100–150 bp and invade both coding and
noncoding regions of the genome [102-104]. With the ability to insert
themselves within the existing protein coding frame, these RPEs often
generate new reading frames within a preexisting gene, creating an
additional peptide segment of 30–50 amino acids in the final gene
product. Repetitive DNA might be inserted with the help of plasmids.
Repeats are important, as they have roles in genomic instability and
evolution. The bacterial chromosomes that contain elevated repeat
density also show significant rates of rearrangements, leading to an
accelerated loss of gene order [105]. Transposons and other extragenic
interspersed repeats may function in gene rearrangement and duplication
[106, 107].
41
Ankyrin and tetratricopeptide repeat proteins
Ankyrin (Ank) and tetratricopeptide (TPR) repeat proteins have been
found in several intracellular bacteria and have roles in host-pathogen
interactions. Nearly 4% of the Rickettsia belli and R. felis genomes
consisted of Ank and TPR proteins [108]. These proteins participate in
various functions, including chaperone activity, cell cycle regulation,
transcription, gene regulation, signal transduction and protein transport
[109-113]. TRPs establish infection and manipulate host cell trafficking
events in L. pneumophila [57, 114, 115], whereas Ank proteins found in
Anaplasma spp., Wolbachia spp. and Ehrlichia spp. are translocated into
the host cell cytoplasm and nucleus, playing dual roles in interfering with
host cell signaling by interacting with the host cytoskeleton and in altering
gene transcription by binding to host chromatin [115]. The deletion or
mutation of genes encoding for Ank proteins reduced the virulence of
Rickettsia peacockii and Rickettsia rickettsii strain Iowa compared to the R.
rickettsii strain Sheila Smith [114].
Secretion systems machinery in Intracellular bacteria
The interactions between intracellular bacteria and the host cells are
enabled using Type IV secretion systems (T4SSs). These systems are
required for bacterial colonization, invasion and persistence within the
niche and consist of supra-molecular transporters ancestrally related to
bacterial conjugation systems. They are complex proteins embedded in
the bacterial cell envelope, and one type has been well studied in
42
Rickettsia [79, 97], Bartonella [116], Wolbachia [117], L. pneumophila [5],
N. sennetsu [118], N. risticii [118] and O. tsutsugamushi [119]. The T4SSs
are not only able to transport diverse macromolecule substrates, proteins
and virulence factors but are also able to transfer DNA through bacterial
conjugation [30, 120-123]. Genes that encode T4SS (VirB/VirD4 and Trw)
components have been found in several species of Bartonella [116]. In
Bartonella rattaustraliani, pNH4 encodes a T4SS containing a complete set
of proteins responsible for conjugal transfer, i.e., TraA, TraC, TraD and
TraG/VirD4 [116]. These systems are described as essential pathogenicity
factors in several mammalian pathogens, including Bartonella henselae
and Bartonella tribocorum [116]. The main role for T4SSs is to translocate
virulence factors to hosts and to promote DNA transfer [121]. The protein
encoded by traA initiates DNA transfer for bacterial systems by relaxing
DNA at a site-and strand-specific nick [124], while TraC is necessary for
the assembly of F pilin into the mature F pilus structure [125]. The
coupling protein traD is essential for transferring DNA by connecting the
DNA processing machinery to the Mpf transfer apparatus [126] a and TraG
is critical for the translocation of substrates through the inner cell
membrane [127]. T4SSs in the Bartonella genus are typically located on
chromosomes, and only Bartonella grahamii has a T4SS on its plasmid
pBGR3 [128]. In L. pneumophila, Dot/Icm T4SS facilitates the inhibition of
phagosome-lysosome fusion and the recruitment to the rough
endoplasmic reticulum to support replication in the host cell. The
components of the dot/icm loci are classified as T4SSs due to homology
with genes. In Legionella, the T4SS is encoded by 26 dot/icm genes
43
arranged in two distinct regions of the chromosome, each approximately
20 kb in length. Region I contains dotDCB and dotA-icmVWX [129]. Region
II contains 18 genes, most of which are dot and icm genes [130]. The
dot/icm loci of the five L. pneumophila strains discussed above exhibit
very high nucleotide conservation, ranging from 98 to 100% among most
orthologs. The exceptions are dotA and icmX; additionally, the icmC gene
of the Corby strain is shorter than and more divergent from (84%
nucleotide identity) that of the Paris strain. Sequence comparisons of the
dot/icm genes to other known open reading frames revealed that at least
18 of the dot/icm genes show similarity to components of the bacterial
conjugative DNA transfer systems, particularly the IncI plasmids ColIB-P9
from Shigella flexneri and R64 from Salmonella enterica [130].
The bacterial genomic information suggests that T4SSs are not
limited to Legionella and related bacteria and IncI plasmids [131].
Interestingly, nearly all the T4SSs found in sequence analyses are encoded
on plasmids [132]. Notable exceptions include the Legionella, Coxiella and
Rickettsiella Dot/Icm systems. It is likely that a common ancestor of these
closely related bacteria acquired a chromosomally encoded T4SS that
played a critical role in its survival. The chromosomal acquisition of the
T4SS might be related to the adaptation of the ancestor bacterium to an
intracellular lifestyle. The genes encoding T4SSs tend to accumulate in
several conserved gene clusters; it appears that there is little pressure to
keep them at a single locus. The conserved gene clusters include (a) dotD-
dotC-dotB (traH-traI-traJ in I-type conjugation systems), (b) dotM/icmP-
dotL/icmO (trbA-trbC), and (c) dotI/icmL-dotH/icmK-dotG/icmE (traM-
44
traN-traO). Together with the other genes found in all T4SSs, including
dotA (traY) and dotO/icmB (traU), these conserved genes are expected to
encode core components that play fundamental roles in transport [131].
The other genes of the dot/icm system include dotH, dotI, and dotO,
which are essential for intracellular growth and evasion of the endocytic
pathway, and icmGCDJBF and icmTSRQPO, which are involved in
macrophage cell death [133]. The type IV secretion system in intracellular
bacteria is critical for survival in this intracellular niche, possibly because it
allows future specialization as a mammalian pathogen [116].
Concluding remarks
The genomic era has paved the way to major findings regarding
intracellular bacteria. Symbiosis between unicellular and multicellular
organisms has contributed considerably to the evolution of life.
Intracellular bacteria are found in a wide range of niches and from various
evolutionary trajectories, resulting in different genomic compositions.
Based on an endosymbiotic origin for mitochondria and other eukaryotic
organelles, we believe that the intracellular culture is ancient and
constantly co-evolving with the host.
The comparison of bacterial genomic content and lifestyles has revealed
that the capacity to exchange genes depends on the bacterial niche.
Allopatric speciation in bacteria is linked to the restricted opportunity to
exchange genes with other organisms, whereas gene duplications,
mutations and deletions are more often observed. The sympatric lifestyle
is linked with larger genomes, larger pan-genomes, a larger mobilome and
45
genetic exchanges with other bacteria. It is likely that the mutual
relationships between these bacteria and their host cells may have
promoted a noticeable reduction. One of the reasons for genome
reduction could be that the intracellular niche reduces the opportunity for
gene acquisition by lateral gene transfer, and the other is that genes are
lost upon adaptation to the niche.
Comparative analyses of bacterial genomes from different lifestyles,
including free-living and host-dependent bacteria, show that host-
dependent bacteria exhibit fewer transcriptional regulators. The numbers
of abnormal or split ribosomal operons have been identified, and it
appears that this abnormal event occurred independently in several
groups of specialized bacteria. If the bacteria do not use many ribosomal
operons, they are likely to lose them, and restricting translation is critical
for specialization, as speciation is often correlated with ribosomal operon
inactivation. Comparative genomic-based analyses of free-living and host-
dependent bacteria found that host-dependent bacteria exhibited fewer
rRNA genes, more split rRNA operons and fewer transcriptional
regulators, characteristics that are linked to slow growth rates.
Lamarckian evolution may have played a role in bacterial speciation
events associated with a reduction in the genome size, an observation
that contradicts the dominant model, which assumes that speciation and
fitness gain are linked with an increase in the gene repertoire. Gene
duplication facilitates adaptation for bacteria to changing environments
and the use of new niches. Gene copies often show differences in non-
synonymous substitution frequencies, indicating different functions and
46
counter-selective constraints within the same genome. The number of
mobile genetic elements found in obligate intracellular bacteria and
facultative intracellular species are within a similar range, but facultative
intracellular species contain four-fold more mobile DNA elements than
obligate intracellular bacteria. This observation is consistent with
predictions that these element compositions are similar to those of free-
living obligate species. Repeated palindromic elements have important
roles in genomic instability and evolution.
Intracellular bacteria possess mechanisms to protect or to invade host
cells. The interactions between intracellular bacteria and host cells are
enabled by Type IV secretion systems (T4SSs). These systems are required
for bacterial colonization, invasion and persistence within the niche and
are supra-molecular transporters ancestrally related to bacterial
conjugation systems. The main role for T4SSs is to translocate virulence
factors to hosts and to promote DNA transfer. The T4SS facilitates the
inhibition of phagosome-lysosomes fusion and facilitates the transport to
the rough endoplasmic reticulum to support replication in the host cell.
Type IV secretion systems in intracellular bacteria are critical for bacterial
survival in the intracellular niche, possibly allowing for future
specialization as a mammalian pathogen. This system is common in
intracellular bacteria and appears to have been acquired from different
origins, demonstrating that genomes have converged to adapt to a
common lifestyle.
The sequencing of additional intracellular bacterial genomes will enable
the acquisition of a more precise picture of the genetic properties
47
associated with the intracellular lifestyle. This effort will also contribute to
a better understanding of the interactions between intracellular bacteria
and different niches and the complex mechanisms implicated in
pathogenicity.
Acknowledgements
We would like to thank Roshan Padmanabhan for his support,
suggestions, corrections and Ripsy Merrin Chacko for helpful remarks.
48
References:
1. Zientz, E., T. Dandekar, and R. Gross, Metabolic interdependence of obligate
intracellular bacteria and their insect hosts. Microbiology and molecular
biology reviews : MMBR, 2004. 68(4): p. 745-70.
2. Gross, R., J. Hacker, and W. Goebel, The Leopoldina international symposium
on parasitism, commensalism and symbiosis--common themes, different
outcome. Molecular microbiology, 2003. 47(6): p. 1749-58.
3. Finlay, B.B. and S. Falkow, Common themes in microbial pathogenicity
revisited. Microbiology and molecular biology reviews : MMBR, 1997. 61(2): p.
136-69.
4. Fernandez-Moreira, E., J.H. Helbig, and M.S. Swanson, Membrane vesicles
shed by Legionella pneumophila inhibit fusion of phagosomes with lysosomes.
Infection and immunity, 2006. 74(6): p. 3285-95.
5. D'Auria, G., et al., Legionella pneumophila pangenome reveals strain-specific
virulence factors. BMC genomics, 2010. 11: p. 181.
6. Pilsczek, F.H., A. Nicholson-Weller, and I. Ghiran, Phagocytosis of Salmonella
montevideo by human neutrophils: immune adherence increases
phagocytosis, whereas the bacterial surface determines the route of
intracellular processing. The Journal of infectious diseases, 2005. 192(2): p.
200-9.
7. Friedland, J.S., R.J. Shattock, and G.E. Griffin, Phagocytosis of Mycobacterium
tuberculosis or particulate stimuli by human monocytic cells induces
equivalent monocyte chemotactic protein-1 gene expression. Cytokine, 1993.
5(2): p. 150-6.
8. Gil, R., A. Latorre, and A. Moya, Bacterial endosymbionts of insects: insights
from comparative genomics. Environmental microbiology, 2004. 6(11): p.
1109-22.
9. Renvoise, A., et al., Intracellular Rickettsiales: Insights into manipulators of
eukaryotic cells. Trends in molecular medicine, 2011. 17(10): p. 573-83.
10. Douglas, A.E., Mycetocyte symbiosis in insects. Biological reviews of the
Cambridge Philosophical Society, 1989. 64(4): p. 409-34.
11. Moran, N.A. and P. Baumann, Bacterial endosymbionts in animals. Current
opinion in microbiology, 2000. 3(3): p. 270-5.
12. Stepkowski, T. and A.B. Legocki, Reduction of bacterial genome size and
expansion resulting from obligate intracellular lifestyle and adaptation to soil
habitat. Acta biochimica Polonica, 2001. 48(2): p. 367-81.
13. Lynn Margulis, R.F., Symbiosis as a Source of Evolutionary Innovation:
Speciation and Morphogenesis1991: The MIT Press.
14. Margulis, L., Symbiosis and evolution. Scientific American, 1971. 225(2): p. 48-
57.
49
15. Margulis, L., The origin of plant and animal cells. American scientist, 1971.
59(2): p. 230-5.
16. von Dohlen, C.D., et al., Mealybug beta-proteobacterial endosymbionts
contain gamma-proteobacterial symbionts. Nature, 2001. 412(6845): p. 433-6.
17. Corsaro, D., et al., Intracellular life. Critical reviews in microbiology, 1999.
25(1): p. 39-79.
18. Merhej, V. and D. Raoult, Rickettsial evolution in the light of comparative
genomics. Biological reviews of the Cambridge Philosophical Society, 2011.
86(2): p. 379-405.
19. Werren, J.H., L. Baldo, and M.E. Clark, Wolbachia: master manipulators of
invertebrate biology. Nature reviews. Microbiology, 2008. 6(10): p. 741-51.
20. McNulty, S.N., et al., Endosymbiont DNA in endobacteria-free filarial
nematodes indicates ancient horizontal genetic transfer. PloS one, 2010. 5(6):
p. e11029.
21. Klasson, L., et al., Horizontal gene transfer between Wolbachia and the
mosquito Aedes aegypti. BMC genomics, 2009. 10: p. 33.
22. Koonin, E.V., The origin and early evolution of eukaryotes in the light of
phylogenomics. Genome biology, 2010. 11(5): p. 209.
23. Dunning Hotopp, J.C., et al., Widespread lateral gene transfer from
intracellular bacteria to multicellular eukaryotes. Science, 2007. 317(5845): p.
1753-6.
24. Georgiades, K. and D. Raoult, The rhizome of Reclinomonas americana, Homo
sapiens, Pediculus humanus and Saccharomyces cerevisiae mitochondria.
Biology direct, 2011. 6: p. 55.
25. Georgiades, K., et al., Phylogenomic analysis of Odyssella thessalonicensis
fortifies the common origin of Rickettsiales, Pelagibacter ubique and
Reclimonas americana mitochondrion. PloS one, 2011. 6(9): p. e24857.
26. Casadevall, A., Evolution of intracellular pathogens. Annual review of
microbiology, 2008. 62: p. 19-33.
27. Whitman, W.B., The modern concept of the procaryote. J Bacteriol, 2009.
191(7): p. 2000-5; discussion 2006-7.
28. Georgiades, K., et al., Gene gain and loss events in Rickettsia and Orientia
species. Biology direct, 2011. 6: p. 6.
29. Gimenez, G., et al., Insight into cross-talk between intra-amoebal pathogens.
BMC genomics, 2011. 12: p. 542.
30. Moliner, C., P.E. Fournier, and D. Raoult, Genome analysis of microorganisms
living in amoebae reveals a melting pot of evolution. FEMS microbiology
reviews, 2010. 34(3): p. 281-94.
31. Audic, S., et al., Genome analysis of Minibacterium massiliensis highlights the
convergent evolution of water-living bacteria. PLoS Genet, 2007. 3(8): p. e138.
50
32. Merhej, V., et al., The rhizome of life: the sympatric Rickettsia felis paradigm
demonstrates the random transfer of DNA sequences. Molecular biology and
evolution, 2011. 28(11): p. 3213-23.
33. Marco, D., Metagenomics and the niche concept. Theory in biosciences =
Theorie in den Biowissenschaften, 2008. 127(3): p. 241-7.
34. Wernegreen, J.J., Genome evolution in bacterial endosymbionts of insects.
Nature reviews. Genetics, 2002. 3(11): p. 850-61.
35. Mira, A., H. Ochman, and N.A. Moran, Deletional bias and the evolution of
bacterial genomes. Trends in genetics : TIG, 2001. 17(10): p. 589-96.
36. Tamas, I., et al., 50 million years of genomic stasis in endosymbiotic bacteria.
Science, 2002. 296(5577): p. 2376-9.
37. Wernegreen, J.J., For better or worse: genomic consequences of intracellular
mutualism and parasitism. Current opinion in genetics & development, 2005.
15(6): p. 572-83.
38. Moran, N.A. and G.R. Plague, Genomic changes following host restriction in
bacteria. Current opinion in genetics & development, 2004. 14(6): p. 627-33.
39. Darby, A.C., et al., Intracellular pathogens go extreme: genome evolution in
the Rickettsiales. Trends in genetics : TIG, 2007. 23(10): p. 511-20.
40. Moran, N.A., Microbial minimalism: genome reduction in bacterial pathogens.
Cell, 2002. 108(5): p. 583-6.
41. Toft, C. and S.G. Andersson, Evolutionary microbial genomics: insights into
bacterial host adaptation. Nature reviews. Genetics, 2010. 11(7): p. 465-75.
42. Degnan, P.H., A.B. Lazarus, and J.J. Wernegreen, Genome sequence of
Blochmannia pennsylvanicus indicates parallel evolutionary trends among
bacterial mutualists of insects. Genome research, 2005. 15(8): p. 1023-33.
43. Moran, N.A., Accelerated evolution and Muller's rachet in endosymbiotic
bacteria. Proceedings of the National Academy of Sciences of the United
States of America, 1996. 93(7): p. 2873-8.
44. Nakabachi, A., et al., The 160-kilobase genome of the bacterial endosymbiont
Carsonella. Science, 2006. 314(5797): p. 267.
45. van Ham, R.C., et al., Reductive genome evolution in Buchnera aphidicola.
Proceedings of the National Academy of Sciences of the United States of
America, 2003. 100(2): p. 581-6.
46. Moran, N.A., J.P. McCutcheon, and A. Nakabachi, Genomics and Evolution of
Heritable Bacterial Symbionts. Annual Review of Genetics, 2008. 42(1): p. 165-
190.
47. Merhej, V., et al., Massive comparative genomic analysis reveals convergent
evolution of specialized bacteria. Biology direct, 2009. 4: p. 13.
48. McCutcheon, J.P. and N.A. Moran, Parallel genomic evolution and metabolic
interdependence in an ancient symbiosis. Proceedings of the National
Academy of Sciences of the United States of America, 2007. 104(49): p.
19392-7.
51
49. Perez-Brocal, V., et al., A small microbial genome: the end of a long symbiotic
relationship? Science, 2006. 314(5797): p. 312-3.
50. Shigenobu, S., et al., Genome sequence of the endocellular bacterial symbiont
of aphids Buchnera sp. APS. Nature, 2000. 407(6800): p. 81-6.
51. Arneodo, J.D., et al., Ultrastructural detection of an unusual intranuclear
bacterium in Pentastiridius leporinus (Hemiptera: Cixiidae). Journal of
invertebrate pathology, 2008. 97(3): p. 310-3.
52. Sassera, D., et al., 'Candidatus Midichloria mitochondrii', an endosymbiont of
the tick Ixodes ricinus with a unique intramitochondrial lifestyle. International
journal of systematic and evolutionary microbiology, 2006. 56(Pt 11): p. 2535-
40.
53. Moran, N.A., et al., The players in a mutualistic symbiosis: insects, bacteria,
viruses, and virulence genes. Proceedings of the National Academy of Sciences
of the United States of America, 2005. 102(47): p. 16919-26.
54. Fraser-Liggett, C.M., Insights on biology and evolution from microbial genome
sequencing. Genome research, 2005. 15(12): p. 1603-10.
55. Renesto, P., et al., Some lessons from Rickettsia genomics. FEMS microbiology
reviews, 2005. 29(1): p. 99-117.
56. Wernegreen, J.J., A.B. Lazarus, and P.H. Degnan, Small genome of Candidatus
Blochmannia, the bacterial endosymbiont of Camponotus, implies irreversible
specialization to an intracellular lifestyle. Microbiology, 2002. 148(Pt 8): p.
2551-6.
57. Cazalet, C., et al., Evidence in the Legionella pneumophila genome for
exploitation of host cell functions and high genome plasticity. Nature genetics,
2004. 36(11): p. 1165-73.
58. Fournier, P.E., et al., Analysis of the Rickettsia africae genome reveals that
virulence acquisition in Rickettsia species may be explained by genome
reduction. BMC genomics, 2009. 10: p. 166.
59. Ogata, H., et al., Mechanisms of evolution in Rickettsia conorii and R.
prowazekii. Science, 2001. 293(5537): p. 2093-8.
60. Rolain, J.M., et al., Partial disruption of translational and posttranslational
machinery reshapes growth rates of Bartonella birtlesii. mBio, 2013. 4(2): p.
e00115-13.
61. Blanc, G., et al., Reductive genome evolution from the mother of Rickettsia.
PLoS genetics, 2007. 3(1): p. e14.
62. Moran, N.A., J.P. McCutcheon, and A. Nakabachi, Genomics and evolution of
heritable bacterial symbionts. Annual Review of Genetics, 2008. 42: p. 165-90.
63. Fares, M.A., A. Moya, and E. Barrio, GroEL and the maintenance of bacterial
endosymbiosis. Trends in genetics : TIG, 2004. 20(9): p. 413-6.
64. McCutcheon, J.P. and N.A. Moran, Extreme genome reduction in symbiotic
bacteria. Nature reviews. Microbiology, 2012. 10(1): p. 13-26.
52
65. Moran, N.A., H.E. Dunbar, and J.L. Wilcox, Regulation of transcription in a
reduced bacterial genome: nutrient-provisioning genes of the obligate
symbiont Buchnera aphidicola. J Bacteriol, 2005. 187(12): p. 4229-37.
66. Wilcox, J.L., et al., Consequences of reductive evolution for gene expression in
an obligate endosymbiont. Molecular microbiology, 2003. 48(6): p. 1491-500.
67. Fares, M.A., et al., Endosymbiotic bacteria: groEL buffers against deleterious
mutations. Nature, 2002. 417(6887): p. 398.
68. Koonin, E.V., Darwinian evolution in the light of genomics. Nucleic acids
research, 2009. 37(4): p. 1011-34.
69. Colson, P. and D. Raoult, Lamarckian evolution of the giant Mimivirus in
allopatric laboratory culture on amoebae. Frontiers in cellular and infection
microbiology, 2012. 2: p. 91.
70. Georgiades, K. and D. Raoult, Defining pathogenic bacterial species in the
genomic era. Frontiers in microbiology, 2010. 1: p. 151.
71. Bliven, K.A. and A.T. Maurelli, Antivirulence genes: insights into pathogen
evolution through gene loss. Infect Immun, 2012. 80(12): p. 4061-70.
72. Hooper, S.D. and O.G. Berg, On the nature of gene innovation: duplication
patterns in microbial genomes. Molecular biology and evolution, 2003. 20(6):
p. 945-54.
73. Schmitz-Esser, S., et al., ATP/ADP translocases: a common feature of obligate
intracellular amoebal symbionts related to Chlamydiae and Rickettsiae. J
Bacteriol, 2004. 186(3): p. 683-91.
74. Aravind, L., et al., Evidence for massive gene exchange between archaeal and
bacterial hyperthermophiles. Trends in genetics : TIG, 1998. 14(11): p. 442-4.
75. Walsh, J.B., How often do duplicated genes evolve new functions? Genetics,
1995. 139(1): p. 421-8.
76. Frank, A.C., H. Amiri, and S.G. Andersson, Genome deterioration: loss of
repeated sequences and accumulation of junk DNA. Genetica, 2002. 115(1): p.
1-12.
77. Braeken, L., B. Van der Bruggen, and C. Vandecasteele, Flux decline in
nanofiltration due to adsorption of dissolved organic compounds: model
prediction of time dependency. The journal of physical chemistry. B, 2006.
110(6): p. 2957-62.
78. Blanc, G., et al., Molecular evolution of rickettsia surface antigens: evidence of
positive selection. Molecular biology and evolution, 2005. 22(10): p. 2073-83.
79. Ogata, H., et al., The genome sequence of Rickettsia felis identifies the first
putative conjugative plasmid in an obligate intracellular parasite. PLoS
biology, 2005. 3(8): p. e248.
80. Dai, L., et al., Database for mobile group II introns. Nucleic acids research,
2003. 31(1): p. 424-6.
81. Casjens, S., Prophages and bacterial genomics: what have we learned so far?
Molecular microbiology, 2003. 49(2): p. 277-300.
53
82. Van Sluys, M.A., et al., Comparative analyses of the complete genome
sequences of Pierce's disease and citrus variegated chlorosis strains of Xylella
fastidiosa. J Bacteriol, 2003. 185(3): p. 1018-26.
83. Banks, D.J., S.B. Beres, and J.M. Musser, The fundamental contribution of
phages to GAS evolution, genome diversification and strain emergence. Trends
in microbiology, 2002. 10(11): p. 515-21.
84. Ohnishi, M., K. Kurokawa, and T. Hayashi, Diversification of Escherichia coli
genomes: are bacteriophages the major contributors? Trends in microbiology,
2001. 9(10): p. 481-5.
85. Boyd, E.F. and H. Brussow, Common themes among bacteriophage-encoded
virulence factors and diversity among the bacteriophages involved. Trends in
microbiology, 2002. 10(11): p. 521-9.
86. Boyd, E.F., B.M. Davis, and B. Hochhut, Bacteriophage-bacteriophage
interactions in the evolution of pathogenic bacteria. Trends in microbiology,
2001. 9(3): p. 137-44.
87. Miao, E.A. and S.I. Miller, Bacteriophages in the evolution of pathogen-host
interactions. Proceedings of the National Academy of Sciences of the United
States of America, 1999. 96(17): p. 9452-4.
88. Koonin, E.V. and Y.I. Wolf, Genomics of bacteria and archaea: the emerging
dynamic view of the prokaryotic world. Nucleic acids research, 2008. 36(21):
p. 6688-719.
89. Frost, L.S., et al., Mobile genetic elements: the agents of open source
evolution. Nature reviews. Microbiology, 2005. 3(9): p. 722-32.
90. Bordenstein, S.R. and W.S. Reznikoff, Mobile DNA in obligate intracellular
bacteria. Nature reviews. Microbiology, 2005. 3(9): p. 688-99.
91. Wu, M., et al., Phylogenomics of the reproductive parasite Wolbachia pipientis
wMel: a streamlined genome overrun by mobile genetic elements. PLoS
biology, 2004. 2(3): p. E69.
92. Andersson, S.G., et al., Comparative genomics of microbial pathogens and
symbionts. Bioinformatics, 2002. 18 Suppl 2: p. S17.
93. Simek, K., et al., Changes in bacterial community composition and dynamics
and viral mortality rates associated with enhanced flagellate grazing in a
mesoeutrophic reservoir. Appl Environ Microbiol, 2001. 67(6): p. 2723-33.
94. Simser, J.A., et al., A novel and naturally occurring transposon, ISRpe1 in the
Rickettsia peacockii genome disrupting the rickA gene involved in actin-based
motility. Molecular microbiology, 2005. 58(1): p. 71-9.
95. Blanc, G., et al., Lateral gene transfer between obligate intracellular bacteria:
evidence from the Rickettsia massiliae genome. Genome research, 2007.
17(11): p. 1657-64.
96. Horn, M., et al., Illuminating the evolutionary history of chlamydiae. Science,
2004. 304(5671): p. 728-30.
54
97. Ogata, H., et al., Genome sequence of Rickettsia bellii illuminates the role of
amoebae in gene exchanges between intracellular pathogens. PLoS Genet,
2006. 2(5): p. e76.
98. Yang, F., et al., Genome dynamics and diversity of Shigella species, the
etiologic agents of bacillary dysentery. Nucleic acids research, 2005. 33(19): p.
6445-58.
99. Labrador, M. and V.G. Corces, Transposable element-host interactions:
regulation of insertion and excision. Annu Rev Genet, 1997. 31: p. 381-404.
100. van Belkum, A., et al., Short-sequence DNA repeats in prokaryotic genomes.
Microbiology and molecular biology reviews : MMBR, 1998. 62(2): p. 275-93.
101. Fournier, P.E., et al., Use of highly variable intergenic spacer sequences for
multispacer typing of Rickettsia conorii strains. Journal of clinical
microbiology, 2004. 42(12): p. 5757-66.
102. Amiri, H., C.M. Alsmark, and S.G. Andersson, Proliferation and deterioration of
Rickettsia palindromic elements. Molecular biology and evolution, 2002.
19(8): p. 1234-43.
103. Claverie, J.M. and H. Ogata, The insertion of palindromic repeats in the
evolution of proteins. Trends in biochemical sciences, 2003. 28(2): p. 75-80.
104. Ogata, H., et al., Selfish DNA in protein-coding genes of Rickettsia. Science,
2000. 290(5490): p. 347-50.
105. Rocha, E.P., DNA repeats lead to the accelerated loss of gene order in bacteria.
Trends in genetics : TIG, 2003. 19(11): p. 600-3.
106. Baldridge, G.D., et al., Transposon insertion reveals pRM, a plasmid of
Rickettsia monacensis. Appl Environ Microbiol, 2007. 73(15): p. 4984-95.
107. Moran, J.V., R.J. DeBerardinis, and H.H. Kazazian, Jr., Exon shuffling by L1
retrotransposition. Science, 1999. 283(5407): p. 1530-4.
108. Ogata, H., et al., Rickettsia felis, from culture to genome sequencing. Annals of
the New York Academy of Sciences, 2005. 1063: p. 26-34.
109. Li, J., A. Mahajan, and M.D. Tsai, Ankyrin repeat: a unique motif mediating
protein-protein interactions. Biochemistry, 2006. 45(51): p. 15168-78.
110. Mosavi, L.K., et al., The ankyrin repeat as molecular architecture for protein
recognition. Protein science : a publication of the Protein Society, 2004. 13(6):
p. 1435-48.
111. Rubtsov, A.M. and O.D. Lopina, Ankyrins. FEBS letters, 2000. 482(1-2): p. 1-5.
112. Blatch, G.L. and M. Lassle, The tetratricopeptide repeat: a structural motif
mediating protein-protein interactions. BioEssays : news and reviews in
molecular, cellular and developmental biology, 1999. 21(11): p. 932-9.
113. Bork, P., Hundreds of ankyrin-like repeats in functionally diverse proteins:
mobile modules that cross phyla horizontally? Proteins, 1993. 17(4): p. 363-74.
114. Felsheim, R.F., T.J. Kurtti, and U.G. Munderloh, Genome sequence of the
endosymbiont Rickettsia peacockii and comparison with virulent Rickettsia
rickettsii: identification of virulence factors. PloS one, 2009. 4(12): p. e8361.
55
115. Caturegli, P., et al., ankA: an Ehrlichia phagocytophila group gene encoding a
cytoplasmic protein antigen with ankyrin repeats. Infection and immunity,
2000. 68(9): p. 5277-83.
116. Saisongkorh, W., et al., Evidence of transfer by conjugation of type IV secretion
system genes between Bartonella species and Rhizobium radiobacter in
amoeba. PloS one, 2010. 5(9): p. e12666.
117. Saridaki, A. and K. Bourtzis, Wolbachia: more than just a bug in insects
genitals. Current opinion in microbiology, 2010. 13(1): p. 67-72.
118. Lin, M., et al., Analysis of complete genome sequence of Neorickettsia risticii:
causative agent of Potomac horse fever. Nucleic acids research, 2009. 37(18):
p. 6076-91.
119. Cho, N.H., et al., The Orientia tsutsugamushi genome reveals massive
proliferation of conjugative type IV secretion system and host-cell interaction
genes. Proceedings of the National Academy of Sciences of the United States
of America, 2007. 104(19): p. 7981-6.
120. Burns, D.L., Type IV transporters of pathogenic bacteria. Current opinion in
microbiology, 2003. 6(1): p. 29-34.
121. Christie, P.J., Type IV secretion: intercellular transfer of macromolecules by
systems ancestrally related to conjugation machines. Molecular microbiology,
2001. 40(2): p. 294-305.
122. Christie, P.J. and J.P. Vogel, Bacterial type IV secretion: conjugation systems
adapted to deliver effector molecules to host cells. Trends in microbiology,
2000. 8(8): p. 354-60.
123. Deng, W., et al., VirE1 is a specific molecular chaperone for the exported
single-stranded-DNA-binding protein VirE2 in Agrobacterium. Molecular
microbiology, 1999. 31(6): p. 1795-807.
124. Chen, I., P.J. Christie, and D. Dubnau, The ins and outs of DNA transfer in
bacteria. Science, 2005. 310(5753): p. 1456-60.
125. Schandel, K.A., M.M. Muller, and R.E. Webster, Localization of TraC, a protein
involved in assembly of the F conjugative pilus. J Bacteriol, 1992. 174(11): p.
3800-6.
126. Beranek, A., et al., Thirty-eight C-terminal amino acids of the coupling protein
TraD of the F-like conjugative resistance plasmid R1 are required and sufficient
to confer binding to the substrate selector protein TraM. J Bacteriol, 2004.
186(20): p. 6999-7006.
127. Schroder, G. and E. Lanka, TraG-like proteins of type IV secretion systems:
functional dissection of the multiple activities of TraG (RP4) and TrwB (R388). J
Bacteriol, 2003. 185(15): p. 4371-81.
128. Berglund, E.C., et al., Run-off replication of host-adaptability genes is
associated with gene transfer agents in the genome of mouse-infecting
Bartonella grahamii. PLoS genetics, 2009. 5(7): p. e1000546.
56
129. Matthews, M. and C.R. Roy, Identification and subcellular localization of the
Legionella pneumophila IcmX protein: a factor essential for establishment of a
replicative organelle in eukaryotic host cells. Infection and immunity, 2000.
68(7): p. 3971-82.
130. Vogel, J.P., et al., Conjugative transfer by the virulence system of Legionella
pneumophila. Science, 1998. 279(5352): p. 873-6.
131. Nagai, H. and T. Kubori, Type IVB Secretion Systems of Legionella and Other
Gram-Negative Bacteria. Frontiers in microbiology, 2011. 2: p. 136.
132. Nora, T., et al., Molecular mimicry: an important virulence strategy employed
by Legionella pneumophila to subvert host functions. Future microbiology,
2009. 4(6): p. 691-701.
133. Andrews, H.L., J.P. Vogel, and R.R. Isberg, Identification of linked Legionella
pneumophila genes essential for intracellular growth and evasion of the
endocytic pathway. Infection and immunity, 1998. 66(3): p. 950-8.
57
Table S1: List of some of the main sequenced intracellular genomes (as of October 2013) indicating the genome size, GC contents,
number of protein coding genes, number of plasmids and the year of publishing along with its lifestyle. Lifestyle: FI- Facultative
intracellular; OI- Obligate intracellular
Niche Bacteria Size GC % Protein Plasmids Year Citation
gammaproteobacteria
OI Buchnera aphidicola Acyrthosiphon pisum 5p 0.64 26 555 0 2009 1
OI Buchnera aphidicola Acyrthosiphon pisum 0.64 26 564 2 2001 2
OI Buchnera aphidicola Baizongia pistaciae 0.62 25 504 1 2003 3
OI Buchnera aphidicola Cinara cedri 0.42 20 357 1 2006 4
OI Buchnera aphidicola Schizaphis graminum 0.64 25 546 0 2002 5
OI Buchnera aphidicola Acyrthosiphon pisum Tuc7 0.64 26 553 0 2009 1
OI Wigglesworthia glossinidia 0.7 22 611 1 2002 6
OI Candidatus Blochmannia floridanus 0.71 27 583 0 2003 7
OI Candidatus Blochmannia pennsylvanicus str. BPEN 0.79 29 610 0 2005 8
OI Baumannia cicadellinicola str Hc 0.69 33 595 0 2006 9
FI Sodalis glossinidius 4.17 54 2432 3 2006 10
OI Candidatus Hamiltonella defensa 5AT 2.1 40 2094 1 2009 11
FI Photorhabdus asymbiotica 5.06 42 4390 1 2009 12,13
OI Candidatus Carsonella ruddii 0.16 16 182 0 2006 14
FI Shigella flexneri 2a 2457T 4.6 50 4060 0 2003 15
FI Shigella flexneri 2a 301 4.6 50 4176 1 2002 16
FI Shigella flexneri 2a 5 str. 8401 4.57 50 4114 0 2006 17
FI Legionella pneumophila Corby 3.58 38 3204 0 2007 18
FI Legionella pneumophila Lens 3.35 38 2878 1 2004 19
FI Legionella pneumophila Paris 3.5 38 3027 1 2004 20
FI Legionella pneumophila subsp. pneumophila str. Philadelphia 1 3.4 38 2942 0 2001 19
FI Legionella pneumophila 2300/99 Alcoy 3.5 38 3190 0 2010 19
FI Legionella pneumophila subsp. pneumophila LPE509 3.5 38 3331 1 2013 21
58
FI Legionella pneumophila subsp. pneumophila str. Thunder Bay 3.5 38 2998 0 2013 21
FI Legionella longbeachae NSW150 4.1 37 3470 1 2010 22
OI Coxiella burnetii CbuG_Q212 2 42 1866 0 2008 23
OI Coxiella burnetii CbuK_Q154 2.1 42 1900 1 2008 23
OI Coxiella burnetii Dugway 5J108-111 2.2 42 1993 1 2007 23
OI Coxiella burnetii RSA 331 2 42 1930 1 2009 23
OI Coxiella burnetii RSA 493 2 42 1817 1 2001 23
FI Francisella tularensis subsp. holarctica FSC200 1.9 32 1438 0 2012 24
FI Francisella tularensis subsp. tularensis TI0902 1.9 32 1544 0 2012 25
FI Francisella tularensis subsp. tularensis TIGB03 2.0 32 1624 0 2012 26
FI Francisella tularensis subsp. tularensis NE061598 1.9 32 1836 0 2009 27
FI Francisella tularensis subsp. holarctica F92 1.9 32 1842 0 2012 28
FI Francisella tularensis holarctica FTNF002-00 FTA 1.9 32 1580 0 2007 29
FI Francisella tularensis holarctica LVS 1.9 32 1754 0 2006 30
FI Francisella tularensis holarctica OSU18 1.9 32 1555 0 2006 30
FI Francisella tularensis mediasiatica FSC147 1.9 32 1406 0 2008 31
FI Francisella tularensis tularensis FSC198 1.9 32 1605 0 2006 30
FI Francisella tularensis tularensis SCHU S4 Schu S4 1.9 32 1604 0 2004 32
FI Francisella tularensis tularensis WY96-3418 1.9 32 1634 0 2007 29
OI Candidatus Vesicomyosocius okutanii HA 1.02 31 937 0 2007 33
OI Candidatus Ruthia magnifica str. Cm 1.16 34 976 0 2006 34
betaproteobacteria
FI Burkholderia mallei ATCC 23344 5.83 68 5024 0 2004 35
FI Burkholderia mallei NCTC 10229 5.76 68 5509 0 2007 35
FI Burkholderia mallei NCTC 10247 5.85 68 5415 0 2007 35
FI Burkholderia mallei SAVP1 5.23 68 5188 0 2007 35
OI Polynucleobacter necessarius subsp. asymbioticus QLW-
P1DMWA-1
2.16 44 2077 0 2007 36
OI Polynucleobacter necessarius subsp. necessarius STIR1 1.56 45 1508 0 2008 36
59
alphaproteobacteria
FI Bartonella bacilliformis KC583 1.45 38 1283 0 2007 37
FI Bartonella grahamii as4aup 2.34 38 1737 1 2009 38
FI Bartonella henselae str. Houston-1 1.93 38 1488 0 2004 39
FI Bartonella quintana str. Toulouse Toulose 1.58 38 1142 0 2004 39
FI Bartonella tribocorum CIP 105476 2.6 38 2069 1 2007 40
OI Candidatus Hodgkinia cicadicola str. Dsem 0.14 58 169 1 2009 41
FI Phenylobacterium zucineum (strain HLK1) 4 71 3529 1 2007 42
OI Anaplasma Centrale str. Israel 1.2 49 923 0 2009 43
OI Anaplasma marginale str. Florida 1.2 49 940 0 2009 44
OI Anaplasma marginale str. St. Maries 1.2 49 948 0 2003 45
OI Anaplasma phagocytophilum 1.47 41 1264 0 2006 46
OI Ehrlichia canis str. Jake 1.3 28 925 0 2005 47
OI Ehrlichia chaffeensis str. Arkansas 1.18 30 1105 0 2006 48
OI Ehrlichia ruminantium str. Gardel 1.5 27 950 0 2005 49
OI Ehrlichia ruminantium str. Welgevonden 1.51 27 958 0 2005 50
OI Ehrlichia ruminantium str. Welgevonden 1.5 27 888 0 2003 50
OI Wolbachia pipientis wPip 1.5 34 1275 0 2008 51
OI Wolbachia pipientis wMel 1.27 35 1195 0 2002 52
OI Wolbachia pipientis wMel TRS 1.08 34 805 0 2005 53
OI Wolbachia pipientis wRi 1.45 35 1150 0 2009 54
OI Neorickettsia risticii str. Illinois 0.88 41 892 0 2009 55
OI Neorickettsia sennetsu str. Miyayama 0.86 41 932 0 2006 56
OI Rickettsia africae ESF-5 1.28 32 1030 1 2009 57
OI Rickettsia akari str. Hartford 1.23 32 1258 0 2007 58
OI Rickettsia bellii OSU 85-389 1.53 31 1475 0 2007 58
OI Rickettsia bellii RML369-C 1.52 31 1429 0 2006 59
OI Rickettsia canadensis str. McKiel 1.16 31 1090 0 2007 60
OI Rickettsia conorii str. Malish 7 1.27 32 1374 0 2001 61
60
OI Rickettsia felis URRWXCal2 1.49 32 1400 2 2005 62
OI Rickettsia massiliae MTU5 1.36 32 968 1 2007 63
OI Rickettsia peacockii str. Rustic 1.3 32 927 1 2009 64
OI Rickettsia prowazekii str. Madrid E 1.1 29 835 0 2001 65
OI Rickettsia rickettsii str. 'Sheila Smith' 1.26 32 1343 0 2007 66
OI Rickettsia rickettsii str. Iowa 1.27 32 1383 0 2008 67
OI Rickettsia typhi str. Wilmington 1.11 28 838 0 2004 68
OI Candidatus Rickettsia amblyommii str. GAT-30V 1.48 32 1390 3 2012 69
OI Rickettsia australis str. Cutlack 1.32 32 1261 1 2012 70
OI Rickettsia canadensis str. CA410 1.15 31 1016 0 2012 71
OI Rickettsia heilongjiangensis 054 1.28 32 1297 0 2011 72
OI Rickettsia japonica YH 1.28 32 971 0 2011 73
OI Rickettsia massiliae str. AZT80 1.28 33 1207 1 2012 74
OI Rickettsia montanensis str. OSU 85-930 1.28 33 1217 0 2012 75
OI Rickettsia parkeri str. Portsmouth 1.30 32 1318 0 2012 76
OI Rickettsia philipii str. 364D 1.29 33 1344 0 2012 77
OI Rickettsia prowazekii str. Breinl 1.11 29 920 0 2013 78
OI Rickettsia prowazekii str. BuV67-CWPP 1.11 29 843 0 2012 79
OI Rickettsia prowazekii str. Chernikova 1.11 29 845 0 2012 80
OI Rickettsia prowazekii str. Dachau 1.11 29 839 0 2012 81
OI Rickettsia prowazekii str. GvV257 1.11 29 829 0 2012 82
OI Rickettsia prowazekii str. Katsinyian 1.11 29 844 0 2012 83
OI Rickettsia prowazekii str. NMRC Madrid E 1.11 29 938 0 2013 84
OI Rickettsia prowazekii str. Rp22 1.11 29 950 0 2010 85
OI Rickettsia prowazekii str. RpGvF24 1.11 29 834 0 2012 86
OI Rickettsia rhipicephali str. 3-7-female6-CWPP 1.31 32 1266 1 2012 87
OI Rickettsia rickettsii str. Arizona 1.27 32 1343 0 2012 88
OI Rickettsia rickettsii str. Brazil 1.26 33 1332 0 2012 89
OI Rickettsia rickettsii str. Colombia 1.27 33 1350 0 2012 90
61
OI Rickettsia rickettsii str. Hauke 1.27 33 1340 0 2012 91
OI Rickettsia rickettsii str. Hino 1.27 33 1335 0 2012 92
OI Rickettsia rickettsii str. Hlp#2 1.27 33 1308 0 2012 93
OI Rickettsia slovaca 13-B 1.28 33 1112 0 2011 94
OI Rickettsia slovaca str. D-CWPP 1.28 33 1347 0 2012 95
OI Rickettsia typhi str. B9991CWPP 1.11 29 839 0 2012 96
OI Rickettsia typhi str. TH1527 1.11 29 838 0 2012 96
OI Orientia tsutsugamushi str. Boryong 2.13 30 1182 0 2007 97
OI Orientia tsutsugamushi str. Ikeda 2 30 1967 0 2008 98
deltaproteobacteria
OI Lawsonia intracellularis PHE/MN1-00 1.46 33 1180 3 2006 99
Bacteroidetes
OI Candidatus Sulcia muelleri GWSS 0.25 22 227 0 2007 100
OI Candidatus Sulcia muelleri SMDSEM 0.28 22 242 0 2009 101
OI Blattabacterium Bge 0.64 27 586 0 2009 102
OI Blattabacterium BPLAN 0.64 28 576 1 2009 102
OI Candidatus Amoebophilus asiaticus 5a2 1.9 35 1283 0 2008 103
Actinobacteria
OI Mycobacterium leprae TN 3.27 57 1605 0 2001 104
FI Renibacterium salmoninarum ATCC 33209 3.16 56 3507 0 2007 105
FI Tropheryma whipplei TW08/27 0.93 46 783 0 2003 106
FI Tropheryma whipplei Twist 0.93 46 808 0 2003 107
Chlamydiae
OI Chlamydophila abortus S26/3 1.14 39 932 0 2003 108
OI Chlamydophila caviae GPIC 1.17 39 998 1 2002 109
OI Chlamydophila felis Fe/C-56 1.17 39 1005 1 2006 110
OI Chlamydophila pneumoniae AR39 1.23 40 1112 0 2001 111
OI Chlamydophila pneumoniae CWL029 1.23 40 1052 0 2001 112
OI Chlamydophila pneumoniae J138 1.23 40 1069 0 2001 113
62
OI Chlamydophila pneumoniae TW-183 1.23 40 1113 0 2003 114
OI Chlamydia muridarum Nigg 1.07 40 904 1 2001 111
OI Chlamydia trachomatis A/HAR-13 1.04 41 911 1 2005 115
OI Chlamydia trachomatis D/UW-3/CX 1.04 41 895 0 2001 116
OI Chlamydia trachomatis 434/Bu 1.04 41 874 0 2008 117
OI Chlamydia trachomatis B/Jali20/OT Jali20 1.04 41 875 0 2009 118
OI Chlamydia trachomatis B/TZ1A828/OT 1.04 41 880 0 2009 119
OI Chlamydia trachomatis L2b/UCH-1/proctitis 1.04 41 874 0 2008 120
OI Candidatus Protochlamydia amoebophila UWE25 2.41 34 2031 0 2004 121
Firmicutes
FI Listeria monocytogenes Clip80459 CLIP80459 2.9 38 2766 0 2009 122
FI Listeria monocytogenes EGD-e 2.94 37 2846 0 2001 123
FI Listeria monocytogenes HCC23 2.98 38 2974 0 2008 124
FI Listeria monocytogenes str. 4b F2365 2.91 38 2821 0 2001 125
Tenericutes
OI Phytoplasma Onion yellows OY-M 0.85 27 750 0 2003 126
OI Phytoplasma Australiense 0.88 27 684 0 2008 127
OI Phytoplasma Aster yellows witches-broom AYWB 0.71 26 671 4 2006 128
OI Phytoplasma mali str. AT 0.6 21 479 0 2008 129
OI Mycoplasma penetrans HF-2 1.36 25 1037 0 2002 130
63
Chapter 3
Genome sequencing of intracellular bacteria
64
65
3.1 Article 1 :
Genome Sequence of Diplorickettsia massiliensis, an
Emerging Ixodes ricinus-Associated Human Pathogen
Mano J. Mathew1, Geetha Subramanian1, Thi-Tien Nguyen1,
Catherine Robert1, Oleg Mediannikov1, Pierre-Edouard Fournier1,
Didier Raoult1*
1 Unité de Recherche sur les Maladies Infectieuses et Tropicales
Emergentes: URMITE, Aix Marseille Université, UMR CNRS 7278,
IRD 198, INSERM 109, Faculté de Médecine, 27 Bd Jean Moulin,
13005, Marseille, France.
Published in J. Bacteriol. June 2012 vol. 194 no. 12 3287
*Corresponding author. E-mail: didier.raoult@gmail.com
66
67
Preamble to article 1
The order Legionellales is composed of several pathogenic, aerobic,
motile and nutritionally fastidious pleomorphic gram negative bacteria
from the class gammaproteobacteria. The order Legionellales is
composed of two families: Legionellaceae and Coxiellaceae. Many species
of Legionella cause legionellosis. The family Coxiellaceae consists of
Aquicella, Coxiella (an intracellular bacterium that is the causative agent
of Q fever) (Beare, et al., 2009), Diplorickettsia and Rickettsiella (an
intracellular parasite of Gryllus bimaculatus) (Roux, et al., 1997,
Mediannikov, et al., 2010). Almost all bacteria isolated from ticks (Ixodes
ricinus) are pathogenic for humans, notably Borrelia burgdorferi, Borrelia
afzelii, Borrelia garinii, Rickettsia helvetica, Rickettsia monacensis and
Francisella tularensis (Parola & Raoult, 2001). F. tularensis, which causes
tularemia or plague-like disease, belongs to the Thiotrichales order
(Beckstrom-Sternberg, et al., 2007).
D. massiliensis strain 20B is an obligate intracellular, gram negative
bacterium isolated from Ixodes ricinus ticks collected in 2006 from the
southeastern part of the Rovinka forest in Slovakia (Mediannikov, et al.,
2010). D. massiliensis belongs to the Gammaproteobacteria class, is non-
endospore-forming, and is shaped as small rods that are usually grouped
in pairs. An initial phylogenetic analysis based on 16S rRNA showed that
D. massiliensis clustered with Rickettsiella grylli (Roux, et al., 1997,
Mediannikov, et al., 2010). Because of its low 16S rDNA similarity (94%)
68
with R. grylli, it was classified as a new genus Diplorickettsia into the
family Coxiellaceae and the order Legionellales (Mediannikov, et al.,
2010). D. massiliensis strain 20B was identified in three patients with
suspected tick-borne infections that exhibited a specific seroconversion.
The evidence of infection was further reconfirmed by using PCR-assay,
thus establishing its role as a human pathogen. This article reports the
genome of D. massiliensis 20B, contains 1,727,973 bp with a G+C content
of 38.9%. When compared to closely related gammaproteobacteria,
D. massiliensis, with 1.7 Mb, had a bigger genome than Rickettsiella grylli,
with 1.4 Mb but smaller than Coxiella burnetii strain CbuK_Q154, with 2.0
Mb. However, D. massiliensis had more metabolism-related genes
(501 genes) than Rickettsiella grylli (360) and Coxiella burnetii (459); it
also had more genes involved in energy production and conversion
(109 versus 75 and 84, respectively) and more genes involved in
translation, ribosomal structure, and biogenesis (170 versus 134 and 135,
respectively).
69
70
71
72
73
Chapter 4
Comparative genomics
74
75
4.1 Article 2 :
The genomic repertoire of Diplorickettsia massiliensis
reveals its allopatric lifestyle
Mano J. Mathew1, Laetitia Rouli1and Didier Raoult1*
1Unité de Recherche sur les Maladies Infectieuses et Tropicales
Emergentes: URMITE, Aix Marseille Université, UMR CNRS 7278,
IRD 198, INSERM 109, Faculté de Médecine, 27 Bd Jean Moulin,
13005, Marseille, France.
Submitted to Biology Direct
*Corresponding author. E-mail: didier.raoult@gmail.com
76
77
Preamble to article 2
In this study, we used a pangenomic approach to elucidate strain-specific
genes as well as genomic differences and similarities between
Diplorickettsia massiliensis strain 20B and twenty-nine sequenced
species, including Legionella strains, Coxiella burnetii strains, F. tularensis
strains and R. grylli. We conducted a global pangenome analysis with
these thirty genomes as well as individual pangenome sets belonging to
Coxiella, Legionella and Francisella. An individual pangenome was
constructed for the Coxiella genus using five sequenced Coxiella burnetii
reference strains, ten sequenced L. pneumophila strains and twelve
sequenced F. tularensis strains. Another pangenome set was constructed
from ten sequenced L. pneumophila strains and a single L. longbeachae
NSW150 strain. A single R. grylli genome and the D. massiliensis strain
20B genome were also included in the above-mentioned pangenome set.
We estimated the sizes of both the pangenome and the core genomes.
Based on these pangenomes, we described the distribution of functional
genes and gene families across the different genomes analyzed, and
specifically characterized the D. massiliensis strain 20B genome.
78
79
Title: The genomic repertoire of Diplorickettsia massiliensis reveals its
allopatric lifestyle
Running title: The genomic repertoire of Diplorickettsia massiliensis
reveals its allopatric lifestyle
Mano J. Mathew1, Laetitia Rouli1and Didier Raoult1*
1 Unité de Recherche sur les Maladies Infectieuses et Tropicales
Emergentes: URMITE, Aix Marseille Université, UMR CNRS 7278,
IRD 198, INSERM 109, Faculté de Médecine, 27 Bd Jean Moulin,
13005, Marseille, France.
Submitted to Biology Direct
*Corresponding author. E-mail: didier.raoult@gmail.com
80
Abstract Background Diplorickettsia massiliensis strain 20B is an obligate intracellular, gram-
negative bacterium isolated from Ixodes ricinus ticks collected in Slovakia.
In this study, we compared the genomic features of D. massiliensis strain
20B with twenty-nine sequenced Gammaproteobacteria species
(Legionella strains, Coxiella burnetii strains, Francisella tularensis strains
and Rickettsiella grylli) using multi-genus pangenomic approach.
Results Using phylogenomic analysis, we found that D. massiliensis shares 635
genes with Rickettsiella grylli and clusters with Coxiella burnetii. We
identified 908 genes (61.56%) in common with Gammaproteobacteria
that constitute the core genome of D. massiliensis and 518 genes
(35.12%) that represent the dispensable genome. We also identified a link
between total gene content and different bacterial lifestyles. We
observed that fewer genes and a lower G+C content correlated with a
smaller genome size and helped the bacteria to adapt to the host.
Because of the reduced genomic repertoire, we speculate that fewer
lateral gene transfers have occurred in D. massiliensis. A pangenomic
approach allowed us to explore the different strategies by which
facultative or obligate intracellular organisms specialize to particular host.
Conclusion These results significantly contribute to our understanding of genome
repertoires. This approach can be used to uncover interesting genomic
features that cannot be predicted using conventional methods.
Moreover, the variability that we identified between the L. pneumophila
strains and L. longbeachae NSW150 may warrant re-classifying them as
separate subspecies.
Keywords: Genome repertoire; pangenome; Diplorickettsia; allopatric;
comparative genomics
81
Background
The order Legionellales is composed of several pathogenic, aerobic,
motile and nutritionally fastidious pleomorphic gram negative bacteria
from the class gammaproteobacteria. The order Legionellales is
composed of two families: Legionellaceae and Coxiellaceae. Many species
of Legionella cause legionellosis. The family Coxiellaceae consists of
Aquicella, Coxiella (an intracellular bacterium that is the causative agent
of Q fever) [1], Diplorickettsia and Rickettsiella (an intracellular parasite of
Gryllus bimaculatus) [2, 3]. Almost all bacteria isolated from ticks (Ixodes
ricinus) are pathogenic for humans, notably Borrelia burgdorferi, Borrelia
afzelii, Borrelia garinii, Rickettsia helvetica, Rickettsia monacensis and
Francisella tularensis [4]. F. tularensis, which causes tularemia or plague-
like disease, belongs to the Thiotrichales order [5].
D. massiliensis strain 20B is an obligate intracellular, gram negative
bacterium isolated from Ixodes ricinus ticks collected in 2006 from the
southeastern part of the Rovinka forest in Slovakia [3]. D. massiliensis
belongs to the Gammaproteobacteria class, is non-endospore-forming,
and is shaped as small rods that are usually grouped in pairs. An initial
phylogenetic analysis based on 16S rRNA showed that D. massiliensis
clustered with Rickettsiella grylli [2, 3]. Because of its low 16S rDNA
similarity (94%) with R. grylli, it was classified as a new genus
Diplorickettsia into the family Coxiellaceae and the order Legionellales [3].
D. massiliensis strain 20B was identified in three patients with suspected
tick-borne infections that exhibited a specific seroconversion. The
evidence of infection was further reconfirmed by using PCR-assay, thus
82
establishing its role as a human pathogen. Whole genome sequencing
was performed at a later date [6, 7].
Recent advances in next generation sequencing techniques have led
to the initiation of large-scale microbial genome projects [8]. Comparative
genomics studies use conventional non-sequence-based technologies
such as microarrays targeting genes or non-coding regions, studies of
specific pathways and whole genome sequence alignment [9]. Bacterial
strains from the same species may exhibit variations in their genetic
repertoire, with differences in both genomic structure and sequence
between strains, reflecting the extraordinary adaptability of prokaryotic
species. Thus, sequencing a single genome per species it is often
insufficient for describing the genetic variability of the species. This led to
the concept of a pangenomic approach, which takes into account the
genetic makeup of a bacterial species and its genomic diversity from
genus to genus. The pangenome of a bacterial species is larger than the
total gene content of any individual strain within the species.
The pangenome is composed of three parts: the core genome
(genes shared by all of the strains), the accessory or dispensable genome
(shared by only some of the strains) and unique genes (strain-specific)
[10]. The accessory genome can reveal evidence of lateral gene transfer
events that occurred during the evolutionary history of a strain and likely
contributed to the evolutionary potential of the organism. Furthermore, a
distinction can be made between closed pangenomes and open
pangenomes. A pangenome is closed when, despite the addition of new
genomes, the gene content remains unchanged, such as in Bacillus
83
anthracis [9, 10]. In contrast, a pangenome is open, as in the case of
Escherichia coli [11], if the gene pool increases with the addition of new
genomes. Pangenome studies can reveal changes that are not easily
detectable using standard annotation analysis [12]. For example,
pangenome studies have facilitated the identification of strain-specific
genes in L. pneumophila. The L. pneumophila dispensable genome,
acquired by horizontal gene transfer, may act as a reservoir that could
confer evolutionary advantages over strains that lack this gene pool [13].
These microbial pathogens exhibit a striking ability to adapt to new hosts,
antibiotics, and host immune systems [14].
In this study, we used a pangenomic approach to elucidate strain-
specific genes as well as genomic differences and similarities between
D. massiliensis strain 20B and twenty-nine sequenced species, including
Legionella strains, Coxiella burnetii strains, F. tularensis strains and
R. grylli. We conducted a global pangenome analysis with these thirty
genomes as well as individual pangenome sets belonging to Coxiella,
Legionella and Francisella. An individual pangenome was constructed for
the Coxiella genus using five sequenced Coxiella burnetii reference
strains, ten sequenced L. pneumophila strains and twelve sequenced
F. tularensis strains. Another pangenome set was constructed from ten
sequenced L. pneumophila strains and a single L. longbeachae NSW150
strain. A single R. grylli genome and the D. massiliensis strain 20B genome
were also included in the above-mentioned pangenome set. We
estimated the sizes of both the pangenome and the core genomes. Based
on these pangenomes, we described the distribution of functional genes
84
and gene families across the different genomes analyzed, and specifically
characterized the D. massiliensis strain 20B genome.
Results
Comparison of genomic features
The main features of the genomes analyzed here are summarized in
Table 1. The chromosomes from the thirty genomes compared in this
study range in size from 1.6 to 4.15 Mb and have a G+C content ranging
from 37.1 to 42.6%. The R. grylli genome is smaller than the
D. massiliensis strain 20B and F. tularensis genomes (1.6, 1.7 and 1.9 Mb,
respectively), and the L. longbeachae strain NSW150 genome (4.1 Mb) is
larger than those of other L. pneumophila strains. The number of protein
coding genes per genome within the various strains and species is
relatively similar, but the gene composition is much more variable. The
distribution of proteins by length among the organisms is shown in Figure
1. We compared the genomic and proteomic repertoires based on protein
length, genome size, and G+C content and found that D. massiliensis
strain 20B fell between the two groups (the first group being C. burnetii
and F. tularensis while the second group is L. pneumophila) but closer to
the former, which has more pathogenic proteins. The coding density of
these genomes ranges from 71.29% to 90.86%. In the Legionella species,
coding regions account for more than 85% of the genome. The number of
proteins associated with D. massiliensis strain 20B is much larger than in
R. grylli, F. tularensis and C. burnetii. Figure 2 summarizes the distribution
85
of G+C content (%) and genome size (Mb). The Kyoto Encyclopedia of
Genes and Genomes (KEGG) characteristics of the organisms are
summarized in Figure 3.
Pangenome analysis
Figure 4 summarizes the results from the individual pangenomes of
C. burnetii, Legionella, L. pneumophila, F. tularensis as well as the set of
all thirty genomes analyzed. These genomes are characterized by a fairly
high number of hypothetical proteins, for which annotation is still
incomplete. Genes belonging to the core and dispensable genomes have
been classified according to their predicted function based on COG and
KEGG categories for the respective pangenomes (Additional file 1). The
C. burnetii pangenome is closed, as we found a finite number of gene
clusters. The L. pneumophila pangenome is open (unlimited) because the
number of pangenome clusters and core genome clusters changed
depending on how many different genomes were included in the analysis.
The F. tularensis pangenome was on the borderline between being
considered an open or closed genome (Additional file 2).
The Coxiella burnetii pangenome
The C. burnetii pangenome consists of 6,871 CDS with 1,080 core genes
(92.04 %) and 491 dispensable genes (7.15 %) (Additional file 3). A total of
56 genes were specific to the C. burnetii CbuG_Q212 (6), C. burnetii
CbuK_Q154 (6), C. burnetii Dugway 5J108-111 (34), C. burnetii RSA 331 (9)
and C. burnetii RSA 493 (1) genomes. Notably, 70 out of these
86
491 accessory genes (14.25%) were hypothetical proteins. Of the 1,080
genes belonging to the core genome, 956 (88.6%) were attributed to a
COG category, and 510 (47.3%) were attributed to a KEGG category. In
the case of the 491 dispensable genes, 421 (85.7%) were assigned to a
COG category, and 185 (37.6%) were assigned to a KEGG category.
Using the COG database, we identified minor differences between the
compartments in the defense mechanisms (V) and intracellular
trafficking, secretion and vesicular transport (U) categories. Using the
KEGG database, we found that C. burnetii Dugway 5J108-111 contains a
greater number of CDSs involved in environmental information
processing and metabolism than other strains. The core genome
represented 92% of the pangenome (Additional file 2), showing again the
high rate of conservation.
The Legionellales pangenome
The Legionella pangenome consists of 23,736 CDSs with 1,410 core genes
(82.44 %) and a dispensable genome of 3791 CDSs (15.97 %) (Additional
file 3). A total of 378 genes were specific to the L. pneumophila str. Lens
(14), L. pneumophila str. Paris (20), L. pneumophila 2300/99 Alcoy (7), L.
pneumophila subsp. pneumophila HL06041035 (21), L. pneumophila
subsp. pneumophila str. Lorraine (8), L. pneumophila str. Corby (6), L.
pneumophila subsp. pneumophila str. Philadelphia 1 (3), L. pneumophila
subsp. pneumophila LPE509 (1) and L. pneumophila subsp. pneumophila
str. Thunder Bay (3) genomes. Of these 378 unique genes, 295 (78.04 %)
were present in L. longbeachae strain NSW150. Of the 1,410 genes
87
belonging to the core genome, 1,316 (93.4%) were attributed to a COG
category and 688 (48.8%) were attributed to a KEGG category. In the case
of the 3791 dispensable genes, 3273 (86.3%) were attributed to a COG
category and 1464 (38.6%) were attributed to a KEGG category.
We observed several differences in the CDSs from the Cell
wall/membrane/envelope biogenesis (M) COG category. Legionellales has
a greater number of CDSs involved in membrane transport and signal
transduction (based on KEGG categories), which is associated with
environmental information processing. In particular, L. longbeachae strain
NSW150 has a greater number of genes associated with energy
production and conservation (C), signal transduction (T), and defense
mechanisms (V) and fewer genes related to cell motility (N), based on
COG categories. Significant differences were observed in the number of
CDSs associated with cellular processes, particularly flagellar assembly,
which is important for cell motility and carbohydrate and energy
metabolism.
The Legionella pneumophila pangenome
The L. pneumophila pangenome consists of 21,459 CDSs with a core
genome of 1,572 genes (90.71 %) and a dispensable genome of 1881
CDSs (8.77 %) (Additional file 3). A total of 112 genes were specific to the
L. pneumophila str. Lens (20), L. pneumophila str. Paris (27),
L. pneumophila 2300/99 Alcoy (7), L. pneumophila subsp. pneumophila
HL06041035 (26), L. pneumophila subsp. pneumophila str. Lorraine (15),
L. pneumophila str. Corby (9), L. pneumophila subsp. pneumophila str.
88
Philadelphia 1 (4), L. pneumophila subsp. pneumophila LPE509 (1) and
L. pneumophila subsp. pneumophila str. Thunder Bay (3) genomes. Of the
1,572 genes belonging to the core genome, 1,465 (93.2 %) were
attributed to a COG category, and 760 (48.4 %) were attributed to a KEGG
category. In the case of the 1,881 dispensable genes, 1,524 (81%) were
attributed to a COG category, and 661 (35.14 %) were attributed to a
KEGG category.
We identified differences in the cell wall/membrane/envelope biogenesis
(M) category based on the CDSs for which a function could be identified
using the COG database. We found that greater number of CDSs are
involved in signal transduction (the bacterial secretion system and the
two-component system, which are associated with environmental
information processing) and translation (ribosomal elements that are
associated with genetic information processing). We did not observe any
differences in the cellular processes category.
The Francisella tularensis pangenome
The F. tularensis pangenome consists of 16,596 CDSs with a core of 1,010
genes (86.05 %) and a dispensable genome of 2297 CDSs (13.84 %)
(Additional file 3). A total of 18 genes were specific to the F. tularensis
subsp. holarctica F92 (1), F. tularensis subsp. holarctica LVS (1),
F. tularensis subsp. holarctica OSU18 (4), F. tularensis subsp. mediasiatica
FSC147 (3), F. tularensis subsp. tularensis NE061598 (4) and F. tularensis
subsp. tularensis WY96-3418 (5) genomes. Of the 1,010 genes belonging
to the core genome, 775 (76.8 %) were attributed to a COG category, and
89
415 (41.1 %) were attributed to a KEGG category. In the case of the 2297
dispensable genes, 1,881 (81.8 %) were attributed to a COG category, and
886 (38.5 %) were attributed to a KEGG category.
We observed greater number of CDSs involved in information storage and
processing (translation, ribosomal structure and biogenesis (J); and
replication, recombination and repair (L)) and metabolism (amino acid
transport metabolism (E), carbohydrate transport metabolism (G) and
inorganic ion transport metabolism (P)). We found that F. tularensis
subsp. holarctica F92, F. tularensis subsp. holarctica LVS and F. tularensis
subsp. holarctica FTNF002-00 have a greater number of CDSs involved in
replication, recombination and repair (L) compared to other F. tularensis
genomes.
The Gammaproteobacteria pangenome
The Gammaproteobacteria pangenome consists of 49,833 CDS with a
core of 627 genes (47.16 %) and a dispensable genome of 25,933 genes
(52.04 %) (Additional file 4, Figure 5). The organisms that share the
greatest number of core genes are as follows: 618 out of 627 in Legionella
strains, 617 genes in L. pneumophila strains, 578 genes in F. tularensis
strains and 580 genes in C. burnetii strains. The organisms that share the
greatest number of dispensable genes are as follows: 13,640 out of
25,933 in Legionella strains (52.6 %), 12,458 in L. pneumophila strains
(48.04 %), 8,048 in F. tularensis strains (31.03 %) and 3,268 in C. burnetii
strains (12.6 %). A total of 400 genes were specific to the C. burnetii (28),
F. tularensis (9), Legionella (272), L. pneumophila (62), R. grylli (42) and D.
90
massiliensis strain 20B (49) genomes. Of the 627 genes belonging to the
core genome, 594 (94.8 %) were attributed to a COG category, and 342
(54.6 %) were attributed to a KEGG category. Among the 25,933
dispensable genes, 22,402 (86.4 %) were attributed to a COG category,
and 10,484 (40.4 %) were attributed to a KEGG category.
In the core genome, we observed differences in the number of CDSs
involved in metabolism (based on COG categories), namely in energy
production and conversion (C), amino acid transport metabolism (E) and
coenzyme transport metabolism (H). In the dispensable genome, the
greatest number of CDSs was associated with amino acid transport
metabolism (E). A similar functional distribution was found in the set of
dispensable genes based on KEGG categories, in that a greater number of
CDSs were associated with metabolism categories but a lesser number
were associated with folding, sorting and degradation, glycan
biosynthesis metabolism, replication and repair and translation.
By analyzing 1,475 genes from D. massiliensis strain 20B using OrthoMCL,
we identified a core genome of 908 genes (61.56 %) and a dispensable
genome of 518 genes (35.12 %). The majority of the genes in the core
genome were associated with COG categories contributing to metabolism
(energy production and conversion (C) and coenzyme transport and
metabolism (H)) and information storage and processing (translation,
ribosomal structure and biogenesis (J); and replication, recombination
and repair (L)). Based on KEGG category assignments, a greater number of
CDSs in the core genome were associated with translation, cofactor and
vitamin metabolism, nucleotide metabolism and carbohydrate
91
metabolism. Genes associated with amino acid metabolism and
carbohydrate metabolism were highly represented among the
dispensable genes. Of the 49 unique genes identified, 15 encoded
hypothetical proteins. Some specific genes were identified, including
PhoPQ-activated pathogenicity-related protein, dehydrogenases, SAM-
dependent methyltransferases, galactose mutarotase and others
(Additional file 5). Based on KEGG categories, these unique genes were
associated with metabolism, environmental information processing,
genetic information processing, two-component systems and sulfur relay
systems.
Phylogenomic analysis
A phylogenomic tree constructed based on gene content (i.e., the
presence or absence of protein-coding genes, as predicted by COG and
KEGG) showed different genome clustering than a whole genome tree
(Figure 6). In the phylogenomic tree constructed based on COG
classification, D. massiliensis strain 20B clustered with R. grylli and
clustered closely with C. burnetii strains. In contrast, in the tree
constructed based on KEGG classification, R. grylli formed a cluster with
the C. burnetii strains, and D. massiliensis strain 20B was not included in
this cluster. Based on all genes associated with cellular processes as
determined by KEGG classification, D. massiliensis strain 20B clustered
with four C. burnetii strains (C. burnetii CbuG_Q212, C. burnetii
CbuK_Q154, C. burnetii Dugway 5J108-111 and C. burnetii RSA 493).
Based on an analysis of COG categories, D. massiliensis strain 20B and
92
R. grylli clustered closely with C. burnetii strains, with the exception of
five COG categories. For cell cycle control, cell division, and chromosome
partitioning (D), nucleotide transport metabolism (F), coenzyme transport
metabolism (H), lipid transport metabolism (I) and secondary metabolite
biosynthesis, transport and catabolism (Q), D. massiliensis strain 20B and
R. grylli clustered with the F. tularensis strains.
Discussion
Pangenomic studies were described by Tettelin et al. in 2005 [15]. These
types of studies analyze bacterial species in detail using different criteria
and can determine whether the nature of the pangenome is open or
closed. C. burnetii, an obligate intracellular bacterium [1], has a closed
pangenome with a core/pangenome ratio of 92% (Additional file 2) and a
relatively constant set of core genes [30]. Another example of a
gammaproteobacterium with a closed pangenome is Buchnera aphidicola
[16], which has a core/pangenome ratio of 98%. In this study we analyzed
the facultative intracellular bacteria L. pneumophila and F. tularensis,
which have core/pangenome ratios of 82% and 87%, respectively.
Although their ratios were very close to the threshold of 89%, both of
these bacteria can be considered to have open pangenomes, unlike the
E. coli pangenome, which is infinite [11].
Our results show that the clubbed pangenome of D. massiliensis is
composed of 23,500 (47.1%) core genes, 13,399 (57%) genes shared by
Legionella and C. burnetii, 12,114 (51.5%) genes shared by C. burnetii and
F. tularensis, and 18,363 (78.1%) genes shared by Legionella and
93
F. tularensis. Moreover, based on the phylogenomic trees we
constructed, we conclude that D. massiliensis is more closely related to
R. grylli than to C. burnetii. D. massiliensis and R. grylli shared 635 genes
and clustered more often with C. burnetii. These results are in agreement
with Pearson et al. [17], who showed that R. grylli is one of the closest
known neighbors of C. burnetii. We also observed differences in lifestyle
among the species analyzed in this study.
Pangenomic studies elucidate the link between gene content and
bacterial lifestyles. An allopatric lifestyle is defined by a narrow ecological
niche with restricted opportunities for acquiring DNA from other
organisms. An allopatric lifestyle can be associated with genome
reduction, especially in pathogens that have smaller genomic repertoires
compared to less specialized bacteria [18], smaller pangenomes and
smaller mobilomes. In contrast, a sympatric lifestyle is associated with
larger genomes, larger pangenomes, a larger mobilome and more
frequent genomic exchanges with other bacteria. Moliner et al. [19, 20]
described two different types of intracellular lifestyles: allopatric bacteria
that are strictly intracellular bacteria and therefore live in narrow niches,
and sympatric bacteria such as Legionella spp. that live in amoebas where
DNA exchange can take place [19, 20]. The authors noted that
intracellular bacteria living in amoebas generally have a larger genome,
whereas other intracellular pathogens suffer from massive gene loss due
to specialization. D. massiliensis, R. grylli, C. burnetii and F. tularensis have
smaller genomes and exhibit losses of function compared to Legionella
species.
94
Based on the comparison of G+C content and genome size and previous
work by Merhej et al., we identified three distinct lifestyles:
D. massiliensis and R. grylli are extremely allopatric, C. burnetii and
F. tularensis are allopatric and have very little interaction with other
organisms, and Legionella are sympatric, as they live in amoebas. In
addition, we compared the gene losses and gains (based on COG
functional analysis) in the genomes analyzed in this study to those
analyzed by Merhej et al. [21]. We found that the more specialized a
bacterium is, the more genes it has related to transcriptional regulation
(K), defense mechanisms (V), inorganic ions (P), amino acid metabolism
(E) and less genes in translation (J). For all of these categories, we
observed a considerable difference between Legionella spp. (more
pronounced in L. longbeachae than in L. pneumophila strains) and the
other bacteria. Moreover, for each of these categories, D. massiliensis
and Rickettsiella grylli have fewer genes with an assigned COG function.
Based on KEGG classification, we also found that D. massiliensis and
Rickettsiella grylli show immense losses of genes related to amino acid
metabolisms. These results are in agreement with those obtained by
Merhej et al. These results allowed us to divide the species analyzed in
this study into three categories based on lifestyle. D. massiliensis and
R. grylli are extremely allopatric species with fewer functional genes (as
classified by COG), including a high loss of amino acid metabolism genes,
and less severe loss in genes related to translation and transcription. The
intermediate allopatric bacteria, C. burnetii and F. tularensis, have more
functional genes compared to the extremely allopatric species. Sympatric
95
bacteria such as Legionella, especially L. longbeachae, possess the
greatest number of functional genes (as classified by COG and KEGG)
compared to the other species analyzed in this study.
A comparative genomics-based analysis of free-living and host-
dependent bacteria showed that intracellular bacteria contain fewer
rRNA genes [21]. These bacterial genomes contained more split rRNA
operons and fewer transcriptional regulators than other bacteria, which
was linked to slower growth rates that are adaptive for their ecological
niche [21]. The deletion of inactivation of certain genes renders several
intracellular pathogens such as Shigella, Salmonella, and F. tularensis
pathogenic. These genes are referred to as antivirulence genes [22]. A
recent study in B. birtlesii identified a deletion in one of the two rRNA
operons and disrupted genes associated with translation that are
important for specialization to a specific niche [23]. The number of
activated genes in a restricted environment is much lower than in a
changing environment, as genes involved in translation are not expressed
extensively [23]. If bacteria do not typically express ribosomal operons in
their respective environments, then these operons are subject to loss
[23]. Bacterial specialization involves a striking degree of gene loss,
including decreased gene numbers, changes in G+C content and
decreased numbers of both incomplete and intact ribosomal operons [21,
24, 25]. Restricting translation is critical for specialization, as speciation is
often correlated with ribosomal operon inactivation [21, 23] and gene
inactivation.
96
Conclusion
This study of intracellular Gammaproteobacteria has contributed to our
understanding of bacterial specialization based on the ecological niche.
The genome size and gene content of the bacteria are associated with
lifestyle. A smaller number of genes and a relatively low G+C content
were observed in the genomes analyzed here, similar to other studies of
intracellular bacteria [18]. Gene loss resulting in a smaller genome size
has been a driving force in the adaptation of these bacteria to their hosts.
Due to the reduction in the genomic repertoire, we speculate that fewer
lateral gene transfers occur in D. massiliensis compared to other
intracellular bacteria [26]. We used a multi-genus pangenomic approach
to characterize the genomic repertoire of representative strains and
compare the distribution of genes in D. massiliensis strain 20B with other
genomes. We found that majority of the genes in D. massiliensis strain
20B were shared with other gammaproteobacteria. A pangenomic
approach facilitates the exploration of different strategies by which
facultative or obligate intracellular bacteria adapt to particular hosts and
contributes significantly to our understanding of genome repertoires. This
approach can be used to uncover unique genomic features that cannot be
predicted by conventional methods. Moreover, our results suggest that
the Legionella strains could be re-classified based on their genomic
variability.
97
Methods
Determination of genomic data
For the genomic comparison, we used thirty sequenced species including
five C. burnetii strains, ten L. pneumophila strains, L. longbeachae strain
NSW150 [27], twelve F. tularensis strains, Rickettsiella grylli and
D. massiliensis strain 20B from the Gammaproteobacteria class. The
information related to genome properties (genome size, coding regions,
G+C content, total number of genes, RNA-coding genes, protein-coding
genes, genes with a predicted function, genes assigned to Clusters of
Orthologous Groups of proteins (COGs), genes with peptide signals and
genes with transmembrane helices) was retrieved from NCBI
(ftp://ftp.ncbi.nih.gov/genomes/Bacteria/) and IMG/ER [28]
(https://img.jgi.doe.gov) (Table 1). Open reading frames (ORFs) were
predicted for the draft genome using Prodigal [29] with default
parameters, but the predicted ORFs were excluded if they spanned a
sequencing gap. The predicted bacterial protein sequences were searched
against the GenBank database [30] and the Clusters of Orthologous
Groups (COG) database using BLASTP (E-value 10-5
and coverage ≥ 70%).
Pangenome analysis
All of the CDSs from each genome were pooled together and clustered
using OrthoMCL [32] using the following parameters: an overlap of at
least 70% and a minimum of 80% similarity. Only protein sequences
longer than 50 amino acids were considered for further analysis.
Homologous sequences were selected using the all-against-all BLASTp
algorithm with an E value of less than 0.00001. Then, the orthologous
98
sequences clustering was analyzed using the Markov Cluster algorithm,
which is based on probability and graph flow theory and allows the
simultaneous classification of global relationships in a similarity space
[32]. An inflation index of 1.5 was used to regulate cluster tightness
(granularity), and the resulting clustered ortholog groups were analyzed
further. Several Perl/Python scripts were compiled in our laboratory for
massive data handling, namely for the calculation of core set (shared
among all strains), dispensable set (shared between at least two) and
unique set (organism-specific) genes from the OrthoMCL results.
Functional annotation was derived using WebMGA [33] against the
Cluster of Orthologous Groups [34] and the Kyoto Encyclopedia of Genes
and Genomes [35].
Genome alignment and gene content-based phylogenomics
Using MAUVE [36], the backbone output file generated after global
genome alignment was used to calculate the composition of core
distribution depending on the pangenome size [37]. This
core/pangenome ratio is used to determinate if a pangenome is open or
closed. The gene content of the genomes was classified based on twenty-
five functional COG categories and was used to construct phylogenomic
trees. The gene content was converted to a matrix of discrete binary
characters ("0" and "1" for absence and presence, respectively) [38] and
used to construct the matrix for Euclidean distances between pairs of
points. The MEV (MultiExperiment Viewer) [39] was used to represent the
99
results visually. The G+C content and COG data were compared with
previous work performed by Merhej et al. [21].
List of abbreviations
PCR: polymerase chain reaction; KEGG: Kyoto Encyclopedia of Genes and
Genomes; COG: clusters of orthologous groups; CDS: coding sequences
Competing interests and funding
The authors declare that they have no competing interests.
Authors' contributions
DR designed the research project. MJM performed the genomic analysis.
MJM and DR analyzed the data. MJM and LR wrote the paper. DR revised
the paper. All authors read and approved the final version.
Acknowledgements
We would like to thank Roshan Padmanabhan for technical support,
suggestions, corrections and Ripsy Merrin Chacko for helpful remarks.
100
References
1. Beare PA, Unsworth N, Andoh M, Voth DE, Omsland A, Gilk SD, Williams KP,
Sobral BW, Kupko JJ, 3rd, Porcella SF, et al: Comparative genomics reveal
extensive transposon-mediated genomic plasticity and diversity among
potential effector proteins within the genus Coxiella. Infection and immunity
2009, 77:642-656.
2. Roux V, Bergoin M, Lamaze N, Raoult D: Reassessment of the taxonomic
position of Rickettsiella grylli. International journal of systematic bacteriology
1997, 47:1255-1257.
3. Mediannikov O, Sekeyova Z, Birg ML, Raoult D: A novel obligate intracellular
gamma-proteobacterium associated with ixodid ticks, Diplorickettsia
massiliensis, Gen. Nov., Sp. Nov. PloS one 2010, 5:e11478.
4. Parola P, Raoult D: Ticks and tickborne bacterial diseases in humans: an
emerging infectious threat. Clinical infectious diseases : an official publication
of the Infectious Diseases Society of America 2001, 32:897-928.
5. Beckstrom-Sternberg SM, Auerbach RK, Godbole S, Pearson JV, Beckstrom-
Sternberg JS, Deng Z, Munk C, Kubota K, Zhou Y, Bruce D, et al: Complete
genomic characterization of a pathogenic A.II strain of Francisella tularensis
subspecies tularensis. PloS one 2007, 2:e947.
6. Mathew MJ, Subramanian G, Nguyen TT, Robert C, Mediannikov O, Fournier
PE, Raoult D: Genome sequence of Diplorickettsia massiliensis, an emerging
Ixodes ricinus-associated human pathogen. Journal of bacteriology 2012,
194:3287.
7. Subramanian G, Mediannikov O, Angelakis E, Socolovschi C, Kaplanski G,
Martzolff L, Raoult D: Diplorickettsia massiliensis as a human pathogen.
European journal of clinical microbiology & infectious diseases : official
publication of the European Society of Clinical Microbiology 2012, 31:365-369.
8. Peterson J, Garges S, Giovanni M, McInnes P, Wang L, Schloss JA, Bonazzi V,
McEwen JE, Wetterstrand KA, Deal C, et al: The NIH Human Microbiome
Project. Genome research 2009, 19:2317-2323.
9. Hu B, Xie G, Lo CC, Starkenburg SR, Chain PS: Pathogen comparative
genomics in the next-generation sequencing era: genome alignments,
pangenomics and metagenomics. Briefings in functional genomics 2011,
10:322-333.
101
10. Medini D, Donati C, Tettelin H, Masignani V, Rappuoli R: The microbial pan-
genome. Current opinion in genetics & development 2005, 15:589-594.
11. Rasko DA, Rosovitz MJ, Myers GS, Mongodin EF, Fricke WF, Gajer P, Crabtree
J, Sebaihia M, Thomson NR, Chaudhuri R, et al: The pangenome structure of
Escherichia coli: comparative genomic analysis of E. coli commensal and
pathogenic isolates. Journal of bacteriology 2008, 190:6881-6893.
12. Rocha EP: Evolutionary patterns in prokaryotic genomes. Current opinion in
microbiology 2008, 11:454-460.
13. D'Auria G, Jimenez-Hernandez N, Peris-Bondia F, Moya A, Latorre A:
Legionella pneumophila pangenome reveals strain-specific virulence factors.
BMC Genomics 2010, 11:181.
14. Wren BW: Microbial genome analysis: insights into virulence, host
adaptation and evolution. Nature reviews Genetics 2000, 1:30-39.
15. Medini D, Donati C, Tettelin H, Masignani V, Rappuoli R: The microbial pan-
genome. Curr Opin Genet Dev 2005, 15:589-594.
16. Snipen L, Almoy T, Ussery DW: Microbial comparative pan-genomics using
binomial mixture models. BMC Genomics 2009, 10:385.
17. Pearson T, Hornstra HM, Sahl JW, Schaack S, Schupp JM, Beckstrom-Sternberg
SM, O'Neill MW, Priestley RA, Champion MD, Beckstrom-Sternberg JS, et al:
When Outgroups Fail; Phylogenomics of Rooting the Emerging Pathogen,
Coxiella burnetii. Syst Biol 2013, 62:752-762.
18. Georgiades K, Merhej V, El Karkouri K, Raoult D, Pontarotti P: Gene gain and
loss events in Rickettsia and Orientia species. Biol Direct 2011, 6:6.
19. Gimenez G, Bertelli C, Moliner C, Robert C, Raoult D, Fournier PE, Greub G:
Insight into cross-talk between intra-amoebal pathogens. BMC Genomics
2011, 12:542.
20. Moliner C, Fournier PE, Raoult D: Genome analysis of microorganisms living
in amoebae reveals a melting pot of evolution. FEMS Microbiol Rev 2010,
34:281-294.
21. Merhej V, Royer-Carenzi M, Pontarotti P, Raoult D: Massive comparative
genomic analysis reveals convergent evolution of specialized bacteria. Biol
Direct 2009, 4:13.
22. Bliven KA, Maurelli AT: Antivirulence genes: insights into pathogen evolution
through gene loss. Infection and immunity 2012, 80:4061-4070.
102
23. Rolain JM, Vayssier-Taussat M, Saisongkorh W, Merhej V, Gimenez G, Robert
C, Le Rhun D, Dehio C, Raoult D: Partial disruption of translational and
posttranslational machinery reshapes growth rates of Bartonella birtlesii.
MBio 2013, 4:e00115-00113.
24. Moran NA, Wernegreen JJ: Lifestyle evolution in symbiotic bacteria: insights
from genomics. Trends Ecol Evol 2000, 15:321-326.
25. Andersson JO, Andersson SG: Insights into the evolutionary process of
genome degradation. Current opinion in genetics & development 1999, 9:664-
671.
26. Audic S, Robert C, Campagna B, Parinello H, Claverie JM, Raoult D, Drancourt
M: Genome analysis of Minibacterium massiliensis highlights the convergent
evolution of water-living bacteria. PLoS Genet 2007, 3:e138.
27. Cazalet C, Gomez-Valero L, Rusniok C, Lomma M, Dervins-Ravault D, Newton
HJ, Sansom FM, Jarraud S, Zidane N, Ma L, et al: Analysis of the Legionella
longbeachae genome and transcriptome uncovers unique strategies to
cause Legionnaires' disease. PLoS Genet 2010, 6:e1000851.
28. Markowitz VM, Chen IM, Palaniappan K, Chu K, Szeto E, Grechkin Y, Ratner A,
Jacob B, Huang J, Williams P, et al: IMG: the Integrated Microbial Genomes
database and comparative analysis system. Nucleic acids research 2012,
40:D115-122.
29. Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ: Prodigal:
prokaryotic gene recognition and translation initiation site identification.
BMC bioinformatics 2010, 11:119.
30. Benson DA, Karsch-Mizrachi I, Clark K, Lipman DJ, Ostell J, Sayers EW:
GenBank. Nucleic acids research 2012, 40:D48-53.
31. Benson G: Tandem repeats finder: a program to analyze DNA sequences.
Nucleic acids research 1999, 27:573-580.
32. Li L, Stoeckert CJ, Jr., Roos DS: OrthoMCL: identification of ortholog groups
for eukaryotic genomes. Genome research 2003, 13:2178-2189.
33. Wu S, Zhu Z, Fu L, Niu B, Li W: WebMGA: a customizable web server for fast
metagenomic sequence analysis. BMC Genomics 2011, 12:444.
34. Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov
DM, Mazumder R, Mekhedov SL, Nikolskaya AN, et al: The COG database: an
updated version includes eukaryotes. BMC bioinformatics 2003, 4:41.
103
35. Moriya Y, Itoh M, Okuda S, Yoshizawa AC, Kanehisa M: KAAS: an automatic
genome annotation and pathway reconstruction server. Nucleic acids
research 2007, 35:W182-185.
36. Darling AE, Mau B, Perna NT: progressiveMauve: multiple genome alignment
with gene gain, loss and rearrangement. PloS one 2010, 5:e11147.
37. Sheppard SK, Didelot X, Jolley KA, Darling AE, Pascoe B, Meric G, Kelly DJ, Cody
A, Colles FM, Strachan NJ, et al: Progressive genome-wide introgression in
agricultural Campylobacter coli. Mol Ecol 2013, 22:1051-1064.
38. Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO: Assigning
protein functions by comparative genome analysis: protein phylogenetic
profiles. Proceedings of the National Academy of Sciences of the United States
of America 1999, 96:4285-4288.
39. Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M,
Currier T, Thiagarajan M, et al: TM4: a free, open-source system for
microarray data management and analysis. BioTechniques 2003, 34:374-378.
104
Figures legends
Figure 1: Protein sequence length distributions. All the organisms
represented in different colors and symbols.
Figure 2: The distribution of GC content (%), genomic size (Mb) is
represented in red and blue respectively
Figure 3: KEGGs characteristics according to categories for the organisms
considered in the analysis
Figure 4: Summarizes the comparison using pangenome analysis. The
results obtained from comparing the five complete genomes of
pathogenic C. burnetii strains, eleven complete genomes of pathogenic
Legionella strains, ten complete genomes of pathogenic L. pneumophila
strains, eleven complete genomes of pathogenic Francisella tularensis
strains and thirty complete genomes in relation to their
orthologs/accessory gene distribution. The middle circle represents the
number of Core functions and each petal corresponds to the number of
accessory functions.
Figure 5 - Distribution of the accessory functions in whole set (30
organisms). The middle circle represents the number of Core functions
and each petal corresponds to the number of accessory functions.
Figure 6 - Phylogenomics analysis based on COG and KEGG information,
clustering based on Euclidean distance method.
Figure 1
106
Figure 2
107
Figure 3
108
Figure 4
109
Figure 5
110
Figure 6
111
Organism Niche Chrs Plasmids
Size
(Mb) GC% Gene Protein
Coding
Density
Accession
Number PMID
Diplorickettsia massiliensis 20B Ticks 1 - 1.73 39.3 2333 2287 79.56 AJGC00000000 22628513
Rickettsiella grylli Insect 1 - 1.58 37.8 1557 1410 90.86 AAQJ02000001.1 5753287
Coxiella burnetii Dugway 5J108-111
T
ick
s+A
mo
eb
a
A
nim
als
1 1 2.21 42.3 2362 2045 82.04 NC_009727.1 19047403
Coxiella burnetii CbuG_Q212 1 - 2.01 42.6 2091 1866 77.55 NC_011527.1 19047403
Coxiella burnetii CbuK_Q154 1 1 2.1 42.6 2183 1942 77.69 NC_011528.1 19047403
Coxiella burnetii RSA 331 1 1 2.05 42.7 2278 1975 78.43 NC_010117.1 19047403
Coxiella burnetii RSA 493 1 1 2.03 42.6 2095 1847 78.00 NC_002971.3 19047403
Francisella tularensis subsp. tularensis SCHU S4 1 - 1.89 32.3 1852 1604 79.17 NC_006570.2 15640799
Francisella tularensis subsp. tularensis TIGB03 1 - 1.97 32.3 1850 1624 76.76 NC_016933.1 22535949
Francisella tularensis subsp. holarctica F92 1 - 1.89 32.2 1890 1842 80.55 NC_019537.1 23405342
Francisella tularensis subsp. holarctica FSC200 1 - 1.89 32.2 1810 1438 71.29 NC_019551.1 23209222
Francisella tularensis subsp. holarctica FTNF002-00 1 - 1.89 32.2 1887 1581 76.33 NC_009749.1 19756146
Francisella tularensis subsp. holarctica LVS 1 - 1.9 32.2 2020 1754 82.40 NC_007880.1 15780452
Francisella tularensis subsp. holarctica OSU18 1 - 1.9 32.2 1932 1555 74.62 NC_008369.1 16980500
Francisella tularensis subsp. mediasiatica FSC147 1 - 1.89 32.3 1750 1406 71.94 NC_010677.1 19521508
Francisella tularensis subsp. tularensis FSC198 1 - 1.89 32.3 1852 1605 79.21 NC_008245.1 17406676
Francisella tularensis subsp. tularensis NE061598 1 - 1.89 32.3 1888 1836 82.23 NC_017453.1 20140244
Francisella tularensis subsp. tularensis TI0902 1 - 1.89 32.3 1764 1544 76.58 NC_016937.1 22535949
Francisella tularensis subsp. tularensis WY96-3418 1 - 1.9 32.3 1872 1634 80.04 NC_009257.1 17895988
Legionella pneumophila subsp. pneumophila str. Philadelphia 1
Am
oe
ba
1 - 3.4 38.3 3003 2943 88.49 NC_002942.5 15448271
Legionella pneumophila str. Paris 1 1 3.64 38.4 3278 3166 87.19 NC_006368.1 15467720
Legionella pneumophila 2300/99 Alcoy 1 - 3.52 38.4 3243 3190 87.75 NC_014125.1 20236513
Legionella pneumophila str. Corby 1 - 3.58 38.5 3257 3204 87.06 NC_009494.2 17888731
Legionella pneumophila str. Lens 1 1 3.41 38.4 3058 2934 87.14 NC_006369.1 15467720
Legionella pneumophila subsp. pneumophila ATCC 43290 1 - 3.36 38.2 2981 2926 89.11 NC_016811.1 22374950
Legionella pneumophila subsp. pneumophila HL06041035 1 - 3.49 38.4 3184 3059 87.17 NC_018140.1 22044686
Legionella pneumophila subsp. pneumophila LPE509 1 1 3.51 38.3 3383 3331 88.66 NC_020521.1 23792742
Legionella pneumophila subsp. pneumophila str. Lorraine 1 1 3.62 38.4 3327 3221 87.48 NC_018139.1 22044686
Legionella pneumophila subsp. pneumophila str. Thunder Bay 1 - 3.46 38.2 3043 2998 88.04 NC_021350.1 23826259
Legionella longbeachae NSW150 1 1 4.15 37.1 3739 3470 84.73 NC_013861.1 20174605
Table 1- General characteristics of the organisms considered for the analysis
112
Additional file 1- Functional analysis. COGs, KEGGs distribution within the core and dispensable compartments.
113
Additional file 2- Pangenome of some proteobacteria summary. The % column corresponds to the core/pan-
genome ratio.
Species Genome
used
Niche Average
Genome
size
Pangenome
size (bp)
Core
genome
size (bp)
%
Salmonella enterica 20 Animals 4.8Mb 96520000 59960168 62
Campylobacter jejuni 14 Human, chicken 1.7MB 23720000 18122022 76
Helicobacter pylori 10 Human 1.6Mb 16370000 12849693 78
Haemophilus influenzae 9 Human 1.8Mb 17170000 13728166 80
Legionella pneumophila 10 Amoeba 3.4Mb 34548036 28477841 82
Francisella tularensis 13 Ticks, Amoeba 1.8Mb 24690000 21468663 87
Yersinia pestis 12 Rodents 4.7Mb 55015109 48947637 89
Coxiella burnetii 5 Animals 2Mb 6690114 6150819 92
Buchnera aphidicola 8 Aphid 0.6Mb 5133548 5033068 98
114
Additional file 3- Individual Pangenome summary based on OrthoMCL clustering.
Corresponding information regarding core, accessory and unique genes in the organisms studies
Organisms
Proteins
used by
Orthomcl
Core
genes
Accessory
Cluster
Core
Cluster
Accessory
genes
Unique
genes
Total
cluster
No
Group Core Accessory Unique
Pangenome Coxiella burnetii (5 genomes) 6871 1080 6431 6324 491 56 1290 1993 92 7.15 0.82
Coxiella burnetii CbuG_Q212 1359 1079 87 1264 89 6 1172 401 93 6.55 0.44
Coxiella burnetii CbuK_Q154 1394 1079 106 1279 109 6 1191 394 91.8 7.82 0.43
Coxiella burnetii Dugway 5J108-111 1414 1080 122 1256 124 34 1236 421 88.8 8.77 2.4
Coxiella burnetii RSA 331 1351 1076 66 1276 66 9 1151 313 94.5 4.89 0.67
Coxiella burnetii RSA 493 1353 1079 102 1249 103 1 1182 464 92.3 7.61 0.07
Pangenome Francisella (12 genomes) 16596 1010 280 14281 2297 18 1308 2329 86.1 13.84 0.11
Francisella tularensis subsp. holarctica F92 1557 1008 209 1315 241 1 1218 192 84.5 15.48 0.06
Francisella tularensis subsp. holarctica FSC200 1248 999 143 1104 144 0 1142 142 88.5 11.54 0
Francisella tularensis subsp. holarctica FTNF002-00 1362 998 164 1193 169 0 1115 165 87.6 12.41 0
Francisella tularensis subsp. holarctica LVS 1534 1009 224 1271 262 1 1234 197 82.9 17.08 0.07
Francisella tularensis subsp. holarctica OSU18 1299 1001 147 1147 148 4 1152 180 88.3 11.39 0.31
Francisella tularensis subsp. mediasiatica FSC147 1243 1000 130 1109 131 3 1133 150 89.2 10.54 0.24
Francisella tularensis subsp. tularensis FSC198 1369 1009 188 1181 188 0 1197 236 86.3 13.73 0
Francisella tularensis subsp. tularensis NE061598 1512 1010 227 1269 239 4 1241 215 83.9 15.81 0.26
Francisella tularensis subsp. tularensis SCHU S4 1368 1009 188 1180 188 0 1197 236 86.3 13.74 0
Francisella tularensis subsp. tularensis TI0902 1322 1007 196 1126 196 0 1203 209 85.2 14.83 0
Francisella tularensis subsp. tularensis TIGB03 1393 1007 197 1195 198 0 1204 216 85.8 14.21 0
Francisella tularensis subsp. tularensis WY96-3418 1291 1007 192 1191 193 5 1204 191 92.3 14.95 0.39
Pangenome Legionella (11 genomes) 23736 1410 570 19567 3791 378 2358 1078 82.4 15.97 1.59
L_ longbeachae NSW150 2277 1356 194 1724 258 295 1845 107 75.7 11.33 12.96
L_pneumo_ 2300_99Alcoy 2177 1404 376 1773 397 7 1787 101 81.4 18.24 0.32
L_pneumo_ pneumophila_ATCC43290 2108 1398 261 1763 333 0 1731 95 83.6 15.8 0
L_pneumo_ Pneumophila_HL 2178 1400 342 1799 358 21 1763 96 82.6 16.44 0.96
L_pneumo_ Pneumophila_Lorraine 2162 1399 339 1812 342 8 1746 100 83.8 15.82 0.37
L_pneumo_ str. Corby 2179 1401 382 1780 393 6 1789 106 81.7 18.04 0.28
L_pneumo_ subsp. pneumophila str193Philadelphia 2145 1408 337 1800 342 3 1748 97 83.9 15.94 0.14
L_pneumo_ subsp. pneumophila str576Philadelphia 2109 1404 320 1783 325 1 1725 99 84.5 15.41 0.05
L_pneumo_ Thunder Bay 2167 1407 343 1814 350 3 1753 102 83.7 16.15 0.14
115
L_ pneumophila str. Lens 2071 1393 319 1727 330 14 1726 89 83.4 15.93 0.68
L_ pneumophila str. Paris 2163 1398 353 1780 363 20 1772 86 82.3 16.78 0.92
Pangenome Legionella pneumophila (10 genomes) 21459 1572 346 19466 1881 112 2030 971 90.7 8.77 0.52
L_ pneumophila str. Lens 2071 1553 153 1888 163 20 1726 89 91.2 7.87 0.97
L_ pneumophila str. Paris 2163 1561 184 1942 194 27 1772 86 89.8 8.97 1.25
L_pneumo_ 2300_99Alcoy 2177 1565 215 1936 234 7 1787 101 88.9 10.75 0.32
L_pneumo_ pneumophila_ATCC43290 2108 1564 167 1937 171 0 1731 95 91.9 8.11 0
L_pneumo_ Pneumophila_HL 2178 1563 174 1962 190 26 1763 96 90.1 8.72 1.19
L_pneumo_ Pneumophila_Lorraine 2162 1562 169 1975 172 15 1746 100 91.4 7.96 0.69
L_pneumo_ str. Corby 2179 1564 216 1943 227 9 1789 106 89.2 10.42 0.41
L_pneumo_ subsp. pneumophila str193Philadelphia 2145 1570 174 1962 179 4 1748 97 91.5 8.34 0.19
L_pneumo_ subsp. pneumophila str576Philadelphia
LPE 2109 1566 158 1945 163 1 1725 99 92.2 7.73 0.05
L_pneumo_ Thunder Bay 2167 1569 181 1976 188 3 1753 102 91.2 8.68 0.14
116
Additional file 4- Whole set Pan-genome summary based on OrthoMCL clustering and corresponding
information regarding core, accessory and unique genes in the organisms studies
Organisms
Proteins
used by
Orthomcl
Core
genes
Accessory
Cluster
Core
Cluster
Accessory
genes
Unique
genes
Total
cluster
No
Group Core Accessory Unique
Whole (30 genomes) 49833 627 2102 23500 25933 400 3130 5886 47.2 52.04 0.8
Coxiella burnetii (5 genomes) 6871 580 682 3575 3268 28 1290 1993 52 47.56 0.41
Coxiella burnetii CbuG_Q212 1359 572 598 721 636 2 1172 401 53.1 46.8 0.15
Coxiella burnetii CbuK_Q154 1394 574 616 725 668 1 1191 394 52 47.92 0.07
Coxiella burnetii Dugway 5J108-111 1414 577 640 693 702 19 1236 421 49 49.65 1.34
Coxiella burnetii RSA 331 1351 572 574 733 613 5 1151 313 54.3 45.37 0.37
Coxiella burnetii RSA 493 1353 575 606 703 649 1 1182 464 52 47.97 0.07
Francisella tularensis (12 genomes) 16596 578 8048 8539 8048 9 1308 2329 51.5 48.49 0.05
Francisella tularensis subsp. holarctica F92 1557 573 644 823 733 1 1218 192 52.9 47.08 0.06
Francisella tularensis subsp. holarctica FSC200 1248 570 572 638 610 0 1142 142 51.1 48.88 0
Francisella tularensis subsp. holarctica FTNF002-00 1362 565 550 640 584 0 1115 165 47 42.88 0
Francisella tularensis subsp. holarctica LVS 1534 576 657 775 758 1 1234 197 50.5 49.41 0.07
Francisella tularensis subsp. holarctica OSU18 1299 568 582 675 622 2 1152 180 52 47.88 0.15
Francisella tularensis subsp. mediasiatica FSC147 1243 565 568 629 614 0 1133 150 50.6 49.4 0
Francisella tularensis subsp. tularensis FSC198 1369 573 624 708 661 0 1197 236 51.7 48.28 0
Francisella tularensis subsp. tularensis NE061598 1512 576 662 787 722 3 1241 215 52.1 47.75 0.2
Francisella tularensis subsp. tularensis SCHU S4 1368 573 624 707 661 0 1197 236 51.7 48.32 0
Francisella tularensis subsp. tularensis TI0902 1322 572 631 651 671 0 1203 209 49.2 50.76 0
Francisella tularensis subsp. tularensis TIGB03 1393 572 632 704 689 0 1204 216 50.5 49.46 0
Francisella tularensis subsp. tularensis WY96-3418 1291 573 629 711 676 2 1204 191 55.1 52.36 0.15
Legionella (11 genomes) 23736 618 1468 9824 13640 272 2358 1078 41.4 57.47 1.15
L_ longbeachae NSW150 2277 608 1027 886 1181 210 1845 107 38.9 51.87 9.22
L_ pneumophila str. Lens 2071 610 1108 863 1200 8 1726 89 41.7 57.94 0.39
L_ pneumophila str. Paris 2163 610 1148 899 1251 13 1772 86 41.6 57.84 0.6
L_pneumo_ 2300_99Alcoy 2177 613 1168 890 1281 6 1787 101 40.9 58.84 0.28
L_pneumo_ pneumophila_ATCC43290 2108 612 1119 877 1231 0 1731 95 41.6 58.4 0
L_pneumo_ Pneumophila_HL 2178 611 1135 888 1273 17 1763 96 40.8 58.45 0.78
L_pneumo_ Pneumophila_Lorraine 2162 613 1127 944 1212 6 1746 100 43.7 56.06 0.28
L_pneumo_ str. Corby 2179 612 1171 888 1285 6 1789 106 40.8 58.97 0.28
117
L_pneumo_ subsp. pneumophila str193Philadelphia 2145 612 1133 898 1244 3 1748 97 41.9 58 0.14
L_pneumo_ subsp. pneumophila str576Philadelphia 2109 614 1110 896 1212 1 1725 99 42.5 57.47 0.05
L_pneumo_ Thunder Bay 2167 611 1140 895 1270 2 1753 102 41.3 58.61 0.09
Legionella pneumophila (10 genomes) 21458 617 1351 8938 12458 62 2030 971 41.7 58.06 0.29
L_ pneumophila str. Lens 2071 610 1108 863 1200 8 1726 89 41.7 57.94 0.39
L_ pneumophila str. Paris 2163 610 1148 899 1251 13 1772 86 41.6 57.84 0.6
L_pneumo_ 2300_99Alcoy 2177 613 1168 890 1281 6 1787 101 40.9 58.84 0.28
L_pneumo_ pneumophila_ATCC43290 2108 612 1119 877 1231 0 1731 95 41.6 58.4 0
L_pneumo_ Pneumophila_HL 2178 611 1135 888 1273 17 1763 96 40.8 58.45 0.78
L_pneumo_ Pneumophila_Lorraine 2162 613 1127 944 1212 6 1746 100 43.7 56.06 0.28
L_pneumo_ str. Corby 2179 612 1171 888 1285 6 1789 106 40.8 58.97 0.28
L_pneumo_ subsp. pneumophila str193Philadelphia 2145 612 1133 898 1244 3 1748 97 41.9 58 0.14
L_pneumo_ subsp. pneumophila str576Philadelphia 2109 614 1110 896 1212 1 1725 99 42.5 57.47 0.05
L_pneumo_ Thunder Bay 2167 611 1140 895 1270 2 1753 102 41.3 58.61 0.09
Rickettsiella grylli 1155 554 459 654 459 42 987 38 56.6 39.74 3.64
Diplorickettsia massiliensis 1475 546 518 908 518 49 970 48 61.6 35.12 3.32
118
Gene ID Cluster ID COG Functional describtion
12043713 OG5_126962 R hypothetical protein
12042690 OG5_127396 RTKL Serine/threonine protein kinase
12043131 OG5_127837 G Galactose mutarotase and related enzymes
12042957 OG5_129515 Q Probable taurine catabolism dioxygenase
12043183 OG5_131640 O Predicted redox protein, regulator of disulfide bond formation
12043061 OG5_131654 C Ferredoxin
12042314 OG5_132174 R FOG:Ankyrin repeat
12043224 OG5_133030 P 3'-Phosphoadenosine 5'-phosphosulfate (PAPS) 3'-phosphatase
12042814 OG5_134761 R FOG:Ankyrin repeat
12043977 OG5_136591 H SAM-dependent methyltransferases
12042324 OG5_136663 S Uncharacterized conserved protein
12043610 OG5_137437 T hypothetical protein
12042285 OG5_137732 R hypothetical protein
12042283 OG5_137790 IQR
Dehydrogenases with different specificities (related to short-chain alcohol
dehydrogenases)
12043203 OG5_138525 T
Response regulators consisting of a CheY-like receiver domain and a winged-helix
DNA-binding
12041993 OG5_141810 R PhoPQ-activated pathogenicity-related protein
12043260 OG5_142772 T FOG:CheY-like receiver
12043907 OG5_146647 R Soluble lytic murein transglycosylase and related regulatory proteins
12044061 OG5_146777 RTKL hypothetical protein
12043846 OG5_150445 S Uncharacterized protein conserved in bacteria
12044192 OG5_152528 T FOG:CheY-like receiver
12043032 OG5_153661 R hypothetical protein
12043044 OG5_156771 M Predicted choline kinase involved in LPS biosynthesis
12043182 OG5_158947 J contains the PP-loop ATPase domain
12043983 OG5_158957 T FOG:CheY-like receiver
12042792 OG5_164552 M UDP-glucose pyrophosphorylase
12043504 OG5_164798 S Uncharacterized protein conserved in bacteria
12042562 OG5_165570 D hypothetical protein
12042797 OG5_166276 M hypothetical protein
12043119 OG5_166572 R hypothetical protein
12043357 OG5_167967 R DNA primase (bacterial type)
12043647 OG5_170999 R Predicted periplasmic protein
12042005 OG5_171413 R hypothetical protein
12043834 OG5_172467 R hypothetical protein
12043019 OG5_175478 R Amino acid transporters
12043613 OG5_176228 H SAM-dependent methyltransferases
12043373 OG5_178450 TK DNA-binding HTH domain-containing proteins
12042871 OG5_178715 K Predicted nucleotide-binding protein containing TIR -like domain
12042514 OG5_181916 R hypothetical protein
12042318 OG5_185753 R hypothetical protein
12042907 OG5_191435 R hypothetical protein
12042962 OG5_204787 R hypothetical protein
12042193 OG5_211174 R hypothetical protein
12042682 OG5_211971 R hypothetical protein
12044129 OG5_215038 R hypothetical protein
12043970 OG5_228892 Q hypothetical protein
12043121 OG5_229846 R hypothetical protein
12043292 OG5_244660 R Uncharacterized protein conserved in bacteria
12044183 OG5_245288 R hypothetical protein
Additional file 5- Diplorickettsia massiliensis strain 20B description of unique genes
119
Chapter 5
Conclusions
120
121
5.1 Conclusions and perspectives
Based on an endosymbiotic origin for mitochondria and other eukaryotic
organelles, we believe that the intracellular culture is ancient and
constantly co-evolving with the host. Comparative analyses of bacterial
genomes from different lifestyles, including free-living and host-
dependent bacteria, show that host-dependent bacteria exhibit fewer
transcriptional regulators. Lamarckian evolution may have played a role in
bacterial speciation events associated with a reduction in the genome
size, an observation that contradicts the dominant model, which assumes
that speciation and fitness gain are linked with an increase in the gene
repertoire. Intracellular bacteria possess mechanisms to protect or to
invade host cells. The interactions between intracellular bacteria and host
cells are enabled by Type IV secretion systems (T4SSs). These systems are
required for bacterial colonization, invasion and persistence within the
niche and are supra-molecular transporters ancestrally related to bacterial
conjugation systems.
The study of intracellular Gammaproteobacteria has contributed to our
understanding of bacterial specialization based on the ecological niche.
The genome size and gene content of the bacteria are associated with
lifestyle. A smaller number of genes and a relatively low G+C content
were observed in the genomes analyzed here, similar to other studies of
intracellular bacteria (Georgiades, et al., 2011). Gene loss resulting in a
122
smaller genome size has been a driving force in the adaptation of these
bacteria to their hosts. Due to the reduction in the genomic repertoire,
we speculate that fewer lateral gene transfers occur in D. massiliensis
compared to other intracellular bacteria (Audic, et al., 2007). We used a
multi-genus pangenomic approach to characterize the genomic repertoire
of representative strains and compare the distribution of genes in
D. massiliensis strain 20B with other genomes. We found that majority of
the genes in D. massiliensis strain 20B were shared with other
gammaproteobacteria. A pangenomic approach facilitates the exploration
of different strategies by which facultative or obligate intracellular
bacteria adapt to particular hosts and contributes significantly to our
understanding of genome repertoires. This approach can be used to
uncover unique genomic features that cannot be predicted by
conventional methods. Moreover, our results suggest that the Legionella
strains could be re-classified based on their genomic variability. The
sequencing of additional intracellular bacterial genomes will enable the
acquisition of a more precise picture of the genetic properties associated
with the intracellular lifestyle. This effort will also contribute to a better
understanding of the interactions between intracellular bacteria and
different niches and the complex mechanisms implicated in pathogenicity.
123
5.2 Future perspectives
Current knowledge barely scratches the surface of the diversity of these
intracellular bacteria and the complex host associations. Genomic studies
have shifted from looking only at genes and protein coding sequences to
exploring the entire genome. It will be interesting to learn more about
the genomic repertoire of emerging intracellular bacterial pathogens
because of its adverse roles. Genomic analyses will provide a springboard
for phylogenomic profiling, pangenomics, transcriptomics and
proteomics, which will ultimately enable better understanding of how
intracellular bacteria exploit their environment, and help to elucidate the
mysteries of pathogenicity among pathogenic intracellular bacteria.
124
125
Bibliography
126
127
Amiri, H., C. M. Alsmark, et al. (2002). "Proliferation and deterioration
of Rickettsia palindromic elements." Molecular biology and
evolution 19(8): 1234-1243.
Andersson, J. O. and S. G. Andersson (1999). "Insights into the
evolutionary process of genome degradation." Curr Opin Genet
Dev 9(6): 664-671.
Andersson, S. G., C. Alsmark, et al. (2002). "Comparative genomics of
microbial pathogens and symbionts." Bioinformatics 18 Suppl 2:
S17.
Andrews, H. L., J. P. Vogel, et al. (1998). "Identification of linked
Legionella pneumophila genes essential for intracellular growth
and evasion of the endocytic pathway." Infection and immunity
66(3): 950-958.
Aravind, L., R. L. Tatusov, et al. (1998). "Evidence for massive gene
exchange between archaeal and bacterial hyperthermophiles."
Trends in genetics : TIG 14(11): 442-444.
Arneodo, J. D., A. Bressan, et al. (2008). "Ultrastructural detection of an
unusual intranuclear bacterium in Pentastiridius leporinus
(Hemiptera: Cixiidae)." Journal of invertebrate pathology 97(3):
310-313.
Audic, S., C. Robert, et al. (2007). "Genome analysis of Minibacterium
massiliensis highlights the convergent evolution of water-living
bacteria." PLoS Genet 3(8): e138.
Baldridge, G. D., N. Y. Burkhardt, et al. (2007). "Transposon insertion
reveals pRM, a plasmid of Rickettsia monacensis." Appl Environ
Microbiol 73(15): 4984-4995.
Banks, D. J., S. B. Beres, et al. (2002). "The fundamental contribution of
phages to GAS evolution, genome diversification and strain
emergence." Trends in microbiology 10(11): 515-521.
128
Beare, P. A., N. Unsworth, et al. (2009). "Comparative genomics reveal
extensive transposon-mediated genomic plasticity and diversity
among potential effector proteins within the genus Coxiella."
Infect Immun 77(2): 642-656.
Beckstrom-Sternberg, S. M., R. K. Auerbach, et al. (2007). "Complete
genomic characterization of a pathogenic A.II strain of Francisella
tularensis subspecies tularensis." PLoS One 2(9): e947.
Benson, D. A., I. Karsch-Mizrachi, et al. (2012). "GenBank." Nucleic Acids
Res 40(Database issue): D48-53.
Benson, G. (1999). "Tandem repeats finder: a program to analyze DNA
sequences." Nucleic Acids Res 27(2): 573-580.
Beranek, A., M. Zettl, et al. (2004). "Thirty-eight C-terminal amino acids
of the coupling protein TraD of the F-like conjugative resistance
plasmid R1 are required and sufficient to confer binding to the
substrate selector protein TraM." J Bacteriol 186(20): 6999-7006.
Berglund, E. C., A. C. Frank, et al. (2009). "Run-off replication of host-
adaptability genes is associated with gene transfer agents in the
genome of mouse-infecting Bartonella grahamii." PLoS genetics
5(7): e1000546.
Blanc, G., M. Ngwamidiba, et al. (2005). "Molecular evolution of
rickettsia surface antigens: evidence of positive selection."
Molecular biology and evolution 22(10): 2073-2083.
Blanc, G., H. Ogata, et al. (2007). "Lateral gene transfer between
obligate intracellular bacteria: evidence from the Rickettsia
massiliae genome." Genome research 17(11): 1657-1664.
Blanc, G., H. Ogata, et al. (2007). "Reductive genome evolution from the
mother of Rickettsia." PLoS genetics 3(1): e14.
Blatch, G. L. and M. Lassle (1999). "The tetratricopeptide repeat: a
structural motif mediating protein-protein interactions." BioEssays
: news and reviews in molecular, cellular and developmental
biology 21(11): 932-939.
129
Bliven, K. A. and A. T. Maurelli (2012). "Antivirulence genes: insights
into pathogen evolution through gene loss." Infect Immun 80(12):
4061-4070.
Bordenstein, S. R. and W. S. Reznikoff (2005). "Mobile DNA in obligate
intracellular bacteria." Nature reviews. Microbiology 3(9): 688-
699.
Bork, P. (1993). "Hundreds of ankyrin-like repeats in functionally diverse
proteins: mobile modules that cross phyla horizontally?" Proteins
17(4): 363-374.
Boyd, E. F. and H. Brussow (2002). "Common themes among
bacteriophage-encoded virulence factors and diversity among the
bacteriophages involved." Trends in microbiology 10(11): 521-529.
Boyd, E. F., B. M. Davis, et al. (2001). "Bacteriophage-bacteriophage
interactions in the evolution of pathogenic bacteria." Trends in
microbiology 9(3): 137-144.
Braeken, L., B. Van der Bruggen, et al. (2006). "Flux decline in
nanofiltration due to adsorption of dissolved organic compounds:
model prediction of time dependency." The journal of physical
chemistry. B 110(6): 2957-2962.
Burns, D. L. (2003). "Type IV transporters of pathogenic bacteria."
Current opinion in microbiology 6(1): 29-34.
Casadevall, A. (2008). "Evolution of intracellular pathogens." Annual
review of microbiology 62: 19-33.
Casjens, S. (2003). "Prophages and bacterial genomics: what have we
learned so far?" Molecular microbiology 49(2): 277-300.
Caturegli, P., K. M. Asanovich, et al. (2000). "ankA: an Ehrlichia
phagocytophila group gene encoding a cytoplasmic protein antigen
with ankyrin repeats." Infection and immunity 68(9): 5277-5283.
Cazalet, C., L. Gomez-Valero, et al. (2010). "Analysis of the Legionella
longbeachae genome and transcriptome uncovers unique
strategies to cause Legionnaires' disease." PLoS Genet 6(2):
e1000851.
130
Cazalet, C., C. Rusniok, et al. (2004). "Evidence in the Legionella
pneumophila genome for exploitation of host cell functions and
high genome plasticity." Nature genetics 36(11): 1165-1173.
Chen, I., P. J. Christie, et al. (2005). "The ins and outs of DNA transfer in
bacteria." Science 310(5753): 1456-1460.
Cho, N. H., H. R. Kim, et al. (2007). "The Orientia tsutsugamushi genome
reveals massive proliferation of conjugative type IV secretion
system and host-cell interaction genes." Proceedings of the
National Academy of Sciences of the United States of America
104(19): 7981-7986.
Christie, P. J. (2001). "Type IV secretion: intercellular transfer of
macromolecules by systems ancestrally related to conjugation
machines." Molecular microbiology 40(2): 294-305.
Christie, P. J. and J. P. Vogel (2000). "Bacterial type IV secretion:
conjugation systems adapted to deliver effector molecules to host
cells." Trends in microbiology 8(8): 354-360.
Claverie, J. M. and H. Ogata (2003). "The insertion of palindromic
repeats in the evolution of proteins." Trends in biochemical
sciences 28(2): 75-80.
Colson, P. and D. Raoult (2012). "Lamarckian evolution of the giant
Mimivirus in allopatric laboratory culture on amoebae." Frontiers
in cellular and infection microbiology 2: 91.
Corsaro, D., D. Venditti, et al. (1999). "Intracellular life." Critical reviews
in microbiology 25(1): 39-79.
D'Auria, G., N. Jimenez-Hernandez, et al. (2010). "Legionella
pneumophila pangenome reveals strain-specific virulence factors."
BMC genomics 11: 181.
Dai, L., N. Toor, et al. (2003). "Database for mobile group II introns."
Nucleic acids research 31(1): 424-426.
Darby, A. C., N. H. Cho, et al. (2007). "Intracellular pathogens go
extreme: genome evolution in the Rickettsiales." Trends in
genetics : TIG 23(10): 511-520.
131
Darling, A. E., B. Mau, et al. (2010). "progressiveMauve: multiple
genome alignment with gene gain, loss and rearrangement." PloS
one 5(6): e11147.
Degnan, P. H., A. B. Lazarus, et al. (2005). "Genome sequence of
Blochmannia pennsylvanicus indicates parallel evolutionary trends
among bacterial mutualists of insects." Genome research 15(8):
1023-1033.
Deng, W., L. Chen, et al. (1999). "VirE1 is a specific molecular chaperone
for the exported single-stranded-DNA-binding protein VirE2 in
Agrobacterium." Molecular microbiology 31(6): 1795-1807.
Douglas, A. E. (1989). "Mycetocyte symbiosis in insects." Biological
reviews of the Cambridge Philosophical Society 64(4): 409-434.
Dunning Hotopp, J. C., M. E. Clark, et al. (2007). "Widespread lateral
gene transfer from intracellular bacteria to multicellular
eukaryotes." Science 317(5845): 1753-1756.
Fares, M. A., A. Moya, et al. (2004). "GroEL and the maintenance of
bacterial endosymbiosis." Trends in genetics : TIG 20(9): 413-416.
Fares, M. A., M. X. Ruiz-Gonzalez, et al. (2002). "Endosymbiotic bacteria:
groEL buffers against deleterious mutations." Nature 417(6887):
398.
Felsheim, R. F., T. J. Kurtti, et al. (2009). "Genome sequence of the
endosymbiont Rickettsia peacockii and comparison with virulent
Rickettsia rickettsii: identification of virulence factors." PloS one
4(12): e8361.
Fernandez-Moreira, E., J. H. Helbig, et al. (2006). "Membrane vesicles
shed by Legionella pneumophila inhibit fusion of phagosomes with
lysosomes." Infection and immunity 74(6): 3285-3295.
Finlay, B. B. and S. Falkow (1997). "Common themes in microbial
pathogenicity revisited." Microbiology and molecular biology
reviews : MMBR 61(2): 136-169.
Fournier, P. E., K. El Karkouri, et al. (2009). "Analysis of the Rickettsia
africae genome reveals that virulence acquisition in Rickettsia
132
species may be explained by genome reduction." BMC genomics
10: 166.
Fournier, P. E., Y. Zhu, et al. (2004). "Use of highly variable intergenic
spacer sequences for multispacer typing of Rickettsia conorii
strains." Journal of clinical microbiology 42(12): 5757-5766.
Frank, A. C., H. Amiri, et al. (2002). "Genome deterioration: loss of
repeated sequences and accumulation of junk DNA." Genetica
115(1): 1-12.
Fraser-Liggett, C. M. (2005). "Insights on biology and evolution from
microbial genome sequencing." Genome research 15(12): 1603-
1610.
Friedland, J. S., R. J. Shattock, et al. (1993). "Phagocytosis of
Mycobacterium tuberculosis or particulate stimuli by human
monocytic cells induces equivalent monocyte chemotactic protein-
1 gene expression." Cytokine 5(2): 150-156.
Frost, L. S., R. Leplae, et al. (2005). "Mobile genetic elements: the agents
of open source evolution." Nature reviews. Microbiology 3(9): 722-
732.
Georgiades, K., M. A. Madoui, et al. (2011). "Phylogenomic analysis of
Odyssella thessalonicensis fortifies the common origin of
Rickettsiales, Pelagibacter ubique and Reclimonas americana
mitochondrion." PloS one 6(9): e24857.
Georgiades, K., V. Merhej, et al. (2011). "Gene gain and loss events in
Rickettsia and Orientia species." Biology direct 6: 6.
Georgiades, K. and D. Raoult (2010). "Defining pathogenic bacterial
species in the genomic era." Frontiers in microbiology 1: 151.
Georgiades, K. and D. Raoult (2011). "The rhizome of Reclinomonas
americana, Homo sapiens, Pediculus humanus and Saccharomyces
cerevisiae mitochondria." Biology direct 6: 55.
Gil, R., A. Latorre, et al. (2004). "Bacterial endosymbionts of insects:
insights from comparative genomics." Environmental microbiology
6(11): 1109-1122.
133
Gimenez, G., C. Bertelli, et al. (2011). "Insight into cross-talk between
intra-amoebal pathogens." BMC genomics 12: 542.
Gross, R., J. Hacker, et al. (2003). "The Leopoldina international
symposium on parasitism, commensalism and symbiosis--common
themes, different outcome." Molecular microbiology 47(6): 1749-
1758.
Hooper, S. D. and O. G. Berg (2003). "On the nature of gene innovation:
duplication patterns in microbial genomes." Molecular biology and
evolution 20(6): 945-954.
Horn, M., A. Collingro, et al. (2004). "Illuminating the evolutionary
history of chlamydiae." Science 304(5671): 728-730.
Hu, B., G. Xie, et al. (2011). "Pathogen comparative genomics in the
next-generation sequencing era: genome alignments, pangenomics
and metagenomics." Brief Funct Genomics 10(6): 322-333.
Hyatt, D., G. L. Chen, et al. (2010). "Prodigal: prokaryotic gene
recognition and translation initiation site identification." BMC
Bioinformatics 11: 119.
Klasson, L., Z. Kambris, et al. (2009). "Horizontal gene transfer between
Wolbachia and the mosquito Aedes aegypti." BMC genomics 10:
33.
Koonin, E. V. (2009). "Darwinian evolution in the light of genomics."
Nucleic acids research 37(4): 1011-1034.
Koonin, E. V. (2010). "The origin and early evolution of eukaryotes in the
light of phylogenomics." Genome biology 11(5): 209.
Koonin, E. V. and Y. I. Wolf (2008). "Genomics of bacteria and archaea:
the emerging dynamic view of the prokaryotic world." Nucleic
acids research 36(21): 6688-6719.
Labrador, M. and V. G. Corces (1997). "Transposable element-host
interactions: regulation of insertion and excision." Annu Rev Genet
31: 381-404.
Li, J., A. Mahajan, et al. (2006). "Ankyrin repeat: a unique motif
mediating protein-protein interactions." Biochemistry 45(51):
15168-15178.
134
Li, L., C. J. Stoeckert, Jr., et al. (2003). "OrthoMCL: identification of
ortholog groups for eukaryotic genomes." Genome Res 13(9):
2178-2189.
Lin, M., C. Zhang, et al. (2009). "Analysis of complete genome sequence
of Neorickettsia risticii: causative agent of Potomac horse fever."
Nucleic acids research 37(18): 6076-6091.
Lynn Margulis, R. F. (1991). Symbiosis as a Source of Evolutionary
Innovation: Speciation and Morphogenesis, The MIT Press.
Marco, D. (2008). "Metagenomics and the niche concept." Theory in
biosciences = Theorie in den Biowissenschaften 127(3): 241-247.
Margulis, L. (1971). "The origin of plant and animal cells." American
scientist 59(2): 230-235.
Margulis, L. (1971). "Symbiosis and evolution." Scientific American
225(2): 48-57.
Markowitz, V. M., I. M. Chen, et al. (2012). "IMG: the Integrated
Microbial Genomes database and comparative analysis system."
Nucleic Acids Res 40(Database issue): D115-122.
Mathew, M. J., G. Subramanian, et al. (2012). "Genome sequence of
Diplorickettsia massiliensis, an emerging Ixodes ricinus-associated
human pathogen." J Bacteriol 194(12): 3287.
Matthews, M. and C. R. Roy (2000). "Identification and subcellular
localization of the Legionella pneumophila IcmX protein: a factor
essential for establishment of a replicative organelle in eukaryotic
host cells." Infection and immunity 68(7): 3971-3982.
McCutcheon, J. P. and N. A. Moran (2007). "Parallel genomic evolution
and metabolic interdependence in an ancient symbiosis."
Proceedings of the National Academy of Sciences of the United
States of America 104(49): 19392-19397.
McCutcheon, J. P. and N. A. Moran (2012). "Extreme genome reduction
in symbiotic bacteria." Nature reviews. Microbiology 10(1): 13-26.
McNulty, S. N., J. M. Foster, et al. (2010). "Endosymbiont DNA in
endobacteria-free filarial nematodes indicates ancient horizontal
genetic transfer." PloS one 5(6): e11029.
135
Mediannikov, O., Z. Sekeyova, et al. (2010). "A novel obligate
intracellular gamma-proteobacterium associated with ixodid ticks,
Diplorickettsia massiliensis, Gen. Nov., Sp. Nov." PLoS One 5(7):
e11478.
Medini, D., C. Donati, et al. (2005). "The microbial pan-genome." Curr
Opin Genet Dev 15(6): 589-594.
Merhej, V., C. Notredame, et al. (2011). "The rhizome of life: the
sympatric Rickettsia felis paradigm demonstrates the random
transfer of DNA sequences." Molecular biology and evolution
28(11): 3213-3223.
Merhej, V. and D. Raoult (2011). "Rickettsial evolution in the light of
comparative genomics." Biological reviews of the Cambridge
Philosophical Society 86(2): 379-405.
Merhej, V., M. Royer-Carenzi, et al. (2009). "Massive comparative
genomic analysis reveals convergent evolution of specialized
bacteria." Biology direct 4: 13.
Miao, E. A. and S. I. Miller (1999). "Bacteriophages in the evolution of
pathogen-host interactions." Proceedings of the National Academy
of Sciences of the United States of America 96(17): 9452-9454.
Mira, A., H. Ochman, et al. (2001). "Deletional bias and the evolution of
bacterial genomes." Trends in genetics : TIG 17(10): 589-596.
Moliner, C., P. E. Fournier, et al. (2010). "Genome analysis of
microorganisms living in amoebae reveals a melting pot of
evolution." FEMS microbiology reviews 34(3): 281-294.
Moran, J. V., R. J. DeBerardinis, et al. (1999). "Exon shuffling by L1
retrotransposition." Science 283(5407): 1530-1534.
Moran, N. A. (1996). "Accelerated evolution and Muller's rachet in
endosymbiotic bacteria." Proceedings of the National Academy of
Sciences of the United States of America 93(7): 2873-2878.
Moran, N. A. (2002). "Microbial minimalism: genome reduction in
bacterial pathogens." Cell 108(5): 583-586.
Moran, N. A. and P. Baumann (2000). "Bacterial endosymbionts in
animals." Current opinion in microbiology 3(3): 270-275.
136
Moran, N. A., P. H. Degnan, et al. (2005). "The players in a mutualistic
symbiosis: insects, bacteria, viruses, and virulence genes."
Proceedings of the National Academy of Sciences of the United
States of America 102(47): 16919-16926.
Moran, N. A., H. E. Dunbar, et al. (2005). "Regulation of transcription in
a reduced bacterial genome: nutrient-provisioning genes of the
obligate symbiont Buchnera aphidicola." J Bacteriol 187(12): 4229-
4237.
Moran, N. A., J. P. McCutcheon, et al. (2008). "Genomics and Evolution
of Heritable Bacterial Symbionts." Annual Review of Genetics
42(1): 165-190.
Moran, N. A. and G. R. Plague (2004). "Genomic changes following host
restriction in bacteria." Current opinion in genetics & development
14(6): 627-633.
Moran, N. A. and J. J. Wernegreen (2000). "Lifestyle evolution in
symbiotic bacteria: insights from genomics." Trends in ecology &
evolution 15(8): 321-326.
Moriya, Y., M. Itoh, et al. (2007). "KAAS: an automatic genome
annotation and pathway reconstruction server." Nucleic Acids Res
35(Web Server issue): W182-185.
Mosavi, L. K., T. J. Cammett, et al. (2004). "The ankyrin repeat as
molecular architecture for protein recognition." Protein science : a
publication of the Protein Society 13(6): 1435-1448.
Nagai, H. and T. Kubori (2011). "Type IVB Secretion Systems of
Legionella and Other Gram-Negative Bacteria." Frontiers in
microbiology 2: 136.
Nakabachi, A., A. Yamashita, et al. (2006). "The 160-kilobase genome of
the bacterial endosymbiont Carsonella." Science 314(5797): 267.
Nora, T., M. Lomma, et al. (2009). "Molecular mimicry: an important
virulence strategy employed by Legionella pneumophila to subvert
host functions." Future microbiology 4(6): 691-701.
Ogata, H., S. Audic, et al. (2000). "Selfish DNA in protein-coding genes of
Rickettsia." Science 290(5490): 347-350.
137
Ogata, H., S. Audic, et al. (2001). "Mechanisms of evolution in Rickettsia
conorii and R. prowazekii." Science 293(5537): 2093-2098.
Ogata, H., B. La Scola, et al. (2006). "Genome sequence of Rickettsia
bellii illuminates the role of amoebae in gene exchanges between
intracellular pathogens." PLoS Genet 2(5): e76.
Ogata, H., P. Renesto, et al. (2005). "The genome sequence of Rickettsia
felis identifies the first putative conjugative plasmid in an obligate
intracellular parasite." PLoS biology 3(8): e248.
Ogata, H., C. Robert, et al. (2005). "Rickettsia felis, from culture to
genome sequencing." Annals of the New York Academy of Sciences
1063: 26-34.
Ohnishi, M., K. Kurokawa, et al. (2001). "Diversification of Escherichia
coli genomes: are bacteriophages the major contributors?" Trends
in microbiology 9(10): 481-485.
Parola, P. and D. Raoult (2001). "Ticks and tickborne bacterial diseases
in humans: an emerging infectious threat." Clin Infect Dis 32(6):
897-928.
Pearson, T., H. M. Hornstra, et al. (2013). "When Outgroups Fail;
Phylogenomics of Rooting the Emerging Pathogen, Coxiella
burnetii." Systematic biology 62(5): 752-762.
Pellegrini, M., E. M. Marcotte, et al. (1999). "Assigning protein functions
by comparative genome analysis: protein phylogenetic profiles."
Proc Natl Acad Sci U S A 96(8): 4285-4288.
Perez-Brocal, V., R. Gil, et al. (2006). "A small microbial genome: the end
of a long symbiotic relationship?" Science 314(5797): 312-313.
Peterson, J., S. Garges, et al. (2009). "The NIH Human Microbiome
Project." Genome Res 19(12): 2317-2323.
Pilsczek, F. H., A. Nicholson-Weller, et al. (2005). "Phagocytosis of
Salmonella montevideo by human neutrophils: immune adherence
increases phagocytosis, whereas the bacterial surface determines
the route of intracellular processing." The Journal of infectious
diseases 192(2): 200-209.
138
Rasko, D. A., M. J. Rosovitz, et al. (2008). "The pangenome structure of
Escherichia coli: comparative genomic analysis of E. coli
commensal and pathogenic isolates." J Bacteriol 190(20): 6881-
6893.
Renesto, P., H. Ogata, et al. (2005). "Some lessons from Rickettsia
genomics." FEMS microbiology reviews 29(1): 99-117.
Renvoise, A., V. Merhej, et al. (2011). "Intracellular Rickettsiales:
Insights into manipulators of eukaryotic cells." Trends in molecular
medicine 17(10): 573-583.
Rocha, E. P. (2003). "DNA repeats lead to the accelerated loss of gene
order in bacteria." Trends in genetics : TIG 19(11): 600-603.
Rocha, E. P. (2008). "Evolutionary patterns in prokaryotic genomes."
Curr Opin Microbiol 11(5): 454-460.
Rolain, J. M., M. Vayssier-Taussat, et al. (2013). "Partial disruption of
translational and posttranslational machinery reshapes growth
rates of Bartonella birtlesii." mBio 4(2): e00115-00113.
Roux, V., M. Bergoin, et al. (1997). "Reassessment of the taxonomic
position of Rickettsiella grylli." Int J Syst Bacteriol 47(4): 1255-
1257.
Rubtsov, A. M. and O. D. Lopina (2000). "Ankyrins." FEBS letters 482(1-
2): 1-5.
Saeed, A. I., V. Sharov, et al. (2003). "TM4: a free, open-source system
for microarray data management and analysis." Biotechniques
34(2): 374-378.
Saisongkorh, W., C. Robert, et al. (2010). "Evidence of transfer by
conjugation of type IV secretion system genes between Bartonella
species and Rhizobium radiobacter in amoeba." PloS one 5(9):
e12666.
Saridaki, A. and K. Bourtzis (2010). "Wolbachia: more than just a bug in
insects genitals." Current opinion in microbiology 13(1): 67-72.
Sassera, D., T. Beninati, et al. (2006). "'Candidatus Midichloria
mitochondrii', an endosymbiont of the tick Ixodes ricinus with a
139
unique intramitochondrial lifestyle." International journal of
systematic and evolutionary microbiology 56(Pt 11): 2535-2540.
Schandel, K. A., M. M. Muller, et al. (1992). "Localization of TraC, a
protein involved in assembly of the F conjugative pilus." J Bacteriol
174(11): 3800-3806.
Schmitz-Esser, S., N. Linka, et al. (2004). "ATP/ADP translocases: a
common feature of obligate intracellular amoebal symbionts
related to Chlamydiae and Rickettsiae." J Bacteriol 186(3): 683-
691.
Schroder, G. and E. Lanka (2003). "TraG-like proteins of type IV secretion
systems: functional dissection of the multiple activities of TraG
(RP4) and TrwB (R388)." J Bacteriol 185(15): 4371-4381.
Sheppard, S. K., X. Didelot, et al. (2013). "Progressive genome-wide
introgression in agricultural Campylobacter coli." Molecular
ecology 22(4): 1051-1064.
Shigenobu, S., H. Watanabe, et al. (2000). "Genome sequence of the
endocellular bacterial symbiont of aphids Buchnera sp. APS."
Nature 407(6800): 81-86.
Simek, K., J. Pernthaler, et al. (2001). "Changes in bacterial community
composition and dynamics and viral mortality rates associated
with enhanced flagellate grazing in a mesoeutrophic reservoir."
Appl Environ Microbiol 67(6): 2723-2733.
Simser, J. A., M. S. Rahman, et al. (2005). "A novel and naturally
occurring transposon, ISRpe1 in the Rickettsia peacockii genome
disrupting the rickA gene involved in actin-based motility."
Molecular microbiology 58(1): 71-79.
Snipen, L., T. Almoy, et al. (2009). "Microbial comparative pan-genomics
using binomial mixture models." BMC genomics 10: 385.
Stepkowski, T. and A. B. Legocki (2001). "Reduction of bacterial genome
size and expansion resulting from obligate intracellular lifestyle
and adaptation to soil habitat." Acta biochimica Polonica 48(2):
367-381.
140
Subramanian, G., O. Mediannikov, et al. (2012). "Diplorickettsia
massiliensis as a human pathogen." Eur J Clin Microbiol Infect Dis
31(3): 365-369.
Tamas, I., L. Klasson, et al. (2002). "50 million years of genomic stasis in
endosymbiotic bacteria." Science 296(5577): 2376-2379.
Tatusov, R. L., N. D. Fedorova, et al. (2003). "The COG database: an
updated version includes eukaryotes." BMC Bioinformatics 4: 41.
Toft, C. and S. G. Andersson (2010). "Evolutionary microbial genomics:
insights into bacterial host adaptation." Nature reviews. Genetics
11(7): 465-475.
van Belkum, A., S. Scherer, et al. (1998). "Short-sequence DNA repeats
in prokaryotic genomes." Microbiology and molecular biology
reviews : MMBR 62(2): 275-293.
van Ham, R. C., J. Kamerbeek, et al. (2003). "Reductive genome
evolution in Buchnera aphidicola." Proceedings of the National
Academy of Sciences of the United States of America 100(2): 581-
586.
Van Sluys, M. A., M. C. de Oliveira, et al. (2003). "Comparative analyses
of the complete genome sequences of Pierce's disease and citrus
variegated chlorosis strains of Xylella fastidiosa." J Bacteriol
185(3): 1018-1026.
Vogel, J. P., H. L. Andrews, et al. (1998). "Conjugative transfer by the
virulence system of Legionella pneumophila." Science 279(5352):
873-876.
von Dohlen, C. D., S. Kohler, et al. (2001). "Mealybug beta-
proteobacterial endosymbionts contain gamma-proteobacterial
symbionts." Nature 412(6845): 433-436.
Walsh, J. B. (1995). "How often do duplicated genes evolve new
functions?" Genetics 139(1): 421-428.
Wernegreen, J. J. (2002). "Genome evolution in bacterial endosymbionts
of insects." Nature reviews. Genetics 3(11): 850-861.
141
Wernegreen, J. J. (2005). "For better or worse: genomic consequences of
intracellular mutualism and parasitism." Current opinion in
genetics & development 15(6): 572-583.
Wernegreen, J. J., A. B. Lazarus, et al. (2002). "Small genome of
Candidatus Blochmannia, the bacterial endosymbiont of
Camponotus, implies irreversible specialization to an intracellular
lifestyle." Microbiology 148(Pt 8): 2551-2556.
Werren, J. H., L. Baldo, et al. (2008). "Wolbachia: master manipulators
of invertebrate biology." Nature reviews. Microbiology 6(10): 741-
751.
Whitman, W. B. (2009). "The modern concept of the procaryote." J
Bacteriol 191(7): 2000-2005; discussion 2006-2007.
Wilcox, J. L., H. E. Dunbar, et al. (2003). "Consequences of reductive
evolution for gene expression in an obligate endosymbiont."
Molecular microbiology 48(6): 1491-1500.
Wren, B. W. (2000). "Microbial genome analysis: insights into virulence,
host adaptation and evolution." Nat Rev Genet 1(1): 30-39.
Wu, M., L. V. Sun, et al. (2004). "Phylogenomics of the reproductive
parasite Wolbachia pipientis wMel: a streamlined genome overrun
by mobile genetic elements." PLoS biology 2(3): E69.
Wu, S., Z. Zhu, et al. (2011). "WebMGA: a customizable web server for
fast metagenomic sequence analysis." BMC genomics 12: 444.
Yang, F., J. Yang, et al. (2005). "Genome dynamics and diversity of
Shigella species, the etiologic agents of bacillary dysentery."
Nucleic acids research 33(19): 6445-6458.
Zientz, E., T. Dandekar, et al. (2004). "Metabolic interdependence of
obligate intracellular bacteria and their insect hosts." Microbiology
and molecular biology reviews : MMBR 68(4): 745-770.
142
143
Acknowledgements
I thank God for providing me patience, persistence and perspiration. I
thank all people who stood with me in completing my thesis. I would not
have been able to achieve my thesis without the help and support of
countless people over the past three years.
I must express my sincere gratitude to my guide and director Professor
Didier Raoult, for his constant suggestions and guidance. I also would like
to thank him for creating a scientific environment at URMITE to learn and
improve my skills and also I would like to thank him for providing me the
financial help (AP-HM) to make my life easier in France.
I am indeed thankful to the core bioinformatics team for helping me in
solving various technical issues. I express my hearty thanks to Ghislain,
Gregory, Fabrice and Olivier for their constant support.
I would like to thank the reviewers of my thesis, Prof. Jérôme ETIENNE
and Prof. Max MAURIN for their scientific advises and detailed review
during the preparation of my thesis. There sincere suggestions indeed
helped me to improve my thesis. I thank Prof. Jean-Louis MEGE for his
support and honoring me by acting as the president of my thesis jury.
My thesis completion would have been harder without these guys. I owe
a special thanks to Catherine Robert and her team, especially Thi-Tien
Nguyen for teaching me molecular biology techniques. I remember here
their time and patience. I am thankful to Francine Simula, Valerie Filosa
and Sylvain Buffet for their administrative support and their constant
help.
In the field of genomics, when I was naïve and lost, Roshan Padmanabhan
guided me with various skills and methods needed. He not only gave me
the feedbacks, but many a times, he helped me in understanding the
problems and also in writing the manuscripts. Without his guidance and
constant feed backs, this PhD would not have been achievable.
144
145
My friends in France, India especially (Sagar, Vishal, Sijo, Mayur) and
others who are in different parts of the world were my sources of
laughter, joy, happiness and support come from. I am happy that, my
friendships with you have extended well beyond our shared times. I owe
a special thanks to all those guys for keeping me determined.
I need to give a special sincere thanks to my wife Ripsy, who stood by me
both in my happy and difficult times. Last but not least, I would like to
express my sincere gratitude to my mother Susamma Mathew, father M J
Mathew, brother Mithun, Mummy Daisy Chacko, Papa K V Chacko,
brother in law Rohind, my grandfather and my grandmother for their
unconditional love, blessing and support. I dedicate my thesis to three
important persons, my mummy (who is my first teacher & my
inspiration), my wife (who is my better half) and my papa (who
encouraged).
Recommended