View
218
Download
1
Category
Preview:
Citation preview
Master'1'«Biologie,'Santé'»'Spécialité)Microbiologie)''''' ' ' ' '''''''''Rapport'de'Stage''
'!Clarisse!Lemonnier'
''
'Ecological'Study'of'Pelagic'Archaea'in'Marine'
Coastal'Waters''
Bioinformatical)approach)''!! !! !! !!!!!!! !!!!!
Maître!de!stage!:! Loïs!Maignien!Date!de!stage!! :!! Du!6!Janvier!au!28!Février!2014!Lieu!de!stage!! :! Laboratoire!de!Microbiologie!des!Environnements!!
Extrêmes!(LM2E)G!29280!Plouzané.!! !
! 2!
Table of Contents!!I.'INTRODUCTION'.....................................................................................................................................................'3!II.'MATERIALS'AND'METHODS'.............................................................................................................................'5!
A. SAMPLING!AND!DNA!EXTRACTION!........................................................................................................................................!5!B.!SEQUENCING!...............................................................................................................................................................................!6!C.!TREATMENT!OF!SEQUENCES!AND!ANALYSIS!..........................................................................................................................!7!1.)Bioinformatical)tools)............................................................................................................................................................)7!
i.!Software!.......................................................................................................................................................................................................................!7!ii. CAPARMOR.!..............................................................................................................................................................................................................!7!
2.)Treatment)of)sequences).......................................................................................................................................................)8!3.)Analysis)of)sequences)............................................................................................................................................................)9!
i.!alphaGdiversity!..........................................................................................................................................................................................................!9!ii.!betaGdiversity!...........................................................................................................................................................................................................!9!iii.!Oligotyping!.............................................................................................................................................................................................................!10!
III.'RESULTS'&'DISCUSSION'.................................................................................................................................'10!A.!TREATMENT!OF!SEQUENCES!..................................................................................................................................................!10!B.!DOMINANT!MGII,!MGI!AND!MGIII!.....................................................................................................................................!11!C)!SEASONAL!CORRELATION!......................................................................................................................................................!12!D)!OLIGOTYPING!..........................................................................................................................................................................!16!
IV.'CONCLUSION'.......................................................................................................................................................'19!BIBLIOGRAPHIE……………………………………………………………………………………………………………………….20'
ANNEXS'.......................................................................................................................................................................'23!!' Figures !Figure!1!:!Phylogeny)of)the)pelagic)archaeas!(DeLong,!2003)!......................................................................................!3!Figure!2!:!CAPARMOR server organization.!..........................................................................................................................!7!Figure!3:!Seasonal)variation)of)the)Cenarchaceae,)MGII)and)MGIII,)based)on)the)phylotype)analysis.!.......!12!Figure!4: Seasonal variation of OTU1 Nitrosopumilus, with the amount of NH4+, NO3- and NO2-.!.............!13!Figure!5!:!Seasonal variation of OTU1 and OTU18!..........................................................................................................!13!Figure!6!:!NMDS for all the sequences with different metadata. The distance index used is Bray.!...................!15!Figure!7!: Seasonal variation of the two oligotypes of Nitrosopumilus spp CCGGCCG and TCGACCG.!.....!17!Figure!8: Seasonal variation of the two oligotypes of Nitrosopumilus TCGGCCG and TGCGCCG!..............!17!Figure!9: NMDS with Bray indice for Nitrosopumilus spp OTUs and Oligotypes.!.................................................!18!Figure!10!:!NMDS with Bray indice for Nitrosopumilus spp OTUs and Oligotypes.!.............................................!19!!!
Tables !Table!1: Comparison of Nitrosopumilus spp. OTUs and Oligotypes contents depending on the abundance of the taxa unit!....................................................................................................................................................................!16!Table!2: Comparison of MGII OTUs and Oligotypes contents depending on the abundance of the taxa unit.!...................................................................................................................................................................!18!Table!3!: Comparison of MGIII OTUs and Oligotypes contents depending on the abundance of the taxa unit!....................................................................................................................................................................!19!!
! 3!
I. Introduction! Pelagic Archaeas
A characterization of the third domain of life, Archaea, has been one of the most important leaps
forward in microbiology these last years. C. Woese and G. Fox discovered it in 1977, thanks to the
onset of molecular phylogeny analysis (Woese & Fox, 1977). Those organisms appeared to adapt
incredibly well to extreme environments. This is why, for more than ten years, research was
focused on their metabolisms in some very restrictive places like hydrothermal vents or hypersaline
lakes. (Cowan, 1992)
Only in the early 1990s, with the development of culture-independent technics, two publications
from Delong and Fuhrman et al. revolutionized our vision of the Archaea. Using 16S rDNA based
analysis in marine coastal waters (Delong, 1992; (Fuhrman, McCallum & Davis, 1992), they indeed
demonstrated their presence in a large and temperate habitat: the ocean. Thereafter, numerous
publications have confirmed and strengthened this discovery, showing that pelagic Archaea are very
well represented in the ocean where they may compose, at least, more than 20% of the whole
marine picoplankton (Smith, 2001; DeLong & Pace, 2001) .
There are major stakes regarding these findings. The oceans are implicated in the global climate
throughout different fundamental biogeochemical processes, were microorganisms take a large part.
All the pelagic archaeas found for the moment can be grouped into 2 main phylums: the
Crenarchaeota with Marine group I, and the Euryarchaeota with three unclassified groups: Marine
Group II, III and IV. (Fig.1)
!Figure' 1':' Phylogeny) of) the) pelagic) archaeas!(DeLong,!2003)'
''''''''
'
The first one, Marine Group I (MGI), appeared to be widespread and very abundant, and must be
considered, according to some scientists, as the amplest microbial group in the ocean (Walker et al.,
2010). A rearrangement of its phylogeny as a new phylum, the Taumarchaeota, was even suggested
! 4!
(Brochier-Armanet et al., 2008) and is now well accepted by the scientific community (Pester,
Schleper & Wagner, 2011). A study of its abundance evolution in the water column shows that it
considerably increases from surface water to bathy-pelagic ocean, were it can reach 40% of the
whole bacterioplankton biomass (DeLong, 2003).
But the knowledge of those microorganisms remains poorly documented, for example, while we
explored the huge abundance of MG I all over the oceans, we had a very little idea of its ecological
role. In the 2000s, a breakthrough occurred, with the discovery of Amo-like gene attributed to MG I
(Treusch et al., 2005). The Amo gene encodes a subunit of the Ammonia monooxygenase. This
protein catalyzes the first step of nitrification e.g. the oxidation of ammonium into nitrites and
nitrates. May the marine Archaea play a role in the global azote cycle? A clear evidence has been
provided by the cultivation of a Taumarchaeota, Nitrosopumilus maritimus by Konneke et al. in
2005 (Könneke et al., 2005). This archaea was isolated from a tropical aquarium with high amount
of ammonium. By cultivating it, they showed that N. maritimus is a chemolithoautotrophe and in
fact carries the Amo-like gene. Ecological studies of this genus from different places – North Sea,
Antarctica, Barbara channel or Mediterranean coast of Spain – shows that a bloom occurs in winter
(Pitcher et al., 2011). Nitrosopumilus might find in this season a favorable competition for
Ammonium that is highly consumed in summer by the phytoplankton unlike during winter.
Moreover, experiences showed that N.maritimus has a great adaptation to low concentrations of
ammonium (Pitcher et al., 2011)
The other phylum found in the ocean is the Euryarchaeota, with Marine Group II (MG II), Marine
Group III (MG III), and Marine Group IV (MG IV). Unlike Marine Group I, these groups still do
not have any close relatives cultivated and are exclusively described through their 16S rRNA. MG
II is the most abundant group in marine surface water (Hugoni et al., 2013; Brown et al., 2009) but
its ecological behavior is not known yet. However, Vaughn Iverson and his team have recently tried
to reconstruct environmental genome of MG II thanks to metagenomics. By comparing its genes to
database to based on sequences homology, they hypothesized that MG II is a motile photo-
heterotroph that mainly focuses on protein and lipid degradation (Iverson et al., 2012). MG III is
particularly found in deep waters, as MG II we know very little about this group.
Importance of culture-independent surveys
Thus an important part of our knowledge about pelagic Archaeas –from their discovery to
hypothesis about their ecology– is mainly given by culture-independent surveys. It is a great
example to illustrate how this approaches are now required for phylogenetic, evolution and
ecological studies.
! 5!
One of the most widely used technic is the sequencing of the 16S rRNA This technic consists of the
extraction and sequencing of “all” the archaeal -or other microorganisms- 16S rRNA genes present
in an environment, and then, try to identify the organisms present at the time of sampling.
Moreover, microbial diversity studies based on this method are currently in a little revolution with
the use of next-generation sequencers that allows to sequences millions of reads in a single run of
one day. Those sequencers thus allow a deep sequencing, with a finest investigation of the diversity
of a sample and they also permit a complex analysis by sequencing different samples
simultaneously (Pinto & Raskin, 2012)
However, this technics have some limits, with at first, the sequencing errors that can’t be easily
discriminated from real sequences and the complexity of its analysis.
The use of bioinformatics has become essential to deal with the thousands or million of data
produced by those sequencers, but also to perform ecological and statistical studies. This technics
are in a current evolution and become a powerful tool to find out sequencing errors, and try to
approach the finest diversity of an environment and explore the uncultivated and unknown
biosphere.
Scope of my work
It is the aim of my internship to study pelagic Archaea diversity and ecology using bioinformatics
tools. In that way, I used as a basis a dataset already published from Hugoni et al. that gathers all
the sequences from a 3 years temporal survey of sea surface water in the bay of Banyuls-sur-mer
(Hugoni et al., 2013). I performed the usual α- and ß-diversity analysis to investigate which
Archaea occurs in this environment and gaining insight into their ecology by examinating their
temporal variations. I ended with the use of a new bioinformatics tool that allows a finer analysis:
Oligotyping. (Eren et al., 2013)
!
!
II. Materials and methods!
A. Sampling)and)DNA)extraction) According to Hugoni et al. publication, ten liters of water were collected monthly from March 2008
to June 2011, at 3 meters depth in the bay of Banyuls-sur-mer. Forty samples were thus obtained.
Archaeas’ treatment started within the two next hours, by different filtrations steps with 3 and 0,22
µm pore-sized Sterivex®. The filters were then stored at -80°C until nucleic acid extraction. This
! 6!
last one was performed directly in the Sterivex® cartridges by mechanical and chemical cell lysis,
using the AllPrep DNA/RNA kit. Both DNA and RNA was extracted, that made up to a total of 80
samples (Hugoni et al., 2013).
B.)Sequencing) Before sequencing, the genes of interest must be amplified by PCR. Basically, microbial diversity
studies target the 16S rRNA gene. This gene is present in the entire prokaryotic domain and
possesses both conservative and variable regions that enable to discriminate two very closed
species. Moreover, it also presents numerous database and one of the most important is NCBI.
Here, the V3-V5 hypervariable regions were sequenced. These regions are known to provide
phylogenetic results close to those obtained with the whole 16S rRNA sequence (Pinto & Raskin,
2012)
To perform the PCR, archaeal specific primers were used: Arch349F (GYG CAS CAG KCG MGA
AW) and Arch806R (GGA CTA CVS GGG TAT CTA AT). Arch349F is a degenerated primer that
actually consists of a combination of different primers, with few variations in their sequences. This
property is useful when it is requested to match the largest number of sequences.
The sequences were obtained by pyrosequencing Roche 454 GS-FLX system, a high throughput
sequencer that offers very interesting advantages for microbial diversity. The complex analysis with
sequencing different samples simultaneously is possible by the addition of barcodes, which are
short sequences specific to one sample, ligated to the primer. The reads can then be easily
identified, especially, as we shall see later, by the different bioinformatics software. This new
technology performs a sequencing based on the pyrosequencing technics giving roughly 1 million
of sequences within 10 hours.
Different files are obtained after sequencing:
- a fasta file with all the sequences and their identification.
- a qual file that assign a sequencing score to each base.
- a file with the primers and barcode sequences for each sample.
All those files were deposited online by Hugoni et col. in the following address:
http://datadryad.org.
)!!
! 7!
!
C.!Treatment!of!sequences!and!analysis!
1.!Bioinformatical!tools!
!!i.!Software!
Different softwares are available for scientists to treat the huge amount of data given by new
generation sequencers and perform sequences analysis.
In this study, we principally used the package Mothur (Schloss et al., 2009), a software specially
designed for microbial ecology analysis, based on the 16S rRNA gene sequences. It centralizes
different tools that permit to produce, from the raw sequences, complete α- and ß-diversity analysis,
within a total of 142 commands.
The second software Oligotyping, is very recent and allows a finer analysis among microorganisms
with closely related 16S rDNA sequences. We will describe it later.
We also used R, an important software for statistical computing and graphics.
The scripts we used to perform this analysis is given in annex (Annex 5)
ii. CAPARMOR.!
Those software manipulate large datasets, and traditional personal computers are quite limited in
power and RAM. Here we obtained an account on the IFREMER server, CAPARMOR to perform
our analysis. This server centralized 2600 processors, and is used for oceanographic and
meteorological simulations from different
laboratories like IFREMER, LEMAR, LOP.
Obviously, we didn’t use the entire possibilities of
this important server, but we only connected to a
little clustering of 32 processors called service8. (Fig.
2)
Figure'2':'CAPARMOR server organization. (source: IFREMER)'
! 8!
2.!Treatment!of!sequences! Quality filtering of sequences is a major step when performing sequences analysis. In fact, the PCR
and sequencing technic always generates errors that may impact the future analysis. It is a challenge
to target these artifacts, as we don’t know the originals sequences and how to distinguish them from
rare but true reads.
With the qual file, we can remove poor quality sequences that are then most likely to contain false
bases, insertion, or deletion. In the Mothur command trim(), we indicate a minimal quality score,
27, that was the one chosen by Hugoni et al. . In this command, we can also precise the number of
mistakes allowed in the primers and barcode sequences, one for both of them, and the number
maximal of homopolymers (i.e. the number of consecutive bases found in a sequence.), that is
basically 8.
It is during this step that we also remove the primers and barcode of all the sequences.
Another important bias, due to PCR this time, is the chimeras’ presence. Chimeras are a
combination of two or more different sequences that can be removed by comparing our sequences
with a reference database using the chimera.uchime() command in Mothur or by comparing reads
to the others within the samples (chimera.slayer() command).
For all those treatment, Mothur generates a file with the non-desired sequences that can then be
removed from the whole fasta file with the remove.seqs() command.
A crucial step is also the sequence alignment, where the regions of identity between our sequences
are found. It results in addition of gaps that are represented by “-“. The alignment can be done with
the sequences together, or with a reference database, that will also permit after an identification of
the sequences. Here we performed the alignment with the align.seqs() command using the reference
database greengenes gg_13_5_99 that contains 202421 archaeal and bacterial sequences, updated in
May 2013 (DeSantis et al., 2006).
Then the last step in sequence quality check, is to set the length of all our sequences in order to
compare them. This is done with screen.seqs(). In this command we precise the position of the first
and last nucleotide that we wanted for all our sequences. We thus have to find a compromise
between keeping the longest sequences, and the risk of loosing numerous reads, or keeping more
numerous sequences, at the expenses of phylogenetic resolution.
! 9!
The nucleotide positions are determined after the alignment step, with all the sequences and their
different length obtained after alignment. I then chose to cut and take sequences between 1917 and
2381 bp. After this treatment all our sequences were 325 bp long.
During this preparation of our sequences for analyzes, we didn’t manipulate all the sequences but
only what is called “unique sequences”: all the sequences exactly identical are grouped together and
only one of them is used for the rest of the analysis. That produce two files: one fasta file, with the
unique sequences and a name file with all the sequences associated to the unique sequence. This
manipulation aims to decrease the number of sequences to manipulates and then facilitates the
analysis. In this study we then pass from 340800 to 26755 sequences to be manipulate in the fasta
file.
3.!Analysis!of!sequences! Sequence analyzes will permit us to see the diversity of the Archaeas in one sample (alpha-
diversity) but also will provide statistical means to compare our samples together (ß-diversity).
i.!alphaBdiversity! To perform alpha-diversity we must pass by grouping our sequences, and this can in two ways: the
OTU (Operational Taxonomic Unit) and phylotypes. The first one consists of clustering our
sequences with 97% of similarity. This percentage is quite debated and is based on empirical
studies that fix the notion of species (Stackebrandt & Goebel, 1994). Phylotype clustering groups
the sequences that received exactly the same taxonomic affiliation when compared to already
known sequences from a reference database.
We can cluster the OTU with the cluster() command in Mothur after measuring the distance
between the sequences together with dist.seqs()
Phylotype are made based on the greengenes database classification with the classify.seqs()
command.
ii.!betaBdiversity!
According to Gauch : "Ordination primarily endeavors to represent sample and species relationships
as faithfully as possible in a low-dimensional space". (Gauch, 1982)
To perform ß-diversity, we produce statistical charts that represent all our samples in a two
dimensions graph. This step of converting a multiple variate data (numerous samples with
numerous taxa) with an important number of dimensions is called ordination. There are numerous
! 10!
ordination methods, and here we used the Nonmetric Multidimensional Scaling (NMDS), which
group the very close samples, but were the distance correlations are not fully respected. Our
samples are thus plotted according to their similarities in the different populations (OTUs).
The analysis starts with the creation of a data matrix that present the samples name in column and
the different OTUs in rows. This step is done with the make.shared() command in Mothur. And
then we apply different “diversity index” that will not give the same informations. We used Jaccard
coefficient that is a presence/non-presence index, or Bray index that is a weighted one and take into
account the different proportion of the OTUs in a sample. We then obtained a distance matrix which
is done with dist.shared() command.
The NMDS were produced with R the script is given in annex (annex 5).
iii.!Oligotyping! Oligotyping is a very new and interesting tool. It principal objectives aims to differentiate very
close sequences that can be differ by only one nucleotide, a high resolution not allowed by the
OTUs and phylotypes. Indeed, the phylotype resolution is quite limited by our actual knowledge of
the phylogeny, that it is particularly poor within the pelagic Archaea : MGII for example is only
defined at the family level. The OTUs groups sequences with 3% variation between them. The
variation between sequences is estimated by entropy analysis, to discriminate it from sequencing
errors.
Oligotyping is performed in already defined taxas, at the level of genus, or family, as it search very
fine variations between the sequences.
At the end of the analyses, we obtain different “oligotypes”, that are different taxonomic units.
III. Results & Discussion!
A.)Treatment)of)sequences)
The database primarily contained 438114 sequences and from this amount, 433 chimeras and 22%
of the sequences were removed after the filtering step. All the aligned sequences were trimmed to
325 bp. Only 174025 sequences were kept for the analysis, due to the finding of numerous bacterial
sequences that composed 48,9% of all the data. It is known that domain-specific primers always
! 11!
amplifies a more or less important fraction of other domain or genes (Baker, Smith & Cowan, 2003)
Those important non-specific amplification can be enhanced by degenerated primers.
Furthermore we eliminated 11 samples that did not contain enough sequence, the minimum limit
was fixed at 500 sequences.
B.)Dominant)MGII,)MGI)and)MGIII))
Phylotypes
A noticeable aspect when observing the diversity of the phylotypes in our samples, was that 99,6%
of them were grouped into three main groups: MGII, for the most abundant (102982 sequences),
MGI with the Cenarchaceae family (68905 sequences), and far behind, MGIII (1464 sequences).
The others anecdotal phylotype found were affiliated to the recently suggested phylum of
Pavarchaeota (Rinke et al.) or the class of Methanomicrobia in the Euryarchaeota phylum.
(Annex1)
This observation confirms the fact that MGII is an important group of the marine surface water.
In total nine phylotypes of more than 10 sequences were described. Four belong to the MGI
phylum, were only one family was found, subdivided into three genus Cenarchaeum,
Nitrosopumilus, and a non-classified one. If we have a look to the number of sequences, the most
abundant genus was clearly Nitrosopumilus that represents 52,755 on 68,905 MGI sequences.
(Annex1)
For MGII and MGIII, no order, or genus is described due to a lack in the previous investigations of
this group’s phylogeny, and then all our sequences were clustered into one phylotype for each, at
the family level.
And finally, the Parvarchaeota phylum represented two phylotypes and the Methanomicrobia, one
phylotype with 38 sequences only.
OTUs
In total, our sequences were clustered into 829 OTUs. Within this total, 485 contain only one
sequence; they probably result of sequencing errors. The most abundant OTU contains 50321
sequences but then the number of sequence per OTU sharply decreases as we can see in Annex 2.
To facilitate the analysis, we thus arbitrary delete the OTUs that contain less than 10 16S rDNA
sequence that lead the total to 52 OTUs.
By comparing our sequences with a reference database, we can have an idea of the different OTUs’
taxonomy.
! 12!
Thus 19 OTUs were associated to MGI. Within the 19 OTUs, 16 were affiliated to Nitrosopumilus
genus as the most abundant OTU (OTU1), 2 were affiliated to Cenarchaeum and one to the
unclassified genus. OTU1 grouped quite all the sequences (50,321 within a total of 68,905
sequences).
For MGII, 26 OTUs were affiliated to this group suggesting subdivisions in numerous taxa that
occurs in our sampling environment. Those OTUs were quite equal in the number of sequences and
suggests a far more diversify group than MGI.
And finally, we only found 2 OTUs for MGIII. This can be explained, as this group is mostly
abundant in deep water. This group was not found in all coastal water surveys (Herfort et al., 2007)
The last OTUs are affiliated to the Pavarchaeota phylum (3) and other anecdotal taxa.
)
C))Seasonal)correlation)
While studying the ecology of different taxa it is interesting to find out if there are different
variations of the relative abundance, correlated with the metadata. It would provide information for
theire ecological behaviors, and also to investigate if two OTUs could define two different species.
As this study is based on a temporal sampling, we represented the relative abundance of the three
main groups at the family level for each month. (Fig. 3) This graph clearly shows high variations
with the seasons: the relative abundance of MGI is extremely low during the summer (August to
October) but very important in winter (January to March). The switch in abundance between MGI
and MGII depending on the seasons, was already observed in Southern North Sea coastal waters
(Herfort et al., 2007)
Figure'3:'Seasonal)variation)of)the)Cenarchaceae,)MGII)and)MGIII,)based)on)the)phylotype)analysis.
! 13!
This graph is in relative abundance and precisely; those variations are mostly carried by
Nitrosopumilus. In our samples, this ammonia oxidizer presents a bloom in winter were it reaches
more than 50% of all the sequences, whereas in summer its population is near to zero.
By regarding the variations of OTU1, we can see that its variations are strongly associated to the
nitrate content. (Fig. 4)
'
'
'
'
'
'
'
Figure'4: Seasonal variation of OTU1 Nitrosopumilus, with the amount of NH4+, NO3- and NO2-.'
Nitrosopumilus spp. shows an implication in nitrification reactions, and reveals an important
ecotype of this species in sea surface waters. This results fit with those found in literature that
describe an ammonia oxidizer.
But surprisingly one OTU associated to Nitrosopumilus spp, OTU18, reaches a pic in summer
instead of winter. (Fig. 5) This observation leads to the hypothesis that there is another ecotype of
Nitrosopumilus spp that either is more abundant in summer or has a quite constant population
throughout the year. Indeed this apparent augmentation might be possible because of the fall of the
most abundant OTUs in this period. This specie would be then easily sampled.
!'
'
'
'
'
'
Figure'5':'Seasonal variation of OTU1 and OTU18 '
! 14!
MGII presents different profiles of seasons variations, it depend on the OTUs. Some OTUs
affiliated to this group are abundant in summer, others are abundant in winter, and the last ones
don’t present logical seasonal pattern. (Annex 3)
The MGIII OTUs all present the same variations with the seasons. (Annex4). Four peaks are found,
3 in November, and one in July.
ß-diversity
The NMDS were done with all the OTUs, for the different metadata. Each dot represents one
sample. They are grouped according to their diversity.
! 15!
Figure!6!:!NMDS with bray distance for all the sequences with different metadata.
! 16!
The NMDS confirms the importance of the seasons in the diversity. It shows three distinct patterns,
for each season. The winter and summer pattern are well separated and the spring pattern seems to
be more widespread and not as well defined. The others metadata that permits to figure out different
patterns are the water temperature, the oxygen, NO2, NO3 and Chlorophyll a content. This last
parameter offers a double influence on the pelagic Archaeas: it is known to have an inverse
correlation with Crenarchaeota abundance and in contrast to be positively correlated to MGII
abundance (Herfort et al., 2007).
D))Oligotyping)
Oligotyping was done in the Nitrosopumilus genus, MGII and MGIII family. In this way, specific
fasta of this three groups were extracted from the general fasta, according to the taxonomy.
To compare the results with the OTUs, we differentiated the abundance of the different species.
According to Hugoni et al. we distinguished the abundant and the rare biosphere. Abundant OTUs,
or oligotypes are those who represent in at least one sample, more than 1% of the sequences. The
rare biosphere groups those bellow 0,2%.
For Nitrosopumilus, 25 oligotypes were found, against 19 OTUs. As we can see in table 1, this
difference depends on the relative abundance of the taxa units. Oligotyping allows a highlighting of
both the rare and abundant OTUs.
Table'1: Comparison of Nitrosopumilus spp. OTUs and Oligotypes contents depending on the abundance of the taxa unit'
. If we observe more in detail those differences, we can see that the first OTU is actually split into 2
oligotypes, not in an equal proportion. The first oligotype remain extremely abundant with 48,475
sequences, while the second is only 1,474. Those two oligotypes present a very strong co-variation.
(Fig. 7)
!'
'
'
'
! Abundant!>1%! Rare!(0,231%)! Extremely!rare!(<0.2%)!OTUs! 2! 8! 4!Oligotypes! 6! 7! 12!
! 17!
'
'
!Figure'7': Seasonal variation of the two first oligotypes of Nitrosopumilus spp CCGGCCG and TCGACCG. '
We can try to identify those oligotypes with the Basic Local Alignment Search Tool (BLAST) in
the NCBI website. This tool compares our sequences with a huge database that gather different
environmental samples and identified sequences. The two first oligotypes both matched at 99% to
Candidatus Nitrosopumilus sp. PS0. This results leads to the hypothesis that they might be the two
different 16S rRNA’s operon of one species. Or also two very closed species with exactly the same
ecological behavior, as the Nitrosopumilus representatives that have their whole genome sequenced
only present one locus of the 16S rRNA gene (source: ncbi).
Two other abundant oligotypes either present a co-variation (Fig. 8)
'
'
'
'
'
'
Figure'8: Seasonal variation of the two oligotypes of Nitrosopumilus TCGGCCG and TGCGCCG '
Those two oligotypes presents, like the OTU 18, a pic of abundance in summer (May, June, July,
August). While we can hypothesis that those two oligotypes are the 2 operons of one species, the
BLAST hits reveals 2 different “identification”. Candidatus Nitrosopumilus sp. HCA1 (100%) for
TCGGCCG and Candidatus Giganthauma insulaporcus (100%) for TGCGCCG. Nevertheless, the
fasta used for this oligotype analysis only contained sequences classified as Nitrosopumilus
according to the taxonomy. This highlights that the reference database used isn’t well updated, for
all the Archaea described. It is a challenge for the Archaeal database to be the most complete, for
whose new taxa are again and again discovered thanks to research advances. Giganthauma
insulacorpus is a particular archaea first described in 2010 in the Guadeloupian archipelago (Muller
! 18!
et al., 2010). Furthermore, this Archaeon shared 97,7% similarities of Nitrosopumilus maritimus in
its ss 16S rRNA gene. It thus explain why this specie hasn’t been described with the OTU approach.
NMDS were done on both OTUs and Oligotype of Nitrospumilus.
Figure!9: NMDS with Bray index for Nitrosopumilus spp OTUs and Oligotypes.
The patterns between OTUs and Oligotyping are the same, with slighted more grouped samples for
oligotyping.
For MGII, the oligotyping reveals 41 oligotypes, instead of 26 OTUs.
Table'2: Comparison of MGII OTUs and Oligotype contents depending on the abundance of the taxa unit.!
Surprisingly, Oligotyping doesn’t observe new abundant taxonomic units. The OTUs already well
defined the different abundant species of MG-II.
The results for the rare biosphere are much more different: 4 OTUs against 11 oligotypes.
Column1! Abundant!>1%! rares!(0.231%)! !!!!!!!!Extremely!rares!(<2%)!OTUs! 17! 5! 4!Oligotypes! 18! 12! 11!
! 19!
NMDS
Figure!10!:!NMDS with Bray index for MGII OTUs and Oligotypes.
The pattern doesn’t change between the two methods.
And for MGIII:
Table'3': Comparison of MGIII OTUs and Oligotypes contents depending on the abundance of the taxa unit.'
Column1! Abundant!>1%! !!!!Rare!(0.231%)! Extremely!rare!(<2%)!OTUs! 1! 1! B!Oligotypes! 1! 1! 3!
The diversity of MGIII remains poor with the oligotype but it reveals 3 extremely rare taxa units. !
IV. Conclusion!
Pelagic archaeas are well represented in marine coastal waters, with two major groups, MGI and
MGII. MGI well demonstrate the ecological importance of those organisms with a clear implication
in nitrogen cycle carried by the genus Nitrosopumilus. Nevertheless we found here another
abundant ecotype of Nitrosopumilus that doesn’t seem to vary throughout the year.
MGII is the most abundant and diversified group in this environment. Unfortunately, it is also the
group that highlights the poor knowkedge about pelagic Archaea ecology and the complexity to
explore uncultivable taxa. Despite it dominance, nothing or little is known about these Archaea and
it is difficult to analyze the different ecotype.
Three different approaches were used in this study to explore the diversity of the Archaea in sea
surface waters from Mediterranean coasts. The most precise information was clearly given by the
! 20!
most recent one, Oligotyping. Its advantages are well demonstrated by the finding of Candidatus
Giganthauma insulaporcus, which wasn’t described by both the OTU and phylotype approachs. A
noticeable aspect of the use of Oligotyping in this environment was that it permitted to highlight the
rare biosphere in all the groups studied. This particular biosphere is only recently investigated
thanks to the deep coverage allowed by new generating sequencers, and leads to important
questions in ecology. (Hugoni et al., 2013)
This internship was for me a rewarding experience, as it was an immersion into bioinformatics.
Approaching ecological behaviors and diversity in an environment through thousands of reads, is a
quite fascinating but complicate exercise. It is important to be aware at each step of the
consequences of the manipulation in the results and afterwards, the analysis.
Bibliography!!G.C.!Baker,!J.J.!Smith!&!D.A.!Cowan;!2003.!Review!and!reGanalysis!of!domainGspecific!16S!
primers.!J.)Microbiol.)Methods,!vol.!55,!no!3,!p541–555.!
N.!Bano,!S.!Ruffin,!B.!Ransom!&!J.T.!Hollibaugh;!2004.!Phylogenetic!composition!of!Arctic!Ocean!archaeal!assemblages!and!comparison!with!Antarctic!assemblages.!Appl.)Environ.)Microbiol.,!vol.!70,!no!2,!p781–789.!
C.!BrochierGArmanet,!B.!Boussau,!S.!Gribaldo!&!P.!Forterre;!2008.!Mesophilic!crenarchaeota:!proposal!for!a!third!archaeal!phylum,!the!Thaumarchaeota.!Nat.)Rev.)Microbiol.,!vol.!6,!no!3,!p245–252.!
M.V.!Brown,!G.K.!Philip,!J.A.!Bunge,!M.C.!Smith,!A.!Bissett,!F.M.!Lauro,!J.A.!Fuhrman!&!S.P.!Donachie;!2009.!Microbial!community!structure!in!the!North!Pacific!ocean.!ISME)J.,!vol.!3,!no!12,!p1374–1386.!
D.A.!Cowan;!1992.!Biotechnology!of!the!Archaea.!Trends)Biotechnol.,!vol.!10,!p315–323.!
E.F.!DeLong;!2003.!Oceans!of!archaea.!ASM)NewsNAm.)Soc.)Microbiol.,!vol.!69,!no!10,!p503–510.!
E.F.!DeLong!&!N.R.!Pace;!2001.!Environmental!Diversity!of!Bacteria!and!Archaea.!Syst.)Biol.,!vol.!50,!no!4,!p470–478.!
T.Z.!DeSantis,!P.!Hugenholtz,!N.!Larsen,!M.!Rojas,!E.L.!Brodie,!K.!Keller,!T.!Huber,!D.!Dalevi,!P.!Hu!&!G.L.!Andersen;!2006.!Greengenes,!a!chimeraGchecked!16S!rRNA!gene!database!and!workbench!compatible!with!ARB.!Appl.)Environ.)Microbiol.,!vol.!72,!no!7,!p5069–5072.!
A.M.!Eren,!L.!Maignien,!W.J.!Sul,!L.G.!Murphy,!S.L.!Grim,!H.G.!Morrison!&!M.L.!Sogin;!2013.!Oligotyping:!differentiating!between!closely!related!microbial!taxa!using!16S!rRNA!gene!data.!Methods)Ecol.)Evol.,!vol.!4,!no!12,!p1111–1119.!
J.A.!Fuhrman,!K.!McCallum!&!A.A.!Davis;!1992.!Novel!major!archaebacterial!group!from!marine!plankton.!Nature,!vol.!356,!no!6365,!p148–149.!
! 21!
H.G.!Gauch;!1982.!Multivariate)analysis)in)community)ecology.)Cambridge)University)Press.)
L.)Herfort,)S.)Schouten,)B.)Abbas,)M.J.)Veldhuis,)M.J.)Coolen,)C.)Wuchter,)J.P.)Boon,)G.J.)Herndl)&)J.S.)Sinninghe)Damsté;)2007.)Variations)in)spatial)and)temporal)distribution)of)Archaea)in)the)North)Sea)in)relation)to)environmental)variables.)FEMS)Microbiol.)Ecol.,!vol.!62,!no!3,!p242–257.!
M.!Hugoni,!N.!Taib,!D.!Debroas,!I.!Domaizon,!I.J.!Dufournel,!G.!Bronner,!I.!Salter,!H.!Agogué,!I.!Mary!&!P.E.!Galand;!2013.!Structure!of!the!rare!archaeal!biosphere!and!seasonal!dynamics!of!active!ecotypes!in!surface!coastal!waters.!Proc.)Natl.)Acad.)Sci.,!vol.!110,!no!15,!p6004–6009.!
V.!Iverson,!R.M.!Morris,!C.D.!Frazar,!C.T.!Berthiaume,!R.L.!Morales!&!E.V.!Armbrust;!2012.!Untangling!genomes!from!metagenomes:!revealing!an!uncultured!class!of!marine!Euryarchaeota.!Science,!vol.!335,!no!6068,!p587–590.!
M.!Könneke,!A.E.!Bernhard,!R.!José,!C.B.!Walker,!J.B.!Waterbury!&!D.A.!Stahl;!2005.!Isolation!of!an!autotrophic!ammoniaGoxidizing!marine!archaeon.!Nature,!vol.!437,!no!7058,!p543–546.!
F.!Muller,!T.!Brissac,!N.!Le!Bris,!H.!Felbeck!&!O.!Gros;!2010.!First!description!of!giant!Archaea!(Thaumarchaeota)!associated!with!putative!bacterial!ectosymbionts!in!a!sulfidic!marine!habitat.!Environ.)Microbiol.,!vol.!12,!no!8,!p2371–2383.!
M.!Pester,!C.!Schleper!&!M.!Wagner;!2011.!The!Thaumarchaeota:!an!emerging!view!of!their!phylogeny!and!ecophysiology.!Curr.)Opin.)Microbiol.,!vol.!14,!no!3,!p300–306.!
A.J.!Pinto!&!L.!Raskin;!2012.!PCR!biases!distort!bacterial!and!archaeal!community!structure!in!pyrosequencing!datasets.!PLoS)One,!vol.!7,!no!8,!pe43093.!
A.!Pitcher,!C.!Wuchter,!K.!Siedenberg,!S.!Schouten!&!J.S.!Sinninghe!Damsté;!2011.!Crenarchaeol!tracks!winter!blooms!of!ammoniaGoxidizing!Thaumarchaeota!in!the!coastal!North!Sea.!Limnol.)Oceanogr.,!vol.!56,!no!6,!p2308–2318.!
P.D.!Schloss,!S.L.!Westcott,!T.!Ryabin,!J.R.!Hall,!M.!Hartmann,!E.B.!Hollister,!R.A.!Lesniewski,!B.B.!Oakley,!D.H.!Parks!&!C.J.!Robinson;!2009.!Introducing!mothur:!openGsource,!platformGindependent,!communityGsupported!software!for!describing!and!comparing!microbial!communities.!Appl.)Environ.)Microbiol.,!vol.!75,!no!23,!p7537–7541.!
D.C.!Smith;!2001.!Expansion!of!the!marine!Archaea.!Science,!vol.!293,!no!5527,!p56–57.!
E.!Stackebrandt!&!B.M.!Goebel;!1994.!Taxonomic!note:!a!place!for!DNAGDNA!reassociation!and!16S!rRNA!sequence!analysis!in!the!present!species!definition!in!bacteriology.!Int.)J.)Syst.)Bacteriol.,!vol.!44,!no!4,!p846–849.!
A.H.!Treusch,!S.!Leininger,!A.!Kletzin,!S.C.!Schuster,!H.GP.!Klenk!&!C.!Schleper;!2005.!Novel!genes!for!nitrite!reductase!and!AmoGrelated!proteins!indicate!a!role!of!uncultivated!mesophilic!crenarchaeota!in!nitrogen!cycling.!Environ.)Microbiol.,!vol.!7,!no!12,!p1985–1995.!
C.B.!Walker,!J.R.!De!La!Torre,!M.G.!Klotz,!H.!Urakawa,!N.!Pinel,!D.J.!Arp,!C.!BrochierGArmanet,!P.S.G.!Chain,!P.P.!Chan!&!A.!Gollabgir;!2010.!Nitrosopumilus!maritimus!genome!reveals!
! 22!
unique!mechanisms!for!nitrification!and!autotrophy!in!globally!distributed!marine!crenarchaea.!Proc.)Natl.)Acad.)Sci.,!vol.!107,!no!19,!p8818–8823.!
C.R.!Woese!&!G.E.!Fox;!1977.!Phylogenetic!structure!of!the!prokaryotic!domain:!The!primary!kingdoms.!Proc.)Natl.)Acad.)Sci.,!vol.!74,!no!11,!p5088–5090.!
!!!!) !
! 23!
ANNEXS! Annex!1!:!Taxonomy!of!the!different!sequences!obtained!with!the!gg_13_5_99 database!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! 24!
Annex 2 : Number of sequences per OTUs (OTU 40 contained 31 sequences)
Annex 3: MGII abundant OTUs seasonal variations.
Annex 4: MGIII OTUs seasonal variations
! 25!
Annex 5 : Scripts Mothur Script #Sequence treatment #HugoniB.fna #gg_13_5_99.fasta #HugoniB.fasta #gg_13_5_99.pds.tax #HugoniB.oligos #greengenes.align #HugoniB.qual summary.seqs(fasta=HugoniB.fna) #HugoniB.summary trim.seqs(fasta=HugoniB.fna,oligos=HugoniB.oligos, qfile=HugoniB.qual, maxambig=0, maxhomop=8, bdiffs=1, pdiffs=1, qwindowaverage=27, qwindowsize=50, processors=16) #HugoniB.trim.fasta #HugoniB.scrap.qual #HugoniB.trim.qual #HugoniB.scrap.fasta #HugoniB.groups summary.seqs(fasta=HugoniB.trim.fasta) #HugoniB.trim.summary unique.seqs(fasta=HugoniB.trim.fasta) #HugoniB.unique.fasta #HugoniB.trim.names summary.seqs(fasta=HugoniB.tri.fasta, name=HugoniB.trim.names) align.seqs(fasta=HugoniB.trim.unique.fasta, reference=greengenes.align, processors=16) #HugoniB.trim.unique.align #HugoniB.trim.unique.flip.accnos #HugoniB.trim.unique.align.report summary.seqs(fasta=HugoniB.trim.unique.align) #HugoniB.trim.unique.summary screen.seqs(fasta=HugoniB.trim.unique.align, name=HugoniB.trim.names, group=HugoniB.groups, start=1919, end=2065, minlength=200, processors=16) #HugoniB.trim.unique.good.align #HugoniB.good.groups #HugoniB.trim.unique.bad.accnos #HugoniB.trim.good.names summary.seqs(fasta=HugoniB.trim.unique.good.align, name=HugoniB.trim.good.names) #HugoniB.trim.unique.good.summary filter.seqs(fasta=HugoniB.trim.unique.good.align, vertical=T, trump=., processors=16) #HugoniB.filter #HugoniB.trim.unique.good.filter.fasta summary.seqs(fasta=HugoniB.trim.unique.good.filter.fasta, name=HugoniB.trim.good.names) unique.seqs(fasta=HugoniB.trim.unique.good.filter.fasta, name=HugoniB.trim.good.names) #HugoniB.trim.unique.good.filter.names #HugoniB.trim.unique.good.filter.unique.fasta summary.seqs(fasta=HugoniB.trim.unique.good.filter.unique.fasta, name=HugoniB.trim.unique.good.filter.names) #HugoniB.trim.unique.good.filter.unique.summary pre.cluster(fasta=HugoniB.trim.unique.good.filter.unique.fasta, name=HugoniB.trim.unique.good.filter.names, diffs=1) #HugoniB.trim.unique.good.filter.unique.precluster.fasta #HugoniB.trim.unique.good.filter.unique.precluster.names #HugoniB.trim.unique.good.filter.unique.precluster.map summary.seqs(fasta=HugoniB.trim.unique.good.filter.unique.precluster.fasta, name=HugoniB.trim.unique.good.filter.unique.precluster.names) #HugoniB.trim.unique.good.filter.unique.precluster.summary chimera.uchime(fasta=HugoniB.trim.unique.good.filter.unique.precluster.fasta, name=HugoniB.trim.unique.good.filter.unique.precluster.names) #HugoniB.trim.unique.good.filter.unique.precluster.uchime.chimeras #HugoniB.trim.unique.good.filter.unique.precluster.uchime.accnos remove.seqs(accnos=HugoniB.trim.unique.good.filter.unique.precluster.uchime.accnos, fasta=HugoniB.trim.unique.good.filter.unique.precluster.fasta, name=HugoniB.trim.unique.good.filter.unique.precluster.names, group=HugoniB.good.groups) #HugoniB.trim.unique.good.filter.unique.precluster.pick.names #HugoniB.trim.unique.good.filter.unique.precluster.pick.fasta #HugoniB.good.pick.groups #HugoniB.trim.unique.good.filter.unique.precluster.pick.rdp.wang.taxonomy #HugoniB.trim.unique.good.filter.unique.precluster.pick.rdp.wang.tax.summary classify.seqs(fasta=HugoniB.trim.unique.good.filter.unique.precluster.pick.fasta, template=gg_13_5_99.fasta, name=HugoniB.trim.unique.good.filter.unique.precluster.pick.names taxonomy=gg_13_5_99.pds.tax, processors=16) #HugoniB.trim.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy #HugoniB.trim.unique.good.filter.unique.precluster.pick.pds.wang.tax.summary remove.lineage(fasta=HugoniB.trim.unique.good.filter.unique.precluster.pick.fasta, name=HugoniB.trim.unique.good.filter.unique.precluster.pick.names, taxonomy=HugoniB.trim.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy, group=HugoniB.good.pick.groups, taxon=Bacteria) #HugoniB.trim.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy
! 26!
#HugoniB.trim.unique.good.filter.unique.precluster.pick.pick.names #HugoniB.trim.unique.good.filter.unique.precluster.pick.pick.fasta #HugoniB.good.pick.pick.groups summary.seqs(fasta=current, name=current) #HugoniB.trim.unique.good.filter.unique.precluster.pick.pick.summary system(mv HugoniB.trim.unique.good.filter.unique.precluster.pick.pick.fasta HugnoniB.final3.fasta) system(mv HugoniB.trim.unique.good.filter.unique.precluster.pick.pick.names HugnoniB.final3.names) system(mv HugoniB.good.pick.pick.groups HugoniB.final3.groups) #ANALYSE ß-DIVERSITÉ - OTUs dist.seqs(fasta=HugnoniB.final.fasta, output=lt) #HugnoniB.final.phylip.dist cluster(phylip=HugnoniB.final.phylip.dist, name=HugnoniB.final.names) #HugnoniB.final.phylip.an.sabund #HugnoniB.final.phylip.an.list #HugnoniB.final.phylip.an.rabund make.shared(list=HugnoniB.final.phylip.an.list, label=0.03, group=HugoniB.final1.groups) #HugnoniB.final.phylip.an.shared tree.shared(calc=jclass-morisitahorn) #HugnoniB.final.phylip.an.jclass.0.03.tre #HugnoniB.final.phylip.an.morisitahorn.0.03.tre dist.shared(shared=HugnoniB.final.phylip.an.shared, calc=morisitahorn-jclass) #HugnoniB.final.phylip.an.morisitahorn.0.03.lt.dist #HugnoniB.final.phylip.an.jclass.0.03.lt.dist pcoa(phylip=HugnoniB.final.phylip.an.morisitahorn.0.03.lt.dist) #HugnoniB.final.phylip.an.morisitahorn.0.03.lt.pcoa.axes #HugnoniB.final.phylip.an.morisitahorn.0.03.lt.pcoa.loadings pcoa(phylip=HugnoniB.final.phylip.an.jclass.0.03.lt.dist) #HugnoniB.final.phylip.an.jclass.0.03.lt.pcoa.axes #HugnoniB.final.phylip.an.jclass.0.03.lt.pcoa.loadings clearcut(phylip=HugnoniB.final.phylip.dist) #HugnoniB.final.phylip.tre unifrac.weighted(tree=HugnoniB.final.phylip.tre, name=HugnoniB.final.names, group=HugoniB.final1.groups, distance=lt) #HugnoniB.final.phylip.trewsummary #HugnoniB.final.phylip.tre1.weighted.phylip.dist pcoa(phylip=HugnoniB.final.phylip.tre1.weighted.phylip.dist) #HugnoniB.final.phylip.tre1.weighted.phylip.pcoa.axes #HugnoniB.final.phylip.tre1.weighted.phylip.pcoa.loadings unifrac.unweighted(tree=HugnoniB.final.phylip.tre, name=HugnoniB.final.names, group=HugoniB.final1.groups, distance=lt) #HugnoniB.final.phylip.uwsummary #HugnoniB.final.phylip.tre1.unweighted.phylip.dist pcoa(phylip=HugnoniB.final.phylip.tre1.unweighted.phylip.dist) #HugnoniB.final.phylip.tre1.unweighted.phylip.pcoa.axes #HugnoniB.final.phylip.tre1.unweighted.phylip.pcoa.loadings - Phylotypes phylotype(taxonomy=HugoniB.final.taxonomy, name=HugnoniB.final.names) #HugoniB.final.tx.sabund #HugoniB.final.tx.rabund #HugoniB.final.tx.list make.shared(list=HugoniB.final.tx.list, group=HugoniB.final1.groups, label=1) #HugoniB.final.tx.shared dist.shared(shared=HugoniB.final.tx.shared, calc=morisitahorn-jclass) #HugoniB.final.tx.morisitahorn.1.lt.dist #HugoniB.final.tx.jclass.1.lt.dist pcoa(phylip=HugoniB.final.tx.morisitahorn.1.lt.dist) #HugoniB.final.tx.morisitahorn.1.lt.pcoa.axes #HugoniB.final.tx.morisitahorn.1.lt.pcoa.loadings pcoa(phylip=HugoniB.final.tx.jclass.1.lt.dist) #HugoniB.final.tx.jclass.1.lt.pcoa.axes #HugoniB.final.tx.jclass.1.lt.pcoa.loadings Oligotyping script # specific fasta preparation # Add taxonomy to the defline in the mothur fasta deunique.seqs(fasta=HugoniB.trim.unique.good.filter.unique.precluster.pick.fasta, name=HugoniB.trim.unique.good.filter.unique.precluster.pick.names) #HugoniB.trim.redundant.fasta add-tax-to-mothur-defline.py myfasta mytaxonomy
! 27!
awk '/Nitrosopumilus/{print;getline;print}' HugoniB.trim.redundant.pds.wang.taxonomy.fa > Hugoni.Nitrosawk.fasta sed "s/__/_/" Hugoni.Nitrosawk.fasta > Hugoni.NitrosopumilusF.fasta awk 'BEGIN{FS="\t"}{print $1}' HugoniB.trim.redundant.NitrosopumilusF.fasta > HugoniB.trim.redundant.Nitrosopumilus.fasta #Oligotyping o-pad-with-gaps HugoniB.trim.redundant.Nitrosopumilus.fasta entropy-analysis HugoniB.trim.redundant.Nitrosopumilus.fasta-PADDED-WITH_GAPS #Nitrosopumilus oligotype HugoniB.trim.redundant.Nitrosopumilus.fasta-PADDED-WITH_GAPS HugoniB.trim.redundant.Nitrosopumilus.fasta-PADDED-WITH_GAPS-ENTROPY -C 84,86,110,112,145,185,192 -a 0 -s 1 --gen-html #MGII oligotype HugoniB.trim.redundant.MGII.fasta-PADDED-WITH_GAPS HugoniB.trim.redundant.MGII.fasta-PADDED-WITH_GAPS-ENTROPY –C 66,68,74,84,86,88,96,97,99,108,110,112,121,124,126,127,140,147,185,239, 241,243,244 –A 10 –s 1 --gen-html #MGIII oligotype HugoniB.trim.redundant.MGIII.fasta-PADDED-WITH_GAPS HugoniB.trim.redundant.MGIII.fasta-PADDED-WITH_GAPS-ENTROPY –c 5 –a 0.0 –s 1 --gen-html R Script shared<- read.table("Shared_total.txt", header=T) braydist<- vegdist(shared, method="bray") nmds<- metaMDS(braydist) metadata<- read.table("metadata.txt", header=T) View(metadata) # Example for Saisons block<- metadata$Saisons ordiplot(nmds, display="si", type="n") points(nmds, display="sites", select=which(block=="Hivers"), pch=21, col="black", bg="light blue") points(nmds, display="sites", select=which(block=="Printemps"), pch=21, col="black", bg="orange") points(nmds, display="sites", select=which(block=="Ete"), pch=21, col="black", bg="red") ordispider(nmds, group=block,show.groups="Hivers") ordispider(nmds, group=block,show.groups="Printemps") ordispider(nmds, group=block,show.groups="Ete") ordiellipse(nmds, group=block, show.groups="Hivers", kind="sd",conf=0.95, col="light blue") ordiellipse(nmds, group=block, show.groups="Printemps", kind="sd",conf=0.95, col="orange") ordiellipse(nmds, group=block, show.groups="Ete", kind="sd",conf=0.95, col="red") ordiplot(nmds, display="si") title("NMDS plot with Bray dist ", "Seasons") legend("bottomright", legend=c("Winter", "Spring", "Summer"), col=c("light blue", "orange", "red"), pch=21, bty="n")
Recommended