27
Master 1 «Biologie, Santé » Spécialité Microbiologie Rapport de Stage Clarisse Lemonnier Ecological Study of Pelagic Archaea in Marine Coastal Waters Bioinformatical approach Maître de stage : Loïs Maignien Date de stage : Du 6 Janvier au 28 Février 2014 Lieu de stage : Laboratoire de Microbiologie des Environnements Extrêmes (LM2E)G 29280 Plouzané.

Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

  • Upload
    dinhthu

  • View
    218

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

Master'1'«Biologie,'Santé'»'Spécialité)Microbiologie)''''' ' ' ' '''''''''Rapport'de'Stage''

'!Clarisse!Lemonnier'

''

'Ecological'Study'of'Pelagic'Archaea'in'Marine'

Coastal'Waters''

Bioinformatical)approach)''!! !! !! !!!!!!! !!!!!

Maître!de!stage!:! Loïs!Maignien!Date!de!stage!! :!! Du!6!Janvier!au!28!Février!2014!Lieu!de!stage!! :! Laboratoire!de!Microbiologie!des!Environnements!!

Extrêmes!(LM2E)G!29280!Plouzané.!! !

Page 2: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 2!

Table of Contents!!I.'INTRODUCTION'.....................................................................................................................................................'3!II.'MATERIALS'AND'METHODS'.............................................................................................................................'5!

A. SAMPLING!AND!DNA!EXTRACTION!........................................................................................................................................!5!B.!SEQUENCING!...............................................................................................................................................................................!6!C.!TREATMENT!OF!SEQUENCES!AND!ANALYSIS!..........................................................................................................................!7!1.)Bioinformatical)tools)............................................................................................................................................................)7!

i.!Software!.......................................................................................................................................................................................................................!7!ii. CAPARMOR.!..............................................................................................................................................................................................................!7!

2.)Treatment)of)sequences).......................................................................................................................................................)8!3.)Analysis)of)sequences)............................................................................................................................................................)9!

i.!alphaGdiversity!..........................................................................................................................................................................................................!9!ii.!betaGdiversity!...........................................................................................................................................................................................................!9!iii.!Oligotyping!.............................................................................................................................................................................................................!10!

III.'RESULTS'&'DISCUSSION'.................................................................................................................................'10!A.!TREATMENT!OF!SEQUENCES!..................................................................................................................................................!10!B.!DOMINANT!MGII,!MGI!AND!MGIII!.....................................................................................................................................!11!C)!SEASONAL!CORRELATION!......................................................................................................................................................!12!D)!OLIGOTYPING!..........................................................................................................................................................................!16!

IV.'CONCLUSION'.......................................................................................................................................................'19!BIBLIOGRAPHIE……………………………………………………………………………………………………………………….20'

ANNEXS'.......................................................................................................................................................................'23!!' Figures !Figure!1!:!Phylogeny)of)the)pelagic)archaeas!(DeLong,!2003)!......................................................................................!3!Figure!2!:!CAPARMOR server organization.!..........................................................................................................................!7!Figure!3:!Seasonal)variation)of)the)Cenarchaceae,)MGII)and)MGIII,)based)on)the)phylotype)analysis.!.......!12!Figure!4: Seasonal variation of OTU1 Nitrosopumilus, with the amount of NH4+, NO3- and NO2-.!.............!13!Figure!5!:!Seasonal variation of OTU1 and OTU18!..........................................................................................................!13!Figure!6!:!NMDS for all the sequences with different metadata. The distance index used is Bray.!...................!15!Figure!7!: Seasonal variation of the two oligotypes of Nitrosopumilus spp CCGGCCG and TCGACCG.!.....!17!Figure!8: Seasonal variation of the two oligotypes of Nitrosopumilus TCGGCCG and TGCGCCG!..............!17!Figure!9: NMDS with Bray indice for Nitrosopumilus spp OTUs and Oligotypes.!.................................................!18!Figure!10!:!NMDS with Bray indice for Nitrosopumilus spp OTUs and Oligotypes.!.............................................!19!!!

Tables !Table!1: Comparison of Nitrosopumilus spp. OTUs and Oligotypes contents depending on the abundance of the taxa unit!....................................................................................................................................................................!16!Table!2: Comparison of MGII OTUs and Oligotypes contents depending on the abundance of the taxa unit.!...................................................................................................................................................................!18!Table!3!: Comparison of MGIII OTUs and Oligotypes contents depending on the abundance of the taxa unit!....................................................................................................................................................................!19!!

Page 3: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 3!

I. Introduction! Pelagic Archaeas

A characterization of the third domain of life, Archaea, has been one of the most important leaps

forward in microbiology these last years. C. Woese and G. Fox discovered it in 1977, thanks to the

onset of molecular phylogeny analysis (Woese & Fox, 1977). Those organisms appeared to adapt

incredibly well to extreme environments. This is why, for more than ten years, research was

focused on their metabolisms in some very restrictive places like hydrothermal vents or hypersaline

lakes. (Cowan, 1992)

Only in the early 1990s, with the development of culture-independent technics, two publications

from Delong and Fuhrman et al. revolutionized our vision of the Archaea. Using 16S rDNA based

analysis in marine coastal waters (Delong, 1992; (Fuhrman, McCallum & Davis, 1992), they indeed

demonstrated their presence in a large and temperate habitat: the ocean. Thereafter, numerous

publications have confirmed and strengthened this discovery, showing that pelagic Archaea are very

well represented in the ocean where they may compose, at least, more than 20% of the whole

marine picoplankton (Smith, 2001; DeLong & Pace, 2001) .

There are major stakes regarding these findings. The oceans are implicated in the global climate

throughout different fundamental biogeochemical processes, were microorganisms take a large part.

All the pelagic archaeas found for the moment can be grouped into 2 main phylums: the

Crenarchaeota with Marine group I, and the Euryarchaeota with three unclassified groups: Marine

Group II, III and IV. (Fig.1)

!Figure' 1':' Phylogeny) of) the) pelagic) archaeas!(DeLong,!2003)'

''''''''

'

The first one, Marine Group I (MGI), appeared to be widespread and very abundant, and must be

considered, according to some scientists, as the amplest microbial group in the ocean (Walker et al.,

2010). A rearrangement of its phylogeny as a new phylum, the Taumarchaeota, was even suggested

Page 4: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 4!

(Brochier-Armanet et al., 2008) and is now well accepted by the scientific community (Pester,

Schleper & Wagner, 2011). A study of its abundance evolution in the water column shows that it

considerably increases from surface water to bathy-pelagic ocean, were it can reach 40% of the

whole bacterioplankton biomass (DeLong, 2003).

But the knowledge of those microorganisms remains poorly documented, for example, while we

explored the huge abundance of MG I all over the oceans, we had a very little idea of its ecological

role. In the 2000s, a breakthrough occurred, with the discovery of Amo-like gene attributed to MG I

(Treusch et al., 2005). The Amo gene encodes a subunit of the Ammonia monooxygenase. This

protein catalyzes the first step of nitrification e.g. the oxidation of ammonium into nitrites and

nitrates. May the marine Archaea play a role in the global azote cycle? A clear evidence has been

provided by the cultivation of a Taumarchaeota, Nitrosopumilus maritimus by Konneke et al. in

2005 (Könneke et al., 2005). This archaea was isolated from a tropical aquarium with high amount

of ammonium. By cultivating it, they showed that N. maritimus is a chemolithoautotrophe and in

fact carries the Amo-like gene. Ecological studies of this genus from different places – North Sea,

Antarctica, Barbara channel or Mediterranean coast of Spain – shows that a bloom occurs in winter

(Pitcher et al., 2011). Nitrosopumilus might find in this season a favorable competition for

Ammonium that is highly consumed in summer by the phytoplankton unlike during winter.

Moreover, experiences showed that N.maritimus has a great adaptation to low concentrations of

ammonium (Pitcher et al., 2011)

The other phylum found in the ocean is the Euryarchaeota, with Marine Group II (MG II), Marine

Group III (MG III), and Marine Group IV (MG IV). Unlike Marine Group I, these groups still do

not have any close relatives cultivated and are exclusively described through their 16S rRNA. MG

II is the most abundant group in marine surface water (Hugoni et al., 2013; Brown et al., 2009) but

its ecological behavior is not known yet. However, Vaughn Iverson and his team have recently tried

to reconstruct environmental genome of MG II thanks to metagenomics. By comparing its genes to

database to based on sequences homology, they hypothesized that MG II is a motile photo-

heterotroph that mainly focuses on protein and lipid degradation (Iverson et al., 2012). MG III is

particularly found in deep waters, as MG II we know very little about this group.

Importance of culture-independent surveys

Thus an important part of our knowledge about pelagic Archaeas –from their discovery to

hypothesis about their ecology– is mainly given by culture-independent surveys. It is a great

example to illustrate how this approaches are now required for phylogenetic, evolution and

ecological studies.

Page 5: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 5!

One of the most widely used technic is the sequencing of the 16S rRNA This technic consists of the

extraction and sequencing of “all” the archaeal -or other microorganisms- 16S rRNA genes present

in an environment, and then, try to identify the organisms present at the time of sampling.

Moreover, microbial diversity studies based on this method are currently in a little revolution with

the use of next-generation sequencers that allows to sequences millions of reads in a single run of

one day. Those sequencers thus allow a deep sequencing, with a finest investigation of the diversity

of a sample and they also permit a complex analysis by sequencing different samples

simultaneously (Pinto & Raskin, 2012)

However, this technics have some limits, with at first, the sequencing errors that can’t be easily

discriminated from real sequences and the complexity of its analysis.

The use of bioinformatics has become essential to deal with the thousands or million of data

produced by those sequencers, but also to perform ecological and statistical studies. This technics

are in a current evolution and become a powerful tool to find out sequencing errors, and try to

approach the finest diversity of an environment and explore the uncultivated and unknown

biosphere.

Scope of my work

It is the aim of my internship to study pelagic Archaea diversity and ecology using bioinformatics

tools. In that way, I used as a basis a dataset already published from Hugoni et al. that gathers all

the sequences from a 3 years temporal survey of sea surface water in the bay of Banyuls-sur-mer

(Hugoni et al., 2013). I performed the usual α- and ß-diversity analysis to investigate which

Archaea occurs in this environment and gaining insight into their ecology by examinating their

temporal variations. I ended with the use of a new bioinformatics tool that allows a finer analysis:

Oligotyping. (Eren et al., 2013)

!

!

II. Materials and methods!

A. Sampling)and)DNA)extraction) According to Hugoni et al. publication, ten liters of water were collected monthly from March 2008

to June 2011, at 3 meters depth in the bay of Banyuls-sur-mer. Forty samples were thus obtained.

Archaeas’ treatment started within the two next hours, by different filtrations steps with 3 and 0,22

µm pore-sized Sterivex®. The filters were then stored at -80°C until nucleic acid extraction. This

Page 6: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 6!

last one was performed directly in the Sterivex® cartridges by mechanical and chemical cell lysis,

using the AllPrep DNA/RNA kit. Both DNA and RNA was extracted, that made up to a total of 80

samples (Hugoni et al., 2013).

B.)Sequencing) Before sequencing, the genes of interest must be amplified by PCR. Basically, microbial diversity

studies target the 16S rRNA gene. This gene is present in the entire prokaryotic domain and

possesses both conservative and variable regions that enable to discriminate two very closed

species. Moreover, it also presents numerous database and one of the most important is NCBI.

Here, the V3-V5 hypervariable regions were sequenced. These regions are known to provide

phylogenetic results close to those obtained with the whole 16S rRNA sequence (Pinto & Raskin,

2012)

To perform the PCR, archaeal specific primers were used: Arch349F (GYG CAS CAG KCG MGA

AW) and Arch806R (GGA CTA CVS GGG TAT CTA AT). Arch349F is a degenerated primer that

actually consists of a combination of different primers, with few variations in their sequences. This

property is useful when it is requested to match the largest number of sequences.

The sequences were obtained by pyrosequencing Roche 454 GS-FLX system, a high throughput

sequencer that offers very interesting advantages for microbial diversity. The complex analysis with

sequencing different samples simultaneously is possible by the addition of barcodes, which are

short sequences specific to one sample, ligated to the primer. The reads can then be easily

identified, especially, as we shall see later, by the different bioinformatics software. This new

technology performs a sequencing based on the pyrosequencing technics giving roughly 1 million

of sequences within 10 hours.

Different files are obtained after sequencing:

- a fasta file with all the sequences and their identification.

- a qual file that assign a sequencing score to each base.

- a file with the primers and barcode sequences for each sample.

All those files were deposited online by Hugoni et col. in the following address:

http://datadryad.org.

)!!

Page 7: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 7!

!

C.!Treatment!of!sequences!and!analysis!

1.!Bioinformatical!tools!

!!i.!Software!

Different softwares are available for scientists to treat the huge amount of data given by new

generation sequencers and perform sequences analysis.

In this study, we principally used the package Mothur (Schloss et al., 2009), a software specially

designed for microbial ecology analysis, based on the 16S rRNA gene sequences. It centralizes

different tools that permit to produce, from the raw sequences, complete α- and ß-diversity analysis,

within a total of 142 commands.

The second software Oligotyping, is very recent and allows a finer analysis among microorganisms

with closely related 16S rDNA sequences. We will describe it later.

We also used R, an important software for statistical computing and graphics.

The scripts we used to perform this analysis is given in annex (Annex 5)

ii. CAPARMOR.!

Those software manipulate large datasets, and traditional personal computers are quite limited in

power and RAM. Here we obtained an account on the IFREMER server, CAPARMOR to perform

our analysis. This server centralized 2600 processors, and is used for oceanographic and

meteorological simulations from different

laboratories like IFREMER, LEMAR, LOP.

Obviously, we didn’t use the entire possibilities of

this important server, but we only connected to a

little clustering of 32 processors called service8. (Fig.

2)

Figure'2':'CAPARMOR server organization. (source: IFREMER)'

Page 8: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 8!

2.!Treatment!of!sequences! Quality filtering of sequences is a major step when performing sequences analysis. In fact, the PCR

and sequencing technic always generates errors that may impact the future analysis. It is a challenge

to target these artifacts, as we don’t know the originals sequences and how to distinguish them from

rare but true reads.

With the qual file, we can remove poor quality sequences that are then most likely to contain false

bases, insertion, or deletion. In the Mothur command trim(), we indicate a minimal quality score,

27, that was the one chosen by Hugoni et al. . In this command, we can also precise the number of

mistakes allowed in the primers and barcode sequences, one for both of them, and the number

maximal of homopolymers (i.e. the number of consecutive bases found in a sequence.), that is

basically 8.

It is during this step that we also remove the primers and barcode of all the sequences.

Another important bias, due to PCR this time, is the chimeras’ presence. Chimeras are a

combination of two or more different sequences that can be removed by comparing our sequences

with a reference database using the chimera.uchime() command in Mothur or by comparing reads

to the others within the samples (chimera.slayer() command).

For all those treatment, Mothur generates a file with the non-desired sequences that can then be

removed from the whole fasta file with the remove.seqs() command.

A crucial step is also the sequence alignment, where the regions of identity between our sequences

are found. It results in addition of gaps that are represented by “-“. The alignment can be done with

the sequences together, or with a reference database, that will also permit after an identification of

the sequences. Here we performed the alignment with the align.seqs() command using the reference

database greengenes gg_13_5_99 that contains 202421 archaeal and bacterial sequences, updated in

May 2013 (DeSantis et al., 2006).

Then the last step in sequence quality check, is to set the length of all our sequences in order to

compare them. This is done with screen.seqs(). In this command we precise the position of the first

and last nucleotide that we wanted for all our sequences. We thus have to find a compromise

between keeping the longest sequences, and the risk of loosing numerous reads, or keeping more

numerous sequences, at the expenses of phylogenetic resolution.

Page 9: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 9!

The nucleotide positions are determined after the alignment step, with all the sequences and their

different length obtained after alignment. I then chose to cut and take sequences between 1917 and

2381 bp. After this treatment all our sequences were 325 bp long.

During this preparation of our sequences for analyzes, we didn’t manipulate all the sequences but

only what is called “unique sequences”: all the sequences exactly identical are grouped together and

only one of them is used for the rest of the analysis. That produce two files: one fasta file, with the

unique sequences and a name file with all the sequences associated to the unique sequence. This

manipulation aims to decrease the number of sequences to manipulates and then facilitates the

analysis. In this study we then pass from 340800 to 26755 sequences to be manipulate in the fasta

file.

3.!Analysis!of!sequences! Sequence analyzes will permit us to see the diversity of the Archaeas in one sample (alpha-

diversity) but also will provide statistical means to compare our samples together (ß-diversity).

i.!alphaBdiversity! To perform alpha-diversity we must pass by grouping our sequences, and this can in two ways: the

OTU (Operational Taxonomic Unit) and phylotypes. The first one consists of clustering our

sequences with 97% of similarity. This percentage is quite debated and is based on empirical

studies that fix the notion of species (Stackebrandt & Goebel, 1994). Phylotype clustering groups

the sequences that received exactly the same taxonomic affiliation when compared to already

known sequences from a reference database.

We can cluster the OTU with the cluster() command in Mothur after measuring the distance

between the sequences together with dist.seqs()

Phylotype are made based on the greengenes database classification with the classify.seqs()

command.

ii.!betaBdiversity!

According to Gauch : "Ordination primarily endeavors to represent sample and species relationships

as faithfully as possible in a low-dimensional space". (Gauch, 1982)

To perform ß-diversity, we produce statistical charts that represent all our samples in a two

dimensions graph. This step of converting a multiple variate data (numerous samples with

numerous taxa) with an important number of dimensions is called ordination. There are numerous

Page 10: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 10!

ordination methods, and here we used the Nonmetric Multidimensional Scaling (NMDS), which

group the very close samples, but were the distance correlations are not fully respected. Our

samples are thus plotted according to their similarities in the different populations (OTUs).

The analysis starts with the creation of a data matrix that present the samples name in column and

the different OTUs in rows. This step is done with the make.shared() command in Mothur. And

then we apply different “diversity index” that will not give the same informations. We used Jaccard

coefficient that is a presence/non-presence index, or Bray index that is a weighted one and take into

account the different proportion of the OTUs in a sample. We then obtained a distance matrix which

is done with dist.shared() command.

The NMDS were produced with R the script is given in annex (annex 5).

iii.!Oligotyping! Oligotyping is a very new and interesting tool. It principal objectives aims to differentiate very

close sequences that can be differ by only one nucleotide, a high resolution not allowed by the

OTUs and phylotypes. Indeed, the phylotype resolution is quite limited by our actual knowledge of

the phylogeny, that it is particularly poor within the pelagic Archaea : MGII for example is only

defined at the family level. The OTUs groups sequences with 3% variation between them. The

variation between sequences is estimated by entropy analysis, to discriminate it from sequencing

errors.

Oligotyping is performed in already defined taxas, at the level of genus, or family, as it search very

fine variations between the sequences.

At the end of the analyses, we obtain different “oligotypes”, that are different taxonomic units.

III. Results & Discussion!

A.)Treatment)of)sequences)

The database primarily contained 438114 sequences and from this amount, 433 chimeras and 22%

of the sequences were removed after the filtering step. All the aligned sequences were trimmed to

325 bp. Only 174025 sequences were kept for the analysis, due to the finding of numerous bacterial

sequences that composed 48,9% of all the data. It is known that domain-specific primers always

Page 11: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 11!

amplifies a more or less important fraction of other domain or genes (Baker, Smith & Cowan, 2003)

Those important non-specific amplification can be enhanced by degenerated primers.

Furthermore we eliminated 11 samples that did not contain enough sequence, the minimum limit

was fixed at 500 sequences.

B.)Dominant)MGII,)MGI)and)MGIII))

Phylotypes

A noticeable aspect when observing the diversity of the phylotypes in our samples, was that 99,6%

of them were grouped into three main groups: MGII, for the most abundant (102982 sequences),

MGI with the Cenarchaceae family (68905 sequences), and far behind, MGIII (1464 sequences).

The others anecdotal phylotype found were affiliated to the recently suggested phylum of

Pavarchaeota (Rinke et al.) or the class of Methanomicrobia in the Euryarchaeota phylum.

(Annex1)

This observation confirms the fact that MGII is an important group of the marine surface water.

In total nine phylotypes of more than 10 sequences were described. Four belong to the MGI

phylum, were only one family was found, subdivided into three genus Cenarchaeum,

Nitrosopumilus, and a non-classified one. If we have a look to the number of sequences, the most

abundant genus was clearly Nitrosopumilus that represents 52,755 on 68,905 MGI sequences.

(Annex1)

For MGII and MGIII, no order, or genus is described due to a lack in the previous investigations of

this group’s phylogeny, and then all our sequences were clustered into one phylotype for each, at

the family level.

And finally, the Parvarchaeota phylum represented two phylotypes and the Methanomicrobia, one

phylotype with 38 sequences only.

OTUs

In total, our sequences were clustered into 829 OTUs. Within this total, 485 contain only one

sequence; they probably result of sequencing errors. The most abundant OTU contains 50321

sequences but then the number of sequence per OTU sharply decreases as we can see in Annex 2.

To facilitate the analysis, we thus arbitrary delete the OTUs that contain less than 10 16S rDNA

sequence that lead the total to 52 OTUs.

By comparing our sequences with a reference database, we can have an idea of the different OTUs’

taxonomy.

Page 12: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 12!

Thus 19 OTUs were associated to MGI. Within the 19 OTUs, 16 were affiliated to Nitrosopumilus

genus as the most abundant OTU (OTU1), 2 were affiliated to Cenarchaeum and one to the

unclassified genus. OTU1 grouped quite all the sequences (50,321 within a total of 68,905

sequences).

For MGII, 26 OTUs were affiliated to this group suggesting subdivisions in numerous taxa that

occurs in our sampling environment. Those OTUs were quite equal in the number of sequences and

suggests a far more diversify group than MGI.

And finally, we only found 2 OTUs for MGIII. This can be explained, as this group is mostly

abundant in deep water. This group was not found in all coastal water surveys (Herfort et al., 2007)

The last OTUs are affiliated to the Pavarchaeota phylum (3) and other anecdotal taxa.

)

C))Seasonal)correlation)

While studying the ecology of different taxa it is interesting to find out if there are different

variations of the relative abundance, correlated with the metadata. It would provide information for

theire ecological behaviors, and also to investigate if two OTUs could define two different species.

As this study is based on a temporal sampling, we represented the relative abundance of the three

main groups at the family level for each month. (Fig. 3) This graph clearly shows high variations

with the seasons: the relative abundance of MGI is extremely low during the summer (August to

October) but very important in winter (January to March). The switch in abundance between MGI

and MGII depending on the seasons, was already observed in Southern North Sea coastal waters

(Herfort et al., 2007)

Figure'3:'Seasonal)variation)of)the)Cenarchaceae,)MGII)and)MGIII,)based)on)the)phylotype)analysis.

Page 13: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 13!

This graph is in relative abundance and precisely; those variations are mostly carried by

Nitrosopumilus. In our samples, this ammonia oxidizer presents a bloom in winter were it reaches

more than 50% of all the sequences, whereas in summer its population is near to zero.

By regarding the variations of OTU1, we can see that its variations are strongly associated to the

nitrate content. (Fig. 4)

'

'

'

'

'

'

'

Figure'4: Seasonal variation of OTU1 Nitrosopumilus, with the amount of NH4+, NO3- and NO2-.'

Nitrosopumilus spp. shows an implication in nitrification reactions, and reveals an important

ecotype of this species in sea surface waters. This results fit with those found in literature that

describe an ammonia oxidizer.

But surprisingly one OTU associated to Nitrosopumilus spp, OTU18, reaches a pic in summer

instead of winter. (Fig. 5) This observation leads to the hypothesis that there is another ecotype of

Nitrosopumilus spp that either is more abundant in summer or has a quite constant population

throughout the year. Indeed this apparent augmentation might be possible because of the fall of the

most abundant OTUs in this period. This specie would be then easily sampled.

!'

'

'

'

'

'

Figure'5':'Seasonal variation of OTU1 and OTU18 '

Page 14: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 14!

MGII presents different profiles of seasons variations, it depend on the OTUs. Some OTUs

affiliated to this group are abundant in summer, others are abundant in winter, and the last ones

don’t present logical seasonal pattern. (Annex 3)

The MGIII OTUs all present the same variations with the seasons. (Annex4). Four peaks are found,

3 in November, and one in July.

ß-diversity

The NMDS were done with all the OTUs, for the different metadata. Each dot represents one

sample. They are grouped according to their diversity.

Page 15: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 15!

Figure!6!:!NMDS with bray distance for all the sequences with different metadata.

Page 16: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 16!

The NMDS confirms the importance of the seasons in the diversity. It shows three distinct patterns,

for each season. The winter and summer pattern are well separated and the spring pattern seems to

be more widespread and not as well defined. The others metadata that permits to figure out different

patterns are the water temperature, the oxygen, NO2, NO3 and Chlorophyll a content. This last

parameter offers a double influence on the pelagic Archaeas: it is known to have an inverse

correlation with Crenarchaeota abundance and in contrast to be positively correlated to MGII

abundance (Herfort et al., 2007).

D))Oligotyping)

Oligotyping was done in the Nitrosopumilus genus, MGII and MGIII family. In this way, specific

fasta of this three groups were extracted from the general fasta, according to the taxonomy.

To compare the results with the OTUs, we differentiated the abundance of the different species.

According to Hugoni et al. we distinguished the abundant and the rare biosphere. Abundant OTUs,

or oligotypes are those who represent in at least one sample, more than 1% of the sequences. The

rare biosphere groups those bellow 0,2%.

For Nitrosopumilus, 25 oligotypes were found, against 19 OTUs. As we can see in table 1, this

difference depends on the relative abundance of the taxa units. Oligotyping allows a highlighting of

both the rare and abundant OTUs.

Table'1: Comparison of Nitrosopumilus spp. OTUs and Oligotypes contents depending on the abundance of the taxa unit'

. If we observe more in detail those differences, we can see that the first OTU is actually split into 2

oligotypes, not in an equal proportion. The first oligotype remain extremely abundant with 48,475

sequences, while the second is only 1,474. Those two oligotypes present a very strong co-variation.

(Fig. 7)

!'

'

'

'

! Abundant!>1%! Rare!(0,231%)! Extremely!rare!(<0.2%)!OTUs! 2! 8! 4!Oligotypes! 6! 7! 12!

Page 17: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 17!

'

'

!Figure'7': Seasonal variation of the two first oligotypes of Nitrosopumilus spp CCGGCCG and TCGACCG. '

We can try to identify those oligotypes with the Basic Local Alignment Search Tool (BLAST) in

the NCBI website. This tool compares our sequences with a huge database that gather different

environmental samples and identified sequences. The two first oligotypes both matched at 99% to

Candidatus Nitrosopumilus sp. PS0. This results leads to the hypothesis that they might be the two

different 16S rRNA’s operon of one species. Or also two very closed species with exactly the same

ecological behavior, as the Nitrosopumilus representatives that have their whole genome sequenced

only present one locus of the 16S rRNA gene (source: ncbi).

Two other abundant oligotypes either present a co-variation (Fig. 8)

'

'

'

'

'

'

Figure'8: Seasonal variation of the two oligotypes of Nitrosopumilus TCGGCCG and TGCGCCG '

Those two oligotypes presents, like the OTU 18, a pic of abundance in summer (May, June, July,

August). While we can hypothesis that those two oligotypes are the 2 operons of one species, the

BLAST hits reveals 2 different “identification”. Candidatus Nitrosopumilus sp. HCA1 (100%) for

TCGGCCG and Candidatus Giganthauma insulaporcus (100%) for TGCGCCG. Nevertheless, the

fasta used for this oligotype analysis only contained sequences classified as Nitrosopumilus

according to the taxonomy. This highlights that the reference database used isn’t well updated, for

all the Archaea described. It is a challenge for the Archaeal database to be the most complete, for

whose new taxa are again and again discovered thanks to research advances. Giganthauma

insulacorpus is a particular archaea first described in 2010 in the Guadeloupian archipelago (Muller

Page 18: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 18!

et al., 2010). Furthermore, this Archaeon shared 97,7% similarities of Nitrosopumilus maritimus in

its ss 16S rRNA gene. It thus explain why this specie hasn’t been described with the OTU approach.

NMDS were done on both OTUs and Oligotype of Nitrospumilus.

Figure!9: NMDS with Bray index for Nitrosopumilus spp OTUs and Oligotypes.

The patterns between OTUs and Oligotyping are the same, with slighted more grouped samples for

oligotyping.

For MGII, the oligotyping reveals 41 oligotypes, instead of 26 OTUs.

Table'2: Comparison of MGII OTUs and Oligotype contents depending on the abundance of the taxa unit.!

Surprisingly, Oligotyping doesn’t observe new abundant taxonomic units. The OTUs already well

defined the different abundant species of MG-II.

The results for the rare biosphere are much more different: 4 OTUs against 11 oligotypes.

Column1! Abundant!>1%! rares!(0.231%)! !!!!!!!!Extremely!rares!(<2%)!OTUs! 17! 5! 4!Oligotypes! 18! 12! 11!

Page 19: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 19!

NMDS

Figure!10!:!NMDS with Bray index for MGII OTUs and Oligotypes.

The pattern doesn’t change between the two methods.

And for MGIII:

Table'3': Comparison of MGIII OTUs and Oligotypes contents depending on the abundance of the taxa unit.'

Column1! Abundant!>1%! !!!!Rare!(0.231%)! Extremely!rare!(<2%)!OTUs! 1! 1! B!Oligotypes! 1! 1! 3!

The diversity of MGIII remains poor with the oligotype but it reveals 3 extremely rare taxa units. !

IV. Conclusion!

Pelagic archaeas are well represented in marine coastal waters, with two major groups, MGI and

MGII. MGI well demonstrate the ecological importance of those organisms with a clear implication

in nitrogen cycle carried by the genus Nitrosopumilus. Nevertheless we found here another

abundant ecotype of Nitrosopumilus that doesn’t seem to vary throughout the year.

MGII is the most abundant and diversified group in this environment. Unfortunately, it is also the

group that highlights the poor knowkedge about pelagic Archaea ecology and the complexity to

explore uncultivable taxa. Despite it dominance, nothing or little is known about these Archaea and

it is difficult to analyze the different ecotype.

Three different approaches were used in this study to explore the diversity of the Archaea in sea

surface waters from Mediterranean coasts. The most precise information was clearly given by the

Page 20: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 20!

most recent one, Oligotyping. Its advantages are well demonstrated by the finding of Candidatus

Giganthauma insulaporcus, which wasn’t described by both the OTU and phylotype approachs. A

noticeable aspect of the use of Oligotyping in this environment was that it permitted to highlight the

rare biosphere in all the groups studied. This particular biosphere is only recently investigated

thanks to the deep coverage allowed by new generating sequencers, and leads to important

questions in ecology. (Hugoni et al., 2013)

This internship was for me a rewarding experience, as it was an immersion into bioinformatics.

Approaching ecological behaviors and diversity in an environment through thousands of reads, is a

quite fascinating but complicate exercise. It is important to be aware at each step of the

consequences of the manipulation in the results and afterwards, the analysis.

Bibliography!!G.C.!Baker,!J.J.!Smith!&!D.A.!Cowan;!2003.!Review!and!reGanalysis!of!domainGspecific!16S!

primers.!J.)Microbiol.)Methods,!vol.!55,!no!3,!p541–555.!

N.!Bano,!S.!Ruffin,!B.!Ransom!&!J.T.!Hollibaugh;!2004.!Phylogenetic!composition!of!Arctic!Ocean!archaeal!assemblages!and!comparison!with!Antarctic!assemblages.!Appl.)Environ.)Microbiol.,!vol.!70,!no!2,!p781–789.!

C.!BrochierGArmanet,!B.!Boussau,!S.!Gribaldo!&!P.!Forterre;!2008.!Mesophilic!crenarchaeota:!proposal!for!a!third!archaeal!phylum,!the!Thaumarchaeota.!Nat.)Rev.)Microbiol.,!vol.!6,!no!3,!p245–252.!

M.V.!Brown,!G.K.!Philip,!J.A.!Bunge,!M.C.!Smith,!A.!Bissett,!F.M.!Lauro,!J.A.!Fuhrman!&!S.P.!Donachie;!2009.!Microbial!community!structure!in!the!North!Pacific!ocean.!ISME)J.,!vol.!3,!no!12,!p1374–1386.!

D.A.!Cowan;!1992.!Biotechnology!of!the!Archaea.!Trends)Biotechnol.,!vol.!10,!p315–323.!

E.F.!DeLong;!2003.!Oceans!of!archaea.!ASM)NewsNAm.)Soc.)Microbiol.,!vol.!69,!no!10,!p503–510.!

E.F.!DeLong!&!N.R.!Pace;!2001.!Environmental!Diversity!of!Bacteria!and!Archaea.!Syst.)Biol.,!vol.!50,!no!4,!p470–478.!

T.Z.!DeSantis,!P.!Hugenholtz,!N.!Larsen,!M.!Rojas,!E.L.!Brodie,!K.!Keller,!T.!Huber,!D.!Dalevi,!P.!Hu!&!G.L.!Andersen;!2006.!Greengenes,!a!chimeraGchecked!16S!rRNA!gene!database!and!workbench!compatible!with!ARB.!Appl.)Environ.)Microbiol.,!vol.!72,!no!7,!p5069–5072.!

A.M.!Eren,!L.!Maignien,!W.J.!Sul,!L.G.!Murphy,!S.L.!Grim,!H.G.!Morrison!&!M.L.!Sogin;!2013.!Oligotyping:!differentiating!between!closely!related!microbial!taxa!using!16S!rRNA!gene!data.!Methods)Ecol.)Evol.,!vol.!4,!no!12,!p1111–1119.!

J.A.!Fuhrman,!K.!McCallum!&!A.A.!Davis;!1992.!Novel!major!archaebacterial!group!from!marine!plankton.!Nature,!vol.!356,!no!6365,!p148–149.!

Page 21: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 21!

H.G.!Gauch;!1982.!Multivariate)analysis)in)community)ecology.)Cambridge)University)Press.)

L.)Herfort,)S.)Schouten,)B.)Abbas,)M.J.)Veldhuis,)M.J.)Coolen,)C.)Wuchter,)J.P.)Boon,)G.J.)Herndl)&)J.S.)Sinninghe)Damsté;)2007.)Variations)in)spatial)and)temporal)distribution)of)Archaea)in)the)North)Sea)in)relation)to)environmental)variables.)FEMS)Microbiol.)Ecol.,!vol.!62,!no!3,!p242–257.!

M.!Hugoni,!N.!Taib,!D.!Debroas,!I.!Domaizon,!I.J.!Dufournel,!G.!Bronner,!I.!Salter,!H.!Agogué,!I.!Mary!&!P.E.!Galand;!2013.!Structure!of!the!rare!archaeal!biosphere!and!seasonal!dynamics!of!active!ecotypes!in!surface!coastal!waters.!Proc.)Natl.)Acad.)Sci.,!vol.!110,!no!15,!p6004–6009.!

V.!Iverson,!R.M.!Morris,!C.D.!Frazar,!C.T.!Berthiaume,!R.L.!Morales!&!E.V.!Armbrust;!2012.!Untangling!genomes!from!metagenomes:!revealing!an!uncultured!class!of!marine!Euryarchaeota.!Science,!vol.!335,!no!6068,!p587–590.!

M.!Könneke,!A.E.!Bernhard,!R.!José,!C.B.!Walker,!J.B.!Waterbury!&!D.A.!Stahl;!2005.!Isolation!of!an!autotrophic!ammoniaGoxidizing!marine!archaeon.!Nature,!vol.!437,!no!7058,!p543–546.!

F.!Muller,!T.!Brissac,!N.!Le!Bris,!H.!Felbeck!&!O.!Gros;!2010.!First!description!of!giant!Archaea!(Thaumarchaeota)!associated!with!putative!bacterial!ectosymbionts!in!a!sulfidic!marine!habitat.!Environ.)Microbiol.,!vol.!12,!no!8,!p2371–2383.!

M.!Pester,!C.!Schleper!&!M.!Wagner;!2011.!The!Thaumarchaeota:!an!emerging!view!of!their!phylogeny!and!ecophysiology.!Curr.)Opin.)Microbiol.,!vol.!14,!no!3,!p300–306.!

A.J.!Pinto!&!L.!Raskin;!2012.!PCR!biases!distort!bacterial!and!archaeal!community!structure!in!pyrosequencing!datasets.!PLoS)One,!vol.!7,!no!8,!pe43093.!

A.!Pitcher,!C.!Wuchter,!K.!Siedenberg,!S.!Schouten!&!J.S.!Sinninghe!Damsté;!2011.!Crenarchaeol!tracks!winter!blooms!of!ammoniaGoxidizing!Thaumarchaeota!in!the!coastal!North!Sea.!Limnol.)Oceanogr.,!vol.!56,!no!6,!p2308–2318.!

P.D.!Schloss,!S.L.!Westcott,!T.!Ryabin,!J.R.!Hall,!M.!Hartmann,!E.B.!Hollister,!R.A.!Lesniewski,!B.B.!Oakley,!D.H.!Parks!&!C.J.!Robinson;!2009.!Introducing!mothur:!openGsource,!platformGindependent,!communityGsupported!software!for!describing!and!comparing!microbial!communities.!Appl.)Environ.)Microbiol.,!vol.!75,!no!23,!p7537–7541.!

D.C.!Smith;!2001.!Expansion!of!the!marine!Archaea.!Science,!vol.!293,!no!5527,!p56–57.!

E.!Stackebrandt!&!B.M.!Goebel;!1994.!Taxonomic!note:!a!place!for!DNAGDNA!reassociation!and!16S!rRNA!sequence!analysis!in!the!present!species!definition!in!bacteriology.!Int.)J.)Syst.)Bacteriol.,!vol.!44,!no!4,!p846–849.!

A.H.!Treusch,!S.!Leininger,!A.!Kletzin,!S.C.!Schuster,!H.GP.!Klenk!&!C.!Schleper;!2005.!Novel!genes!for!nitrite!reductase!and!AmoGrelated!proteins!indicate!a!role!of!uncultivated!mesophilic!crenarchaeota!in!nitrogen!cycling.!Environ.)Microbiol.,!vol.!7,!no!12,!p1985–1995.!

C.B.!Walker,!J.R.!De!La!Torre,!M.G.!Klotz,!H.!Urakawa,!N.!Pinel,!D.J.!Arp,!C.!BrochierGArmanet,!P.S.G.!Chain,!P.P.!Chan!&!A.!Gollabgir;!2010.!Nitrosopumilus!maritimus!genome!reveals!

Page 22: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 22!

unique!mechanisms!for!nitrification!and!autotrophy!in!globally!distributed!marine!crenarchaea.!Proc.)Natl.)Acad.)Sci.,!vol.!107,!no!19,!p8818–8823.!

C.R.!Woese!&!G.E.!Fox;!1977.!Phylogenetic!structure!of!the!prokaryotic!domain:!The!primary!kingdoms.!Proc.)Natl.)Acad.)Sci.,!vol.!74,!no!11,!p5088–5090.!

!!!!) !

Page 23: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 23!

ANNEXS! Annex!1!:!Taxonomy!of!the!different!sequences!obtained!with!the!gg_13_5_99 database!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Page 24: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 24!

Annex 2 : Number of sequences per OTUs (OTU 40 contained 31 sequences)

Annex 3: MGII abundant OTUs seasonal variations.

Annex 4: MGIII OTUs seasonal variations

Page 25: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 25!

Annex 5 : Scripts Mothur Script #Sequence treatment #HugoniB.fna #gg_13_5_99.fasta #HugoniB.fasta #gg_13_5_99.pds.tax #HugoniB.oligos #greengenes.align #HugoniB.qual summary.seqs(fasta=HugoniB.fna) #HugoniB.summary trim.seqs(fasta=HugoniB.fna,oligos=HugoniB.oligos, qfile=HugoniB.qual, maxambig=0, maxhomop=8, bdiffs=1, pdiffs=1, qwindowaverage=27, qwindowsize=50, processors=16) #HugoniB.trim.fasta #HugoniB.scrap.qual #HugoniB.trim.qual #HugoniB.scrap.fasta #HugoniB.groups summary.seqs(fasta=HugoniB.trim.fasta) #HugoniB.trim.summary unique.seqs(fasta=HugoniB.trim.fasta) #HugoniB.unique.fasta #HugoniB.trim.names summary.seqs(fasta=HugoniB.tri.fasta, name=HugoniB.trim.names) align.seqs(fasta=HugoniB.trim.unique.fasta, reference=greengenes.align, processors=16) #HugoniB.trim.unique.align #HugoniB.trim.unique.flip.accnos #HugoniB.trim.unique.align.report summary.seqs(fasta=HugoniB.trim.unique.align) #HugoniB.trim.unique.summary screen.seqs(fasta=HugoniB.trim.unique.align, name=HugoniB.trim.names, group=HugoniB.groups, start=1919, end=2065, minlength=200, processors=16) #HugoniB.trim.unique.good.align #HugoniB.good.groups #HugoniB.trim.unique.bad.accnos #HugoniB.trim.good.names summary.seqs(fasta=HugoniB.trim.unique.good.align, name=HugoniB.trim.good.names) #HugoniB.trim.unique.good.summary filter.seqs(fasta=HugoniB.trim.unique.good.align, vertical=T, trump=., processors=16) #HugoniB.filter #HugoniB.trim.unique.good.filter.fasta summary.seqs(fasta=HugoniB.trim.unique.good.filter.fasta, name=HugoniB.trim.good.names) unique.seqs(fasta=HugoniB.trim.unique.good.filter.fasta, name=HugoniB.trim.good.names) #HugoniB.trim.unique.good.filter.names #HugoniB.trim.unique.good.filter.unique.fasta summary.seqs(fasta=HugoniB.trim.unique.good.filter.unique.fasta, name=HugoniB.trim.unique.good.filter.names) #HugoniB.trim.unique.good.filter.unique.summary pre.cluster(fasta=HugoniB.trim.unique.good.filter.unique.fasta, name=HugoniB.trim.unique.good.filter.names, diffs=1) #HugoniB.trim.unique.good.filter.unique.precluster.fasta #HugoniB.trim.unique.good.filter.unique.precluster.names #HugoniB.trim.unique.good.filter.unique.precluster.map summary.seqs(fasta=HugoniB.trim.unique.good.filter.unique.precluster.fasta, name=HugoniB.trim.unique.good.filter.unique.precluster.names) #HugoniB.trim.unique.good.filter.unique.precluster.summary chimera.uchime(fasta=HugoniB.trim.unique.good.filter.unique.precluster.fasta, name=HugoniB.trim.unique.good.filter.unique.precluster.names) #HugoniB.trim.unique.good.filter.unique.precluster.uchime.chimeras #HugoniB.trim.unique.good.filter.unique.precluster.uchime.accnos remove.seqs(accnos=HugoniB.trim.unique.good.filter.unique.precluster.uchime.accnos, fasta=HugoniB.trim.unique.good.filter.unique.precluster.fasta, name=HugoniB.trim.unique.good.filter.unique.precluster.names, group=HugoniB.good.groups) #HugoniB.trim.unique.good.filter.unique.precluster.pick.names #HugoniB.trim.unique.good.filter.unique.precluster.pick.fasta #HugoniB.good.pick.groups #HugoniB.trim.unique.good.filter.unique.precluster.pick.rdp.wang.taxonomy #HugoniB.trim.unique.good.filter.unique.precluster.pick.rdp.wang.tax.summary classify.seqs(fasta=HugoniB.trim.unique.good.filter.unique.precluster.pick.fasta, template=gg_13_5_99.fasta, name=HugoniB.trim.unique.good.filter.unique.precluster.pick.names taxonomy=gg_13_5_99.pds.tax, processors=16) #HugoniB.trim.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy #HugoniB.trim.unique.good.filter.unique.precluster.pick.pds.wang.tax.summary remove.lineage(fasta=HugoniB.trim.unique.good.filter.unique.precluster.pick.fasta, name=HugoniB.trim.unique.good.filter.unique.precluster.pick.names, taxonomy=HugoniB.trim.unique.good.filter.unique.precluster.pick.pds.wang.taxonomy, group=HugoniB.good.pick.groups, taxon=Bacteria) #HugoniB.trim.unique.good.filter.unique.precluster.pick.pds.wang.pick.taxonomy

Page 26: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 26!

#HugoniB.trim.unique.good.filter.unique.precluster.pick.pick.names #HugoniB.trim.unique.good.filter.unique.precluster.pick.pick.fasta #HugoniB.good.pick.pick.groups summary.seqs(fasta=current, name=current) #HugoniB.trim.unique.good.filter.unique.precluster.pick.pick.summary system(mv HugoniB.trim.unique.good.filter.unique.precluster.pick.pick.fasta HugnoniB.final3.fasta) system(mv HugoniB.trim.unique.good.filter.unique.precluster.pick.pick.names HugnoniB.final3.names) system(mv HugoniB.good.pick.pick.groups HugoniB.final3.groups) #ANALYSE ß-DIVERSITÉ - OTUs dist.seqs(fasta=HugnoniB.final.fasta, output=lt) #HugnoniB.final.phylip.dist cluster(phylip=HugnoniB.final.phylip.dist, name=HugnoniB.final.names) #HugnoniB.final.phylip.an.sabund #HugnoniB.final.phylip.an.list #HugnoniB.final.phylip.an.rabund make.shared(list=HugnoniB.final.phylip.an.list, label=0.03, group=HugoniB.final1.groups) #HugnoniB.final.phylip.an.shared tree.shared(calc=jclass-morisitahorn) #HugnoniB.final.phylip.an.jclass.0.03.tre #HugnoniB.final.phylip.an.morisitahorn.0.03.tre dist.shared(shared=HugnoniB.final.phylip.an.shared, calc=morisitahorn-jclass) #HugnoniB.final.phylip.an.morisitahorn.0.03.lt.dist #HugnoniB.final.phylip.an.jclass.0.03.lt.dist pcoa(phylip=HugnoniB.final.phylip.an.morisitahorn.0.03.lt.dist) #HugnoniB.final.phylip.an.morisitahorn.0.03.lt.pcoa.axes #HugnoniB.final.phylip.an.morisitahorn.0.03.lt.pcoa.loadings pcoa(phylip=HugnoniB.final.phylip.an.jclass.0.03.lt.dist) #HugnoniB.final.phylip.an.jclass.0.03.lt.pcoa.axes #HugnoniB.final.phylip.an.jclass.0.03.lt.pcoa.loadings clearcut(phylip=HugnoniB.final.phylip.dist) #HugnoniB.final.phylip.tre unifrac.weighted(tree=HugnoniB.final.phylip.tre, name=HugnoniB.final.names, group=HugoniB.final1.groups, distance=lt) #HugnoniB.final.phylip.trewsummary #HugnoniB.final.phylip.tre1.weighted.phylip.dist pcoa(phylip=HugnoniB.final.phylip.tre1.weighted.phylip.dist) #HugnoniB.final.phylip.tre1.weighted.phylip.pcoa.axes #HugnoniB.final.phylip.tre1.weighted.phylip.pcoa.loadings unifrac.unweighted(tree=HugnoniB.final.phylip.tre, name=HugnoniB.final.names, group=HugoniB.final1.groups, distance=lt) #HugnoniB.final.phylip.uwsummary #HugnoniB.final.phylip.tre1.unweighted.phylip.dist pcoa(phylip=HugnoniB.final.phylip.tre1.unweighted.phylip.dist) #HugnoniB.final.phylip.tre1.unweighted.phylip.pcoa.axes #HugnoniB.final.phylip.tre1.unweighted.phylip.pcoa.loadings - Phylotypes phylotype(taxonomy=HugoniB.final.taxonomy, name=HugnoniB.final.names) #HugoniB.final.tx.sabund #HugoniB.final.tx.rabund #HugoniB.final.tx.list make.shared(list=HugoniB.final.tx.list, group=HugoniB.final1.groups, label=1) #HugoniB.final.tx.shared dist.shared(shared=HugoniB.final.tx.shared, calc=morisitahorn-jclass) #HugoniB.final.tx.morisitahorn.1.lt.dist #HugoniB.final.tx.jclass.1.lt.dist pcoa(phylip=HugoniB.final.tx.morisitahorn.1.lt.dist) #HugoniB.final.tx.morisitahorn.1.lt.pcoa.axes #HugoniB.final.tx.morisitahorn.1.lt.pcoa.loadings pcoa(phylip=HugoniB.final.tx.jclass.1.lt.dist) #HugoniB.final.tx.jclass.1.lt.pcoa.axes #HugoniB.final.tx.jclass.1.lt.pcoa.loadings Oligotyping script # specific fasta preparation # Add taxonomy to the defline in the mothur fasta deunique.seqs(fasta=HugoniB.trim.unique.good.filter.unique.precluster.pick.fasta, name=HugoniB.trim.unique.good.filter.unique.precluster.pick.names) #HugoniB.trim.redundant.fasta add-tax-to-mothur-defline.py myfasta mytaxonomy

Page 27: Rapport de stage-06 01 - univ-brest.frpagesperso.univ-brest.fr/~maignien/doc/Lemonnier... · ! 3! I. Introduction! Pelagic Archaeas A characterization of the third domain of life,

! 27!

awk '/Nitrosopumilus/{print;getline;print}' HugoniB.trim.redundant.pds.wang.taxonomy.fa > Hugoni.Nitrosawk.fasta sed "s/__/_/" Hugoni.Nitrosawk.fasta > Hugoni.NitrosopumilusF.fasta awk 'BEGIN{FS="\t"}{print $1}' HugoniB.trim.redundant.NitrosopumilusF.fasta > HugoniB.trim.redundant.Nitrosopumilus.fasta #Oligotyping o-pad-with-gaps HugoniB.trim.redundant.Nitrosopumilus.fasta entropy-analysis HugoniB.trim.redundant.Nitrosopumilus.fasta-PADDED-WITH_GAPS #Nitrosopumilus oligotype HugoniB.trim.redundant.Nitrosopumilus.fasta-PADDED-WITH_GAPS HugoniB.trim.redundant.Nitrosopumilus.fasta-PADDED-WITH_GAPS-ENTROPY -C 84,86,110,112,145,185,192 -a 0 -s 1 --gen-html #MGII oligotype HugoniB.trim.redundant.MGII.fasta-PADDED-WITH_GAPS HugoniB.trim.redundant.MGII.fasta-PADDED-WITH_GAPS-ENTROPY –C 66,68,74,84,86,88,96,97,99,108,110,112,121,124,126,127,140,147,185,239, 241,243,244 –A 10 –s 1 --gen-html #MGIII oligotype HugoniB.trim.redundant.MGIII.fasta-PADDED-WITH_GAPS HugoniB.trim.redundant.MGIII.fasta-PADDED-WITH_GAPS-ENTROPY –c 5 –a 0.0 –s 1 --gen-html R Script shared<- read.table("Shared_total.txt", header=T) braydist<- vegdist(shared, method="bray") nmds<- metaMDS(braydist) metadata<- read.table("metadata.txt", header=T) View(metadata) # Example for Saisons block<- metadata$Saisons ordiplot(nmds, display="si", type="n") points(nmds, display="sites", select=which(block=="Hivers"), pch=21, col="black", bg="light blue") points(nmds, display="sites", select=which(block=="Printemps"), pch=21, col="black", bg="orange") points(nmds, display="sites", select=which(block=="Ete"), pch=21, col="black", bg="red") ordispider(nmds, group=block,show.groups="Hivers") ordispider(nmds, group=block,show.groups="Printemps") ordispider(nmds, group=block,show.groups="Ete") ordiellipse(nmds, group=block, show.groups="Hivers", kind="sd",conf=0.95, col="light blue") ordiellipse(nmds, group=block, show.groups="Printemps", kind="sd",conf=0.95, col="orange") ordiellipse(nmds, group=block, show.groups="Ete", kind="sd",conf=0.95, col="red") ordiplot(nmds, display="si") title("NMDS plot with Bray dist ", "Seasons") legend("bottomright", legend=c("Winter", "Spring", "Summer"), col=c("light blue", "orange", "red"), pch=21, bty="n")