66
1 Eukaryotic virus composition can predict the efficiency of carbon export in the 1 global ocean 2 Hiroto Kaneko 1,+ , Romain Blanc-Mathieu 1,2,+ , Hisashi Endo 1 , Samuel Chaffron 3,4 , 3 Tom O. Delmont 4,5 , Morgan Gaia 4,5 , Nicolas Henry 6 , Rodrigo Hernández-Velázquez 1 , 4 Canh Hao Nguyen 1 , Hiroshi Mamitsuka 1 , Patrick Forterre 7 , Olivier Jaillon 4,5 , 5 Colomban de Vargas 6 , Matthew B. Sullivan 8 , Curtis A. Suttle 9 , Lionel Guidi 10 and 6 Hiroyuki Ogata 1,* 7 + Equal contribution 8 * Corresponding author 9 Affiliations: 10 1: Bioinformatics Center, Institute for Chemical Research, Kyoto University, 11 Gokasho, Uji, Kyoto 611-0011, Japan 12 2: Laboratoire de Physiologie Cellulaire & Végétale, CEA, Univ. Grenoble Alpes, 13 CNRS, INRA, IRIG, Grenoble, France. 14 3: Université de Nantes, CNRS UMR 6004, LS2N, F-44000 Nantes, France. 15 4: Research Federation (FR2022) Tara Oceans GO-SEE, Paris, France 16 5: Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, 91000 17 Evry, France 18 6: Sorbonne Universités, CNRS, Laboratoire Adaptation et Diversité en Milieu Marin, 19 Station Biologique de Roscoff, 29680 Roscoff, France 20 7: Institut Pasteur, Department of Microbiology, 25 rue du Docteur Roux, 75015, 21 Paris, France 22 Manuscript . CC-BY-NC 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted October 7, 2020. ; https://doi.org/10.1101/710228 doi: bioRxiv preprint

Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

1

Eukaryotic virus composition can predict the efficiency of carbon export in the 1

global ocean 2

Hiroto Kaneko1,+, Romain Blanc-Mathieu1,2,+, Hisashi Endo1, Samuel Chaffron3,4, 3

Tom O. Delmont4,5, Morgan Gaia4,5, Nicolas Henry6, Rodrigo Hernández-Velázquez1, 4

Canh Hao Nguyen1, Hiroshi Mamitsuka1, Patrick Forterre7, Olivier Jaillon4,5, 5

Colomban de Vargas6, Matthew B. Sullivan8, Curtis A. Suttle9, Lionel Guidi10 and 6

Hiroyuki Ogata1,* 7

+ Equal contribution 8

* Corresponding author 9

Affiliations: 10

1: Bioinformatics Center, Institute for Chemical Research, Kyoto University, 11

Gokasho, Uji, Kyoto 611-0011, Japan 12

2: Laboratoire de Physiologie Cellulaire & Végétale, CEA, Univ. Grenoble Alpes, 13

CNRS, INRA, IRIG, Grenoble, France. 14

3: Université de Nantes, CNRS UMR 6004, LS2N, F-44000 Nantes, France. 15

4: Research Federation (FR2022) Tara Oceans GO-SEE, Paris, France 16

5: Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, 91000 17

Evry, France 18

6: Sorbonne Universités, CNRS, Laboratoire Adaptation et Diversité en Milieu Marin, 19

Station Biologique de Roscoff, 29680 Roscoff, France 20

7: Institut Pasteur, Department of Microbiology, 25 rue du Docteur Roux, 75015, 21

Paris, France 22

Manuscript.CC-BY-NC 4.0 International licenseavailable under a

not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 2: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

2

8: Department of Microbiology and Department of Civil, Environmental and Geodetic 23

Engineering, Ohio State University, Columbus, OH, United States of America 24

9: Departments of Earth, Ocean & Atmospheric Sciences, Microbiology & 25

Immunology, and Botany, and the Institute for the Oceans and Fisheries, University 26

of British Columbia, Vancouver, BC, V6T 1Z4, Canada 27

10: Sorbonne Université, CNRS, Laboratoire d’Océanographie de Villefanche, LOV, 28

F-06230 Villefranche-sur-mer, France 29

30

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 3: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

3

Summary 31

The biological carbon pump, in which carbon fixed by photosynthesis is exported to 32

the deep ocean through sinking, is a major process in Earth’s carbon cycle. The 33

proportion of primary production that is exported is termed the carbon export 34

efficiency (CEE). Based on in-lab or regional scale observations, viruses were 35

previously suggested to affect the CEE (i.e., viral “shunt” and “shuttle”). In this study, 36

we tested associations between viral community composition and CEE measured at a 37

global scale. A regression model based on relative abundance of viral marker genes 38

explained 67% of the variation in CEE. Viruses with high importance in the model 39

were predicted to infect ecologically important hosts. These results are consistent with 40

the view that the viral shunt and shuttle functions at a large scale and further imply 41

that viruses likely act in this process in a way dependent on their hosts and ecosystem 42

dynamics. 43

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 4: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

4

Introduction 44

A major process in the global cycling of carbon is the oceanic biological carbon pump 45

(BCP), an organism-driven process by which atmospheric carbon (i.e., CO2) is 46

transferred and sequestered to the ocean interior and seafloor for periods ranging from 47

centuries to hundreds of millions of years. Between 15% and 20% of net primary 48

production (NPP) is exported out of the euphotic zone, with 0.3% of fixed carbon 49

reaching the seafloor annually (Zhang et al., 2018). However, there is wide variation 50

in estimates of the proportion of primary production in the surface ocean that is 51

exported to depth, ranging from 1% in the tropical Pacific to 35-45% during the North 52

Atlantic bloom (Buesseler and Boyd, 2009). As outlined below, many factors affect 53

the BCP. 54

Of planktonic organisms living in the upper layer of the ocean, diatoms 55

(Tréguer et al., 2018) and zooplankton (Turner, 2015) have been identified as 56

important contributors to the BCP in nutrient-replete oceanic regions. In the 57

oligotrophic ocean, cyanobacteria, collodarians (Lomas and Moran, 2011), diatoms 58

(Agusti et al., 2015; Karl et al., 2012; Leblanc et al., 2018), and other small (pico- to 59

nano-) plankton (Lomas and Moran, 2011) have been implicated in the BCP. 60

Sediment trap studies suggest that ballasted aggregates of plankton with biogenic 61

minerals contribute to carbon export to the deep sea (Iversen and Ploug, 2010; Klaas 62

and Archer, 2002). The BCP comprises three processes: carbon fixation, export, and 63

remineralization. As these processes are governed by complex interactions between 64

numerous members of planktonic communities (Zhang et al., 2018), the BCP is 65

expected to involve various organisms, including viruses (Zimmerman et al., 2019a). 66

Viruses have been suggested to regulate the efficiency of the BCP. Lysis of 67

host cells by viruses releases cellular material in the form of dissolved organic matter 68

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 5: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

5

(DOM), which fuels the microbial loop and enhances respiration and secondary 69

production (Gobler et al., 1997; Weitz et al., 2015). This process, coined “viral shunt 70

(Wilhelm and Suttle, 1999)”, can reduce the carbon export efficiency (CEE) because 71

it increases the retention of nutrients and carbon in the euphotic zone and prevents 72

their transfer to higher trophic levels as well as their export from the euphotic zone to 73

the deep sea (Fuhrman, 1999; Weitz et al., 2015). However, an alternative process is 74

also considered, in which viruses contribute to the vertical carbon export (Weinbauer, 75

2004). For instance, a theoretical study proposed that the CEE increases if viral lysis 76

augments the ratio of exported carbon relative to the primary production-limiting 77

nutrients (nitrogen and phosphorous) (Suttle, 2007). Laboratory experimental studies 78

reported that cells infected with viruses form larger particles (Peduzzi and Weinbauer, 79

1993; Yamada et al., 2018), can sink faster (Lawrence and Suttle, 2004), and can lead 80

to preferential grazing by heterotrophic protists (Evans and Wilson, 2008) and/or to 81

higher growth of grazers (Goode et al., 2019). This process termed “viral shuttle 82

(Sullivan et al., 2017)” is supported by several field studies that reported association 83

of viruses with sinking material. Viruses were observed in sinking material in the 84

North Atlantic Ocean (Proctor and Fuhrman, 1991) and sediment of coastal waters 85

where algal blooms occur (Lawrence et al., 2002; Tomaru et al., 2007, 2011). In 86

addition, vertical transport of bacterial viruses between photic and aphotic zones was 87

observed in the Pacific Ocean (Hurwitz et al., 2015) and in Tara Oceans virome data 88

(Brum et al., 2015). A systematic analysis of large-scale omics data from oligotrophic 89

oceanic regions revealed a strong positive association between carbon flux and 90

bacterial dsDNA viruses (i.e., cyanophages), which were previously unrecognized as 91

possible contributors to the BCP (Guidi et al., 2016). More recently, viral infection of 92

blooms of the photosynthetic eukaryote Emiliania huxleyi in the North Atlantic were 93

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 6: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

6

found to be accompanied by particle aggregation and greater downward vertical flux 94

of carbon, with the highest export during the early stage of viral infection (Laber et 95

al., 2018). These studies raise the question of the overall impact of viruses infecting 96

eukaryotes on oceanic carbon cycling and export. Given the significant contributions 97

of eukaryotic plankton to ocean biomass and net production (Hirata et al., 2011; Li, 98

1995) and their observed predominance over prokaryotes in sinking materials of 99

Sargasso Sea oligotrophic surface waters (Fawcett et al., 2011; Lomas and Moran, 100

2011), various lineages of eukaryotic viruses may be responsible for a substantial part 101

of the variation in carbon export across oceanic regions. 102

If the “viral shunt” and “shuttle” processes function at a global scale and if 103

these involve specific viruses, we expect to detect a statistical association between 104

viral community composition and CEE in a large scale omics data. To our knowledge, 105

such an association has never been investigated. Although this test per se does not 106

prove that viruses regulate CEE, we consider the association is worth being tested 107

because such an association is a necessary condition for the global model of viral 108

shunt and shuttle and, under its absence, we will have to reconsider the model. Deep 109

sequencing of planktonic community DNA and RNA, as carried out in Tara Oceans, 110

has enabled the identification of marker genes of major viral groups infecting 111

eukaryotes (Hingamp et al., 2013; Carradec et al., 2018; Culley, 2018). To examine 112

the association between viral community composition and CEE, we thus used the 113

comprehensive organismal dataset from the Tara Oceans expedition (Carradec et al., 114

2018; Sunagawa et al., 2015), as well as related measurements of carbon export 115

estimated from particle concentrations and size distributions observed in situ (Guidi et 116

al., 2016). 117

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 7: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

7

In the present study, we identified several hundred marker-gene sequences of 118

nucleocytoplasmic large DNA viruses (NCLDVs) in metagenomes of 0.2–3 m size 119

fraction. We also identified RNA and ssDNA viruses in metatranscriptomes of four 120

eukaryotic size fractions spanning 0.8 to 2,000 m. The resulting profiles of viral 121

distributions were compared with an image-based measure of carbon export efficiency 122

(CEE), which is defined as the ratio of the carbon flux at depth to the carbon flux at 123

surface. 124

Results and Discussion 125

Detection of diverse eukaryotic viruses in Tara Oceans gene catalogs 126

We used profile hidden Markov model-based homology searches to identify marker-127

gene sequences of eukaryotic viruses in two ocean gene catalogs. These catalogs were 128

previously constructed from environmental shotgun sequence data of samples 129

collected during the Tara Oceans expedition. The first catalog, the Ocean Microbial 130

Reference Gene Catalog (OM-RGC), contains 40 million non-redundant genes 131

predicted from the assemblies of Tara Oceans viral and microbial metagenomes 132

(Sunagawa et al., 2015). We searched this catalog for NCLDV DNA polymerase 133

family B (PolB) genes, as dsDNA viruses may be present in microbial metagenomes 134

because large virions (> 0.2 μm) have been retained on the filter or because viral 135

genomes actively replicating or latent within picoeukaryotic cells have been captured. 136

The second gene catalog, the Marine Atlas of Tara Oceans Unigenes (MATOU), 137

contains 116 million non-redundant genes derived from metatranscriptomes of single-138

cell microeukaryotes and small multicellular zooplankton (Carradec et al., 2018). We 139

searched this catalog for NCLDV PolB genes, RNA-dependent RNA polymerase 140

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 8: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

8

(RdRP) genes of RNA viruses, and replication-associated protein (Rep) genes of 141

ssDNA viruses, since transcripts of viruses actively infecting their hosts, as well as 142

genomes of RNA viruses, have been captured in this catalog. 143

We identified 3,874 NCLDV PolB sequences (3,486 in metagenomes and 388 144

in metatranscriptomes), 975 RNA virus RdRP sequences, and 299 ssDNA virus Rep 145

sequences (Table 1). These sequences correspond to operational taxonomic units 146

(OTUs) at a 95% identity threshold. All except 17 of the NCLDV PolBs from 147

metagenomes were assigned to the families Mimiviridae (n = 2,923), 148

Phycodnaviridae (n = 348), and Iridoviridae (n = 198) (Table 1). The larger numbers 149

of PolB sequences assigned to Mimiviridae and Phycodnaviridae compared with other 150

NCLDV families are consistent with a previous observation based on a smaller 151

dataset (Hingamp et al., 2013). The divergence between these environmental 152

sequences and reference sequences from known viral genomes was greater in 153

Mimiviridae than in Phycodnaviridae (Figure 1a, S1a and S2). Within Mimiviridae, 154

83% of the sequences were most similar to those from algae-infecting Mimivirus 155

relatives. Among the sequences classified in Phycodnaviridae, 93% were most similar 156

to those in Prasinovirus, whereas 6% were closest to Yellowstone lake phycodnavirus, 157

which is closely related to Prasinovirus. Prasinoviruses are possibly over-represented 158

in the metagenomes because the 0.2 to 3 m size fraction selects their picoeukaryotic 159

hosts. RdRP sequences were assigned mostly to the order Picornavirales (n = 325), 160

followed by the families Partitiviridae (n = 131), Narnaviridae (n = 95), 161

Tombusviridae (n = 45), and Virgaviridae (n = 33) (Table 1), with most sequences 162

being distant (30% to 40% amino acid identity) from reference viruses (Figures 1b, 163

S1b and S3). These results are consistent with previous studies on the diversity of 164

marine RNA viruses, in which RNA virus sequences were found to correspond to 165

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 9: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

9

diverse positive-polarity ssRNA and dsRNA viruses distantly related to well-166

characterized viruses (Culley, 2018). Picornavirales may be over-represented in the 167

metatranscriptomes because of the polyadenylated RNA selection. The majority (n = 168

201) of Rep sequences were annotated as Circoviridae, known to infect animals, 169

which is consistent with a previous report (Wang et al., 2018). Only eight were 170

annotated as plant ssDNA viruses (families Nanovoridae and Gemniviridae) (Table 171

1). Most of these environmental sequences are distant (40% to 50% amino acid 172

identity) from reference sequences (Figures 1c, S1c and S4). Additional 388 NCLDV 173

PolBs were detected in the metranscriptomes. The average cosmopolitanism (number 174

of samples where an OTU was observed by at least two reads) for PolBs in 175

metagenomes was 23 samples against 2.9 for metranscriptome-derived PolB 176

sequences, 5.5 for Reps, and 5.8 for RdRPs. Within metatranscriptomes, the average 177

gene-length normalized read counts for PolBs were respectively ten and three times 178

lower than those of RdRPs and Reps. Therefore, PolBs from metatranscriptomes were 179

not further used in our study. 180

Composition of eukaryotic viruses can explain the variation of carbon 181

export efficiency 182

Among the PolB, RdRP, and Rep sequences identified in the Tara Oceans gene 183

catalogs, 38%, 18%, and 11% (total = 1,523 sequences), respectively, were present in 184

at least five samples and had matching carbon export measurement data (Table 1). We 185

used the relative abundance (defined as the centered log-ratio transformed gene-length 186

normalized read count) profiles of these 1,523 marker-gene sequences at 59 sampling 187

sites in the photic zone of 39 Tara Oceans stations (Figure 2) to test for association 188

between their composition and a measure of carbon export efficiency (CEE, see 189

Transparent Methods, Figure S5). A partial least squares (PLS) regression model 190

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 10: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

10

explained 67% (coefficient of determination R2 = 67%) of the variation in CEE with a 191

Pearson correlation coefficient of 0.84 between observed and predicted values. This 192

correlation was confirmed to be statistically significant by permutation test (P < 1 193

104) (Figure 3a). 194

We also tested for their association with estimates of carbon export flux at 150 195

meters (CE150) and NPP. PLS regressions explained 54% and 64% of the variation in 196

CE150 and NPP with Pearson correlation coefficients between observed and predicted 197

values of 0.74 (permutation test, P < 1 104) and 0.80 (permutation test, P < 1 198

104), respectively (Figure S6). In these three PLS regression models, 83, 86, and 97 199

viruses were considered to be key predictors (i.e., Variable Importance in the 200

Projection [VIP] score > 2) of CEE, CE150, and NPP, respectively. PLS models for 201

NPP and CE150 shared a larger number of predictors (52 viruses) compared to the PLS 202

models for NPP and CEE (seven viruses) (two proportion Z-test, P = 4.14 1012). 203

Consistent with this observation, CE150 was correlated with NPP (Pearson’s r = 0.77; 204

parametric test, P < 1 1012). This result implies that the magnitude of export in the 205

analyzed samples was partly constrained by primary productivity. However, CEE was 206

not correlated with NPP (r = 0.16; parametric test, P = 0.2) or CE150 (r = 0.002; 207

parametric test, P = 0.99). Thus, as expected, primary productivity was not a major 208

driver for the efficiency of carbon export. 209

The 83 viruses (5% of the viruses included in our analysis) that were 210

associated with CEE with a VIP score > 2 are considered to be important predictors of 211

CEE in the PLS regression (Figure 3b, Supplemental Data 1), and these viruses are 212

hereafter referred to as VIPs (Viruses Important in the Prediction). Fifty-eight VIPs 213

had positive regression coefficient, and 25 had negative regression coefficient in the 214

prediction (Figure 3b). Most of the positively associated VIPs showed high relative 215

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 11: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

11

abundance in the Mediterranean Sea and in the Indian Ocean where CEE tends to be 216

high compared with other oceanic regions (Figure 4). Among them, 15 (red labels in 217

Figure 4) also had high relative abundance in samples from other oceanic regions, 218

showing that these viruses are associated with CEE at a global scale. In contrast, 219

negatively associated VIPs tend to have higher relative abundance in the Atlantic 220

Ocean and the Southern Pacific Ocean where CEE is comparatively lower. In the 221

following sections, we investigate potential hosts of the VIPs in order to interpret the 222

statistical association between viral community composition and CEE in the light of 223

previous observations in the literature. 224

Viruses correlated with CEE infect ecologically important hosts 225

Most of the VIPs (77 of 83) belong to Mimiviridae (n = 34 with 25 positive 226

VIPs and nine negative VIPs), Phycodnaviridae (n = 24 with 18 positive VIPs and six 227

negative VIPs), and ssRNA viruses of the order Picornavirales (n = 19 with 13 228

positive VIPs and six negative VIPs) (Figure 3b, Table S1). All the phycodnavirus 229

VIPs were most closely related to prasinoviruses infecting Mamiellales, with amino 230

acid sequence percent identities to reference sequences ranging between 35% and 231

95%. The six remaining VIPs were two NCLDVs of the family Iridoviridae 232

negatively associated with CEE, three RNA viruses (two ssRNA viruses of the family 233

Hepeviridae negatively associated with CEE and one dsRNA virus of the family 234

Partitiviridae positively associated with CEE), and one ssDNA virus of the family 235

Circoviridae positively associated with CEE. 236

Host information may help understand the relationship between these VIPs 237

and CEE. We performed genomic context analysis for PolB VIPs and phylogeny-238

guided network-based host prediction for PolB and RdRP to infer putative virus–host 239

relationships (see Transparent Methods). 240

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 12: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

12

Taxonomic analysis of genes predicted in 10 metagenome-assembled genomes 241

(MAGs) from the eukaryotic size fractions and 65 genome fragments (contigs) 242

assembled from the prokaryotic size fraction encoding VIP PolBs further confirmed 243

their identity as Mimiviridae or Phycodnaviridae (Figure S7). The size of MAGs 244

ranged between 30 kbp and 440 kbp with an average of 210 kbp (Table S2). The 245

presence of genes with high sequence similarities to cellular genes in a viral genome 246

is suggestive of a virus–host relationship (Monier et al., 2009; Yoshikawa et al., 247

2019). Two closely related Mimiviridae VIPs, PolB 000079111 (positively associated 248

with CEE) and PolB 000079078 (negatively associated with CEE), were 249

phylogenetically close to the pelagophyte virus Aureococcus anophagefferens virus 250

(AaV). One MAG (268 kbp in size) corresponding to PolB 000079111 encoded seven 251

genes showing high similarities to genes from Pelagophyceae, and another MAG (382 252

kbp in size), corresponding to PolB 000079078, encoded five genes similar to genes 253

from Pelagophyceae. All but one of these 12 genes was encoded on a genome 254

fragment containing genes annotated as viral, including five NCLDV core genes 255

(Supplemental Data 2), excluding the possibility of contamination in these MAGs. 256

Two closely related Phycodnaviridae VIPs, PolB 001064263 and 010288541, were 257

positively associated with CEE. Both of these PolBs correspond to a MAG (134 kbp 258

in size) encoding one gene likely derived from Mamiellales. The genomic fragment 259

harboring this cellular gene was found to encode 10 genes annotated as viral 260

(Supplemental Data 2). 261

We conducted a phylogeny-guided, network-based host prediction analysis for 262

Mimiviridae, Phycodnaviridae, and Picornavirales (Figures S8 and S9). Only a subset 263

of the VIPs was included in this analysis because we kept the most reliable sequences 264

to obtain a well-resolved tree topology. Within the Prasinovirus clade, which 265

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 13: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

13

contained thirteen VIPs (nine positive and four negative), seven different eukaryotic 266

orders were detected as predicted host groups for 10 nodes in the tree. Mamiellales, 267

the only known host group of prasinoviruses, was detected at eight nodes (five of 268

them had no parent-to-child relationships), whereas the other six eukaryotic orders 269

were found at only one node (or two in the case of Eutreptiales) (Figure S8). The 270

order Mamiellales includes three genera (Micromonas, Ostreococcus, and 271

Bathycoccus), which are bacterial-sized green microalgae common in coastal and 272

oceanic environments and are considered to be influential actors in oceanic systems 273

(Monier et al., 2016). Various prasinoviruses (fourteen with available genome 274

sequences) have been isolated from the three genera. 275

Within the family Mimiviridae, which contains fifteen VIPs (10 positive and 276

five negative), twelve different orders were predicted as putative host groups (Figure 277

S8). Collodaria was detected at 15 nodes (two of them had no parent-to-child 278

relationships), and Prymnesiales at six nodes (three of them had no parent-to-child 279

relationships), whereas all other orders were present at a maximum of one node each 280

with no parent-to-child relationships. The nodes enriched for Prymnesiales and 281

Collodaria fell within a monophyletic clade (marked by a red arrow in Figure S8) 282

containing four reference haptophyte viruses infecting Prymnesiales and two 283

reference haptophyte viruses infecting Phaeocystales. Therefore, the environmental 284

PolB sequences in this Mimiviridae clade (including five positive VIPs and one 285

negative VIP) are predicted to infect Prymnesiales or related haptophytes. The 286

detection of Collodaria may be the result of indirect associations that reflect a 287

symbiotic relationship with Prymnesiales, as some acantharians, evolutionarily related 288

to the Collodaria, are known to host Prymnesiales species (Mars Brisbin et al., 2018). 289

Known species of Prymnesiales and Phaeocystales have organic scales, except one 290

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 14: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

14

Prymnesiales species, Prymnesium neolepis, which bears siliceous scales (Yoshida et 291

al., 2006). Some species can form blooms and colonies. Previous studies revealed the 292

existence of diverse and abundant noncalcifying picohaptophytes in open oceans 293

(Endo et al., 2018; Liu et al., 2009). Haptophytes as a whole have been estimated to 294

contribute from 30% to 50% of the total photosynthetic standing stock across the 295

world ocean (Hirata et al., 2011; Liu et al., 2009). They constitute an important 296

mixotrophic group in oligotrophic waters (Endo et al., 2018), and mixotrophy is 297

proposed to increase vertical carbon flux by enabling the uptake of organic forms of 298

nutrients (Ward and Follows, 2016). Clear host prediction was not made for the other 299

nine Mimiviridae VIPs shown in the phylogenetic tree. Three VIPs (two positive and 300

one negative) in the tree were relatives of AaV. One negatively associated VIP was a 301

relative of Cafeteria roenbergensis virus infecting a heterotrophic protist. The five 302

remaining Mimiviridae VIPs are very distant from any known Mimiviridae. 303

Sixteen Picornavirales VIPs (eleven positive and five negative) were included 304

in the phylogeny-guided, network-based host prediction analysis (Figure 9). Nine 305

(seven positive and two negative) were grouped within Dicistroviridae (known to 306

infect insects) and may therefore infect marine arthropods such as copepods, the most 307

ubiquitous and abundant mesozooplankton groups involved in carbon export (Turner, 308

2015). Three other Picornavirales VIPs were placed within a clade containing known 309

bacillarnaviruses. Two of them (35179764 and 33049404) were positively associated 310

with CEE and had diatoms of the order Chaetocerotales as a predicted host group. The 311

third one (107558617) was negatively associated with CEE and distant from other 312

bacillarnaviruses, and had no host prediction. Diatoms have been globally observed in 313

the deep sea (Agusti et al., 2015; Leblanc et al., 2018) and identified as important 314

contributors of the biological carbon pump (Tréguer et al., 2018). One positively 315

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 15: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

15

associated VIP (32150309) was in a clade containing Aurantiochytrium single-316

stranded RNA virus (AsRNAV), infecting a marine fungoid protist thought to be an 317

important decomposer (Takao et al., 2005). The last three Picornavirales VIPs 318

(59731273, 49554577, and 36496887) had no predicted host and were too distant 319

from known Picornavirales to speculate about their putative host group. 320

Outside Picornavirales, three RNA virus VIPs (two Hepeviridae, negatively 321

associated, and one Partitiviridae, positively associated) were identified, for which no 322

reliable host inferences were made by sequence similarity. Known Hepeviridae infect 323

metazoans, and known Partitiviridae infect fungi and plants. The two Hepeviridae-324

like viruses were most closely related to viruses identified in the transcriptomes of 325

mollusks (amino acid identities of 48% for 42335229 and 43% for 77677770) (Shi et 326

al., 2016). The Partitiviridae-like VIP (35713768) was most closely related to a 327

fungal virus, Penicillium stoloniferum virus S (49% amino acid identity). 328

One ssDNA virus VIP (38177659) was positively associated with CEE. It was 329

annotated as a Circoviridae, although it groups with other environmental sequences as 330

an outgroup of known Circoviridae. This VIP was connected with copepod, mollusk, 331

and Collodaria OTUs in the co-occurrence network but no enrichment of predicted 332

host groups was detected for its clade. Circoviridae-like viruses are known to infect 333

copepods (Dunlap et al., 2013) and have been reported to associate with mollusks 334

(Dayaram et al., 2015), but none have been reported for Collodaria. 335

Overall, we could infer hosts for 36 VIPs (Tables S3 and S4). Most of the 336

predicted hosts are known to be ecologically important as primary producers 337

(Mamiellales, Prymnesiales, Pelagophyceae, and diatoms) or grazers (copepods). Of 338

these, diatoms and copepods are well known as important contributors to the BCP but 339

others (i.e., Mamiellales, Prymnesiales, Pelagophyceae) have not been recognized as 340

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 16: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

16

major contributors to the BCP. Our analysis also revealed that positive and negative 341

VIPs are not separated in either the viral or host phylogenies. 342

Viruses positively correlated with CEE tend to interact with silicified 343

organisms 344

The phylogeny-guided, network-based host prediction analysis correctly predicted 345

known virus–host relationships (for viruses infecting Mamiellales, Prymnesiales, and 346

Chaetocerotales) using our large dataset, despite the reported limitations of these co-347

occurrence network-based approaches (Coenen and Weitz, 2018). This result 348

prompted us to further exploit the species co-occurrence networks (Table S5) to 349

investigate functional differences between the eukaryotic organisms predicted to 350

interact with positive VIPs, negative VIPs, and viruses less important for prediction of 351

CEE (VIP score < 2) (non-VIPs). Positive VIPs had a greater proportion of 352

connections with silicified eukaryotes (Q = 0.001), but not with chloroplast-bearing 353

eukaryotes (Q = 0.16) nor calcifying eukaryotes (Q = 1), compared to non-VIPs 354

(Table S6). No functional differences were observed between negative VIPs and non-355

VIPs viruses (Table S6) or positive VIPs (Table S7). 356

Multifarious ways viruses affect the fate of carbon 357

Our analysis revealed that eukaryotic virus composition was able to predict CEE in 358

the global sunlit ocean and 83 out of the 1,523 viruses had a high importance in the 359

predictive model. This association is not a proof that the viruses are the cause of the 360

variation of CEE. For example, a virus may be found to be associated with CEE if its 361

host affects CEE regardless of viral infection. This would be the case especially if 362

latent/persistent viruses are widespread and abundant in phytoplankton (Goic and 363

Saleh, 2012). Organisms that preferentially grow in marine snow (Bochdansky et al., 364

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 17: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

17

2017) may bring associations between viruses infecting those organisms and CEE; 365

this could be the case for the AsRNAV-related VIP that we identified. Alternatively, 366

the observed associations between VIPs and CEE may reflect a more direct causal 367

relationship, which we attempt to explore in light of the large body of literature on the 368

mechanisms by which viruses impact the fate of carbon in the oceans. 369

Among the 83 VIPs, 58 were positively associated with CEE. Such a positive 370

association is expected from the “viral shuttle” model, which states that viral activity 371

could facilitate carbon export to the deep ocean (Fuhrman, 1999; Sullivan et al., 2017; 372

Weinbauer, 2004) because viral infection can facilitate cell sinking (Lawrence and 373

Suttle, 2004) and increase the sizes of particles (Peduzzi and Weinbauer, 1993; 374

Yamada et al., 2018); for instance, a virus may induce secretion of sticky material that 375

contributes to cell/particle aggregation, such as transparent exopolymeric particles 376

(TEP) (Nissimov et al., 2018). The data we used to estimate carbon export are based 377

on the particle size distribution and concentration, and do not convey information 378

regarding the aggregation status of particles. Therefore, we cannot directly test for a 379

relation between viruses and aggregation at the sampling sites. Nonetheless, we found 380

that CEE (i.e., CEdeep/CEsurface) increased with the change of particles size from 381

surface to deep ( = 0.42, P = 8 109) (Figure S10). This positive correlation may 382

reflect an elevated level of aggregation (either enhanced by viral activity or not) in 383

places where CEE is high, although it could be also due to the presence of large 384

organisms at depth. 385

Greater aggregate sinking along with higher particulate carbon fluxes was 386

observed in North Atlantic blooms of Emiliania huxleyi that were infected early by 387

the virus EhV, compared with late-infected blooms (Laber et al., 2018). In the same 388

bloom, viral infection stage was found to proceed with water column depth (Sheyn et 389

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 18: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

18

al., 2018). Enhanced TEP production for these same early infected calcifying 390

populations was observed over a three-day period in deck-board bottle incubations 391

(Laber et al., 2018). Laboratory observations also exist for enhanced TEP production 392

and aggregate formation during the early phase of EhV infection of a calcifying E. 393

huxleyi strain (Laber et al., 2018; Nissimov et al., 2018). These observations strongly 394

suggest that infection-induced TEP production in organisms containing dense material 395

(e.g., calcite scales for E. huxleyi) can facilitate carbon export. No EhV-like PolB 396

sequences were detected in our dataset, which was probably due to sampled areas and 397

seasons. 398

Laboratory experiments suggest that viruses closely related to positive VIPs, 399

such as prasinoviruses, have infectious properties that may drive carbon export. 400

Cultures of Micromonas pusilla infected with prasinoviruses showed increased TEP 401

production compared with non-infected cultures (Lønborg et al., 2013), although it is 402

not known if this increase leads to aggregation. The hosts of prasinoviruses have been 403

proposed to contribute to carbon export because they were observed in abyssopelagic 404

zone at sampling sites dominated by Mamiellales in their surface waters in the 405

western subtropical North Pacific (Shiozaki et al., 2019). Some prasinoviruses encode 406

glycosyltransferases (GTs) of the GT2 family. Similar to the a098r gene (GT2) in 407

Paramecium bursaria Chlorella virus 1, the expression of GT2 family members 408

during infection possibly leads to the production of a dense fibrous hyaluronan 409

network at the surface of infected cells. Such a network may trigger the aggregation 410

of host cells, facilitate viral propagation (Van Etten et al., 2017), and increase the cell 411

wall C:N ratio. We detected one GT2 in a MAG of two Phycodnaviridae-like positive 412

VIPs (000200745 and 002503270) predicted to infect Mamiellales, one in a MAG 413

corresponding to the putative pelagophyte positive VIP 000079111 related to AaV 414

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 19: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

19

and six in two MAGs (three each) corresponding to two Mimiviridae-like positive 415

VIPs (000328966 and 001175669). Phaeocystis globosa virus (PgV), closely related 416

to the positive VIP PolB 000912507 (Figure S8), has been linked with increased TEP 417

production and aggregate formation during the termination of a Phaeocystis bloom 418

(Brussaard et al., 2007). Two closely related bacillarnavirus VIPs were positively 419

associated with CEE and predicted to infect Chaetocerales. A previous study revealed 420

an increase in abundance of viruses infecting diatoms of Chaetoceros in both the 421

water columns and the sediments during the bloom of their hosts in a coastal area 422

(Tomaru et al., 2011), suggesting sinking of cells caused by viruses. Furthermore, the 423

diatom Chaetoceros tenuissimus infected with a DNA virus (CtenDNAV type II) has 424

been shown to produce higher levels of large-sized particles (50 to 400 m) compared 425

with non-infected cultures (Yamada et al., 2018). 426

The other 25 VIPs were negatively associated with CEE. This association is 427

compatible with the “viral shunt,” which increases the amount of DOC (Wilhelm and 428

Suttle, 1999) and reduces the transfer of carbon to higher trophic levels and to the 429

deep ocean (Fuhrman, 1999; Weitz et al., 2015). Increased DOC has been observed in 430

culture of Mamiellales lysed by prasinoviruses (Lønborg et al., 2013). Although this 431

culture-based observation may be difficult to extrapolate to natural conditions, where 432

the cell concentration and thus the contact rate with viruses are probably lower, 433

Mamiellales species are known to form blooms during which cell densities may be 434

comparable with cultures (Zhu et al., 2005). A field study reported that PgV, to which 435

the negative VIP PolB 000054135 is closely related (Figure S8), can be responsible 436

for up to 35% of cell lysis per day during bloom of its host (Baudoux et al., 2006), 437

which is likely accompanied by consequent DOC release. Similarly, the decline of a 438

bloom of the pelagophyte Aureococcus anophagefferens has been associated with 439

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 20: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

20

active infection by AaV (to which one negative VIP is closely related) 440

(Moniruzzaman et al., 2017). Among RNA viruses, eight were negative VIPs (six 441

Picornavirales and two Hepeviridae). The higher representation of Picornavirales in 442

the virioplankton (Culley, 2018) than within cells (Urayama et al., 2018) suggests that 443

they are predominantly lytic, although no information exists regarding the effect of 444

Picornavirales on DOC release. 445

It is likely that the “viral shunt” and “viral shuttle” simultaneously affect and 446

modulate CEE in the global ocean (Zimmerman et al., 2019a). The relative 447

importance of these two phenomena must fluctuate considerably depending on the 448

host traits, viral effects on metabolism, and environmental conditions. Reflecting this 449

complexity, viruses of a same host group could be found to be either positively or 450

negatively associated with CEE. For example, among prasinoviruses most likely 451

infecting Mamiellales, 18 were positive VIPs and six were negative VIPs. Two 452

closely related prasinoviruses (sharing 97.5% genome-wide identity) are known to 453

exhibit different ecological strategies with notably distinct molecular signatures on the 454

organic matter released upon infection of the same host (Zimmerman et al., 2019b). 455

We found that even two very closely related Mimiviridae viruses (PolBs 000079111 456

and 000079078 sharing 94% nucleotide identity over their full gene lengths) most 457

likely infecting pelagophyte algae were positively and negatively associated with 458

CEE. Furthermore, it is known that an early-infected E. huxleyi system was linked 459

with both higher aggregation “at surface” and higher remineralization “at deep” 460

compared to late-infected blooms (Laber et al., 2018). Therefore, the viral effect on 461

carbon cycle may vary also with depth. 462

Five percent of the tested viruses were associated with CEE in our study. 463

Similarly, four percent of bacterial virus populations were found to be associated with 464

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 21: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

21

the magnitude of carbon export at 150 meters (Guidi et al., 2016). These results 465

suggest that viruses affecting carbon export are rather uncommon. It is plausible that 466

such viruses affect CEE by infecting organisms that are functionally important 467

(abundant or keystone species), as we observed in host prediction. The vast majority 468

(95%) of non-VIPs may not have a significant impact on CEE, because they do not 469

strongly impact the host population, for instance, by stably coexisting with their hosts. 470

It is worth noting that experimental studies have reported cultures of algae with 471

viruses that reach a stable co-existence state after a few generations (Yau et al., 2020). 472

It is also possible that some of these non-VIPs can impact carbon export but were not 473

captured in the infection stage affecting the export process. Viruses captured in our 474

samples can represent active viruses in different infectious stages (early, mid or late) 475

for metagenomes and metatranscriptomes or at the post-lysis stage for metagenomes. 476

Potential effects of global climate changes on viral shunt/shuttle and CEE 477

Increasing evidence suggests that the biological carbon pump is highly dependent on 478

the planktonic community composition, and as discussed above, viruses represent a 479

possible key parameter that determines the efficiency of carbon export. In the photic 480

layer of the oceans, the composition of planktonic communities is strongly affected by 481

sea surface temperature (Salazar et al., 2019; Sunagawa et al., 2015), and CEE may 482

therefore be affected by ocean warming. Our result indicated that viruses infecting 483

small phytoplankton such as Mamiellales and haptophytes are likely associated with 484

CEE. Interestingly, many studies showed that high temperature and/or CO2 levels are 485

associated with an increased contribution of small sized phytoplankton to the total 486

biomass (Hare et al., 2007; Mousing et al., 2014; Sugie et al., 2020). 487

An increase in CO2 level in the surface seawater also causes a decrease in pH 488

(i.e., ocean acidification). Previous studies demonstrated that the decrease in seawater 489

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 22: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

22

pH negatively affect the growth of calcified and silicified phytoplankton cells (Doney 490

et al., 2009; Endo et al., 2016; Petrou et al., 2019). The biogenic minerals such as 491

calcium carbonate and silica act as ballasts in sinking particles (Iversen and Ploug, 492

2010). Given the statistical association that we detected between the viruses positively 493

correlated with CEE and the silicified predicted host planktons, the ocean 494

acidification may decrease the viral shuttle and thus CEE globally in the future. 495

The increased sea surface temperature will decrease the nutrient supply at the 496

surface of the oligotrophic ocean by preventing the vertical mixing. The decrease in 497

nutrient availability of surface seawaters possibly diminishes the net primary 498

production (NPP) and the magnitude of carbon export (corresponded to CE150 in our 499

study) (Riebesell et al., 2009). Consistently, a downward trend of global 500

phytoplankton abundance has been observed by satellite (Boyce et al., 2010). In such 501

a scenario of the global decrease of NPP, the efficiency of export would be an 502

important factor for a precise estimation of carbon export in the future ocean. In this 503

regard, the role of marine viruses in the carbon cycle and export should be further 504

investigated and eventually be integrated into prospective models for the climate 505

change. 506

Conclusions 507

Eukaryotic virus community composition was able to predict CEE at 59 sampling 508

sites in the photic zone of the world ocean. This statistical association was detected 509

based on a large omics dataset collected throughout the oceans and processed with 510

standardized protocols. The predictability of CEE by viral composition is consistent 511

with the hypothesis that “viral shuttle” and “shunt” are functioning at a global scale. 512

Among 83 viruses with a high importance in the prediction of CEE, 58 viruses were 513

positively and 25 negatively correlated with carbon export efficiency. Most of these 514

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 23: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

23

viruses belong to Prasinovirus, Mimiviridae, and Picornavirales and are either new to 515

science or with no known roles in carbon export efficiency. Thirty-six of these 516

“select” viruses were predicted to infect ecologically important hosts such as green 517

algae of the order Mamiellales, haptophytes, diatoms, and copepods. Positively 518

associated viruses had more predicted interactions with silicified eukaryotes than non-519

associated viruses did. Overall, these results imply that the effect of viruses on the 520

“shuttle” and “shunt” processes could be dependent on viral hosts and ecosystem 521

dynamics. 522

Limitations of the study 523

The observed statistical associations between viral compositions and examined 524

parameters (i.e., CEE, CE and NPP) do not convey the information about the direction 525

of their potential causality relationships, and they could even result from indirect 526

relationships as discussed above. Certain groups of viruses detected in samples may 527

be over- or under-represented because of the technical limitations in size 528

fractionation, DNA/RNA extraction and sequencing. 529

Resource Availability 530

Lead Contact 531

Further information and requests for resources and reagents should be directed to and 532

will be fulfilled by Lead Contact, Hiroyuki Ogata ([email protected]). 533

Materials Availability 534

This study did not generate unique reagent. 535

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 24: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

24

Data and Code Availability 536

The authors declare that the data supporting the findings of this study are available 537

within the paper and its supplemental files, as well as at the GenomeNet FTP: 538

ftp://ftp.genome.jp/pub/db/community/tara/Cpump/Supplementary_material/. 539

Our custom R script used to test for association between viruses and environmental 540

variables (CEE, CE150 and NPP) is available along with input data at the GenomeNet 541

FTP: 542

ftp://ftp.genome.jp/pub/db/community/tara/Cpump/Supplementary_material/PLSreg/. 543

The Taxon Interaction Mapper (TIM) tool developed for this study and used for virus 544

host prediction is available at https://github.com/RomainBlancMathieu/TIM. 545

Supplemental Files 546

Supplemental_Information.pdf: supplemental figures and tables, and transparent 547

methods 548

Supplemental_Data_1_2.xlsx 549

Acknowledgements 550

We thank the Tara Oceans consortium, the projects Oceanomics and France 551

Genomique (grants ANR-11-BTBR-0008 and ANR-10-INBS-09), and the people and 552

sponsors who supported the Tara Oceans Expedition (http://www.embl.de/tara-553

oceans/) for making the data accessible. This is contribution number XXX of the Tara 554

Oceans Expedition 2009–2013. Computational time was provided by the 555

SuperComputer System, Institute for Chemical Research, Kyoto University. We thank 556

Barbara Goodson, Ph.D., and Sara J. Mason, M.Sc., from Edanz Group (https://en-557

author-services.edanzgroup.com/) for editing a draft of this manuscript. This work 558

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 25: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

25

was supported by JSPS/KAKENHI (Nos. 26430184, 18H02279, and 19H05667 to 559

H.O. and Nos. 19K15895 and 19H04263 to H.E.), Scientific Research on Innovative 560

Areas from the Ministry of Education, Culture, Science, Sports and Technology 561

(MEXT) of Japan (Nos. 16H06429, 16K21723, and 16H06437 to H.O.), the 562

Collaborative Research Program of the Institute for Chemical Research, Kyoto 563

University (2019-29 to S.C.), the Future Development Funding Program of the Kyoto 564

University Research Coordination Alliance (to R.B.M.), the ICR-KU International 565

Short-term Exchange Program for Young Researchers (to S.C.), and the Research 566

Unit for Development of Global Sustainability (to H.O. and T.O.D.). 567

Author contributions 568

H.O. and R.B.M. conceived the study. R.B.M and H.K. performed most of the 569

analyses. H.E. and L.G. designed carbon export analysis. R.H.V and S.C. performed 570

network analysis. N.H. and C.d.V. analyzed eukaryotic sequences. T.O.D., M.G., P.F. 571

and O.J. analyzed viral MAGs. C.H.N. and H.M. contributed to statistical analysis. 572

M.B.S. and C.A.S. contributed to interpretations. All authors edited and approved the 573

final version of the manuscript. 574

Declaration of Interests 575

The authors declare no competing interests. 576

References 577

Agusti, S., González-Gordillo, J.I., Vaqué, D., Estrada, M., Cerezo, M.I., Salazar, G., 578 Gasol, J.M., and Duarte, C.M. (2015). Ubiquitous healthy diatoms in the deep sea 579 confirm deep carbon injection by the biological pump. Nat. Commun. 6, 7608. 580

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 26: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

26

Baudoux, A., Noordeloos, A., Veldhuis, M., and Brussaard, C. (2006). Virally 581 induced mortality of Phaeocystis globosa during two spring blooms in temperate 582 coastal waters. Aquat. Microb. Ecol. 44, 207–217. 583

Bochdansky, A.B., Clouse, M.A., and Herndl, G.J. (2017). Eukaryotic microbes, 584 principally fungi and labyrinthulomycetes, dominate biomass on bathypelagic marine 585 snow. ISME J. 11, 362–373. 586

Boyce, D.G., Lewis, M.R., and Worm, B. (2010). Global phytoplankton decline over 587 the past century. Nature 466, 591–596. 588

Brum, J.R., Ignacio-Espinoza, J.C., Roux, S., Doulcier, G., Acinas, S.G., Alberti, A., 589 Chaffron, S., Cruaud, C., Vargas, C. de, Gasol, J.M., et al. (2015). Patterns and 590 ecological drivers of ocean viral communities. Science 348, 1261498. 591

Brussaard, C.P.D., Bratbak, G., Baudoux, A.-C., and Ruardij, P. (2007). Phaeocystis 592 and its interaction with viruses. Biogeochemistry 83, 201–215. 593

Buesseler, K.O., and Boyd, P.W. (2009). Shedding light on processes that control 594 particle export and flux attenuation in the twilight zone of the open ocean. Limnol. 595 Oceanogr. 54, 1210–1232. 596

Carradec, Q., Pelletier, E., Silva, C.D., Alberti, A., Seeleuthner, Y., Blanc-Mathieu, 597 R., Lima-Mendez, G., Rocha, F., Tirichine, L., Labadie, K., et al. (2018). A global 598 ocean atlas of eukaryotic genes. Nat. Commun. 9, 373. 599

Coenen, A.R., and Weitz, J.S. (2018). Limitations of Correlation-Based Inference in 600 Complex Virus-Microbe Communities. MSystems 3, e00084-18. 601

Culley, A. (2018). New insight into the RNA aquatic virosphere via viromics. Virus 602 Res. 244, 84–89. 603

Dayaram, A., Goldstien, S., Argüello-Astorga, G.R., Zawar-Reza, P., Gomez, C., 604 Harding, J.S., and Varsani, A. (2015). Diverse small circular DNA viruses circulating 605 amongst estuarine molluscs. Infect. Genet. Evol. J. Mol. Epidemiol. Evol. Genet. 606 Infect. Dis. 31, 284–295. 607

Doney, S.C., Fabry, V.J., Feely, R.A., and Kleypas, J.A. (2009). Ocean Acidification: 608 The Other CO 2 Problem. Annu. Rev. Mar. Sci. 1, 169–192. 609

Dunlap, D.S., Ng, T.F.F., Rosario, K., Barbosa, J.G., Greco, A.M., Breitbart, M., and 610 Hewson, I. (2013). Molecular and microscopic evidence of viruses in marine 611 copepods. Proc. Natl. Acad. Sci. 110, 1375–1380. 612

Endo, H., Sugie, K., Yoshimura, T., and Suzuki, K. (2016). Response of Spring 613 Diatoms to CO2 Availability in the Western North Pacific as Determined by Next-614 Generation Sequencing. PLOS ONE 11, e0154291. 615

Endo, H., Ogata, H., and Suzuki, K. (2018). Contrasting biogeography and diversity 616 patterns between diatoms and haptophytes in the central Pacific Ocean. Sci. Rep. 8, 617 10916. 618

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 27: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

27

Evans, C., and Wilson, W.H. (2008). Preferential grazing of Oxyrrhis marina on virus 619 infected Emiliania huxleyi. Limnol. Oceanogr. 53, 2035–2040. 620

Fawcett, S.E., Lomas, M.W., Casey, J.R., Ward, B.B., and Sigman, D.M. (2011). 621 Assimilation of upwelled nitrate by small eukaryotes in the Sargasso Sea. Nat. 622 Geosci. 4, 717–722. 623

Fuhrman, J.A. (1999). Marine viruses and their biogeochemical and ecological 624 effects. Nature 399, 541–548. 625

Gobler, C.J., Hutchins, D.A., Fisher, N.S., Cosper, E.M., and Saňudo‐ Wilhelmy, 626 S.A. (1997). Release and bioavailability of C, N, P Se, and Fe following viral lysis of 627 a marine chrysophyte. Limnol. Oceanogr. 42, 1492–1504. 628

Goic, B., and Saleh, M.-C. (2012). Living with the enemy: viral persistent infections 629 from a friendly viewpoint. Curr. Opin. Microbiol. 15, 531–537. 630

Goode, A.G., Fields, D.M., Archer, S.D., and Martínez, J.M. (2019). Physiological 631 responses of Oxyrrhis marina to a diet of virally infected Emiliania huxleyi. PeerJ 7, 632 e6722. 633

Guidi, L., Chaffron, S., Bittner, L., Eveillard, D., Larhlimi, A., Roux, S., Darzi, Y., 634 Audic, S., Berline, L., Brum, J.R., et al. (2016). Plankton networks driving carbon 635 export in the oligotrophic ocean. Nature 532, 465. 636

Hare, C., Leblanc, K., DiTullio, G., Kudela, R., Zhang, Y., Lee, P., Riseman, S., and 637 Hutchins, D. (2007). Consequences of increased temperature and CO2 for 638 phytoplankton community structure in the Bering Sea. Mar. Ecol. Prog. Ser. 352, 9–639 16. 640

Hingamp, P., Grimsley, N., Acinas, S.G., Clerissi, C., Subirana, L., Poulain, J., 641 Ferrera, I., Sarmento, H., Villar, E., Lima-Mendez, G., et al. (2013). Exploring 642 nucleo-cytoplasmic large DNA viruses in Tara Oceans microbial metagenomes. ISME 643 J. 7, 1678–1695. 644

Hirata, T., Hardman-Mountford, N.J., Brewin, R.J.W., Aiken, J., Barlow, R., Suzuki, 645 K., Isada, T., Howell, E., Hashioka, T., Noguchi-Aita, M., et al. (2011). Synoptic 646 relationships between surface Chlorophyll-a and diagnostic pigments specific to 647 phytoplankton functional types. Biogeosciences 8, 311–327. 648

Hurwitz, B.L., Brum, J.R., and Sullivan, M.B. (2015). Depth-stratified functional and 649 taxonomic niche specialization in the “core” and “flexible” Pacific Ocean Virome. 650 ISME J. 9, 472–484. 651

Iversen, M.H., and Ploug, H. (2010). Ballast minerals and the sinking carbon flux in 652 the ocean: carbon-specific respiration rates and sinking velocity of marine snow 653 aggregates. Biogeosciences 7, 2613–2624. 654

Karl, D.M., Church, M.J., Dore, J.E., Letelier, R.M., and Mahaffey, C. (2012). 655 Predictable and efficient carbon sequestration in the North Pacific Ocean supported 656 by symbiotic nitrogen fixation. Proc. Natl. Acad. Sci. 109, 1842–1849. 657

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 28: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

28

Klaas, C., and Archer, D.E. (2002). Association of sinking organic matter with 658 various types of mineral ballast in the deep sea: Implications for the rain ratio. Glob. 659 Biogeochem. Cycles 16, 63-1-63–14. 660

Laber, C.P., Hunter, J.E., Carvalho, F., Collins, J.R., Hunter, E.J., Schieler, B.M., 661 Boss, E., More, K., Frada, M., Thamatrakoln, K., et al. (2018). Coccolithovirus 662 facilitation of carbon export in the North Atlantic. Nat. Microbiol. 3, 537–547. 663

Lawrence, J.E., and Suttle, C.A. (2004). Effect of viral infection on sinking rates of 664 Heterosigma akashiwo and its implications for bloom termination. Aquat. Microb. 665 Ecol. 37, 1–7. 666

Lawrence, J.E., Chan, A.M., and Suttle, C.A. (2002). Viruses causing lysis of the 667 toxic bloom-forming alga Heterosigma akashiwo (Raphidophyceae) are widespread in 668 coastal sediments of British Columbia, Canada. Limnol. Oceanogr. 47, 545–550. 669

Leblanc, K., Quéguiner, B., Diaz, F., Cornet, V., Michel-Rodriguez, M., Durrieu de 670 Madron, X., Bowler, C., Malviya, S., Thyssen, M., Grégori, G., et al. (2018). 671 Nanoplanktonic diatoms are globally overlooked but play a role in spring blooms and 672 carbon export. Nat. Commun. 9, 953. 673

Li, W. (1995). Composition of Ultraphytoplankton in the Central North-Atlantic. Mar. 674 Ecol. Prog. Ser. 122, 1–8. 675

Liu, H., Probert, I., Uitz, J., Claustre, H., Aris-Brosou, S., Frada, M., Not, F., and de 676 Vargas, C. (2009). Extreme diversity in noncalcifying haptophytes explains a major 677 pigment paradox in open oceans. Proc. Natl. Acad. Sci. U. S. A. 106, 12803–12808. 678

Lomas, M.W., and Moran, S.B. (2011). Evidence for aggregation and export of 679 cyanobacteria and nano-eukaryotes from the Sargasso Sea euphotic zone. 680 Biogeosciences 8, 203–216. 681

Lønborg, C., Middelboe, M., and Brussaard, C.P.D. (2013). Viral lysis of 682 Micromonas pusilla: impacts on dissolved organic matter production and 683 composition. Biogeochemistry 116, 231–240. 684

Mars Brisbin, M., Mesrop, L.Y., Grossmann, M.M., and Mitarai, S. (2018). Intra-host 685 Symbiont Diversity and Extended Symbiont Maintenance in Photosymbiotic 686 Acantharea (Clade F). Front. Microbiol. 9. 687

Monier, A., Pagarete, A., de Vargas, C., Allen, M.J., Read, B., Claverie, J.-M., and 688 Ogata, H. (2009). Horizontal gene transfer of an entire metabolic pathway between a 689 eukaryotic alga and its DNA virus. Genome Res. 19, 1441–1449. 690

Monier, A., Worden, A.Z., and Richards, T.A. (2016). Phylogenetic diversity and 691 biogeography of the Mamiellophyceae lineage of eukaryotic phytoplankton across the 692 oceans. Environ. Microbiol. Rep. 8, 461–469. 693

Moniruzzaman, M., Wurch, L.L., Alexander, H., Dyhrman, S.T., Gobler, C.J., and 694 Wilhelm, S.W. (2017). Virus-host relationships of marine single-celled eukaryotes 695 resolved from metatranscriptomics. Nat. Commun. 8, 16054. 696

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 29: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

29

Mousing, E., Ellegaard, M., and Richardson, K. (2014). Global patterns in 697 phytoplankton community size structure—evidence for a direct temperature effect. 698 Mar. Ecol. Prog. Ser. 497, 25–38. 699

Nissimov, J.I., Vandzura, R., Johns, C.T., Natale, F., Haramaty, L., and Bidle, K.D. 700 (2018). Dynamics of transparent exopolymer particle production and aggregation 701 during viral infection of the coccolithophore, Emiliania huxleyi. Environ. Microbiol. 702 20, 2880–2897. 703

Peduzzi, P., and Weinbauer, M.G. (1993). Effect of concentrating the virus-rich 2-704 2nm size fraction of seawater on the formation of algal flocs (marine snow). Limnol. 705 Oceanogr. 38, 1562–1565. 706

Petrou, K., Baker, K.G., Nielsen, D.A., Hancock, A.M., Schulz, K.G., and Davidson, 707 A.T. (2019). Acidification diminishes diatom silica production in the Southern Ocean. 708 Nat. Clim. Change 9, 781–786. 709

Proctor, L.M., and Fuhrman, J.A. (1991). Roles of viral infection in organic particle 710 flux. Mar. Ecol. Prog. Ser. 69, 133–142. 711

Riebesell, U., Kortzinger, A., and Oschlies, A. (2009). Sensitivities of marine carbon 712 fluxes to ocean change. Proc. Natl. Acad. Sci. 106, 20602–20609. 713

Salazar, G., Paoli, L., Alberti, A., Huerta-Cepas, J., Ruscheweyh, H.-J., Cuenca, M., 714 Field, C.M., Coelho, L.P., Cruaud, C., Engelen, S., et al. (2019). Gene Expression 715 Changes and Community Turnover Differentially Shape the Global Ocean 716 Metatranscriptome. Cell 179, 1068-1083.e21. 717

Sheyn, U., Rosenwasser, S., Lehahn, Y., Barak-Gavish, N., Rotkopf, R., Bidle, K.D., 718 Koren, I., Schatz, D., and Vardi, A. (2018). Expression profiling of host and virus 719 during a coccolithophore bloom provides insights into the role of viral infection in 720 promoting carbon export. ISME J. 1. 721

Shi, M., Lin, X.-D., Tian, J.-H., Chen, L.-J., Chen, X., Li, C.-X., Qin, X.-C., Li, J., 722 Cao, J.-P., Eden, J.-S., et al. (2016). Redefining the invertebrate RNA virosphere. 723 Nature 540, 539–543. 724

Shiozaki, T., Hirose, Y., Hamasaki, K., Kaneko, R., Ishikawa, K., and Harada, N. 725 (2019). Eukaryotic Phytoplankton Contributing to a Seasonal Bloom and Carbon 726 Export Revealed by Tracking Sequence Variants in the Western North Pacific. Front. 727 Microbiol. 10. 728

Sugie, K., Fujiwara, A., Nishino, S., Kameyama, S., and Harada, N. (2020). Impacts 729 of Temperature, CO2, and Salinity on Phytoplankton Community Composition in the 730 Western Arctic Ocean. Front. Mar. Sci. 6, 821. 731

Sullivan, M.B., Weitz, J.S., and Wilhelm, S. (2017). Viral ecology comes of age. 732 Environ. Microbiol. Rep. 9, 33–35. 733

Sunagawa, S., Coelho, L.P., Chaffron, S., Kultima, J.R., Labadie, K., Salazar, G., 734 Djahanschiri, B., Zeller, G., Mende, D.R., Alberti, A., et al. (2015). Ocean plankton. 735 Structure and function of the global ocean microbiome. Science 348, 1261359. 736

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 30: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

30

Suttle, C.A. (2007). Marine viruses--major players in the global ecosystem. Nat. Rev. 737 Microbiol. 5, 801–812. 738

Takao, Y., Nagasaki, K., Mise, K., Okuno, T., and Honda, D. (2005). Isolation and 739 characterization of a novel single-stranded RNA Virus infectious to a marine fungoid 740 protist, Schizochytrium sp. (Thraustochytriaceae, Labyrinthulea). Appl. Environ. 741 Microbiol. 71, 4516–4522. 742

Tomaru, Y., Hata, N., Masuda, T., Tsuji, M., Igata, K., Masuda, Y., Yamatogi, T., 743 Sakaguchi, M., and Nagasaki, K. (2007). Ecological dynamics of the bivalve-killing 744 dinoflagellate Heterocapsa circularisquama and its infectious viruses in different 745 locations of western Japan. Environ. Microbiol. 9, 1376–1383. 746

Tomaru, Y., Fujii, N., Oda, S., Toyoda, K., and Nagasaki, K. (2011). Dynamics of 747 diatom viruses on the western coast of Japan. Aquat. Microb. Ecol. 63, 223–230. 748

Tréguer, P., Bowler, C., Moriceau, B., Dutkiewicz, S., Gehlen, M., Aumont, O., 749 Bittner, L., Dugdale, R., Finkel, Z., Iudicone, D., et al. (2018). Influence of diatom 750 diversity on the ocean biological carbon pump. Nat. Geosci. 11, 27–37. 751

Turner, J.T. (2015). Zooplankton fecal pellets, marine snow, phytodetritus and the 752 ocean’s biological pump. Prog. Oceanogr. 130, 205–248. 753

Urayama, S., Takaki, Y., Nishi, S., Yoshida‐ Takashima, Y., Deguchi, S., Takai, K., 754 and Nunoura, T. (2018). Unveiling the RNA virosphere associated with marine 755 microorganisms. Mol. Ecol. Resour. 18, 1444–1455. 756

Van Etten, J.L., Agarkova, I., Dunigan, D.D., Tonetti, M., De Castro, C., and Duncan, 757 G.A. (2017). Chloroviruses Have a Sweet Tooth. Viruses 9. 758

Wang, H., Wu, S., Li, K., Pan, Y., Yan, S., and Wang, Y. (2018). Metagenomic 759 analysis of ssDNA viruses in surface seawater of Yangshan Deep-Water Harbor, 760 Shanghai, China. Mar. Genomics 41, 50–53. 761

Ward, B.A., and Follows, M.J. (2016). Marine mixotrophy increases trophic transfer 762 efficiency, mean organism size, and vertical carbon flux. Proc. Natl. Acad. Sci. 113, 763 2958–2963. 764

Weinbauer, M.G. (2004). Ecology of prokaryotic viruses. FEMS Microbiol. Rev. 28, 765 127–181. 766

Weitz, J.S., Stock, C.A., Wilhelm, S.W., Bourouiba, L., Coleman, M.L., Buchan, A., 767 Follows, M.J., Fuhrman, J.A., Jover, L.F., Lennon, J.T., et al. (2015). A multitrophic 768 model to quantify the effects of marine viruses on microbial food webs and ecosystem 769 processes. ISME J. 9, 1352–1364. 770

Wilhelm, S.W., and Suttle, C.A. (1999). Viruses and Nutrient Cycles in the 771 SeaViruses play critical roles in the structure and function of aquatic food webs. 772 BioScience 49, 781–788. 773

Yamada, Y., Tomaru, Y., Fukuda, H., and Nagata, T. (2018). Aggregate Formation 774 During the Viral Lysis of a Marine Diatom. Front. Mar. Sci. 5. 775

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 31: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

31

Yau, S., Krasovec, M., Benites, L.F., Rombauts, S., Groussin, M., Vancaester, E., 776 Aury, J.-M., Derelle, E., Desdevises, Y., Escande, M.-L., et al. (2020). Virus-host 777 coexistence in phytoplankton through the genomic lens. Sci. Adv. 6, eaay2587. 778

Yoshida, M., Noël, M.-H., Nakayama, T., Naganuma, T., and Inouye, I. (2006). A 779 haptophyte bearing siliceous scales: ultrastructure and phylogenetic position of 780 Hyalolithus neolepis gen. et sp. nov. (Prymnesiophyceae, Haptophyta). Protist 157, 781 213–234. 782

Yoshikawa, G., Blanc-Mathieu, R., Song, C., Kayama, Y., Mochizuki, T., Murata, K., 783 Ogata, H., and Takemura, M. (2019). Medusavirus, a novel large DNA virus 784 discovered from hot spring water. J. Virol. JVI.02130-18. 785

Zhang, C., Dang, H., Azam, F., Benner, R., Legendre, L., Passow, U., Polimene, L., 786 Robinson, C., Suttle, C.A., and Jiao, N. (2018). Evolving paradigms in biological 787 carbon cycling in the ocean. Natl. Sci. Rev. 5, 481–499. 788

Zhu, F., Massana, R., Not, F., Marie, D., and Vaulot, D. (2005). Mapping of 789 picoeucaryotes in marine ecosystems with quantitative PCR of the 18S rRNA gene. 790 FEMS Microbiol. Ecol. 52, 79–92. 791

Zimmerman, A.E., Howard-Varona, C., Needham, D.M., John, S.G., Worden, A.Z., 792 Sullivan, M.B., Waldbauer, J.R., and Coleman, M.L. (2019a). Metabolic and 793 biogeochemical consequences of viral infection in aquatic ecosystems. Nat. Rev. 794 Microbiol. 1–14. 795

Zimmerman, A.E., Bachy, C., Ma, X., Roux, S., Jang, H.B., Sullivan, M.B., 796 Waldbauer, J.R., and Worden, A.Z. (2019b). Closely related viruses of the marine 797 picoeukaryotic alga Ostreococcus lucimarinus exhibit different ecological strategies. 798 Environ. Microbiol. 21, 2148–2170. 799

800

801

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 32: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

32

Figure legends 802

Figure 1: Viruses of eukaryotic plankton identified in Tara Oceans samples are 803

distantly related to characterized viruses. Unrooted maximum likelihood 804

phylogenetic trees containing environmental (black) and reference (red) viral 805

sequences for NCLDV DNA polymerase family B (a), RNA virus RNA-dependent 806

RNA polymerase (b), and ssDNA virus replication-associated protein (c). A 807

rectangular representation of these trees with branch support values is provided in 808

Figure S2–S4. 809

810

Figure 2: Carbon export efficiency and relative marker-gene occurrence of 811

eukaryotic plankton viruses along the sampling route. a Carbon export efficiency 812

estimated at 39 Tara Oceans stations where surface and DCM layers were sampled 813

for prokaryote-enriched metagenomes and eukaryotic metatranscriptomes. b and c 814

Relative marker-gene occurrence of major groups of viruses of eukaryotic plankton 815

for NCLDVs in metagenomes (b) and for RNA and ssDNA viruses in 816

metatranscriptomes (c) at 59 sampling sites. 817

818

Figure 3: Relative abundance of eukaryotic plankton viruses associated with 819

carbon export efficiency in the global ocean. a Bivariate plot between predicted and 820

observed values in a leave-one-out cross-validation test for carbon export efficiency. 821

The PLS regression model was constructed using occurrence profiles of 1,523 822

marker-gene sequences (1,309 PolBs, 180 RdRPs and 34 Reps) derived from 823

environmental samples. r, Pearson correlation coefficient; R2, the coefficient of 824

determination between measured response values and predicted response values. R2, 825

which was calculated as 1 – SSE/SST (sum of squares due to error and total) 826

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 33: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

33

measures how successful the fit is in explaining the variance of the response values. 827

The significance of the association was assessed using a permutation test (n = 10,000) 828

(grey histogram in a). The red diagonal line shows the theoretical curve for perfect 829

prediction. b Pearson correlation coefficients between CEE and occurrence profiles of 830

83 viruses that have VIP scores > 2 (VIPs) with the first two components in the PLS 831

regression model using all samples. PLS components 1 and 2 explained 83% and 11% 832

of the variance of CEE, respectively. Fifty-eight VIPs had positive regression 833

coefficients in the model (shown with circles), and 25 had negative regression 834

coefficients (shown with triangles). 835

836

Figure 4: Biogeography of viruses associated with carbon export efficiency. The 837

upper panel shows carbon export efficiency (CEE = CEdeep/CEsurface) for 59 sampling 838

sites. The bottom panel is a map reflecting relative abundances, expressed as centered 839

log-ratio transformed, gene-length normalized read counts of viruses positively and 840

negatively associated with CEE that have VIP scores > 2 (VIPs). MS, Mediterranean 841

Sea; IO, Indian Ocean; SAO, South Atlantic Ocean; SPO, South Pacific Ocean; NPO, 842

North Pacific Ocean; NAO, North Atlantic Ocean. The bottom horizontal axis is 843

labeled with Tara Oceans station numbers, sampling depth (SRF, surface; DCM, deep 844

chlorophyll maximum), and abbreviations of biogeographic provinces. Viruses 845

labeled in red correspond to positive VIPs that tend more represented in more than 846

one biogeographic province. 847

848

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 34: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

34

Tables 849

Table 1: Taxonomic breakdown of viral marker genes 850 Viruses Identified Used in PLS

regression*

NC

LD

Vs Mimiviridae 2,923 1,148

Phycodnaviridae 348 99

Iridoviridae 198 59

Other NCLDVs ** 17 3

Total 3,486 1,309

RN

A v

iru

se

s

Picornavirales (ssRNA+) 325 80

Partitiviridae (dsRNA) 131 22

Narnaviridae (ssRNA+) 95 6

Other families 289 53

Unclassified 78 9

RNA viruses 57 10

Total 975 180

ssD

NA

viru

se

s

Circoviridae 201 22

Geminiviridae 4 0

Nanoviridae 4 0

Unclassified 39 2

ssDNA viruses 51 10

Total 299 34

All 4,760 1,523

* The marker genes had to occurred in at least five samples and harbor a Spearman correlation 851 coefficient > |0.2| with carbon export efficiency. 852 ** There was no unclassified NCLDV. 853 854

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 35: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

Tree scale: 0.1

CRESS6

CRESS4

Smacoviridae Nanoviridae CRESS5

Genomoviridae

Geminiviridae

CRESS1

BacilladnaviridaeCRESS2CRESS3

Circoviridae

�����������

Narnaviridae

MononegaviralesTymovirales

Totiviridae

Partitiviridae

Picornavirales

TombusviridaeEndornaviridae

Bunyavirales

Flaviviridae

Bromoviridae

Bunyavirales

Other familiesOther families

Figure 1

ca b

�������������

Mimivirus

Figure 1

AaV

PoV

IridoviridaeAscoviridaePithovirusMarseilleviridae

YSLPVs

PBCVs

Prasinovirus

Unclassified NCLDVs

HaVCroV

Mimiviridae

Phycodnaviridae

OLPVs

CeV

PgV, PpV, HeV-RF02, PkV-RF02, PpDNAV

Figure 1-4

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 36: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

Other NCLDVs

Other ssDNA virusesCircoviridae

Mimiviridae

a

b

c

Picornavirales Partitiviridae Narnaviridae Other RNA viruses

Phycodnaviridae Iridoviridae

Longitude

Latit

ude

Figure 2

●●●●●●●

●●●●

●●●

●●●●

●●●●●

●●●● ● ●●

● ●

●●●

●●

●●●●

●● ●

23

25 30

3638 39

41

52

64

65

66

70

72

76

78

9398

100

102

109

110

111

122

123

124

125

128

132137

138

142

145146

148149

150

151

152

−50

0

50

−100 0 100 200long

lat

CEE●

●●

0.25

0.50

0.75

1.00

1.25

●●●●●●●

●●●●

●●●

●●●●

●●●●●

●●●● ● ●●

● ●

●●●

●●

●●●●

●● ●

23

25 30

3638 39

41

52

64

65

66

70

72

76

78

9398

100

102

109

110

111

122

123

124

125

128

132137

138

142

145146

148149

150

151

152

−50

0

50

−100 0 100 200long

lat

CEE●

●●

0.25

0.50

0.75

1.00

1.25

0

50

-50

-100 1000

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 37: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

Observed carbon export efficiency

Pred

icted

car

bon

expo

rt ef

ficie

ncy

●●

●●

●●

●●

●●●

●●●

●●

●●●

●● ●

●●

●●●

●●●

●●

●●

●●

●●●●

●●

0.0 0.5 1.0 1.5

0.0

0.5

1.0

1.5ba

R2 = 0.67

r = 0.84

P < 10-4

Mimiviridae (n=34, 25 pos/9 neg)

Phycodnaviridae (n=24, 18 pos/6 neg)

Iridoviridae (n=2, 0 pos/2 neg)

Positive (pos)Sign of regression coefficient :

Picornavirales (n=19, 13 pos/6 neg)

Hepeviridae (n=2, 0 pos/2 neg)

Partitiviridae (n=1, 1 pos/0 neg)

Circoviridae (n=1, 1 pos/0 neg)

Taxonomy :

Negative (neg)

PLS component 1

PLS

com

pone

nt 2

−1.0 −0.5 0.0 0.5 1.0−1.0

−0.5

0.0

0.5

1.0

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

CEE

Figure 3

Observed carbon export efficiency

Pre

dic

ted c

arb

on e

xport

effic

iency

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 38: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

CEE

0.00.20.40.60.81.01.2

23_D

CM

23_SRF

25_D

CM

25_SRF

30_D

CM

30_SRF

36_D

CM

36_SRF

38_D

CM

38_SRF

39_D

CM

41_D

CM

41_SRF

52_D

CM

52_SRF

64_D

CM

64_SRF

65_SRF

66_SRF

70_SRF

72_D

CM

72_SRF

76_D

CM

76_SRF

78_D

CM

78_SRF

93_SRF

98_SRF

100_DCM

100_SR

F102_DCM

102_SR

F109_DCM

109_SR

F110_DCM

110_SR

F111_DCM

111_SR

F122_SR

F123_SR

F124_SR

F125_SR

F128_DCM

128_SR

F132_SR

F137_DCM

137_SR

F138_SR

F142_DCM

142_SR

F145_SR

F146_SR

F148_SR

F149_SR

F150_DCM

150_SR

F151_DCM

151_SR

F152_SR

Frdrp_8855752rdrp_36496887rdrp_9164163rdrp_54294427rdrp_35713768

polb_013433452rdrp_8626697rdrp_9164160rdrp_32150309rdrp_32150057

polb_000172102polb_000159717polb_013294654rdrp_35179764rdrp_33049404

polb_000161220polb_004804559polb_000249217rdrp_30787766

polb_010288541polb_001175669polb_000079111polb_001527691polb_003319665polb_007771300polb_000042601polb_000200745polb_003580241rdrp_32202687

polb_007102163polb_001064263polb_015514497polb_014364115polb_000396610polb_000490625polb_000061559polb_000061999rdrp_36505302

polb_015907472polb_000230224polb_002503270polb_000194282polb_008001141polb_000503865polb_002682999polb_000030837polb_000328966Rep_38177659polb_000129518polb_000079365polb_000229407polb_004312996polb_000912507polb_000435873polb_000239928polb_002035391polb_007503502polb_000248170polb_000073352polb_000054135polb_000249074polb_000232032polb_000073545polb_000026723polb_003145223rdrp_77677770rdrp_84897402

polb_004775027polb_000240662polb_007423474polb_000844241polb_000673383polb_000079078polb_000110630rdrp_49554577rdrp_105714054polb_000236849rdrp_59731273rdrp_42335229rdrp_107558617polb_000251540rdrp_77677810

polb_000495602

polb_000172102polb_000159717polb_013294654

polb_004804559

polb_000079111

polb_007771300

polb_000200745

polb_000396610

polb_002503270polb_000194282

polb_000229407

polb_000435873polb_000239928polb_002035391

polb_000248170

MS IO SAO SPO NAONPO

MEDI

AR

AB

MONS

ISS

G

EA

FR

BE

NG

SATL

CH

IL

SP

SG

PE

OD

CH

IL

SPSG

PE

OD

NP

ST

PN

EC

CA

RB

GF

ST

NA

ST

-W

NA

ST

-E

-5−5 0 5

50

CLR

gene-length

normalized

read count:

Negative V

IPs

Positiv

e V

IPs

Figure 4

CEE

0.00.20.40.60.81.01.2

23_D

CM

23_SRF

25_D

CM

25_SRF

30_D

CM

30_SRF

36_D

CM

36_SRF

38_D

CM

38_SRF

39_D

CM

41_D

CM

41_SRF

52_D

CM

52_SRF

64_D

CM

64_SRF

65_SRF

66_SRF

70_SRF

72_D

CM

72_SRF

76_D

CM

76_SRF

78_D

CM

78_SRF

93_SRF

98_SRF

100_DCM

100_SR

F102_DCM

102_SR

F109_DCM

109_SR

F110_DCM

110_SR

F111_DCM

111_SR

F122_SR

F123_SR

F124_SR

F125_SR

F128_DCM

128_SR

F132_SR

F137_DCM

137_SR

F138_SR

F142_DCM

142_SR

F145_SR

F146_SR

F148_SR

F149_SR

F150_DCM

150_SR

F151_DCM

151_SR

F152_SR

F

rdrp_8855752rdrp_36496887rdrp_9164163rdrp_54294427rdrp_35713768

polb_013433452rdrp_8626697rdrp_9164160rdrp_32150309rdrp_32150057

polb_000172102polb_000159717polb_013294654rdrp_35179764rdrp_33049404

polb_000161220polb_004804559polb_000249217rdrp_30787766

polb_010288541polb_001175669polb_000079111polb_001527691polb_003319665polb_007771300polb_000042601polb_000200745polb_003580241rdrp_32202687

polb_007102163polb_001064263polb_015514497polb_014364115polb_000396610polb_000490625polb_000061559polb_000061999rdrp_36505302

polb_015907472polb_000230224polb_002503270polb_000194282polb_008001141polb_000503865polb_002682999polb_000030837polb_000328966Rep_38177659polb_000129518polb_000079365polb_000229407polb_004312996polb_000912507polb_000435873polb_000239928polb_002035391polb_007503502polb_000248170polb_000073352polb_000054135polb_000249074polb_000232032polb_000073545polb_000026723polb_003145223rdrp_77677770rdrp_84897402

polb_004775027polb_000240662polb_007423474polb_000844241polb_000673383polb_000079078polb_000110630rdrp_49554577rdrp_105714054polb_000236849rdrp_59731273rdrp_42335229rdrp_107558617polb_000251540rdrp_77677810

polb_000495602

CE

E

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 39: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

1

Supplemental Information 1

Supplemental Figures 2 3

4 Figure S1: Distribution of the degree (%) of amino acid identity between environmental 5 sequences and their best BLAST hits to reference sequences for nucleocytoplasmic large 6 DNA viruses (NCLDVs) (a), RNA viruses (b), and ssDNA viruses (c). 7

a b c

Supplementary Figure 1

Supplemental Figures, Tables and Tranparent Methods

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 40: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

2

8 Figure S2: Maximum likelihood phylogenetic trees for NCLDV DNA polymerase family B. 9 Environmental sequences are shown in black and references in red. Approximate Shimodaira–10 Hasegawa (SH)-like local support values greater than 0.8 are shown. Scale bar indicates one 11 change per site. 12

1.0

1.0

1.0

1.0

1.0

0.9

1.0

0.9

0.9

1.0

0.7

0.7

1.0

1.0

0.9

0.9

1.0

0.6

1.0

1.0

1.0

1.0

0.9

1.0

1.0

1.0

1.0

1.0

1.0

0.8

0.9

0.8

1.0

0.8

1.0

1.0

0.8

1.0

0.9

0.5

1.0

1.01.0

1.0

0.9

0.9

1.0

0.7

1.0

1.0

1.0

1.0

1.0

0.6

1.0

0.9

1.0

0.7

1.0

0.9

0.5

0.9

0.9

1.0

0.8

0.8

0.9

1.0

1.0

1.0

0.9

1.0

0.9

1.0

1.0

1.0

1.0

1.0

1.0

1.0

0.9

1.0

1.0

0.9

1.0

1.0

0.6

1.0

1.0

0.6

1.0

1.0

0.9

0.9

0.9

1.0

1.0

1.0

1.0

0.9

0.9

1.0

1.0

1.0

1.0

1.0

1.0

1.0

0.9

1.0

0.9

1.0

0.8

0.8

0.7

1.0

0.9

1.0

0.9

1.0

1.0

0.9

1.0

1.0

0.7

0.8

1.0

0.9

1.0

1.0

1.01.0

1.0

0.8

1.0

0.8

1.0

0.8

1.00.7

1.0

0.9

1.0

1.0

0.9

1.0

1.0

1.0

1.0

1.0

0.9

1.0

0.8

0.9

1.0

1.0

0.9

1.0

0.8

1.0

0.9

1.0

1.0

1.0

1.0

0.8

1.0

0.8

0.90.8

0.9

0.7

1.0

0.9

1.0

1.0

1.0

1.0

0.8

0.8

0.7

0.9

1.0

1.0

1.0

1.0

0.7

1.0

0.7

1.0

1.01.0

1.0

1.0

0.9

1.0

1.0

1.0

1.0

1.0

0.9

0.9

0.8

1.0

0.7

1.0

1.0

0.9

0.8

0.61.0

0.6

1.0

1.0

1.0

0.8

1.0

1.0

0.7

1.0

0.6

1.0

1.0

1.0

0.9

1.0

1.0

1.0

1.0

1.0

0.9

1.0

0.8

1.0

1.0

1.0

0.6

1.0

0.6

0.9

1.0

1.0

0.8

1.0

1.0

1.0

0.9

1.0

1.0

1.0

1.0

0.7

1.0

1.0

1.0

1.0

0.8 1.0

1.0

1.0

1.0

1.0

0.9

0.9

1.0

0.7

1.0

1.0

1.0

0.9

0.8

0.6

0.6

0.8

1.0

1.0

1.0

1.0

0.9

0.9

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

0.9

1.00.9

1.0

1.0

1.0

0.9

0.9

1.0

0.9

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

0.9

1.0

1.0

1.0

0.71.0

1.0

1.0

0.8

0.9

1.0

1.0

0.9

0.8

1.0

1.0

1.0

1.0

0.7

1.0

1.0

1.0

0.9

1.0

1.0

1.0

0.9

0.9

0.9

1.0

1.0

0.8

1.0

0.8

1.01.0

0.8

1.0

0.7

1.0

1.0

0.9

0.7

1.0

1.01.0

1.0

1.0

1.0

0.9

1.0

1.0

1.0

0.90.7

1.0

0.6

1.0

1.0

0.9

1.0

1.0

0.7

0.9

1.0

0.9

1.0

0.8

1.0

1.0

1.0

1.0

0.8

1.0

0.9

1.0

0.9

1.0

0.7

1.0

1.0

1.0

0.5

1.01.0

1.0

0.8

1.0

1.0

1.0 1.0

1.0

1.0

1.0

0.9

0.7

0.6

1.0

1.0

1.0

1.0

0.7

1.0

1.01.0

1.01.0

0.8

0.9

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

0.9

0.6

0.9

0.9

1.0

1.0

1.0

1.0

1.0

0.7

1.0

0.8

1.0

1.0

0.8

1.0

1.0

1.0

1.0

0.8

1.0

1.0

1.0

1.0

0.8

0.9

1.0

0.8

0.5

1.0

1.0

1.0

0.8

0.9

1.0

0.70.8

1.0

1.0

1.0

1.0

1.0

0.9

0.9

1.0

0.5

0.9

1.0

1.01.0

0.8

0.9

1.0

1.0

0.9

1.0

1.0

1.0

0.9

1.0

0.9

1.0

0.8

0.6

1.0

1.0

0.9

0.9

0.7

1.0

0.8

1.0

0.9

1.0

1.0

0.7

0.9

1.0

0.9

1.0

1.0

0.90.8

1.0

1.0

1.0

1.0

0.5

0.6

1.0

1.0

1.0

0.9

0.7

1.0

1.0

1.0

0.7

1.0

1.0

0.6

1.0

1.0

1.0

0.8

1.0 1.0

1.0

0.9

0.9

0.8

1.0

1.0

0.9

1.0

0.7

1.0

1.0

1.0

1.0

0.7

0.9

0.9

0.6

1.0

1.0

1.0

0.7

0.9

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

0.9

1.0

0.90.8

0.7

1.0

0.9

0.9

1.0

0.7

0.9

1.0

0.7

1.0

1.0

1.0

0.7

1.0

0.6

1.0

1.0

1.0

1.0

1.0

1.0

1.0

0.6

1.0

1.0

0.9

0.8

1.0

0.7

1.0

0.9

1.0

0.8

1.0

1.0

0.6

0.9

0.9

1.01.0

1.0

1.0

1.0

1.0

1.0

0.5

1.0

1.0

0.5

1.0

0.9

1.0

1.01.0

1.0

1.00.9

0.9

0.8

0.8

1.0

0.8

0.5

0.9

0.7

1.0

1.0

0.8

0.9

1.0

1.0

1.0

0.5

1.0

1.0

1.0

1.0

1.0

1.0

0.7

1.0

0.7

0.7

0.8

0.9

0.8

1.0

1.0

1.0

1.0

1.0

0.9

1.0

0.8

1.0

0.9

0.8

0.9

1.0

0.7

1.0

1.0

0.9

1.0

0.9

1.0

1.0

1.0

1.0

1.0

0.8

1.00.9

1.0

1.0

0.8

0.9

0.7

1.0

1.0

1.0

0.8

1.0

1.0

1.0

0.8

1.0

0.6

0.9

1.0

1.0

1.0

0.9

1.0

1.0

1.0

0.6

1.0

1.0

1.0

1.0

1.0

1.00.8

1.0

1.0

0.9

1.0

1.0

0.9

1.0

0.9

1.0

1.0

0.6

1.0

1.0

1.0

0.9

0.9

1.0

1.0

1.0

0.9

0.7

1.0

1.0

1.0

0.8

1.0

0.8

0.8

1.0

1.0

1.0

0.8

1.0

0.8

0.9

1.0

1.0

0.8

1.0

1.0

1.0

0.9

1.0

1.0

1.0

1.0

0.5

1.0

1.0

0.5

0.9

1.0

1.0

1.0

0.9

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

0.9

1.0

1.0

1.0

1.0

1.0

1.0

0.5

1.01.0

1.0

1.01.0

1.0

1.0

1.0

1.0

1.0

1.0

0.8

0.9

0.9

0.6

0.6

1.0

1.0

0.6

0.9

1.0

1.0

1.0

1.0

1.0

1.0

0.8

1.0

1.0

0.7

1.0

1.0

0.9

1.0

1.0

1.0

0.8

0.9

1.0

1.0

1.0

0.5

0.9

1.0

0.9

0.7

0.8

0.9

1.0

0.6

0.5

1.0

1.01.0

1.0

1.0

0.5

0.7

1.0

1.0

0.7

0.8

1.0

1.0

1.0

1.0

1.0

0.9

1.0

0.9

1.0

1.0

0.9

0.9

1.0

0.9

0.7

1.0

0.8

0.61.0

0.7

1.0

1.0

1.0

0.9

0.9

0.9

1.0

1.0

1.0

1.0

0.9

0.8

1.0

0.6

1.0

1.0

0.7

0.9

0.9

1.0

1.0

0.7

1.0

1.0

0.7

1.0

0.8

0.8

1.0

0.9

1.0

0.8

1.0

1.0

0.6

1.0

1.0

1.0

0.9

0.9

0.9

0.9

0.5

Tree scale: 1 IridoviridaeAscoviridaePithovirusMarseilleviridae

Unclassified NCLDVsOthers/unclassified Phycodnaviridae

MimivirusCroV

AaV

PoV

OLPVs

CeV

PgV, PpV, HeV-RF02, PkV-RF02, PpDNAV

Prasinovirus

Phycodnaviridae

Mimiviridae

Supplementary Figure 2

Tree scale: 1

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 41: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

3

13 Figure S3: Unrooted maximum likelihood phylogenetic trees for RNA virus RNA-dependent 14 RNA polymerase. Environmental sequences are shown in black and references in red. 15 Approximate SH-like local support values greater than 0.8 are shown. Scale bar indicates one 16 change per site. 17

1.0

1.0

1.0

1.0

1.0

0.9

0.9

1.0

1.0

1.0

1.0

1.0

1.0

0.8

1.0

0.9

1.0

1.0

0.9

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

0.9

1.0

1.0

1.0

1.0

0.9

1.0

0.8

1.0

0.9

1.0

0.8

1.0

1.0

0.9

1.0

1.0

1.0

1.0

1.0

1.0

0.9

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

0.9

1.0

1.0

1.0

0.9

1.0

1.0

1.0

0.9

0.9

1.0

0.9

1.0

0.9

1.0

1.0

0.9

1.0

0.9

1.0

1.0

0.9

0.91.0

0.9

1.0

1.0

1.0

1.0

1.0

1.0

1.0

0.8

1.0

0.9

0.8

1.0

0.9

0.9

0.8

1.0

1.0

1.0

1.0

1.0

0.9

0.9

1.0

1.0

1.0

0.9

1.0

1.0

0.9

0.9

1.0

1.0

1.0

1.0

0.9

1.0

0.9 0.9

0.90.9

0.9

1.0

0.9

1.0

0.9

1.0 1.0

1.0

1.0

1.0

1.0

0.9

1.0

1.0

0.9

1.0

1.0

0.81.0

0.9

1.0

0.9

0.8

1.0

0.9

0.8

0.9

0.9

1.0

1.0

1.0

0.9

0.8

0.8

1.0

1.0

0.9

1.0

1.0

0.8

0.9

0.9

1.0

0.9

0.9

1.0

1.0

0.9

1.0

0.9

1.0

1.0

1.0

1.01.0

0.9

1.0

0.9

1.0

1.0

1.0

1.0

0.9

1.0

0.9

1.0

1.0

1.0

1.0

1.0

0.9

1.0

1.0

1.0

1.0

1.0

1.0

0.9

1.0

1.0

1.0

0.8

0.9

0.9

1.0

1.0

1.0

0.9

1.0

0.9

0.9

1.0

1.0

1.0

0.9 0.9

0.9

1.0

0.9

0.9

1.0

0.9

0.9

1.0

0.9

1.0

1.0

1.0

0.9

1.0

1.0

1.01.0

1.0

0.9

1.0

1.0

0.9

0.9

0.8

1.0

0.9

1.0

0.8

0.9

0.8

1.0

0.9

1.0

0.9

1.0

1.0

0.9

0.8

1.0

0.8

0.9

1.0

1.0

0.9

0.9

0.8

0.9

1.0

0.8

1.0

1.0

1.0

1.0

1.0

1.0

0.80.9

1.0

1.0

0.8

1.0

1.0

0.9

0.8

0.9

0.9

1.0

0.9

0.9

1.00.8

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

0.9

1.0

1.0

1.0

1.0

1.0

0.9

1.0

1.0

1.0

0.9

0.8

0.9

0.9

1.0

0.9

0.9

1.0

1.0

1.00.9

1.0

1.0

1.0

0.9

1.0

1.0

1.0

1.0

1.0

1.0

1.0

0.9

0.9

1.0

1.0

1.0

0.9

1.0

1.0

1.0

1.0

0.9

0.9

0.9

1.0

1.0

0.9

1.0

1.0

1.0

0.9

1.0

0.9

0.8

1.0

1.0

1.0

1.0

1.0

0.9

1.0

1.0

0.9

1.0

1.0

1.0

0.9

1.0

1.0

0.9

0.8

1.0

1.0

1.0

1.0

0.9

1.0

1.0

1.0

1.0

1.0

1.0

0.8

0.9

0.9

1.0

1.0

1.0

1.01.0

1.0

1.0

1.0

1.0

1.0

0.9

0.9

0.8

1.0

1.0

1.0

1.0

1.0

1.0

0.9

0.9

0.9

1.0

1.0

1.0

1.0

1.0

0.9

1.01.0

1.0

1.0

1.0

1.0

0.9

0.9

1.0

1.0

1.00.9

0.9

1.0

1.0

1.0

1.0

0.8

0.8

1.0

0.8

1.0

1.0

1.0

0.8

1.0

1.0

1.0

1.0

0.8

0.9

1.0

1.0

1.01.0

1.0

1.0

1.0

1.0

1.0

1.0

0.9

0.9

1.0

0.9

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

0.8

1.0

0.9

1.0

1.0

1.0

1.0

1.0

1.0

0.9

0.9

1.0

1.0

1.0

0.9

0.8

0.9

1.0

0.8

0.9

1.0

0.9

0.9

1.0

1.0

0.9

1.0

0.9

1.0

0.9

1.0

0.8

1.0

0.9

0.9

1.0

0.9

1.0

0.9

0.9

0.9

1.0

1.0

1.0

1.0

0.8

0.9

1.0

0.9

1.0

1.0

1.0

0.9

0.9

1.0

0.90.9

1.0

1.0

1.0

1.0

0.8

1.0

0.9

0.9

0.9

1.0

1.0

1.01.0

1.0

1.0

1.0

0.9

0.9

1.0

0.9

1.0

0.9

1.0

0.8

0.9

0.9

0.9

1.0

1.0

1.0

1.0

0.9

1.0

1.0

0.8

1.0

0.9

1.0

0.9

1.0

1.0

0.9

1.0

1.0

0.9

0.8

0.8

1.0

0.9

0.9

0.9

1.0

0.8

1.0

0.8

0.9

1.0

1.0

1.0

0.8

1.0

0.9

0.9

1.0

1.0

1.0

1.0 1.0

0.9

0.9

1.0

1.0

0.8

0.9

0.8

1.0

1.0

0.9

0.9

0.9

1.0

1.0

0.9

1.0

0.9

1.0

1.0

1.0

1.0

1.0

0.8

1.0

1.0

1.0

0.9

0.9

1.0

0.9

1.0

1.0

0.8

1.0

1.0

0.9

0.9

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.01.0

1.0

0.8

0.9

0.9

0.9

1.0

1.0

1.0

1.0

0.9

1.0

1.0

1.0

0.8

1.0

1.0

1.0

1.0

0.9

0.8

0.9

0.8

1.0

1.0

0.9

1.0

1.0

1.0

0.9

1.0

1.0

1.0

0.9

1.0

1.0

1.0

0.9

0.8

1.0

1.0

1.0

0.9

0.8

1.0

1.0

0.9

1.0

0.9

1.0

1.0

0.9

1.0

1.0

1.0

1.0

0.9

0.9

0.9

0.9

1.0

1.0

1.0

1.0

0.8

1.0

0.9

1.0

0.9

1.0

1.0

1.0

0.9

0.9

1.0

1.0

0.9

1.0

0.9

0.9

1.0

1.0

1.0

0.8

0.9

1.0

0.9

0.9

0.9

1.0

0.9

1.0

1.0

1.0

1.0

0.9

1.0

1.0

0.9

0.9

1.0

1.0

1.0

1.0

0.9

1.0

1.0

1.0

0.8

0.8

1.0

0.8

0.8

0.8

1.0

0.8

1.0

0.9

1.0

0.8

0.9

0.9

0.9

Tree scale: 1

Supplementary Figure 3Tree scale: 1

Picornavirales

Partitiviridae

Totiviridae

Bunyavirales

Flaviviridae

Tymovirales

Bromoviridae

Other families

TombusviridaeEndornaviridae

Other families

Bunyavirales

Mononegavirales

Narnaviridae

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 42: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

4

18 Figure S4: Unrooted maximum likelihood phylogenetic trees for ssDNA virus replication-19 associated protein. Environmental sequences are shown in black and references in red. 20 Approximate SH-like local support values greater than 0.8 are shown. Scale bar indicates one 21 change per site. 22 23

0.9

1.0

1.0

0.9

0.9

1.0

0.9

1.0

1.0

1.0

1.0

0.9

0.9

0.91.0

1.01.0

1.0

0.9

0.9

1.0

0.8

1.0

1.0

0.9

1.0 1.0

0.9

0.9

1.0

1.00.9

1.0

0.9

1.0

1.0

0.9

1.0

1.0

1.0

0.9

0.9

0.9

0.9

0.8

1.0

0.9

0.9

1.0

0.9

0.8

1.0

1.0

1.0

1.0

0.9

0.8

0.9

0.9

1.0

1.0

1.0

0.9

0.9

0.9

1.0

0.9

0.9

0.8

1.0

1.0

0.9

0.8

0.8

0.9

0.8

0.9

0.9

0.9

1.0

1.0

1.0

0.9

1.0

0.9

1.0

1.0

1.0

0.9

0.9

0.9

0.9

1.0

0.9

1.0

0.91.0

0.9

1.0

1.0

1.0

1.0

1.0

0.9

1.0

1.0

1.0

1.0

1.0

1.0

0.81.0

0.9

1.0

1.0

0.9

1.0

0.9

0.9

1.0

1.0

1.0

1.0

1.0

1.0

1.0

0.9

0.9

0.9

0.9

1.0

0.9

1.0

1.0

0.9

0.9

1.0

0.9

0.9

0.9

1.0

0.9

1.0

1.0

1.0

0.9

0.8

1.0

1.0

0.9

1.0

1.0

1.0

1.0

1.0

0.9

1.0

0.9

0.9

1.0

1.0

1.0

1.0

1.0

0.9

0.9

1.0

1.0

0.90.8

0.8

1.0

0.9

1.0

1.0

0.9

0.9

1.0

1.0

0.8

0.9

1.0

1.0

0.8

1.0

1.0

1.0

1.0

0.9

1.0

1.0

1.0

0.80.9

0.9

0.9

1.0

0.9

0.9

1.0

1.0

1.0

0.9

0.9

1.0

0.8

0.9

1.0

0.8

1.0

1.0

0.9

0.9

1.0

0.9

1.0

1.0

1.0

0.9

0.9

0.8

0.9

1.0

0.9

1.0

1.0

0.9

0.9

1.0

1.0

1.0

1.0

0.9

0.9

1.0

0.9

1.0

1.0

1.0

1.0

1.0

1.0

1.0

0.9

1.0

0.9

1.0

0.9

1.0

1.0

1.0

0.9

1.0

1.0

1.0

0.90.8

1.0

1.0

1.0

1.00.8

1.0

1.0

0.9

0.9

1.0

0.9

0.8

1.0

1.0

1.0

0.9

1.0

1.0

1.0

1.0

0.9

0.9

0.9

1.0

1.0

1.0

1.0

0.9

0.9

0.81.0

1.0

0.9

1.0

1.0

1.0

1.0

0.9

0.8

0.8

0.8

1.0

1.0

1.0

0.8

0.9

0.8

1.0

0.9

1.0

0.9

0.9

0.9

1.0

0.9

1.0

1.0

0.9

0.9

1.0

1.0

1.0

0.8

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

1.0

0.9

0.9

0.9

1.0

1.0

0.8

1.0

0.9

0.9

0.9

1.0

1.0

1.0

1.0

0.9

0.9

0.9

0.9

1.0

1.0

1.0

0.9

0.9

0.9

0.9

1.0

0.9

1.0

0.9

1.0

1.0

0.9

0.9

0.9

1.0

1.0

0.9

1.0

0.9

0.9

0.8

0.9

0.9

1.0

0.9

0.9

1.0

1.0

1.0

0.8

1.0

0.9

0.9

0.9

1.0

1.0

1.0

1.0

0.9

1.0

1.0

1.0

1.0

1.0

1.0

1.01.00.9

1.0

0.9

0.9

1.0

1.0

0.9

1.0

0.9

1.0

1.0

1.0

1.0

1.0

1.0

0.9

0.9

1.0

1.0

0.8

1.0

0.9

0.9

1.0

0.9

1.0

1.0

0.9

1.0

1.0

1.0

0.9

1.0

1.0

0.8

0.9

1.0

0.9

1.0

0.9

0.9

0.8

1.0

0.9

0.9

0.9

1.0

0.8

1.0

0.80.9

1.0

1.0

0.9

0.8

1.0

1.0

1.0

1.0

1.0

0.9

0.9

0.9

0.9

0.9

0.9

1.0

0.9

1.0

0.8

1.0

1.0

Tree scale: 1

Smacoviridae

Nanoviridae

CRESS5

CRESS4

CRESS6

Genomoviridae

Geminiviridae

BacilladnaviridaeCRESS1

CRESS2

CRESS3

Circoviridae

Supplementary Figure 4Tree scale: 1

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 43: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

5

24 Figure S5: Variation in carbon export flux (mg m–2 d–1) across sampling depths in the water 25 column. Dots are average values, and horizontal lines represent standard deviations. 26 27

●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0 5 10

1000

800

600

400

200

0

Carbon export flux (Mean +/− SD)

Dept

h(m

)

Supplementary Figure 10

Carbon export flux (mg m−2 d−1)

Dep

th (m

)

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 44: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

6

28 Figure S6: The results of PLS regressions using relative abundance profiles of viral marker-29 genes to explain the variance of carbon export efficiency (CEE) (a, d, g, j), carbon export flux 30 at 150 meters (CE150) (b, e, h, k), and net primary production (NPP) (c, f, i, l). a–c Bivariate 31 plots between predicted and observed response values in a leave-one-out cross-validation test. 32 The red diagonal line shows the theoretical curve for perfect prediction. d–f Variation in root 33 mean squared error of predictions (RMSEP) for the training set (solid black line) and cross-34 validation set (red dashed line) across the number of components. Blue dashed line shows the 35 number of components selected for the analysis. g–i Results of the permutation tests (n = 36 10,000) supporting the significance of the association between viruses and the response 37 variable. The histograms show the distribution of Pearson correlation coefficients obtained 38 from PLS models reconstructed based on the permutated response variable and red line show 39 the non-permutated response variable. j–l Pearson correlation coefficients between the 40 response variable and abundance profiles of viruses with VIP scores > 2 (VIPs) with the first 41 two components in the PLS regression model using all samples. Viruses with positive 42 regression coefficients are shown with circles, and those with negative coefficients are shown 43 with triangles. 44

Measured net primary production

Pred

icte

d ne

t prim

ary

prod

uctio

n

●●

●●

●●●●● ●●●●●●●

●●●●

●●●●

●●●●●

●●●

●●

●●●

●●● ●●●

●●

●●

●●

0 500 1000 1500

0

500

1000

1500

Measured carbon export flux

Pred

icte

d ca

rbon

exp

ort f

lux

●●

● ●● ●

●●

●●●

●● ●●●● ●●

● ●

●●

●●●

● ●

●●●●

●●●●● ●

● ●●

●●

●●●

0 5 10 15

0

5

10

15

Measured carbon export efficiency

Pred

icte

d ca

rbon

exp

ort e

ffici

ency

●●

●●

●●

●●

●●●

●●●

●●

●●●

●● ●

●●

●●●

●●●

●●

●●

●●

●●●●

●●

0.0 0.5 1.0 1.5

0.0

0.5

1.0

1.5

0 2 4 6 8 10

050

100

150

200

250

Cflux

number of components

RM

SEP

trainCV

0 2 4 6 8 10

0.0

0.5

1.0

1.5

2.0

2.5

Cflux

number of components

RM

SEP

trainCV

Carbon export fluxCarbon export efficiency NPP

0 2 4 6 8 10

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Cflux

number of components

RM

SEP

trainCV

Correlation coefficient (r)

Freq

uenc

y

−1.0 −0.5 0.0 0.5 1.0

050

100

150

200

#Components = 3R2 = 0.67 r = 0.84

#Components = 2R2 = 0.54 r = 0.74

#Components = 2R2 = 0.64 r = 0.80

Correlation coefficient (r)

Freq

uenc

y

−1.0 −0.5 0.0 0.5 1.0

050

100

150

200

250

Correlation coefficient (r)

Freq

uenc

y

−1.0 −0.5 0.0 0.5 1.0

050

100

150

200

250

Observed carbon export flux (mg m−2 d−1) Measured NPP (mg m−2 d−1)

Pred

icte

d ca

rbon

exp

ort f

lux

(mg

m−2

d−1)

Pred

icte

d N

PP (m

g m

−2d−1

)

Observed carbon export efficiency

Pred

icte

d ca

rbon

exp

ort e

ffici

ency

P < 10-4 P < 10-4 P < 10-4

PLS component 1 (72% variance explained)

PLS

com

pone

nt 2

(23%

var

ianc

e ex

plai

ned)

−1.0 −0.5 0.0 0.5 1.0−1.0

−0.5

0.0

0.5

1.0

●●

●●●

●●

● ●

●●

● ●

●●

●● ●

●●●

●●

CE150

MimiviridaePhycodnaviridae

PicornaviralesOther viruses

PLS component 1 (80% variance explained)

PLS

com

pone

nt 2

(16%

var

ianc

e ex

plai

ned)

−1.0 −0.5 0.0 0.5 1.0−1.0

−0.5

0.0

0.5

1.0

●●

●●●●

● ●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●●

●●

●●

NPP

MimiviridaePhycodnaviridae

PicornaviralesOther viruses

PLS component 1 (83% variance explained)

PLS

com

pone

nt 2

(11%

var

ianc

e ex

plai

ned)

−1.0 −0.5 0.0 0.5 1.0−1.0

−0.5

0.0

0.5

1.0

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

CEE

MimiviridaePhycodnaviridae

PicornaviralesOther viruses

d e

Number of components

RM

SEP

RM

SEP

RM

SEP

#VIPs = 86

Number of components Number of components

a b c

g h i

j k l#VIPs = 83 #VIPs = 97

f

Supplementary Figure 5

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 45: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

7

45 Figure S7: Taxonomic composition of genes predicted in viral genome fragments encoding 46 NCLDV PolBs positively (a and c) or negatively (b and d) associated with CEE (VIP score > 47 2). a and b Metagenome-assembled genomes (MAGs) derived from samples filtered to retain 48 particles of sizes > 0.8 μm. c and d Contigs derived from samples filtered to retain particles 49 between 0.2 μm and 3 μm in size. Taxonomic annotations were performed as described in 50 Methods. 51

a

c

b

d

Supplementary Figure 6.CC-BY-NC 4.0 International licenseavailable under a

not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 46: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

8

52 Figure S8: Phylogenetic positions of NCLDV PolBs associated with CEE and network-based 53 predicted eukaryotic host groups. The unrooted maximum likelihood phylogenetic tree 54 contains environmental (labeled in red if VIP score > 2 and the regression coefficient is 55 positive, labeled in blue if negative) and reference (labeled in black) sequences of 56 Prasinovirus and Mimiviridae PolBs. The approximate SH-like local support values are 57 shown in percentages at nodes, and the scale bar indicates one change per site. Host groups 58 predicted at nodes are shown with colored circles. The red arrow points to a clade of viruses 59 predicted to infect Prymnesiales. 60

001350990

001184783

PoV (Unclassified Mimiviridae)

OtV RT-2011 (Prasinovirus)

000066500

000072372

001599716

000122879

002503270

000079365

001819659

000066804

OLPV 1 (Mesomimivirinae)

000067277

000286831

PgV (Mesomimivirinae)

MpV 1 (Prasinovirus)

000120786

003117577

000061593

000294968

004389005

000066950

001105267

PpV (Mesomimivirinae)

001250095

000194282

000230224

004057507

001121291

000075583

MpV SP1 (Prasinovirus)

000630816

003037006

000495698

000075116

000079753

000017697

000684264

000064954

003013914

000683307

000070477

000239928

000468213

000320572

000069361

000667500

000054135

000545331

000058809

000071604

002822478

000490625

000066071

000565520

003118589

000064096

000249074

000248170

000122430

002318741

000063945

000280162

OtV 1 (Prasinovirus)

000192105

000068784

000155828

000229407

001527691

000072166

003826014

000073739

000068880

000240849

000249993

000239005

000834606

Tupanvirus DO (Megavirinae)

000026821

000063931

000235944

002750835

000060035

000109761

000236849

001557307

003483755

000033614

000133961

Moumouvirus (Megavirinae)

000027805

000282052

001249129

001897164

000161220

000387513

000262952

000070861

001274286

000250875

PkV RF02 (Mesomimivirinae)

000064537

000062592

000085188

000945523

000061999

Indivirus (Unclassified Mimiviridae)

001058750

000066963

001391978

000065051

000060884

000079253

000064883

000066893

000079200

000449426

000102622

000062167

000081486

000166921

000042784

000673383

Tupanvirus SL (Megavirinae)

000847567

000299791

000356992

000231911

000079111

000058962

BsV (Unclassified Mimiviridae)

002828763

000025088

000692758

000479532

000544460

004949926

000104259

001132733

001372408

000129518

000192208

000061559

000269375

002503271

000401418

000512830

CeV (Mesomimivirinae)

000159675

000088733

000081984

003008254

000663173

001230579

000231558

002825758

000032792

000435873

000069310

000070978

000072571

000146594

000109237

003855514

000745571

000297388

000618073

002035311

000071750

001143251

000115299

001196663

000049837

000089657

000083592

000086577

001499365

000330656

000457706

001016487

000241877OLPV 2 (Mesomimivirinae)

000621764

000064307

000700607

000191910

000284102

000912507

000072765

002268728

000058552

000061528

000073383

000105816

001170932

000057148

000335275

000187401

000073441

000030837

Hokovirus (Unclassified Mimiviridae)

001009342

BpV 2 (Prasinovirus)

CroV (Unclassified Mimiviridae)

000065429

002317890

OtV 2 (Prasinovirus)

000066031

000071068

000310828

000512118

000860059

000249217

000158793

000073545

BpV 1 (Prasinovirus)

000068465

OlV 1 (Prasinovirus)

000328875

PparvumV (Mesomimivirinae)

000070058

000058812

001173520

000073352

000274868

001004628

000072094

000291604

000417185

Catovirus (Unclassified Mimiviridae)

000055116

000071171

000071229

000046074

000377714

000259426

000054815

AaV (Unclassified Mimiviridae)

001296274

000081579

000236724

000085957

000539407

001647925

000032783

000492984

Mimivirus (Megavirinae)

000067387

000231895

000287835

000110630

000051249

004447818

002413486

002156050

000240662

000079078

000057033

001148268

000081503

001041167

Megavirus (Megavirinae)

HeV RF02 (Mesomimivirinae)

004661489

000200745

002963968

001180381

000061534

000061401

000251027

000148273

100

100

100

97

100

87

99

24

13

100

93

97

100

100100

97

69

100

95

100

100

100

100

77

99

97

37

100

55

96

99

100

23

45

100

44

99

100

100

87

99

94

43

100

99

94

100

96

92

72

98

87

100

98

100

70

100

9

87

68

61

100

100

47

26

100

96

100

100

100

100

100

100

60

98

88

33100

44

100

100

99

100

100

99

69

86

97

100

53

63

99

100

100

99

43

100

93

99

100

100

96

88

85

100

100

57

99

97

99

100

99

29

88

100

100

100

100

53

72

93

99

97

31

86

86

100

100

100

40

31

99

100

98

100

100

42

99

96

96

99

100

90

71

96

100

98

100

100

100

13

94

100

98

100

13

100

48

100

100

100

95

100

99

49

66

100

90

100

99

99

100

100

85

10

75

97

92

100

100

31

25

100

90

3

97

16

84

100

38

96

96

100

75

6294

32

85

74

94

63

100

100

54

94

99

100

100

63

64

100

97

88

96

64

100

98

99

100

100

91

98

99

94

93

100

92

100

93

70

100

99

100

99

100

100

99

71

60

100

91

100

100

100

4

100

100

100

100

10

45

57

100

32

99

99

94

100

99

18

100

99

95

2

100

100

100

35

100

Tree scale: 1

Prasinovirus

Mesomimivirinae

Megavirinae

Mimiviridae

Mamiellales

PleuronematidaEutreptiales

Bicosoecida

Torodiniales

Corethrales

Beroida

ThecosomataChaetocerotales

Syndiniales

Aconoidasida

Calanoida

Bacillariales

Cyclotrichida

Collodaria

Sphaeropleales

Prymnesiales

Cryomonadida

Alcyonacea

Tree scale: 1Network-based predicted eukaryotic host groups at a given node

Supplementary Figure 7Labels:Black: reference sequencesRed: Positive VIPsBlue: Negative VIPs

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 47: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

9

61 Figure S9: Phylogenetic position of Piconavirales RdRPs associated with CEE and network-62 based predicted eukaryotic host groups. The unrooted maximum likelihood phylogenetic tree 63 contains environmental (labeled in red if VIP score > 2 and the regression coefficient is 64 positive, labeled in blue if negative) and reference (labeled in black) sequences of 65 Piconavirales RdRPs. The approximate SH-like local support values are shown in 66 percentages at nodes, and the scale bar indicates one change per site. Host groups predicted at 67 nodes are shown with colored circles. 68

76240729

BQCV (Dicistroviridae)

5309472

32202687

58838081

36496887

HoCV-1 (Dicistroviridae)

21733123

APLV3 (Unclassified Picornavirales)

56759010

33077172

51241105

29093047

32536384

15356524

24128940

29675950

29660803

59958592

4578943

58865139

5296497

35193888

57554431

98629691

478607

33428197

42456736

DCV (Dicistroviridae)

50667108

82602001

63103152

6175769

35179764

79539407

Bu-1 (Unclassified Picornavirales)

25124614

8625456

45028609

5973300

50273061

21880330

7903458

21391908

60392120

16254629

TSV (Dicistroviridae)

76136173

24747727

BRNAV G5 (Dicistroviridae)

44709998

82254166

15963946

6753846

16250369

HiPV (Dicistroviridae)

6031360

36505302

BRNAV G1 (Dicistroviridae)

Posavirus 3 (Unclassified Picornavirales)

9116058

9164163

103609885

59731273

29093457

11722969

49554577

JP-B (Unclassified Picornavirales)

76170225

52388179

58840825

8901086

89028829

5998385

75258857

10956500

INT 494

23270150

22428174

Posavirus 2 (Unclassified Picornavirales)

CsfrRNAV (Bacillarnavirus)

33069045

SINV-1 (Dicistroviridae)

37006855

RsetRNAV01 (Bacillarnavirus)

Apis dicistrovirus (Dicistroviridae)

24128145

98636599

APLV1 (Unclassified Picornavirales)

501186

8868089

43017044

7501593

44278324

26169224

16996831

24762850

2523836

42876889

16323154

32150309

Posavirus 1 (Unclassified Picornavirales)

TrV (Dicistroviridae)

9164160

21733119

32661547

42975987

29949318

501185

45022612

BV3 (Unclassified Picornavirales)

AGV3 (Dicistroviridae)

77677810

45643841

Tottori-HG1 (Unclassified Picornavirales)

79173268

AglaRNAV (Bacillarnavirus)

89802926

FsPLV (Picornaviridae)

52260257

50460461

32165274

2523835

KBV (Dicistroviridae)

1493543

109205596

97436908

Bu-3 (Unclassified Picornavirales)

BV1 (Unclassified Picornavirales)

5257731

54294427

AmV-A1 (Picornaviridae)

7873561

2523834

8626697

HaRNAV (Marnaviridae)

INT 429

60944405

ALPV (Dicistroviridae)

64666778

44709997

AnCV (Dicistroviridae)

AWando15 (Unclassified Picornavirales)

26568779

44060994

APLV4 (Unclassified Picornavirales)

56725325

RiPV-1 (Unclassified Picornavirales)

21729800

BRNAV G3 (Unclassified Picornavirales)

5015692

89802927

TPLV1 (Unclassified Picornavirales)

33056153

22980344

58838535

10541891

2954311

DpRNAV (Unclassified Picornavirales)

5252376

CtenRNAV01 (Bacillarnavirus)

21728043

100389194

29724237

100032960

53572985

84897402

PSIV (Dicistroviridae)

Goose dicistrovirus (Dicistroviridae)

Centovirus AC (Dicistroviridae)

89591997

CrPV (Dicistroviridae)

BRNAV G2 (Unclassified Picornavirales)

57468092

27815517

MCDV (Dicistroviridae)

5890355

6040574

AsRNAV (Labyrnavirus)

52479932

65266812

510209

50475018

29626693

97623802

15962113

63982103

13828801

21804986

11236314

38186194

60395683

Kilifi virus (Unclassified Picornavirales)

58837197

83943423

54635440

41784406

4578944

8855753

33049404

45370407

2584342

107558617

97531995

5971958

33056152

70963812

9853640629628862

APLV2 (Unclassified Picornavirales)

12533

5015691

12433837

JP-A (Unclassified Picornavirales)

5972655

45092716

16005444

20451590

22452106

INT 2

487149

58873145

84892392

35164655

97454952

21872260

23395631

84432702

487147

44049884

8855752

77

94

100

98

83

81

99

99100

13

91

99

100

86

83

81100

100

98

100

74

95

100

58

89

86

29

94

100

98

21

24

90

100

96

100

61

100

100

100

61

100

100

99

72

100

100

99

100

100

71

100

96

100

90

83

99

100

100

97

96

7

100100

100

98

94

43

100

99

70

100

65

100

100

46

94

100

98

71

100

96

100

90

100

49

67

97

98

85

100

100

44

96

35

99

67

100

85

60

24

91

49

100

100

92

98

100

98

88

15

24

100

85

28

99

100

88

16

100

98

99

75

94

93

77

86

0

100

20

100

100

95

100

100

90

4

100

16

69

98

74

100

100

68

9099

90

100

100

24

100

98

85

31

41

91

90

48

99

100

100

100

97

10097

16

95

60

78

89

100

100

72

99

75

23

84

100

100

82

100

97

17

33

74

100

96

85

94

99

98

26

100

100

51

97

86

100

100

100

100

68

39

92

100

97

100

55100

94

100

94

74

84

99

52

38

99

97

100

89

100

100

100

89

Tree scale: 1

Labyrnavirus

Marnaviridae

Picornaviridae

Bacillarnavirus

Dicistroviridae

IflaviridaeSecoviridae

Arthracanthida

Noctilucales

Peridiniales

Chaetocerotales

Hemiptera

Tree scale: 1 Labels:Black: reference sequencesRed: Positive VIPsBlue: Negative VIPs

Network-based predicted eukaryotic host groups at a given node

Supplementary Figure 8

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 48: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

10

69 Figure S10: Carbon export efficiency (CEE) is correlated with the change in the slope of 70 particle size distribution (PSD) that occurred from the surface to deep (below the euphotic 71 zone). Observed PSDs were fitted in the form n = adb, where n is the frequency of particles of 72 a given size, d is the particle diameter, and a and b are parameters (as described by(Guidi et 73 al., 2008)). b, the PSD slope, is a proxy for particles size. For example b = -5 indicates 74 presence of a large proportion of smaller particles, whereas b = -3 indicates a preponderance 75 of larger particles. A higher b value at deep compared to surface is suggestive of aggregation 76 or presence of larger organisms at deep compare to surface. The blue line shows the 77 regression line between CEE and the PSD slope difference between surface and deep. The 78 shade around the regression line shows the 95% confidence interval. 79 80

ρ = 0.42P = 8�10−9

Supplementary Figure 9 .CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 49: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

11

81

Supplemental Tables 82 83 Table S1: Viral lineages associated with CEE 84 Viruses VIPs Positives

VIPs Negative VIPs

NC

LDVs

Mimiviridae 34 25 9 Phycodnaviridae 24 18 6 Iridoviridae 2 0 2 Other NCLDVs * 0 0 0

Total 60 43 17

RN

A vi

ruse

s

Picornavirales (ssRNA+) 19 13 6 Partitiviridae (dsRNA) 1 1 0 Narnaviridae (ssRNA+) 0 0 0 Other families 2* 0 2 Unclassified 0 0 0 RNA viruses 0 0 0

Total 22 14 8

ssD

NA

viru

ses Circoviridae 1 1 0

Geminiviridae 0 0 0 Nanoviridae 0 0 0 Unclassified 0 0 0 ssDNA viruses 0 0 0

Total 1 1 0 All 83 58 25

* Two Hepeviridae (ssRNA+). 85 86

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 50: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

12

Table S2: Assembly statistics for NCLDV metagenome-assembled genomes and 87 corresponding VIPs 88

Metagenome-assembled genome #contigs N50 L50 Min Max Sum VIPs OTUs (OM-RGC.v1 ID)

TARA_IOS_NCLDV_Bin_127_6 14 21,642 5 8,581 35,822 267,607 PolB 000079111

TARA_IOS_NCLDV_Bin_173_3 12 12,913 3 2,807 34,517 108,412 PolB 000248170

TARA_MED_NCLDV_Bin_284_10 34 10,936 10 2,580 29,722 298,760 PolB 000328966

TARA_MED_NCLDV_Bin_284_14 43 14,837 11 2,756 27,607 439,843 PolB 001175669

TARA_IOS_NCLDV_Bin_127_4 26 5,734 10 2,560 8,505 133,765 PolB 001064263 and 010288541

TARA_AON_NCLDV_Bin_289_4 17 9,468 5 3,044 26,201 153,728 PolB 000200745 and 002503270

TARA_MED_NCLDV_Bin_341_10 5 7,800 2 2,534 7,941 30,478 PolB 002682999

TARA_PON_NCLDV_Bin_65_10 35 13,866 11 3,781 43,080 382,455 PolB 000079078

TARA_PON_NCLDV_Bin_102_1 53 4,608 18 2,606 11,485 239,832 PolB 000495602

TARA_AON_NCLDV_Bin_133_8 8 7,204 3 2,686 10,349 51,009 PolB 000240662

N50: The length of the contigs for which half of the assembly size is contained in contigs with a length greater than N50. 89 L50: Number of contigs (or scaffolds) with a size greater or equal to N50. 90 91

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 51: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

13

Table S3: Host predictions per viral and host group for 83 VIPs based on 92 phylogeny, co-occurrence analysis, and genomic context 93 Virus-host relationship Positive VIPs Negative VIPs Total NCLDV-Mamiellales 10 4 14 NCLDV-Prymnesiales 5 1 6 NCLDV-Pelagophyceae 2 1 3 NCLDV-No prediction 26 11 37 RNA virus-Copepoda 7 2 9 RNA virus-Chaetocerotales 2 0 2 RNA virus-Labyrinthulomycetes 1 0 1 RNA virus-No prediction 4 6 10 ssDNA virus-Copepoda 1 0 1

Total 58 25 83 94

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 52: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

14

Table S4: Host prediction per viral OTU for 83 VIPs based on phylogeny, co-95 occurrence analysis, and genomic context 96

97 Notes 98 *1 This virus was located in well-separated clade containing Aurantiochytrium single-stranded RNA virus (AsRNAV) which is 99 known to infect Labyrinthulomycetes. 100 *2 These viruses were grouped within Dicistroviridae (known to infect insects) and may therefore infect marine arthropods such 101 as copepods. 102 *3 This virus was connected with a copepod, mollusk and Collodaria OTUs in the co-occurrence network reconstructed for the 103 mesoplankton size. Circoviridae-like viruses are known to infect copepod. 104

Virustypes Virus OTUs

Direction ofassociationwith CEE

Classification(LCA annotation)

Clade in phylogenic tree usedfor TIM analysis

TIM-basedpredicted host MAGs ID Genome-based

predicted host Suggested host Note

polb_000026723 negative Mimiviridae NA NA NA NA NApolb_000030837 positive Mimiviridae Mimiviridae/Mesomimivirinae Prymnesiales NA NA Prymnesialespolb_000042601 positive Mimiviridae NA NA NA NA NApolb_000054135 negative Mimiviridae Mimiviridae/Mesomimivirinae Collodaria NA NA Prymnesialespolb_000061559 positive Mimiviridae Mimiviridae/Mesomimivirinae Prymnesiales NA NA Prymnesialespolb_000061999 positive Mimiviridae Mimiviridae NA NA NA NApolb_000073352 negative Phycodnaviridae Phycodnaviridae/Prasinovirus NA NA NA Mamiellalespolb_000073545 negative Mimiviridae Mimiviridae/CroV relative NA NA NA NApolb_000079078 negative Mimiviridae Mimiviridae/AaV relative NA PON_NCLDV_Bin_65_10 Pelagophyceae Pelagophyceaepolb_000079111 positive Mimiviridae Mimiviridae/AaV relative NA IOS_NCLDV_Bin_127_6 Pelagophyceae Pelagophyceaepolb_000079365 positive Mimiviridae Mimiviridae NA NA NA NApolb_000110630 negative Mimiviridae Mimiviridae NA NA NA NApolb_000129518 positive Mimiviridae Mimiviridae/Mesomimivirinae Prymnesiales NA NA Prymnesialespolb_000159717 positive Phycodnaviridae NA NA NA NA NApolb_000161220 positive Mimiviridae Mimiviridae/AaV relative NA NA NA Pelagophyceaepolb_000172102 positive Mimiviridae NA NA NA NA NApolb_000194282 positive Phycodnaviridae Phycodnaviridae/Prasinovirus Mamiellales NA NA Mamiellalespolb_000200745 positive Phycodnaviridae Phycodnaviridae/Prasinovirus Mamiellales AON_NCLDV_Bin_289_4 NA Mamiellalespolb_000229407 positive Phycodnaviridae Phycodnaviridae/Prasinovirus NA NA NA Mamiellalespolb_000230224 positive Phycodnaviridae Phycodnaviridae/Prasinovirus NA NA NA Mamiellalespolb_000232032 negative Phycodnaviridae NA NA NA NA NApolb_000236849 negative Phycodnaviridae Phycodnaviridae/Prasinovirus Mamiellales NA NA Mamiellalespolb_000239928 positive Phycodnaviridae Phycodnaviridae/Prasinovirus NA NA NA Mamiellalespolb_000240662 negative Phycodnaviridae Phycodnaviridae/Prasinovirus NA NA NA Mamiellalespolb_000248170 positive Phycodnaviridae Phycodnaviridae/Prasinovirus Mamiellales IOS_NCLDV_Bin_173_3 NA Mamiellalespolb_000249074 negative Phycodnaviridae Phycodnaviridae/Prasinovirus NA NA NA Mamiellalespolb_000249217 positive Phycodnaviridae Phycodnaviridae/Prasinovirus NA NA NA Mamiellalespolb_000251540 negative Phycodnaviridae NA NA NA NA NApolb_000328966 positive Mimiviridae NA NA NCLDV_Bin_284_10 NA NApolb_000396610 positive Mimiviridae NA NA NA NA NApolb_000435873 positive Phycodnaviridae Phycodnaviridae/Prasinovirus NA NA NA Mamiellalespolb_000490625 positive Mimiviridae Mimiviridae NA NA NA NApolb_000495602 negative Iridoviridae NA NA NCLDV_Bin_102_1 NA NApolb_000503865 positive Phycodnaviridae NA NA NA NA NApolb_000673383 negative Mimiviridae Mimiviridae NA NA NA NApolb_000844241 negative Iridoviridae NA NA NA NA NApolb_000912507 positive Mimiviridae Mimiviridae/Mesomimivirinae Collodaria NA NA Prymnesialespolb_001064263 positive Phycodnaviridae NA NA IOS_NCLDV_Bin_127_4 Mamiellales Mamiellalespolb_001175669 positive Mimiviridae NA NA MED_NCLDV_Bin_284_14 NA NApolb_001527691 positive Mimiviridae Mimiviridae/Mesomimivirinae NA NA NA Prymnesialespolb_002035391 positive Phycodnaviridae NA NA NA NA NApolb_002503270 positive Phycodnaviridae Phycodnaviridae/Prasinovirus Mamiellales AON_NCLDV_Bin_289_4 NA Mamiellalespolb_002682999 positive Mimiviridae NA NA NA NA NApolb_003145223 negative Mimiviridae NA NA NA NA NApolb_003319665 positive Mimiviridae NA NA NA NA NApolb_003580241 positive Mimiviridae NA NA NA NA NApolb_004312996 positive Phycodnaviridae NA NA NA NA NApolb_004775027 negative Mimiviridae NA NA NA NA NApolb_004804559 positive Mimiviridae NA NA NA NA NApolb_007102163 positive Mimiviridae NA NA NA NA NApolb_007423474 negative Mimiviridae NA NA NA NA NApolb_007503502 positive Mimiviridae NA NA NA NA NApolb_007771300 positive Phycodnaviridae NA NA NA NA NApolb_008001141 positive Mimiviridae NA NA NA NA NApolb_010288541 positive Phycodnaviridae NA NA IOS_NCLDV_Bin_127_4 Mamiellales NApolb_013294654 positive Phycodnaviridae NA NA NA NA NApolb_013433452 positive Mimiviridae NA NA NA NA NApolb_014364115 positive Mimiviridae NA NA NA NA NApolb_015514497 positive Mimiviridae NA NA NA NA NApolb_015907472 positive Phycodnaviridae NA NA NA NA NArdrp_105714054 negative Picornavirales NA NA NA NA NArdrp_107558617 negative Picornavirales Picornavirales/Bacillarnavirus NA NA NA NArdrp_30787766 positive Picornavirales NA NA NA NA NArdrp_32150057 positive Picornavirales NA NA NA NA NArdrp_32150309 positive Picornavirales Picornavirales/Labyrnavirus NA NA NA Labyrinthulomycetes *1rdrp_32202687 positive Picornavirales Picornavirales/Dicistroviridae NA NA NA Copepodardrp_33049404 positive Picornavirales Picornavirales/Bacillarnavirus Chaetocerotales NA NA Chaetocerotalesrdrp_35179764 positive Picornavirales Picornavirales/Bacillarnavirus Chaetocerotales NA NA Chaetocerotalesrdrp_35713768 positive Partitiviridae NA NA NA NA NArdrp_36496887 positive Picornavirales Picornavirales NA NA NA NArdrp_36505302 positive Picornavirales Picornavirales/Dicistroviridae NA NA NA Copepoda *2rdrp_42335229 negative Hepeviridae NA NA NA NA NArdrp_49554577 negative Picornavirales Picornavirales NA NA NA NArdrp_54294427 positive Picornavirales Picornavirales/Dicistroviridae NA NA NA Copepoda *2rdrp_59731273 negative Picornavirales Picornavirales NA NA NA NArdrp_77677770 negative Hepeviridae NA NA NA NA NArdrp_77677810 negative Picornavirales Picornavirales/Dicistroviridae NA NA NA Copepoda *2rdrp_84897402 negative Picornavirales Picornavirales/Dicistroviridae NA NA NA Copepoda *2rdrp_8626697 positive Picornavirales Picornavirales/Dicistroviridae NA NA NA Copepoda *2rdrp_8855752 positive Picornavirales Picornavirales/Dicistroviridae NA NA NA Copepoda *2rdrp_9164160 positive Picornavirales Picornavirales/Dicistroviridae NA NA NA Copepoda *2rdrp_9164163 positive Picornavirales Picornavirales/Dicistroviridae NA NA NA Copepoda *2

ssDNAviruses rep_38177659 positive Circoviridae NA NA NA NA Copepoda *3

NCLDVs

RNAviruses

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 53: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

15

Table S5: Statistics for the FlashWeave co-occurrence graphs 105 Viral marker gene

Planktonic size fraction* #Samples #Viral OTUs #Eukaryotic

OTUs #Edges in graph

#Virus-to-eukaryote edges

#Viruses connected to a eukaryote (%)

NCLDVs PolB Piconano 99 2269 4936 20934 3594 1735 (76)

Nano 51 1775 1872 6704 1027 721 (41)

Micro 92 2205 2524 12189 2101 1299 (59)

Meso 95 2238 2250 11624 1796 1126 (50)

RNA viruses RdRP Piconano 60 125 4484 10754 446 122 (98)

Nano 36 53 1768 2659 124 46 (87)

Micro 62 124 2407 5351 367 117 (94)

Meso 62 48 2100 4329 116 42 (88)

ssDNA viruses Rep Piconano 60 64 4484 10577 205 63 (98%)

Nano 36 1 1768 2563 2 1 (100%)

Micro 62 4 2407 5086 9 4 (100%)

Meso 62 8 2100 4242 24 8 (100%)

* Pico: 0.8 to 5 μm, Nano: 5 to 20 μm, Micro: 20 to 180 μm, Meso: 180 to 2000 μm 106 107

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 54: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

16

Table S6: Functional differences between eukaryotes found to be best connected to 108 VIPs and non-VIPs 109

Functional trait Positive VIPs (n = 50) Non-VIPs (n = 983)

P-value (Fisher’s exact

test, two sided)

Adjusted P- value (BH) (Q)

Presence Absence Presence Absence

Chloroplast 20 30 276 690 0.109 0.164

Silicification 11 39 60 920 0.000 0.001

Calcification 1 49 30 950 1.000 1.000

Functional trait Negative VIPs (n = 21) Non-VIPs (n = 983) P-value

(Fisher’s exact test, two sided)

Adjusted P- value (BH) (Q)

Presence Absence Presence Absence

Chloroplast 3 17 276 690 0.218 0.655

Silicification 0 21 60 920 0.632 0.947

Calcification 0 21 30 950 1.000 1.000

110

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 55: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

17

Table S7: Functional differences between eukaryotes found to be best connected to 111 positive and negative VIPs 112

Functional trait Positive VIPs (n = 50) Negative VIPs (n = 21) P-value

(Fisher extact test, two sided)

Adjusted P- value (BH) (Q)

Presence Absence Presence Absence

Chloroplast 20 30 3 17 0.053 0.079

Silicification 11 39 0 21 0.027 0.079

Calcification 1 49 0 21 1.000 1.000

113

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 56: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

18

Transparent Methods 114

Data context 115

We used publicly available data generated in the framework of the Tara Oceans expedition. 116

Single-copy marker-gene sequences for NCLDVs and RNA viruses were identified from two 117

gene catalogs: the Ocean Microbial Reference Gene Catalog (OM-RGC) and the Marine Atlas 118

of Tara Oceans Unigenes (MATOU). The viral marker-gene read count profiles used in our 119

study are as previously reported for prokaryotic-sized metagenomes (size fraction 0.2–3 µm) 120

(Sunagawa et al., 2015) and eukaryotic-sized metatranscriptomes (Carradec et al., 2018). 121

Eukaryotic plankton samples (the same samples were used for metatranscriptomes, 122

metagenomes and 18S rRNA V9-meta-barcodes) were filtered for categorization into the 123

following size classes: piconano (0.8–5 µm), nano (5–20 µm), micro (20–180 µm), and meso 124

(180–2,000 µm). For eukaryotic 18S rRNA V9 OTUs (de Vargas et al., 2015), we used an 125

updated version of the data that included functional trait annotations (chloroplast-bearing, 126

silicified, and calcified organisms) of V9 OTUs. Occurrence profiles are compositional 127

matrices in which gene occurrence are expressed as unnormalized (V9 meta-barcode data) or 128

gene-length normalized (shotgun data) read counts. Indirect measurements of carbon export 129

(mg m-2 d-1) in 5-m increments from the surface to a 1,000-m depth were taken from Guidi et 130

al. (Guidi et al., 2016) The original measurements were derived from the distribution of 131

particle sizes and abundances collected using an underwater vision profiler. These raw data 132

are available from PANGEA (Picheral et al., 2014). Net primary production (NPP) data were 133

extracted and averaged from 8-day composites of the vertically generalized production model 134

(VGPM) (Behrenfeld and Falkowski, 1997) for the week of sampling. Thus, in this study, the 135

comparisons between NPP and other parameters were not made at the same time point. This 136

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 57: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

19

might have affected the results of the regression analysis, especially if there were any short-137

term massive bloom events, although there was no bloom signal during the sampling period. 138

Carbon export, carbon export efficiency, and particle size distribution 139

Carbon flux profiles (mg m-2 d-1) were estimated based on particle size distributions and 140

abundances. The method used for carbon flux estimation was previously calibrated comparing 141

sediment trap measurement and data from imaging instruments (Guidi et al., 2008). Carbon 142

flux values from depths of 30 to 970 meters were divided into 20-m bins, each obtained by 143

averaging the carbon flux values from the designated 20 m in profiles gathered during 144

biological sampling within a 25-km radius over 24 h when less than 50% of data were missing 145

(Figure S5). Carbon export (CE) was defined as the carbon flux at 150 m (Guidi et al., 2016). 146

Carbon export efficiency was calculated as follows: CEE = CEdeep/CEsurface (Buesseler and 147

Boyd, 2009). To compare stations with different water column structures, we defined CEsurface 148

as the maximum CE (in a 20 m window) within the first 150 m. CEdeep is the average CE (also 149

in a 20 m window) 200 m below this maximum. The 150 m limit serves as a reference point 150

to automatize the calculation of CEsurface and CEdeep. The 150m-depth layer was selected 151

because often used as a reference depth for drifting sediment trap and because most of the 152

deep chlorophyll maximum (DCM) were shallower except at two (stations 98 (175 m) and 153

100 (180 m)). The maximum CEsurface for these two stations was above 150 m. The sampling 154

strategy of Tara Oceans was designed to study a variety of marine ecosystems and to target 155

well-defined meso- to large-scale features (based on remote-sensing data). Therefore, this 156

strategy avoided sampling water with important lateral inputs. Nevertheless, the possibility of 157

having locations with potential lateral transport cannot be excluded. 158

We obtained the particle size distribution (PSD) profiles generated by the Tara Oceans 159

expedition and computed the PSD slope at each depth for all profiles. The slope value 160

(denoted “b”) is used as the descriptor of the particle size distribution as defined in a previous 161

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 58: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

20

work (Guidi et al., 2009). For example, b = −5 indicates the presence of a large proportion of 162

smaller particles, whereas b = −3 indicates a preponderance of larger particles. We averaged 163

the slope values at each sampling site in the same way as for carbon export flux. 164

Identification of viral marker genes from ocean gene catalogs 165

Viral genes were collected from two gene catalogs: OM-RGC version 1 and MATOU. 166

Sequences in these two gene catalogs are representatives of clusters of environmental 167

sequences (clustered at 95% nucleotide identity). The OM-RGC data were taxonomically re-168

annotated, with the NCBI reference tree used to determine the last common ancestor modified 169

to reflect the current classification of NCLDVs (Carradec et al., 2018). We automatically 170

classified viral gene sequences as eukaryotic or prokaryotic according to their best BLAST 171

score against viral sequences in the Virus-Host Database (Mihara et al., 2016). DNA 172

polymerase B (PolB), RNA-dependent RNA polymerase (RdRP), and replication-associated 173

protein (Rep) genes were used as markers for NCLDVs, RNA viruses, and ssDNA viruses, 174

respectively. For PolB, reference proteins from the NCLDV orthologous gene cluster 175

NCVOG0038 (Yutin et al., 2009) were aligned using MAFFT-linsi (Katoh and Standley, 176

2013). A hidden Markov model (HMM) profile was constructed from the resulting alignment 177

using hmmbuild (Eddy, 2011). This PolB HMM profile was searched against OM-RGC amino 178

acid sequences and translated MATOU sequences annotated as NCLDVs, and sequences 179

longer than 200 amino acids that had hits with E-values < 1 ´ 10-5 were selected as putative 180

PolBs. RdRP sequences were chosen from the MATOU catalog as follows: sequences 181

assigned to Pfam profiles PF00680, PF00946, PF00972, PF00978, PF00998, PF02123, 182

PF04196, PF04197, or PF05919 and annotated as RNA viruses were retained as RdRPs. For 183

Rep, we reconstructed an HMM profile using a comprehensive set of reference sequences 184

(Kazlauskas et al., 2018) and searched this profile against the translated MATOU sequences 185

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 59: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

21

annotated as ssDNA viruses. We kept sequences that had hits with E-values < 1 ´ 10-5 and 186

removed those that contained frameshifts. 187

The procedure above identified 3,486 PolB sequences in the metagenomic samples and 188

respectively 975, 388, and 299 RdRP, PolB, and Rep sequences in the metranscriptomes. 189

Testing for associations between viruses with CEE, CE150, and NPP 190

To test for associations between occurrence of viral marker genes and CEE, CE150, and NPP, 191

we proceeded as follows. Samples with CEE values greater than one and with Z-score greater 192

than two were considered as outliers and removed (this removed the two samples from station 193

68). Only marker genes represented by at least two reads in five or more samples were 194

retained (lowering this minimal number of required samples down to three or four did not 195

improve the PLS regression model). To cope with the sparsity and composition of the data, 196

gene-length normalized read count matrices were center log-ratio transformed, separately for 197

ssDNA viruses, RNA viruses and NCLDVs. We next selected genes with Spearman 198

correlation coefficients with CEE, CE150 or NPP greater than 0.2 or smaller than −0.2 (zero 199

values were removed). To assess the association between these marker genes and CEE, we 200

used partial least square (PLS) regression analysis. The number of components selected for 201

the PLS model was chosen to minimize the root mean square error of prediction (Figure S6). 202

We assessed the strength of the association between carbon export (the response variable) and 203

viral marker genes occurrence (the explanatory variable) by correlating leave-one-out cross-204

validation predicted values with the measured carbon export values. We tested the 205

significance of the correlation by comparing the original Pearson coefficients between 206

explanatory and response variables with the distribution of Pearson coefficients obtained from 207

PLS models reconstructed based on permutated data (10,000 iterations). We estimated the 208

contribution of each gene (predictor) according to its variable importance in the projection 209

(VIP) score derived from the PLS regression model using all samples. The VIP score of a 210

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 60: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

22

predictor estimates its contribution in the PLS regression. Predictors with high VIP scores (> 211

2) were assumed to be important for the PLS prediction of the response variable. 212

Phylogenetic analysis 213

Environmental PolB sequences annotated as NCLDVs were searched against reference 214

NCLDV PolB sequences using BLAST. Environmental sequences with hits to a reference 215

sequence that had > 40% identity and an alignment length greater than 400 amino acids were 216

kept and aligned with reference sequences using MAFFT-linsi. Environmental RdRP 217

sequences annotated as were translated into six frames of amino acid sequences, and reference 218

RNA viruses RdRP sequences collected from the Virus-Host Database were searched against 219

the Conserved Domain Database (CDD) using rpsBLAST. The resulting alignment was used 220

to trim reference and environmental RdRP sequences to the conserved part corresponding to 221

the domain, CDD: 279070, before alignment with MAFFT-linsi. Rep sequences annotated as 222

ssDNA viruses were treated similarly. PolB, RdRP, and Rep multiple sequence alignments 223

were manually curated to discard poorly aligned sequences. Phylogenetic trees were 224

reconstructed using the the build function of ETE3 (Huerta-Cepas et al., 2016) of the 225

GenomeNet TREE tool (https://www.genome.jp/tools-bin/ete). Columns were automatically 226

trimmed using trimAl (Capella-Gutiérrez et al., 2009), and trees were constructed using 227

FastTree with default settings (Price et al., 2009). 228

A similar procedure was applied for the trees used in the hosts prediction analysis albeit 229

selecting sequences for the Phycodnaviridae/Mimiviridae (Figure S8) and the Picornavirales 230

(Figure S9) and removing the ones occurring in fewer than 10 samples, to reduce the size of 231

the tree. 232

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 61: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

23

Virus–eukaryote co-occurrence analysis 233

We used FlashWeave (Tackmann et al., 2019) with Julia 1.2.0 to predict virus–host 234

interactions based on their co-occurrence patterns. Read count matrices for the 3,486 PolBs, 235

975 RdRPs, 299 Reps, and 18S rRNA V9 DNA barcodes obtained from samples collected at 236

the same location were fed into FlashWeave. The 18S rRNA V9 data were filtered to retain 237

OTUs with an informative taxonomic annotation. The 18S rRNAV9 OTUs and viral marker 238

sequences were further filtered to conserve only those present in at least five samples. 239

FlashWeave networks were learned for each of the four eukaryotic size fractions with the 240

parameters ‘heterogenous’ = false and ‘sensitive’ = true, and edges receiving a weight > 0.2 241

and a Q-value < 0.01 (the default) were retained. The number of samples per size fraction 242

ranged between 51 and 99 for NCLDVs and between 36 and 62 for RNA and ssDNA viruses. 243

The number of retained OTUs per size fraction varied between 1,775 and 2,269 for NCLDVs 244

and between 48 and 125 for RNA viruses (Table S5). 245

Mapping of putative hosts onto viral phylogenies 246

In our association networks, individual viral sequences were often associated with multiple 247

18S rRNA V9 OTUs belonging to drastically different eukaryotic groups, a situation that can 248

reflect interactions among multiple organisms but also noise associated with this type of 249

analysis (Coenen and Weitz, 2018). To extract meaningful information from these networks, 250

we reasoned as follows. We assumed that evolutionarily related viruses infect evolutionarily 251

related organisms, similar to the case of phycodnaviruses (Clasen and Suttle, 2009). In the 252

interaction networks, the number of connections between viruses in a given clade and the 253

associated eukaryotic host group should accordingly be enriched compared with the number 254

of connections with non-host organisms arising by chance. Following this reasoning, we 255

assigned the most likely eukaryotic host group as follows. The tree constructed from viral 256

marker-gene sequences (PolB, RdRP or Rep) was traversed from root to tips to visit every 257

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 62: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

24

node. We counted how many connections existed between leaves of each node and the V9-258

OTUs of a given eukaryotic group (order level). We then tested whether the node was 259

enriched compared with the rest of the tree using Fischer’s exact test and applied the 260

Benjamini–Hochberg procedure to control the false discovery rate among comparisons of 261

each eukaryotic taxon (order level). To avoid the appearance of significant associations driven 262

by a few highly connected leaves, we required half of the leaves within a node to be 263

connected to a given eukaryotic group. Significant enrichment of connections between a virus 264

clade and a eukaryotic order was considered to be indicative of a possible virus–host 265

relationship. We refer to the above approach, in which taxon interactions are mapped onto a 266

phylogenetic tree of a target group using the organism’s associations predicted from a species 267

co-occurrence-based network, as TIM, for Taxon Interaction Mapper. This tool is available at 268

https://github.com/RomainBlancMathieu/TIM. This approach can be extended to interactions 269

other than virus–host relationships. 270

Assembly of NCLDV metagenome-assembled genomes (MAGs) 271

NCLDV metagenome-assembled genomes (MAGs) were assembled from Tara Oceans 272

metagenomes corresponding to size fractions > 0.8 µm. Metagenomes were first organized 273

into 11 ‘metagenomic sets’ based upon their geographic coordinates, and each set was co-274

assembled using MEGAHIT (Li et al., 2015) v.1.1.1. For each set, scaffolds longer than 2.5 275

kbp were processed within the bioinformatics platform anvi’o (Eren et al., 2015) v.6.1 276

following methodology described previously for genome-resolved metagenomics (Delmont et 277

al., 2018). Briefly, we used the automatic binning algorithm CONCOCT (Alneberg et al., 278

2014) to identify large clusters of contigs using both sequence composition and differential 279

coverage across metagenomes within the set. We then used HMMER (Eddy, 2011) v3.1b2 to 280

search for a collection of eight NCLDV gene markers (Guglielmini et al., 2019), and 281

identified NCLDV MAGs by manually binning CONCOCT clusters of interest using the 282

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 63: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

25

anvi’o interactive interface. The interface displayed hits for the eight gene markers alongside 283

coverage values across metagenomes and GC-content. Finally, NCLDV MAGs were 284

manually curated using the same interface, to minimize contamination as described previously 285

(Delmont and Eren, 2016). 286

Taxonomic composition of genes predicted in NCLDV genomes of VIPs 287

VIP’s PolB sequences were searched (using BLAST) against MAGs reconstructed from the 288

metagenomes of the eukaryotic size fraction (> 0.8 µm) and against contigs used to produce 289

OM-RGCv1. Genome fragments covering 95% of the length of PolB VIPs with > 95% 290

nucleotide identity were considered as originating from a same viral OTUs. Genes were 291

predicted and annotated taxonomically with the same procedure described above 292

(identification of viral marker genes). Genes contained in viral genome fragments and 293

annotated as cellular organisms with amino acid identities > 60% were manually inspected 294

(Supplemental Data 2). 295

Statistical test 296

All the statistical significance assessments were performed with two-sided test. 297

Supplemental References 298 Alneberg, J., Bjarnason, B.S., Bruijn, I. de, Schirmer, M., Quick, J., Ijaz, U.Z., Lahti, L., 299 Loman, N.J., Andersson, A.F., and Quince, C. (2014). Binning metagenomic contigs by 300 coverage and composition. Nat. Methods 11, 1144–1146. 301

Behrenfeld, M.J., and Falkowski, P.G. (1997). Photosynthetic rates derived from satellite-302 based chlorophyll concentration. Limnol. Oceanogr. 42, 1–20. 303

Buesseler, K.O., and Boyd, P.W. (2009). Shedding light on processes that control particle 304 export and flux attenuation in the twilight zone of the open ocean. Limnol. Oceanogr. 54, 305 1210–1232. 306

Capella-Gutiérrez, S., Silla-Martínez, J.M., and Gabaldón, T. (2009). trimAl: a tool for 307 automated alignment trimming in large-scale phylogenetic analyses. Bioinforma. Oxf. Engl. 308 25, 1972–1973. 309

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 64: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

26

Carradec, Q., Pelletier, E., Silva, C.D., Alberti, A., Seeleuthner, Y., Blanc-Mathieu, R., Lima-310 Mendez, G., Rocha, F., Tirichine, L., Labadie, K., et al. (2018). A global ocean atlas of 311 eukaryotic genes. Nat. Commun. 9, 373. 312

Clasen, J.L., and Suttle, C.A. (2009). Identification of freshwater Phycodnaviridae and their 313 potential phytoplankton hosts, using DNA pol sequence fragments and a genetic-distance 314 analysis. Appl. Environ. Microbiol. 75, 991–997. 315

Coenen, A.R., and Weitz, J.S. (2018). Limitations of Correlation-Based Inference in Complex 316 Virus-Microbe Communities. MSystems 3, e00084-18. 317

Delmont, T.O., and Eren, A.M. (2016). Identifying contamination with advanced visualization 318 and analysis practices: metagenomic approaches for eukaryotic genome assemblies. PeerJ 4, 319 e1839. 320

Delmont, T.O., Quince, C., Shaiber, A., Esen, Ö.C., Lee, S.T., Rappé, M.S., McLellan, S.L., 321 Lücker, S., and Eren, A.M. (2018). Nitrogen-fixing populations of Planctomycetes and 322 Proteobacteria are abundant in surface ocean metagenomes. Nat. Microbiol. 3, 804–813. 323

Eddy, S.R. (2011). Accelerated Profile HMM Searches. PLOS Comput. Biol. 7, e1002195. 324

Eren, A.M., Esen, Ö.C., Quince, C., Vineis, J.H., Morrison, H.G., Sogin, M.L., and Delmont, 325 T.O. (2015). Anvi’o: an advanced analysis and visualization platform for ‘omics data. PeerJ 3, 326 e1319. 327

Guglielmini, J., Woo, A.C., Krupovic, M., Forterre, P., and Gaia, M. (2019). Diversification 328 of giant and large eukaryotic dsDNA viruses predated the origin of modern eukaryotes. Proc. 329 Natl. Acad. Sci. 116, 19585–19592. 330

Guidi, L., Jackson, G.A., Stemmann, L., Miquel, J.C., Picheral, M., and Gorsky, G. (2008). 331 Relationship between particle size distribution and flux in the mesopelagic zone. Deep Sea 332 Res. Part Oceanogr. Res. Pap. 55, 1364–1374. 333

Guidi, L., Stemmann, L., Jackson, G.A., Ibanez, F., Claustre, H., Legendre, L., Picheral, M., 334 and Gorskya, G. (2009). Effects of phytoplankton community on production, size, and export 335 of large aggregates: A world-ocean analysis. Limnol. Oceanogr. 54, 1951–1963. 336

Guidi, L., Chaffron, S., Bittner, L., Eveillard, D., Larhlimi, A., Roux, S., Darzi, Y., Audic, S., 337 Berline, L., Brum, J.R., et al. (2016). Plankton networks driving carbon export in the 338 oligotrophic ocean. Nature 532, 465. 339

Huerta-Cepas, J., Serra, F., and Bork, P. (2016). ETE 3: Reconstruction, analysis and 340 visualization of phylogenomic data. Mol. Biol. Evol. msw046. 341

Katoh, K., and Standley, D.M. (2013). MAFFT multiple sequence alignment software version 342 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780. 343

Kazlauskas, D., Varsani, A., and Krupovic, M. (2018). Pervasive Chimerism in the 344 Replication-Associated Proteins of Uncultured Single-Stranded DNA Viruses. Viruses 10. 345

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 65: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

27

Li, D., Liu, C.-M., Luo, R., Sadakane, K., and Lam, T.-W. (2015). MEGAHIT: an ultra-fast 346 single-node solution for large and complex metagenomics assembly via succinct de Bruijn 347 graph. Bioinforma. Oxf. Engl. 31, 1674–1676. 348

Mihara, T., Nishimura, Y., Shimizu, Y., Nishiyama, H., Yoshikawa, G., Uehara, H., Hingamp, 349 P., Goto, S., and Ogata, H. (2016). Linking Virus Genomes with Host Taxonomy. Viruses 8, 350 66. 351

Picheral, M., Searson, S., Taillandier, V., Bricaud, A., Boss, E., Stemmann, L., Gorsky, G., 352 Tara Oceans Consortium, C., and Tara Oceans Expedition, P. (2014). Vertical profiles of 353 environmental parameters measured from physical, optical and imaging sensors during Tara 354 Oceans expedition 2009-2013. 355

Price, M.N., Dehal, P.S., and Arkin, A.P. (2009). FastTree: Computing Large Minimum 356 Evolution Trees with Profiles instead of a Distance Matrix. Mol. Biol. Evol. 26, 1641–1650. 357

Sunagawa, S., Coelho, L.P., Chaffron, S., Kultima, J.R., Labadie, K., Salazar, G., 358 Djahanschiri, B., Zeller, G., Mende, D.R., Alberti, A., et al. (2015). Ocean plankton. Structure 359 and function of the global ocean microbiome. Science 348, 1261359. 360

Tackmann, J., Matias Rodrigues, J.F., and von Mering, C. (2019). Rapid Inference of Direct 361 Interactions in Large-Scale Ecological Networks from Heterogeneous Microbial Sequencing 362 Data. Cell Syst. 9, 286-296.e8. 363

de Vargas, C., Audic, S., Henry, N., Decelle, J., Mahé, F., Logares, R., Lara, E., Berney, C., 364 Le Bescot, N., Probert, I., et al. (2015). Ocean plankton. Eukaryotic plankton diversity in the 365 sunlit ocean. Science 348, 1261605. 366

Yutin, N., Wolf, Y.I., Raoult, D., and Koonin, E.V. (2009). Eukaryotic large nucleo-367 cytoplasmic DNA viruses: Clusters of orthologous genes and reconstruction of viral genome 368 evolution. Virol. J. 6, 223. 369

370

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint

Page 66: Eukaryotic virus composition can predict the efficiency of … · 1 1 Eukaryotic virus composition can predict the efficiency of carbon export in the 2 global ocean 3 Hiroto Kaneko1,+,

Supplemental Data 1,2

Click here to access/downloadSupplemental File Sets

Supplemental_Data_1_2.xlsx

.CC-BY-NC 4.0 International licenseavailable under anot certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

The copyright holder for this preprint (which wasthis version posted October 7, 2020. ; https://doi.org/10.1101/710228doi: bioRxiv preprint