Assessment criteria include, paper format, content literature cited and article choice. In addition to the three articles used for analysis, supporting references should also be used. A minimum of one unique supporting article is required in the Introduction and Conclusion sections. A student example paper has been uploaded to Blackboard. This paper is well-done and hits almost all the “Meets expectations” criteria in the rubric.
Summary sums up the strengths and weaknesses of each article; compares and contrasts articles to one another summary lacks some details, misses a strength/weakness of one of the articles; occasionally compares and contrasts but mainly lists each articles strengths and weakness separately summary lacks detail, includes either strengths or weaknesses; fails to compare and contrasts articles to one another
Significance establishes practical and theoretical significance of body of work; has your chosen article been cited by others; did your articles spark other researches hypotheses or questions; are there any practical applications; implication (social, political, technological, medical) to the research; cites at least one other supporting reference (unique from introduction) logic not clear to the theoretical significance of the body of work; not thorough in establishing its significance; cites at least one other supporting reference (unique from introduction) no connection to theoretical significance of body of work; fails to cite at least one supporting reference Literature cited Format one journal format chosen and used throughout in bibliography and in-text citations some in-text citations were not in the same format; 1-2 errors in bibliography consistency lacking for in-text citations; bibliography with 3+ formatting errors Subject Chosen articles were all on the same topic; topic was specific enough so that an analysis was possible topics were not consistent or were too broad/general each article was on a separate topic and the topics were without reasonable similarities Citation Each reference was used and cited correctly within the body of the paper; three focal references were analyzed; at least 5 references used references were occasionally cited incorrectly; three focal references were analyzed; 4 total references used two or fewer references were analyzed; no supporting references used Quantity minimum of 5, 1 unique to intro, 1 unique to discussion and 3 critically reviewed missing 1 unique missing 2 unique and/or 1 of critically reviewed.
Host-Microbe Coevolution: Applying Evidence from Model
Systems to Complex Marine Invertebrate Holobionts
Paul A. O’Brien,a,b,c Nicole S. Webster,b,c,d David J. Miller,e,f David G. Bournea,b,c
aCollege of Science and Engineering, James Cook University, Townsville, QLD, Australia
bAustralian Institute of Marine Science, Townsville, QLD, Australia
cAIMS@JCU, Townsville, QLD, Australia
dAustralian Centre for Ecogenomics, University of Queensland, Brisbane, QLD, Australia
eARC Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, QLD, Australia
fCentre for Tropical Bioinformatics and Molecular Biology, James Cook University, Townsville, QLD, Australia
ABSTRACT Marine invertebrates often host diverse microbial communities, making
it difficult to identify important symbionts and to understand how these communi-
ties are structured. This complexity has also made it challenging to assign microbial
functions and to unravel the myriad of interactions among the microbiota. Here we
propose to address these issues by applying evidence from model systems of host-
microbe coevolution to complex marine invertebrate microbiomes. Coevolution is
the reciprocal adaptation of one lineage in response to another and can occur
through the interaction of a host and its beneficial symbiont. A classic indicator of
coevolution is codivergence of host and microbe, and evidence of this is found in
both corals and sponges. Metabolic collaboration between host and microbe is of-
ten linked to codivergence and appears likely in complex holobionts, where micro-
bial symbionts can interact with host cells through production and degradation of
metabolic compounds. Neutral models are also useful to distinguish selected mi-
crobes against a background population consisting predominately of random associ-
ates. Enhanced understanding of the interactions between marine invertebrates and
their microbial communities is urgently required as coral reefs face unprecedented
local and global pressures and as active restoration approaches, including manipula-
tion of the microbiome, are proposed to improve the health and tolerance of reef
species. On the basis of a detailed review of the literature, we propose three re-
search criteria for examining coevolution in marine invertebrates: (i) identifying sto-
chastic and deterministic components of the microbiome, (ii) assessing codivergence
of host and microbe, and (iii) confirming the intimate association based on shared
metabolic function.
KEYWORDS codivergence, coevolution, marine invertebrates, microbiome,
phylosymbiosis
Coevolution theory dates back to the 19th century (box 1), and coevolution iscurrently referred to as the reciprocal evolution of one lineage in response to
another (1). This definition encompasses a broad range of interactions such as predator-
prey, host-symbiont, and host-parasite interactions or interactions among the members
of a community of organisms such as a host and its associated microbiome (1, 2). In the
case of host-microbe associations, this has produced some of the most remarkable
evolutionary outcomes that have shaped life on Earth, such as the eukaryotic cell,
multicellularity, and the development of organ systems (3, 4). It is now recognized that
microbial associations with a multicellular host represent the rule rather than the
Citation O’Brien PA, Webster NS, Miller DJ,
Bourne DG. 2019. Host-microbe coevolution:
applying evidence from model systems to
complex marine invertebrate holobionts. mBio
10:e02241-18. https://doi.org/10.1128/mBio
.02241-18.
Editor Danielle A. Garsin, University of Texas
Health Science Center at Houston
Copyright © 2019 O’Brien et al. This is an
open-access article distributed under the terms
of the Creative Commons Attribution 4.
0
International license.
Address correspondence to David G. Bourne,
david.bourne@jcu.edu.au.
Published 5 February 2019
MINIREVIEW
Host-Microbe Biology
crossm
January/February 2019 Volume 10 Issue 1 e02241-18 ® mbio.asm.org
1
D
ow
nl
oa
de
d
fr
om
h
tt
ps
:/
/j
ou
rn
al
s.
as
m
.o
rg
/j
ou
rn
al
/m
bi
o
on
0
2
F
eb
ru
ar
y
20
22
b
y
24
.1
16
.2
51
.2
22
.
https://doi.org/10.1128/mBio.02241-18
https://doi.org/10.1128/mBio.02241-18
https://creativecommons.org/licenses/by/4.0/
https://creativecommons.org/licenses/by/4.0/
mailto:david.bourne@jcu.edu.au
https://crossmark.crossref.org/dialog/?doi=10.1128/mBio.02241-18&domain=pdf&date_stamp=2019-2-5
https://mbio.asm.org
exception (4), but in complex associations of that kind, the extent to which coevolution
operates is often unclear.
BOX 1: A BRIEF HISTORY OF COEVOLUTION
Charles Darwin once explained the sudden and rapid diversification of flowering
plants as an “abominable mystery,” since it could not be explained by traditional
views of evolution alone (5). While his correspondent Gaston de Saporta speculated
that a biological interaction between flowering plants and insects might be the
cause of the phenomenon, it was not until nearly 100 years later that the concept of
coevolution developed. In a pioneering study, Ehrlich and Raven (6) observed that
related groups of butterflies were feeding on related groups of plants and specu-
lated this was due to a process for which they coined the name “coevolution.” Using
butterflies, they argued that plants had evolved mechanisms to overcome predation
from herbivores, which in turn had evolved new ways to prey on plants. Decades on,
the introduction of phylogenetics has shown that plants evolved in the absence of
butterflies, which colonized the diverse group of plants after their chemical defenses
were already in place (7). Nevertheless, the theory of coevolution was endorsed, and
two important points came to light. First, care must be taken when inferring
coevolution from seemingly parallel lines of evolution, and where possible, diver-
gence times and common ancestry should be included. Second, coevolution can
occur between communities of organisms (“guild” coevolution), as observed in the
case of flowering plants, where predation and pollination from a wide variety of
insects likely influenced the diversification of angiosperms (8).
Since coevolution can occur across multiple levels of interactions, multiple theories
have also developed. The Red Queen theory is based on the concept of antagonistic
coevolution and assumes that an adaptation that increases the fitness of one species
will come at the cost to the fitness of another (9). This type of coevolution has been
most pronounced in host-parasite interactions, where the antagonistic interactions are
closely coupled (10). However, coevolutionary patterns may also arise in the case of
mutualistic symbioses, which require reciprocal adaptations to the benefit of each
partner (11). Mutualistic coevolution is associated with a number of key traits that are
discussed further in this review, such as obligate symbiosis, vertical inheritance, and
metabolic collaboration. Third, coevolution has also recently been placed in context of
the hologenome theory (12), which suggests that the holobiont can act as a unit of
selection (but not necessarily as the primary unit) since the combined genomes
influence the host phenotype on which selection may operate (13, 14). However,
hologenome theory also acknowledges that selection acts on each component of the
holobiont individually as well as in combination with other components (including the
host). Thus, the entity that is the hologenome may be formed, in part, through
coevolution of interacting holobiont compartments, in addition to neutral processes
(12).
Given the ubiquitous nature of host-microbe associations and the huge metabolic
potential that microorganisms represent, it is not surprising that evidence of host-
microbe coevolution is emerging. Model representatives of both simple and complex
associations are being used to study coevolution, allowing researchers to look for
specific traits, signals, and patterns (1, 15). A well-known model system is the pea-aphid
and its endosymbiotic bacteria in the genus Buchnera. This insect has evolved special-
ized cells known as bacteriocytes to host its endosymbionts, which in turn synthesize
and translocate amino acids that are missing from the diet of the pea aphids (16).
Amino acid synthesis occurs through intimate cooperation between host and symbiont,
with some pathways missing from the host and some from the symbiont, such that the
relationship is obligate to the extent that the one organism cannot survive without the
other (17). The human gut microbiome has been extensively studied in complex
systems and has been shown to be intimately associated with human health. Gut
Minireview ®
January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 2
D
ow
nl
oa
de
d
fr
om
h
tt
ps
:/
/j
ou
rn
al
s.
as
m
.o
rg
/j
ou
rn
al
/m
bi
o
on
0
2
F
eb
ru
ar
y
20
22
b
y
24
.1
16
.2
51
.2
22
.
https://mbio.asm.org
microbes have been shown to be linked with human behavior and development
through metabolic processes, such as microbial regulation of the essential amino acid
tryptophan (18, 19). The human microbiome contains around 150-fold more nonre-
dundant genes than the human genome (20), and the metabolic capacity of microbes
residing in the intestine is believed to have been a driving evolutionary force in the
host-microbe coevolution of humans (2). In these examples, as well as many others
(21–23), both host and symbiont evolved to maintain and facilitate the symbiosis.
Furthermore, phylogenies of host and symbiont in these systems are often mirrored,
indicating that host and symbiont are diverging in parallel (16, 24, 25), a phenomenon
known as codivergence (26).
In the marine environment, invertebrates can host microbial communities as simple
and stable as that of the pea aphid or as complex and dynamic as that of the human
gut (Fig. 1). The Hawaiian bobtail squid, for example, maintains an exclusive symbiosis
with a single bacterial symbiont which it hosts within a specialized light organ (27). On
the other hand, corals host enormously diverse microbial communities, comprising
thousands of species-level operational taxonomic units (OTUs), which are often influ-
enced by season, location, host health, and host genotype (28–31). Marine sponges also
host complex microbial communities with diversity comparable to that of corals (32)
but with associations that are generally far more stable in space and time (33).
Less-diverse microbial communities are found in the sea anemone Aiptasia, where the
number of OTUs is generally in the low hundreds (34). Due to the close taxonomic
relationship of Aiptasia with coral and its comparatively simple microbial community, it
has been proposed as a model organism for studying coral microbiology and symbiosis
(34). Some marine invertebrates also include species along a continuum of microbial
diversities. Ascidians, for example, have been shown to host fewer than 10 (Polycarpa
aurata) or close to 500 (Didemnum sp.) microbial OTUs within their inner tunic (35).
Furthermore, species with low microbial diversity such as P. aurata can exhibit high
intraspecific variation, with as few as 8% of OTUs shared among individuals of the same
species (35). Taken together, the data from those studies highlight the vast spectrum
of associations that marine invertebrates form with microbial communities in terms of
diversity, composition, and stability (Fig. 1).
While previous research has provided a good understanding of the composition of
marine invertebrate microbiomes, our understanding of how the microbiome interacts
with the host, and of the potential to coevolve, is far more limited. Moreover, the
FIG 1 Spectrum of microbial diversity associated with different compartments of marine invertebrates. Microbial associations may involve a single symbiont
in a specialized organ or over 1,000 operational taxonomic units (OTUs) associated with tissues. The levels of OTUs reported in the figure represent the highest
recorded in the referenced study for that species. Reported levels of diversity may differ significantly within the same species across different studies.
Minireview ®
January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 3
D
ow
nl
oa
de
d
fr
om
h
tt
ps
:/
/j
ou
rn
al
s.
as
m
.o
rg
/j
ou
rn
al
/m
bi
o
on
0
2
F
eb
ru
ar
y
20
22
b
y
24
.1
16
.2
51
.2
22
.
https://mbio.asm.org
increasing number of studies generating tremendous volumes of host-associated
microbiome sequence data requires theoretical development to interpret these rela-
tionships. Coevolved microbial symbionts are presumed to be intimately linked with
host fitness and metabolism (36); therefore, understanding these relationships in
marine invertebrates will have direct implications for health and disease processes in
these animals. Three research criteria arise for examining coevolution in marine inver-
tebrates: (i) identifying stochastic and deterministic microbial components of the
microbiome, (ii) assessing codivergence of host and microbe, and (iii) confirming an
intimate association between host and microbe related to shared metabolic function
(metabolic collaboration). While each of these criteria may be fulfilled without the
involvement of coevolution (26, 37, 38), evidence of their existence in combination
provides a strong basis for establishing coevolution patterns (Fig. 2). This review
positions these three criteria in coevolution as representing a complementary approach
to the study of complex marine invertebrate microbiomes by drawing from examples
of model systems. Focussing on keystone coral reef invertebrates, this review also
evaluates the current evidence for each criterion. Finally, while parasites and pathogens
also contribute to host coevolution, the focus of this review is mutualistic symbionts;
thus, pathogens and parasitism are not discussed.
BOX 2: GLOSSARY
(i) Codivergence. Two organisms which speciate or diverge in parallel as illus-
trated by topological congruency of phylogenetic trees.
(ii) Coevolution. Reciprocal adaptation of one (or more) lineage(s) in response to
another (or others).
(iii) Holobiont. A host organism and its associated microbial community.
(iv) Hologenome. The collective genomes of a host and its associated microbial
community, which may act as a unit of selection or at discrete levels.
(v) Metabolic collaboration. Two or more oganisms that are linked through
metabolic interactions, generally to the benefit of one another.
(vi) Metagenome. The collective microbial genes recovered from an environmen-
tal sample, usually predominantly prokaryotic.
(vii) Metatranscriptomics. Quantification of the total microbial mRNA in a sample
as an indication of gene expression and active microbial functions.
(viii) Microbiome. The total genetic make-up of a microbial community associated
with a habitat.
(ix) Microbiota. The community of microorganisms residing in a particular habitat,
usually a host organism.
(x) Phylosymbiosis. The rentention of a host phylogenetic signal within its
associated microbial community.
(xi) Virome. The total viral genetic content recovered from an environmental
sample.
UNTANGLING PATTERNS OF HOST-MICROBE COEVOLUTION IN A WEB OF
MICROBES
(i) Phylosymbiosis and neutral theory—identifying stochastic and determinis-
tic components of the microbiome. Host-microbe coevolution may occur to some
degree at the level of the hologenome, i.e., reciprocal evolution of the host genome
and microbiome (12). Therefore, it is necessary to understand microbial community
structure and population dynamics within the host environment. This may illustrate (i)
that the microbiome associated with a host is structured through phylogenetically
related host traits and may therefore retain a host phylogenetic signal (phylosymbiosis)
and (ii) that certain microbes deviate from the expected patterns of neutral population
dynamics, i.e., stochastic births and deaths and immigration. It is likely that phylosym-
biosis and neutral population dynamics are linked; therefore, their potential to con-
tribute to coevolution is discussed together.
Minireview ®
January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 4
D
ow
nl
oa
de
d
fr
om
h
tt
ps
:/
/j
ou
rn
al
s.
as
m
.o
rg
/j
ou
rn
al
/m
bi
o
on
0
2
F
eb
ru
ar
y
20
22
b
y
24
.1
16
.2
51
.2
22
.
https://mbio.asm.org
Homocysteine + Serine
(host diet & metabolism)
Cystathionine B-synthase
(symbiont enzyme)
Cystathionine
Cystathionine y-lyase
(host enzyme)
Cysteine
S-H
CH2
C COOH H2N
H
Bacteria spp. 1
Host spp.
A
d)
Host phylogeny Microbial dendrogram
Host species
A
B
C
D
a)
b)
Host phylogeny Microbial phylogeny
Bacteria spp. 1
A
B
C
D
Relative abundance of
microbes in host sample
Fr
eq
ue
nc
y
of
m
ic
ro
be
s
in
h
os
t Bacteria spp. 1
Bacteria spp.2
Low High
0
1
c)
FIG 2 Hypothetical scenario addressing three criteria for host-microbe coevolution in species A to D. (a) Phylosymbiosis
shown through hierarchical clustering of the microbial community, resulting in a microbial dendrogram which mirrors host
phylogeny. (b) Neutral model showing the expected occurrence of microbes based on neutral population dynamics (blue
line). As the relative abundance increases, so too does the occurrence in host samples. The members of bacterial species
group 1 (Bacteria spp. 1) are therefore more abundant than would be expected by chance and may indicate active selection,
while the members of Bacteria spp. 2 are less abundant. (c) Codivergence of the members of Bacteria spp. 1 with their hosts.
The members of Bacteria spp. 1 are found within the microbial community of each host species and appear to be actively
selected for. Their phylogeny indicates a host split at the strain level followed by diversification within each host species.
Congruence between host and microbial lineages suggests important host-microbe interactions and warrants further
investigation. (d) Metabolic collaboration between the members of Host spp. A and those of Bacteria spp. 1. Fluorescence
in-situ hybridization (FISH) confirms that the members of Bacteria spp. 1 are located within bacteriocyte cells in the tissues
of Host spp. A. Genome and transcriptome data for each species suggest that the amino acid cysteine is produced by the
activity of a metabolic pathway shared between host and microbe. In corals of the genus Acropora, for example, the
genome is incomplete with respect to biosynthesis of cysteine and represents a potential pathway for collaborations of host
and microbe (101). Hypothetically, the amino acids homocysteine and serine (potentially sourced from host diet and
metabolism) are combined to form cystathionine through the enzyme cystathionine V synthase (provided by the host’s
endosymbiont). The host enzyme cystathionine �-lyase then breaks down cystathionine to form cysteine.
Minireview ®
January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 5
D
ow
nl
oa
de
d
fr
om
h
tt
ps
:/
/j
ou
rn
al
s.
as
m
.o
rg
/j
ou
rn
al
/m
bi
o
on
0
2
F
eb
ru
ar
y
20
22
b
y
24
.1
16
.2
51
.2
22
.
https://mbio.asm.org
The term “phylosymbiosis” is not intended to imply coevolution (12, 38); however,
coevolution of a host and microbiome may reinforce patterns of phylosymbiosis. There
are many host traits that correlate with host phylogeny, some of which can act as
environmental filters, preventing the establishment of microbes in the host environ-
ment. Thus, neutral population dynamics, with host traits acting as an ecological filter
to microbial immigration, may be sufficient to result in phylosymbiotic patterns (39, 40).
However, host traits are not static; thus, the evolution of these microbial niches may
further drive the radiation of the microbes that reside within them. In turn, the
continuous colonization over many generations of a microbial community likely adds
to the selective pressure on host traits. Therefore, ecological filtering of microbes
through host traits and coevolution of a host and microbiome need not be mutually
exclusive in the appearance of phylosymbiosis (39). Moreover, assessing patterns of
phylosymbiosis and neutral population dynamics also allows the detection of microbes
that deviate from these patterns and may identify important microbial species that are
actively selected for (or against) by the host. In this context, neutral models can
simulate expected microbial abundance, allowing easier detection of microbes that do
not fit these patterns (41). This reasoning justifies consideration of phylosymbiosis and
microbial population dynamics in assessing coevolution in complex holobionts.
Patterns of phylosymbiosis are frequently detected in complex holobionts. One
particular study tested for phylosymbiosis across 24 species of terrestrial animals from
4 groups that included Peromyscus deer mice, Drosophila flies, mosquitos, and Nasonia
wasps and an additional data set of 7 hominid species (42). Since these animals (with
the exception of hominids) could be reared under controlled laboratory conditions,
environmental influences could be eliminated, leaving the host as the sole factor
influencing the microbial community. Under these conditions, phylosymbiotic patterns
were clearly observed for all five groups, with phylogenetically related taxa sharing
similar microbial communities and microbial dendrograms mirroring host phylogenies.
Similar patterns of phylosymbiosis have been observed in a growing number of
terrestrial systems, including all five gut regions in rodents (43), the skin of ungulates
(44), the distal gut in hominids (45), and roots of multiple plant phyla (46), providing
evidence that such patterns are common among host-associated microbiomes.
In the marine environment, two major studies, one involving 236 colonies across 32
genera of scleractinian coral collected from the east and west coasts of Australia (47)
and the other involving 804 samples of 81 sponge species collected from the Atlantic
Ocean, Pacific Ocean, and Indian Ocean and the Mediterranean Sea and Red Sea (32),
have provided the most convincing examples of phylosymbiosis. Both studies found a
significant evolutionary signal of the host with respect to microbial diversity and
composition. Specifically, mantel tests were used to delineate the finding that closely
related corals and sponges hosted more extensively similar microbial communities in
terms of composition than would be expected by chance. In the case of corals, the
similarity was seen in the skeleton and, to a lesser extent, in the tissue microbiome,
while the mucus microbiome was more highly influenced by the surrounding environ-
ment (47). However, both studies found that host species was the strongest factor in
explaining dissimilarity among microbial communities. Additional studies on both cold
water and tropical sponges have found similar phylogenetic patterns within the
microbiome of the host species (48, 49). Together, these results suggest that host
phylogeny (or associated traits) has a significant role in structuring associated microbial
communities, although there are additional factors related to host identity (and unre-
lated to phylogeny) that also likely play a major role.
Most studies to date have focused on the microbes that adhere to these patterns of
phylosymbiosis, though more-useful information arguably could be determined from
the microbes that do not. Since phylosymbiosis is a pattern that shows correlations
between microbiome dissimilarity and host phylogeny, it does not indicate active
microbial selection or cospeciation (38), and the species that deviate from these
patterns would be interesting targets for studies of codivergence and metabolic
collaboration (see below). Neutral models have been applied to three species of
Minireview ®
January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 6
D
ow
nl
oa
de
d
fr
om
h
tt
ps
:/
/j
ou
rn
al
s.
as
m
.o
rg
/j
ou
rn
al
/m
bi
o
on
0
2
F
eb
ru
ar
y
20
22
b
y
24
.1
16
.2
51
.2
22
.
https://mbio.asm.org
sponges, a jellyfish, and a sea anemone, and while neutral models have been shown to
fit well to the expectation of microbial abundance in sponges (which also show
phylosymbiosis), jellyfish and sea anemone microbiomes were found to be associated
with a higher level of nonneutrality (40). Potential reasons for nonneutrality include the
presence of a more sophisticated immune system in cnidarians that provides active
selection on certain microbial taxa and that the microbiomes in such cases are more
transient or a combination of the two. In summary, neutral population dynamics filtered
through phylogenetically related host traits likely result in, or at least contribute to, the
observed patterns of phylosymbiosis. This does not necessarily mean that the pattern
is unimportant or is not contributing to coevolution at the hologenome level, and it
may be that the communities of microbes that follow these patterns are responsible for
broad ecological functions (50). On the other hand, microbes that deviate from these
patterns may be responsible for more-specific functions and are of high interest to
those trying to identify symbionts and coevolution at the microbial species or strain
level.
(ii) Codivergence—microbial phylogeny and host phylogeny are congruent.
The second criterion in assessing host-microbe coevolution is that of whether individ-
ual microbial lineages and their hosts have matching phylogenies (22, 24, 51). Codi-
vergence implies a tightly coupled, long-term interaction between two species and can
potentially identify beneficial symbionts (or parasites) that have coevolved with the
host (26). However, it is also important that codivergence can arise due to processes
other than coevolution, such as one species adaptively tracking another, which would
imply that the evolution is not reciprocal, or two species responding independently to
the same speciation event or environmental stress (37). In known cases of coevolution,
phylogenies of hosts and their microbial symbionts are congruent (16, 51, 52). However,
in complex and uncharacterized systems, this strategy can be reversed to identify
potential symbionts. Therefore, the main value of investigating codivergence in com-
plex associations is to identify those specific microbes on which to focus further
attention.
Codivergence has been demonstrated in the case of Hydra viridissima, a freshwater
relative of marine cnidarians, and its photosymbiont Chlorella (53). In this system,
photosynthetically fixed carbohydrates from Chlorella are transported to its host (54),
and phylogenetic analysis of 6 strains of H. viridissima and their vertically transmitted
symbionts revealed clear congruency of host and symbiont topologies (55). In more-
complex systems, patterns of codivergence have been illustrated in the gut microbiota
of hominids (25). Analysis of fecal samples from humans, wild chimpanzees, wild
bonobos, and wild gorillas showed that four clades of bacteria from the dominant
families Bacteroidaceae and Bifidobacteriaceae codiverged with host phylogeny. Impor-
tantly, this example illustrates one possible way of identifying codivergence in complex
holobionts where the symbionts are unknown. Since bacteria from the families Bacte-
roidaceae, Bifidobacteriaceae, and Lachnospiraceae are known to dominate the gut of
hominids, multiple primer sets targeting each individual family were utilized, and
phylogenetic analyses of the families were completed independently. Furthermore,
instead of using the relatively slowly diverging 16S rRNA gene, the fast-evolving and
variable gene encoding DNA gyrase subunit B was used for bacterial phylogenetics.
Similar methods may be applied to complex marine invertebrates such as coral and
sponges, where 16S rRNA gene studies have identified prominent bacteria.
Within complex marine invertebrate holobionts, codivergence has been most clearly
demonstrated in cold-water sponges in the family Latrunculiidae. The microbiomes of
six species within this family were dominated by a single betaproteobacterial OTU, and
the phylogeny of this OTU was highly congruent with that of the host (56). Further-
more, gene expression analysis suggested that the dominant betaproteobacteria are
active members of the microbiome rather than dormant or nonviable members;
however, whether or not this potential symbiont and its host participate in metabolic
collaboration is unknown, highlighting an example warranting further investigation.
The microbiomes of many other marine invertebrates are dominated by members of
Minireview ®
January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 7
D
ow
nl
oa
de
d
fr
om
h
tt
ps
:/
/j
ou
rn
al
s.
as
m
.o
rg
/j
ou
rn
al
/m
bi
o
on
0
2
F
eb
ru
ar
y
20
22
b
y
24
.1
16
.2
51
.2
22
.
https://mbio.asm.org
the genus Endozoicomonas (57). A pan-genomic analysis of the genomes of seven
Endozoicomonas strains representing a broad range of hosts (corals, sponges, and sea
slugs) provided some evidence for codivergence (58). Strikingly, the two closely related
corals Stylophora pistillata and Pocillopora verrucosa hosted Endozoicomonas with
highly similar genomes. A second, large-scale study (47) found that Endozoicomonas
species within the coral tissues showed strong signals of codivergence with their hosts;
however, they were grouped into two major divisions, namely, the host-specific and
host-generalist divisions. The presence of a host-generalist clade may partly explain
why the patterns of codivergence did not hold when samples of S. pistillata and P.
verrucosa were collected across 28 reefs worldwide (59). Furthermore, the genome of
Endozoicomonas is large and appears to be adapted to a planktonic lifestyle (57).
Having a free-living stage with respect to the Endozoicomonas life cycle suggests a
facultative relationship with corals and would limit the extent of codivergence.
Codivergence may also occur between two symbionts within the microbial com-
munity associated with a single host. An interesting example occurs in lower termites,
which live in a symbiotic relationship with flagellate protozoa that are essential for the
breakdown of lignocellulose obtained from wood particles (60). Within the hindgut,
these flagellate protozoa are associated with endosymbiotic prokaryotes, and while the
functional basis of this relationship is unclear, matching phylogenies of flagellate host
and prokaryote symbiont indicate codivergence (61). The microbiomes of many marine
invertebrates also include both eukaryotes and prokaryotes that appear to closely
interact with one another. For example, the symbiotic algae Symbiodiniaceae, which
reside in the endoderm of the coral tissue, are producers of dimethylsulfoniopropionate
(DMSP), which is thought to be metabolized by bacteria within the holobiont (62).
Symbiodiniaceae and bacteria are also linked through the nitrogen cycle, where
diazotrophs within the holobiont are postulated to fix nitrogen such that it can be used
by the endosymbiotic algae (63, 64). Furthermore, the existence of a core microbiome
associated with Symbiodiniaceae appears likely, with bacteria affiliated to Marinobacter,
Labrenzia, and Chromatiaceae present across 18 cultures of Symbiodiniaceae spanning
5 genera (65). A range of other marine invertebrates, including soft corals, sponges, and
molluscs, also host Symbiodiniaceae, and it would be valuable to investigate whether
Symbiodiniaceae show codivergence and coevolution with prokaryotes in these sys-
tems.
(iii) Metabolic collaboration—intimate association between host and microbe.
A third key feature of coevolution is that host and microbe collaborate in a way that is
mutually beneficial (15). This is often related to the metabolic function of the microbe,
with the host facilitating or complementing that function. This could be in the form of
a specialized cell or organ to host microbial symbionts (27), a shared metabolic
pathway to produce essential vitamins or amino acids (17), or microbial regulation of
certain metabolites produced by the host (19). Metabolic collaboration should be
validated where potential candidates for coevolution have been identified through
population dynamics and codivergence, as reciprocal evolution necessitates an inter-
action between the two species. A key step in demonstrating an interaction, and
therefore identifying potential reciprocal evolution, is to look at the genome and
transcriptomes of the host and symbionts for evidence of integrated metabolism,
combined with targeted in situ visualization of metabolite passage to support the
metabolic collaboration.
Sharpshooters, a group of xylem-feeding insects, provide an elegant example of
metabolic collaboration between a host and bacterial symbionts. Sharpshooters host
two microbial symbionts, Baumannia cicadellinicola and Sulcia muelleri, in their special-
ized bacteriocyte cells (36), and both symbionts show patterns of codivergence with
their host (66). The genomes of B. cicadellinicola and S. muelleri predict the synthesis of
vitamins and essential amino acids, respectively, which are deficient in the diet of
sharpshooters (23). Furthermore, these two symbionts not only appear to complement
each other in terms of their roles in supplementing the host diet, but each symbiont
also appears dependent on the other. Circumstantial evidence suggests that similar
Minireview ®
January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 8
D
ow
nl
oa
de
d
fr
om
h
tt
ps
:/
/j
ou
rn
al
s.
as
m
.o
rg
/j
ou
rn
al
/m
bi
o
on
0
2
F
eb
ru
ar
y
20
22
b
y
24
.1
16
.2
51
.2
22
.
https://mbio.asm.org
functional relationships may exist among marine invertebrates, and the characteriza-
tion of these should be a high priority.
Some examples of metabolic collaboration in complex marine invertebrate holo-
bionts are represented by sponges. Genome and transcriptome data from Cymbastela
concentrica and two of its bacterial symbionts (novel genomes of the Phyllobacteriaceae
and Nitrosopumilales) suggest that creatine and creatinine produced by sponge me-
tabolism are likely to be degraded to the amino acid glycine by its symbionts (67).
Furthermore, gene expression data suggest that the urea produced by creatine deg-
radation by the Phyllobacteriaceae symbiont may be transported and degraded by a
third bacterial symbiont in the genus Nitrospira (67). The potential for metabolic
collaboration also exists between the sponge Theonella swinhoei and its symbiont
belonging to “Candidatus Entotheonella.” The genome of “Ca. Entotheonella” possesses
the repertoire for production of almost all amino acids as well as rare coenzymes;
however, additional research is needed to understand if these products are used by the
host (68). While the following does not constitute metabolic collaboration, sponge
symbionts also appear to interact with their host through eukaryote-like proteins (ELPs).
For example, microbial symbionts associated with different sponges often contain
genes coding for ELPs, some of which are phylogenetically similar to those found in
sponges and appear to inhibit phagocytosis (69, 70). Furthermore, additional functional
domains associated with ELPs suggest that these proteins are transported to the outer
membrane, where they are maintained and potentially used in bacterium-host inter-
actions (71). A symbiosis maintained through host-bacterium interactions such as this
emphasizes the potential for coevolution to take place, although it does not in itself
demonstrate reciprocal evolution. Finally, characterizations based on metagenomic and
metatranscriptomic data sets require functional validation using techniques such as
stable isotope probing (SIP) (for a review, see reference 72). For example, using 14C- and
13C-labeled bicarbonate in combination with autoradiography and nanoscale second-
ary ion mass spectrometry (nanoSIMS), symbionts of the colonial ciliate Zoothamnium
niveum were shown to fix inorganic carbon and translocate organic carbon to its host
(73). In the advent of new technology associated with SIP, future research would benefit
from validating the putative microbial functions implied by genomic research.
A core microbial community, i.e., one that has high intraspecies stability, is often the
primary focus of microbial ecologists trying to distinguish functionally important taxa
from commensals or short-term visitors (74). While a few bacterial lineages have been
shown to occur across a large number of corals and other invertebrate species (57, 75),
evidence of the existence of a defined and stable core community remains elusive.
From a taxonomic perspective, a core community may not exist; instead, a core
functional capacity may exist across diverse lineages. In marine sponges, for example,
different host species associate with different symbionts that perform equivalent
functions (95). Namely, host-specific microbes among different sponge species appear
to use different enzymes to perform the same functions in processes such as denitri-
fication and ammonium oxidation. However, functional redundancy in microbial eco-
systems may not be as common as previously thought, as rare microbial phylotypes
have been implicated in specific microbial pathways, while more-abundant phylotypes
are positively correlated with broader metabolic functions such as respiration (50). This
may have important implications in looking at neutral population dynamics, as those
rare taxa that are present more often than expected could be responsible for key
microbial functions. The existence of a core community would have obvious implica-
tions for coevolution, as universally associated microbes are more likely to have
coevolved with their host. If present, reconstruction of phylogenetic relationships of
core taxa can illustrate whether microbes also diverge in parallel with their host, leading
to further investigations that utilize integrated genomic techniques to identify core
functional genes and pathways.
While research on the microbiome of marine invertebrates has focused mostly on
Minireview ®
January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 9
D
ow
nl
oa
de
d
fr
om
h
tt
ps
:/
/j
ou
rn
al
s.
as
m
.o
rg
/j
ou
rn
al
/m
bi
o
on
0
2
F
eb
ru
ar
y
20
22
b
y
24
.1
16
.2
51
.2
22
.
https://mbio.asm.org
prokaryotes and microbial eukaryotes (box 3), there is increasing recognition of the
importance of viruses as components of the holobiont, adding to the complexity of an
already challenging system (76). Viruses are the most abundant biological entities in the
oceans (77) and are likely to play important roles in host-microbe coevolution, as
bacteria commonly acquire genes for symbiosis or pathogenicity through lateral gene
transfer from viruses (78). For example, the bacterium Hamiltonella defensa is a com-
mon symbiont of aphids providing defense against wasp parasitism. However, toxin-
encoding genes required for aphid protection occur only after infection from a lyso-
genic lambdoid bacteriophage (79). Thus, it is feasible that coevolution of host and
symbiont can be made possible through the initial acquisition of symbiont genes from
viruses. Furthermore, viruses structure bacterial communities through processes such
as cell lysis, thereby adding another form of selective pressure to invertebrate holo-
bionts (80). A recent study found that viral communities of corals and sponges are
specific to their host species and are distinct from the viral communities inhabiting the
surrounding seawater (81). Viruses of the order Caudovirales (tailed bacteriophages)
were found across all viromes in the study, often as the dominant member; thus, a
host-specific virome combined with a host-specific microbiome could be associated
with viral selection and pressure. As a result, by influencing microbial community
structure, viruses can have major effects on coevolution within the holobiont. The
extent to which viruses influence marine invertebrate holobionts is still unknown;
however, future research on reef holobionts would benefit from including analyses of
both the viral and prokaryotic communities.
BOX 3: SYMBIODINIACEAE—AN OBLIGATE SYMBIONT AND A COEVOLVED
PARTNER?
Dinoflagellates from the family Symbiodiniaceae (see reference 82 for revised
taxonomy) are common symbionts of many different marine invertebrates, including
cnidarians, sponges, molluscs, and protozoans (83). These photosynthetic dinofla-
gellates provide their host with fixed carbon and in return gain inorganic nutrients
and a suitable living environment, creating a remarkable symbiosis that is respon-
sible for the foundation of coral reef ecosystems (83, 84). The symbiotic lifestyle often
leads to a reduction of genome size, and, although the genomes of Symbiodiniaceae
are large by comparison with those of many other eukaryotic microbes, they are
among the smallest for dinoflagellates. The relatively small genomes typical of the
Symbiodiniaceae suggest some degree of adaptation to life inside the host (71),
despite the fact that many members of this family are known to have a free-living
stage (68, 69). An important exception to this life cycle is the dinoflagellate formerly
known as clade C15, which is vertically transmitted in coral hosts, and culturing
experiments suggested that it is unlikely that the strain can survive outside the host
environment (85). Moreover, this symbiont appears to have lost its genomic poten-
tial for motility, representing a likely adaptation to life inside a host (85).
Illustrating reciprocal adaptation of one lineage in response to another is extremely
challenging in complex symbiotic systems. While meeting the basic criteria set out in
this review does not prove coevolution, it would provide support for the idea of
coevolution in host-microbe systems where little is known about the evolutionary
origins. In doing so, it is also likely that obligate microbes can be differentiated from
transient members of the holobiont. Many factors need to be considered, including
common ancestry, the origins of the host-microbe association, and the estimated times
of divergence. The butterfly-plant example (box 1) highlights the necessity to distin-
guish the possibility of microbes colonizing their host after host evolution has taken
place. In the case of the aphid-Buchnera symbiosis, the origin of infection has been
dated at 150 to 250 million years ago (MYA), when aphids first diverged from a
common ancestor, and Buchnera form a monophyletic group that is exclusively asso-
Minireview ®
January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 10
D
ow
nl
oa
de
d
fr
om
h
tt
ps
:/
/j
ou
rn
al
s.
as
m
.o
rg
/j
ou
rn
al
/m
bi
o
on
0
2
F
eb
ru
ar
y
20
22
b
y
24
.1
16
.2
51
.2
22
.
https://mbio.asm.org
ciated with aphids (16, 36). Within hominids, divergence times were calculated for gut
bacteria that show codivergence with their host and were found to coincide with host
evolution. Furthermore, the hominid-microbe association appears to have arisen from
a common ancestor of all African great apes.
Vertical versus horizontal microbial acquisition may also influence patterns of
evolution and should be considered within any study on host-microbe coevolution.
Generally speaking, microbes that are acquired vertically, i.e., passed from parent to
offspring, are more likely to have coevolved with their host. This is the case for many
insect endosymbionts, and their loss of a free-living stage and their subsequent
adaptation to the host environment determined many of the coevolution signals
previously detailed (23, 36, 86). For example, Buchnera endosymbionts have been
passed from parent to offspring for over 100 million years and, as the endosymbiont
evolved, it lost many genes required for life outside the host (16). Such patterns may
be far more difficult to observe in microbes acquired from the environment (horizontal
transmission). Codiversification is more difficult to detect in horizontally acquired
symbionts, as the selection pressures include environmental forces that act in concert
with the host-imposed pressures. Invertebrates such as cnidarians and sponges can
acquire microbial symbionts through both vertical transmission and horizontal trans-
mission (87–91), and focusing initially on vertically transmitted microbes would simplify
the search for coevolutionary signals.
Consideration of genetic markers and key traits of symbiosis could also be useful for
identifying potentially coevolved symbionts. For example, many vertically transmitted
endosymbionts have reduced genome sizes compared to their free-living relatives,
since many genes may become redundant during adaptation to the host environment
(36, 86). Some microbial symbionts are also housed in bacteriocytes or other specialized
compartments, and microbial aggregates resembling such associations have been
detected in both corals and sponges (102, 103). Microbes housed in these specialized
cells represent priority candidates in the search for coevolved relationships. Other
trends, such as lower G�C content, high isoelectric point values, and proteins that are
quickly evolving relative to those seen with free-living bacteria, are all features of insect
endosymbionts (23). Exploring these traits in more-complex systems may also have
some utility in the search for coevolved symbionts. Furthermore, observing support for
host-symbiont coevolution may require careful choices of appropriate genetic markers
due to different divergence rates. In particular, it has been suggested that immune
genes should be targeted as they are rapidly evolving and likely to directly influence
the microbial community (92). Additionally, unresolved host and microbe genealogies
may further confuse patterns of host-microbe coevolution; thus, robust phylogenetic
trees and markers are critical to illustrate codivergence.
To begin investigating host-microbe coevolution in complex holobionts, it may be
useful to unify studies by investigating a number of model organisms. Marine sponges
present an ideal starting point for investigating coevolution in complex systems for a
variety of reasons. First, they may represent the earliest animal lineage to have diverged
and they host highly stable microbial communities, increasing the likelihood of discov-
ering coevolved symbionts. Second, metagenomic analyses in sponges are currently
better developed than in other marine invertebrates with complex microbiomes,
providing a solid platform with which to investigate coevolution. Third, some evidence
of coevolution already exists, with sponges exhibiting codivergence and metabolic
collaboration and some species hosting microbial cells within bacteriocytes. However,
as yet, no research has traced all the aforementioned traits to a single holobiont
species.
In this era of climate change and environmental degradation heavily impacting
marine ecosystems (93, 94), there is an urgent need to better understand the microbial
processes that underpin invertebrate health and evolution. Following the criteria set
out in the review will not only enable exploration of evidence for coevolution but also
provide a better understanding of how microbial communities are structured and
Minireview ®
January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 11
D
ow
nl
oa
de
d
fr
om
h
tt
ps
:/
/j
ou
rn
al
s.
as
m
.o
rg
/j
ou
rn
al
/m
bi
o
on
0
2
F
eb
ru
ar
y
20
22
b
y
24
.1
16
.2
51
.2
22
.
https://mbio.asm.org
identify potentially beneficial symbionts which can be targeted using genomic tech-
niques to elucidate their specific roles within the holobiont.
We thank Hillary Smith (James Cook University [JCU]) for helpful contributions to
figure preparation and Nikolaos Andreakis (JCU) for helpful comments on the manu-
script outline. We also thank Pedro Frade (University of Algarve) for insightful discus-
sions on the manuscript.
P.A.O. conceived, designed, and drafted the manuscript and figures. D.G.B., N.S.W.,
and D.J.M. revised the manuscript and made substantial contributions to its design and
intellectual content. D.G.B. contributed to figure conception and preparation.
We declare that the review was conducted in the absence of any commercial or
financial relationships that could be construed as a potential conflict of interest.
1. Zaneveld J, Turnbaugh PJ, Lozupone C, Ley RE, Hamady M, Gordon JI,
Knight R. 2008. Host-bacterial coevolution and the search for new drug
targets. Curr Opin Chem Biol 12:109 –114. https://doi.org/10.1016/j
.cbpa.2008.01.015.
2. Van den Abbeele P, Van de Wiele T, Verstraete W, Possemiers S. 2011.
The host selects mucosal and luminal associations of coevolved gut
microorganisms: a novel concept. FEMS Microbiol Rev 35:681–704.
https://doi.org/10.1111/j.1574-6976.2011.00270.x.
3. Archibald JM. 2015. Endosymbiosis and eukaryotic cell evolution. Curr
Biol 25:R911–R921. https://doi.org/10.1016/j.cub.2015.07.055.
4. McFall-Ngai M, Hadfield MG, Bosch TCG, Carey HV, Domazet-Lošo T,
Douglas AE, Dubilier N, Eberl G, Fukami T, Gilbert SF, Hentschel U, King
N, Kjelleberg S, Knoll AH, Kremer N, Mazmanian SK, Metcalf JL, Nealson
K, Pierce NE, Rawls JF, Reid A, Ruby EG, Rumpho M, Sanders JG, Tautz
D, Wernegreen JJ. 2013. Animals in a bacterial world, a new imperative
for the life sciences. Proc Natl Acad Sci U S A 110:3229 –3236. https://
doi.org/10.1073/pnas.1218525110.
5. Friedman WE. 2009. The meaning of Darwin’s ‘abominable mystery’.
Am J Bot 96:5–21. https://doi.org/10.3732/ajb.0800150.
6. Ehrlich PR, Raven PH. 1964. Butterflies and plants: a study in coevolu-
tion. Evolution 18:586 – 608. https://doi.org/10.2307/2406212.
7. Janz N, Nylin S. 1998. Butterflies and plants: a phylogenetic study. Evolu-
tion 52:486 –502. https://doi.org/10.1111/j.1558-5646.1998.tb01648.x.
8. Ryan MF, Byrne O. 1988. Plant-insect coevolution and inhibition of
acetylcholinesterase. J Chem Ecol 14:1965–1975. https://doi.org/10
.1007/BF01013489.
9. Van Valen L. 1974. Molecular evolution as predicted by natural selec-
tion. J Mol Evol 3:89 –101. https://doi.org/10.1007/BF01796554.
10. Paterson S, Vogwill T, Buckling A, Benmayor R, Spiers AJ, Thomson NR,
Quail M, Smith F, Walker D, Libberton B, Fenton A, Hall N, Brockhurst
MA. 2010. Antagonistic coevolution accelerates molecular evolution.
Nature 464:275–278. https://doi.org/10.1038/nature08798.
11. Herre EA, Knowlton N, Mueller UG, Rehner SA. 1999. The evolution of
mutualisms: exploring the paths between conflict and cooperation.
Trends Ecol Evol 14:49 –53. https://doi.org/10.1016/S0169-5347(98)
01529-8.
12. Theis KR, Dheilly NM, Klassen JL, Brucker RM, Baines JF, Bosch TCG,
Cryan JF, Gilbert SF, Goodnight CJ, Lloyd EA, Sapp J, Vandenkoorn-
huyse P, Zilber-Rosenberg I, Rosenberg E, Bordenstein SR. 2016. Getting
the hologenome concept right: an eco-evolutionary framework for
hosts and their microbiomes. mSystems 1:e00028-16. https://doi.org/
10.1128/mSystems.00028-16.
13. Bordenstein SR, Theis KR. 2015. Host biology in light of the microbiome:
ten principles of holobionts and hologenomes. PLoS Biol 13:e1002226.
https://doi.org/10.1371/journal.pbio.1002226.
14. Zilber-Rosenberg I, Rosenberg E. 2008. Role of microorganisms in the
evolution of animals and plants: the hologenome theory of evolution.
FEMS Microbiol Rev 32:723–735. https://doi.org/10.1111/j.1574-6976
.2008.00123.x.
15. Wilson ACC, Duncan RP. 2015. Signatures of host/symbiont genome
coevolution in insect nutritional endosymbioses. Proc Natl Acad Sci
U S A 112:10255–10261. https://doi.org/10.1073/pnas.1423305112.
16. Baumann P, Moran NA, Baumann L. 1997. The evolution and genetics
of aphid endosymbionts. Bioscience 47:12–20. https://doi.org/10.2307/
1313002.
17. Russell CW, Bouvaine S, Newell PD, Douglas AE. 2013. Shared metabolic
pathways in a coevolved insect-bacterial symbiosis. Appl Environ Mi-
crobiol 79:6117– 6123. https://doi.org/10.1128/AEM.01543-13.
18. Collins SM, Surette M, Bercik P. 2012. The interplay between the
intestinal microbiota and the brain. Nat Rev Microbiol 10:735–742.
https://doi.org/10.1038/nrmicro2876.
19. Kennedy PJ, Cryan JF, Dinan TG, Clarke G. 2017. Kynurenine pathway
metabolism and the microbiota-gut-brain axis. Neuropharmacology
112:399 – 412. https://doi.org/10.1016/j.neuropharm.2016.07.002.
20. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T,
Pons N, Levenez F, Yamada T, Mende DR, Li J, Xu J, Li S, Li D, Cao J,
Wang B, Liang H, Zheng H, Xie Y, Tap J, Lepage P, Bertalan M, Batto JM,
Hansen T, Le Paslier D, Linneberg A, Nielsen HB, Pelletier E, Renault P,
Sicheritz-Ponten T, Turner K, Zhu H, Yu C, Li S, Jian M, Zhou Y, Li Y,
Zhang X, Li S, Qin N, Yang H, Wang J, Brunak S, Doré J, Guarner F,
Kristiansen K, Pedersen O, Parkhill J, Weissenbach J, et al. 2010. A
human gut microbial gene catalogue established by metagenomic
sequencing. Nature 464:59 – 65. https://doi.org/10.1038/nature08821.
21. Brune A, Dietrich C. 2015. The gut microbiota of termites: digesting the
diversity in the light of ecology and evolution. Annu Rev Microbiol
69:145–166. https://doi.org/10.1146/annurev-micro-092412-155715.
22. Fenn K, Blaxter M. 2004. Are filarial nematode Wolbachia obligate
mutualist symbionts? Trends Ecol Evol 19:163–166. https://doi.org/10
.1016/j.tree.2004.01.002.
23. Wu D, Daugherty SC, Van Aken SE, Pai GH, Watkins KL, Khouri H, Tallon
LJ, Zaborsky JM, Dunbar HE, Tran PL, Moran NA, Eisen JA. 2006.
Metabolic complementarity and genomics of the dual bacterial sym-
biosis of sharpshooters. PLoS Biol 4:e188. https://doi.org/10.1371/
journal.pbio.0040188.
24. Clark MA, Moran NA, Baumann P, Wernegreen JJ. 2000. Cospeciation
between bacterial endosymbionts (Buchnera) and a recent radiation
of aphids (Uroleucon) and pitfalls of testing for phylogenetic con-
gruence. Evolution 54:517–525. https://doi.org/10.1111/j.0014-3820
.2000.tb00054.x.
25. Moeller AH, Caro-Quintero A, Mjungu D, Georgiev AV, Lonsdorf EV,
Muller MN, Pusey AE, Peeters M, Hahn BH, Ochman H. 2016. Cospecia-
tion of gut microbiota with hominids. Science 353:380 –382. https://doi
.org/10.1126/science.aaf3951.
26. Moran NA. 2006. Symbiosis. Curr Biol 16:R866 –R871. https://doi.org/10
.1016/j.cub.2006.09.019.
27. McFall-Ngai M. 2008. Hawaiian bobtail squid. Curr Biol 18:R1043–R1044.
https://doi.org/10.1016/j.cub.2008.08.059.
28. Chen C, Tseng C, Chen CA, Tang S. 2011. The dynamics of microbial
partnerships in the coral Isopora palifera. ISME J 5:728 –740. https://doi
.org/10.1038/ismej.2010.151.
29. Gil-Agudelo DL, Myers C, Smith GW, Kim K. 2006. Changes in the
microbial communities associated with Gorgonia ventalina during
aspergillosis infection. Dis Aquat Organ 69:89 –94. https://doi.org/10
.3354/dao069089.
30. Koren O, Rosenberg E. 2006. Bacteria associated with mucus and
tissues of the coral Oculina patagonica in summer and winter. Appl
Minireview ®
January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 12
D
ow
nl
oa
de
d
fr
om
h
tt
ps
:/
/j
ou
rn
al
s.
as
m
.o
rg
/j
ou
rn
al
/m
bi
o
on
0
2
F
eb
ru
ar
y
20
22
b
y
24
.1
16
.2
51
.2
22
.
https://doi.org/10.1016/j.cbpa.2008.01.015
https://doi.org/10.1016/j.cbpa.2008.01.015
https://doi.org/10.1111/j.1574-6976.2011.00270.x
https://doi.org/10.1016/j.cub.2015.07.055
https://doi.org/10.1073/pnas.1218525110
https://doi.org/10.1073/pnas.1218525110
https://doi.org/10.3732/ajb.0800150
https://doi.org/10.2307/2406212
https://doi.org/10.1111/j.1558-5646.1998.tb01648.x
https://doi.org/10.1007/BF01013489
https://doi.org/10.1007/BF01013489
https://doi.org/10.1007/BF01796554
https://doi.org/10.1038/nature08798
https://doi.org/10.1016/S0169-5347(98)01529-8
https://doi.org/10.1016/S0169-5347(98)01529-8
https://doi.org/10.1128/mSystems.00028-16
https://doi.org/10.1128/mSystems.00028-16
https://doi.org/10.1371/journal.pbio.1002226
https://doi.org/10.1111/j.1574-6976.2008.00123.x
https://doi.org/10.1111/j.1574-6976.2008.00123.x
https://doi.org/10.1073/pnas.1423305112
https://doi.org/10.2307/1313002
https://doi.org/10.2307/1313002
https://doi.org/10.1128/AEM.01543-13
https://doi.org/10.1038/nrmicro2876
https://doi.org/10.1016/j.neuropharm.2016.07.002
https://doi.org/10.1038/nature08821
https://doi.org/10.1146/annurev-micro-092412-155715
https://doi.org/10.1016/j.tree.2004.01.002
https://doi.org/10.1016/j.tree.2004.01.002
https://doi.org/10.1371/journal.pbio.0040188
https://doi.org/10.1371/journal.pbio.0040188
https://doi.org/10.1111/j.0014-3820.2000.tb00054.x
https://doi.org/10.1111/j.0014-3820.2000.tb00054.x
https://doi.org/10.1126/science.aaf3951
https://doi.org/10.1126/science.aaf3951
https://doi.org/10.1016/j.cub.2006.09.019
https://doi.org/10.1016/j.cub.2006.09.019
https://doi.org/10.1016/j.cub.2008.08.059
https://doi.org/10.1038/ismej.2010.151
https://doi.org/10.1038/ismej.2010.151
https://doi.org/10.3354/dao069089
https://doi.org/10.3354/dao069089
https://mbio.asm.org
Environ Microbiol 72:5254 –5259. https://doi.org/10.1128/AEM.00554
-06.
31. Littman RA, Willis BL, Pfeffer C, Bourne DG. 2009. Diversities of coral-
associated bacteria differ with location, but not species, for three
acroporid corals on the Great Barrier Reef. FEMS Microbiol Ecol 68:
152–163. https://doi.org/10.1111/j.1574-6941.2009.00666.x.
32. Thomas T, Moitinho-Silva L, Lurgi M, Björk JR, Easson C, Astudillo-García
C, Olson JB, Erwin PM, López-Legentil S, Luter H, Chaves-Fonnegra A,
Costa R, Schupp PJ, Steindler L, Erpenbeck D, Gilbert J, Knight R,
Ackermann G, Victor Lopez J, Taylor MW, Thacker RW, Montoya JM,
Hentschel U, Webster NS. 2016. Diversity, structure and convergent
evolution of the global sponge microbiome. Nat Commun 7:11870.
https://doi.org/10.1038/ncomms11870.
33. Webster NS, Thomas T. 2016. The sponge hologenome. mBio 7:e00135
-16. https://doi.org/10.1128/mBio.00135-16.
34. Röthig T, Costa RM, Simona F, Baumgarten S, Torres AF, Radhakrishnan
A, Aranda M, Voolstra CR. 2016. Distinct bacterial communities associ-
ated with the coral model Aiptasia in aposymbiotic and symbiotic
states with Symbiodinium. Front Mar Sci 3. https://doi.org/10.3389/
fmars.2016.00234.
35. Erwin PM, Pineda MC, Webster N, Turon X, López-Legentil S. 2014.
Down under the tunic: bacterial biodiversity hotspots and widespread
ammonia-oxidizing archaea in coral reef ascidians. ISME J 8:575–588.
https://doi.org/10.1038/ismej.2013.188.
36. Moran NA, Baumann P. 2000. Bacterial endosymbionts in animals.
Curr Opin Microbiol 3:270 –275. https://doi.org/10.1016/S1369-5274
(00)00088-6.
37. Moran NA, Sloan DB. 2015. The hologenome concept: helpful or hol-
low? PLoS Biol 13:e1002311. https://doi.org/10.1371/journal.pbio
.1002311.
38. Douglas AE, Werren JH. 2016. Holes in the hologenome: why host-
microbe symbioses are not holobionts. mBio 7:e02099-15. https://doi
.org/10.1128/mBio.02099-15.
39. Mazel F, Davis KM, Loudon A, Kwong WK, Groussin M, Parfrey LW. 2018.
Is host filtering the main driver of phylosymbiosis across the tree of life?
mSystems 3:e00097-18. https://doi.org/10.1128/mSystems.00097-18.
40. Sieber M, Pita L, Weiland-Bräuer N, Dirksen P, Wang J, Mortzfeld B,
Franzenburg S, Schmitz RA, Baines JF, Fraune S, Hentschel U, Schulen-
burg H, Bosch TCG, Traulsen A. 2018. The neutral metaorganism.
bioRxiv https://doi.org/10.1101/367243.
41. Sloan WT, Lunn M, Woodcock S, Head IM, Nee S, Curtis TP. 2006.
Quantifying the roles of immigration and chance in shaping prokaryote
community structure. Environ Microbiol 8:732–740. https://doi.org/10
.1111/j.1462-2920.2005.00956.x.
42. Brooks AW, Kohl KD, Brucker RM, van Opstal EJ, Bordenstein SR. 2016.
Phylosymbiosis: relationships and functional effects of microbial com-
munities across host evolutionary history. PLoS Biol 14:e2000225.
https://doi.org/10.1371/journal.pbio.2000225.
43. Kohl KD, Dearing MD, Bordenstein SR. 2018. Microbial communities
exhibit host species distinguishability and phylosymbiosis along the
length of the gastrointestinal tract. Mol Ecol 27:1874 –1883. https://doi
.org/10.1111/mec.14460.
44. Ross AA, Müller KM, Weese JS, Neufeld JD. 2018. Comprehensive skin
microbiome analysis reveals the uniqueness of human skin and evi-
dence for phylosymbiosis within the class Mammalia. Proc Natl Acad
Sci U S A 115:E5786 –E5795. https://doi.org/10.1073/pnas.1801302115.
45. Ochman H, Worobey M, Kuo CH, Ndjango JBN, Peeters M, Hahn BH,
Hugenholtz P. 2010. Evolutionary relationships of wild hominids reca-
pitulated by gut microbial communities. PLoS Biol 8:e1000546. https://
doi.org/10.1371/journal.pbio.1000546.
46. Yeoh YK, Dennis PG, Paungfoo-Lonhienne C, Weber L, Brackin R,
Ragan MA, Schmidt S, Hugenholtz P. 2017. Evolutionary conserva-
tion of a core root microbiome across plant phyla along a tropical
soil chronosequence. Nat Commun 8:215. https://doi.org/10.1038/
s41467-017-00262-8.
47. Pollock FJ, McMinds R, Smith S, Bourne DG, Willis BL, Medina M,
Thurber RV, Zaneveld JR. 2018. Coral-associated bacteria demonstrate
phylosymbiosis and cophylogeny. Nat Commun 9:4921. https://doi
.org/10.1038/s41467-018-07275-x.
48. Schöttner S, Hoffmann F, Cárdenas P, Rapp HT, Boetius A, Ramette A.
2013. Relationships between host phylogeny, host type and bacterial
community diversity in cold-water coral reef sponges. PLoS One
8:e55505. https://doi.org/10.1371/journal.pone.0055505.
49. Easson CG, Thacker RW. 2014. Phylogenetic signal in the community
structure of host-specific microbiomes of tropical marine sponges.
Front Microbiol 5:532. https://doi.org/10.3389/fmicb.2014.00532.
50. Rivett DW, Bell T. 2018. Abundance determines the functional role of
bacterial phylotypes in complex communities. Nat Microbiol
3:767–772. https://doi.org/10.1038/s41564-018-0180-0.
51. Nishiguchi MK, Ruby EG, McFall-Ngai MJ. 1998. Competitive dominance
among strains of luminous bacteria provides an unusual form of evi-
dence for parallel evolution in sepiolid squid-vibrio symbioses. Appl
Environ Microbiol 64:3209 –3213.
52. Bandi C, Anderson TJC, Genchi C, Blaxter ML. 1998. Phylogeny of
Wolbachia in filarial nematodes. Proc Biol Sci 265:2407–2413. https://
doi.org/10.1098/rspb.1998.0591.
53. Deines P, Bosch TCG. 2016. Transitioning from microbiome composi-
tion to microbial community interactions: the potential of the metaor-
ganism hydra as an experimental model. Front Microbiol 7:1610.
https://doi.org/10.3389/fmicb.2016.01610.
54. Mews LK, Smith DC. 1980. The green hydra symbiosis. III. The biotrophic
transport of carbohydrate from alga to animal. Proc R Soc Lond B Biol
Sci 209:377– 401. https://doi.org/10.1098/rspb.1980.0101.
55. Kawaida H, Ohba K, Koutake Y, Shimizu H, Tachida H, Kobayakawa Y.
2013. Symbiosis between hydra and chlorella: molecular phylogenetic
analysis and experimental study provide insight into its origin and
evolution. Mol Phylogenet Evol 66:906 –914. https://doi.org/10.1016/j
.ympev.2012.11.018.
56. Matcher GF, Waterworth SC, Walmsley TA, Matsatsa T, Parker-Nance S,
Davies-Coleman MT, Dorrington RA. 2017. Keeping it in the family:
coevolution of latrunculid sponges and their dominant bacterial sym-
bionts. Microbiologyopen 6:e00417. https://doi.org/10.1002/mbo3.417.
57. Neave MJ, Apprill A, Ferrier-Pagès C, Voolstra CR. 2016. Diversity and
function of prevalent symbiotic marine bacteria in the genus Endozo-
icomonas. Appl Microbiol Biotechnol 100:8315– 8324. https://doi.org/
10.1007/s00253-016-7777-0.
58. Neave MJ, Michell CT, Apprill A, Voolstra CR. 2017. Endozoicomonas
genomes reveal functional adaptation and plasticity in bacterial strains
symbiotically associated with diverse marine hosts. Sci Rep 7:40579.
https://doi.org/10.1038/srep40579.
59. Neave MJ, Rachmawati R, Xun L, Michell CT, Bourne DG, Apprill A,
Voolstra CR. 2017. Differential specificity between closely related corals
and abundant Endozoicomonas endosymbionts across global scales.
ISME J 11:186 –200. https://doi.org/10.1038/ismej.2016.95.
60. Brune A. 2014. Symbiotic digestion of lignocellulose in termite guts.
Nat Rev Microbiol 12:168 –180. https://doi.org/10.1038/nrmicro3182.
61. Ikeda-Ohtsubo W, Brune A. 2009. Cospeciation of termite gut flagel-
lates and their bacterial endosymbionts: Trichonympha species and
‘Candidatus Endomicrobium trichonymphae’. Mol Ecol 18:332–342.
https://doi.org/10.1111/j.1365-294X.2008.04029.x.
62. Raina JB, Tapiolas D, Willis BL, Bourne DG. 2009. Coral-associated
bacteria and their role in the biogeochemical cycling of sulfur. Appl
Environ Microbiol 75:3492–3501. https://doi.org/10.1128/AEM.02567
-08.
63. Lema KA, Willis BL, Bourne DG. 2012. Corals form characteristic associ-
ations with symbiotic nitrogen-fixing bacteria. Appl Environ Microbiol
78:3136 –3144. https://doi.org/10.1128/AEM.07800-11.
64. Rädecker N, Pogoreutz C, Voolstra CR, Wiedenmann J, Wild C. 2015.
Nitrogen cycling in corals: the key to understanding holobiont func-
tioning? Trends Microbiol 23:490 – 497. https://doi.org/10.1016/j.tim
.2015.03.008.
65. Lawson CA, Raina JB, Kahlke T, Seymour JR, Suggett DJ. 2018.
Defining the core microbiome of the symbiotic dinoflagellate, Sym-
biodinium. Environ Microbiol Rep 10:7–11. https://doi.org/10.1111/
1758-2229.12599.
66. Takiya DM, Tran PL, Dietrich CH, Moran NA. 2006. Co-cladogenesis
spanning three phyla: leafhoppers (Insecta: Hemiptera: Cicadellidae)
and their dual bacterial symbionts. Mol Ecol 15:4175– 4191. https://doi
.org/10.1111/j.1365-294X.2006.03071.x.
67. Moitinho-Silva L, Díez-Vives C, Batani G, Esteves AIS, Jahn MT, Thomas
T. 2017. Integrated metabolism in sponge-microbe symbiosis revealed
by genome-centered metatranscriptomics. ISME J 11:1651–1666.
https://doi.org/10.1038/ismej.2017.25.
68. Lackner G, Peters EE, Helfrich EJN, Piel J. 2017. Insights into the lifestyle
of uncultured bacterial natural product factories associated with ma-
rine sponges. Proc Natl Acad Sci U S A 114:E347–E356. https://doi.org/
10.1073/pnas.1616234114.
69. Nguyen MTHD, Liu M, Thomas T. 2014. Ankyrin-repeat proteins from
Minireview ®
January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 13
D
ow
nl
oa
de
d
fr
om
h
tt
ps
:/
/j
ou
rn
al
s.
as
m
.o
rg
/j
ou
rn
al
/m
bi
o
on
0
2
F
eb
ru
ar
y
20
22
b
y
24
.1
16
.2
51
.2
22
.
https://doi.org/10.1128/AEM.00554-06
https://doi.org/10.1128/AEM.00554-06
https://doi.org/10.1111/j.1574-6941.2009.00666.x
https://doi.org/10.1038/ncomms11870
https://doi.org/10.1128/mBio.00135-16
https://doi.org/10.3389/fmars.2016.00234
https://doi.org/10.3389/fmars.2016.00234
https://doi.org/10.1038/ismej.2013.188
https://doi.org/10.1016/S1369-5274(00)00088-6
https://doi.org/10.1016/S1369-5274(00)00088-6
https://doi.org/10.1371/journal.pbio.1002311
https://doi.org/10.1371/journal.pbio.1002311
https://doi.org/10.1128/mBio.02099-15
https://doi.org/10.1128/mBio.02099-15
https://doi.org/10.1128/mSystems.00097-18
https://doi.org/10.1101/367243
https://doi.org/10.1111/j.1462-2920.2005.00956.x
https://doi.org/10.1111/j.1462-2920.2005.00956.x
https://doi.org/10.1371/journal.pbio.2000225
https://doi.org/10.1111/mec.14460
https://doi.org/10.1111/mec.14460
https://doi.org/10.1073/pnas.1801302115
https://doi.org/10.1371/journal.pbio.1000546
https://doi.org/10.1371/journal.pbio.1000546
https://doi.org/10.1038/s41467-017-00262-8
https://doi.org/10.1038/s41467-017-00262-8
https://doi.org/10.1038/s41467-018-07275-x
https://doi.org/10.1038/s41467-018-07275-x
https://doi.org/10.1371/journal.pone.0055505
https://doi.org/10.3389/fmicb.2014.00532
https://doi.org/10.1038/s41564-018-0180-0
https://doi.org/10.1098/rspb.1998.0591
https://doi.org/10.1098/rspb.1998.0591
https://doi.org/10.3389/fmicb.2016.01610
https://doi.org/10.1098/rspb.1980.0101
https://doi.org/10.1016/j.ympev.2012.11.018
https://doi.org/10.1016/j.ympev.2012.11.018
https://doi.org/10.1002/mbo3.417
https://doi.org/10.1007/s00253-016-7777-0
https://doi.org/10.1007/s00253-016-7777-0
https://doi.org/10.1038/srep40579
https://doi.org/10.1038/ismej.2016.95
https://doi.org/10.1038/nrmicro3182
https://doi.org/10.1111/j.1365-294X.2008.04029.x
https://doi.org/10.1128/AEM.02567-08
https://doi.org/10.1128/AEM.02567-08
https://doi.org/10.1128/AEM.07800-11
https://doi.org/10.1016/j.tim.2015.03.008
https://doi.org/10.1016/j.tim.2015.03.008
https://doi.org/10.1111/1758-2229.12599
https://doi.org/10.1111/1758-2229.12599
https://doi.org/10.1111/j.1365-294X.2006.03071.x
https://doi.org/10.1111/j.1365-294X.2006.03071.x
https://doi.org/10.1038/ismej.2017.25
https://doi.org/10.1073/pnas.1616234114
https://doi.org/10.1073/pnas.1616234114
https://mbio.asm.org
sponge symbionts modulate amoebal phagocytosis. Mol Ecol 23:
1635–1645. https://doi.org/10.1111/mec.12384.
70. Reynolds D, Thomas T. 2016. Evolution and function of eukaryotic-like
proteins from sponge symbionts. Mol Ecol 25:5242–5253. https://doi
.org/10.1111/mec.13812.
71. Díez-Vives C, Moitinho-Silva L, Nielsen S, Reynolds D, Thomas T. 2017.
Expression of eukaryotic-like protein in the microbiome of sponges.
Mol Ecol 26:1432–1451. https://doi.org/10.1111/mec.14003.
72. Berry D, Loy A. 2018. Stable-isotope probing of human and animal
microbiome function. Trends Microbiol 13:999-1007. https://doi.org/10
.1016/j.tim.2018.06.004.
73. Volland JM, Schintlmeister A, Zambalos H, Reipert S, Mozetič P, Espada-
Hinojosa S, Turk V, Wagner M, Bright M. 2018. NanoSIMS and tissue
autoradiography reveal symbiont carbon fixation and organic carbon
transfer to giant ciliate host. ISME J 12:714 –727. https://doi.org/10
.1038/s41396-018-0069-1.
74. Hernandez-Agreda A, Gates RD, Ainsworth TD. 2017. Defining the core
microbiome in corals’ microbial soup. Trends Microbiol 25:125–140.
https://doi.org/10.1016/j.tim.2016.11.003.
75. Ainsworth TD, Krause L, Bridge T, Torda G, Raina JB, Zakrzewski M,
Gates RD, Padilla-Gamiño JL, Spalding HL, Smith C, Woolsey ES, Bourne
DG, Bongaerts P, Hoegh-Guldberg O, Leggat W. 2015. The coral core
microbiome identifies rare bacterial taxa as ubiquitous endosymbionts.
ISME J 9:2261–2274. https://doi.org/10.1038/ismej.2015.39.
76. Weynberg KD, Wood-Charlson EM, Suttle CA, van Oppen MJH. 2014.
Generating viral metagenomes from the coral holobiont. Front Micro-
biol 5:206. https://doi.org/10.3389/fmicb.2014.00206.
77. Wommack KE, Colwell RR. 2000. Virioplankton: viruses in aquatic eco-
systems. Microbiol Mol Biol Rev 64:69 –114. https://doi.org/10.1128/
MMBR.64.1.69-114.2000.
78. Ochman H, Moran NA. 2001. Genes lost and genes found: evolution of
bacterial pathogenesis and symbiosis. Science 292:1096 –1099. https://
doi.org/10.1126/science.1058543.
79. Oliver KM, Degnan PH, Hunter MS, Moran NA. 2009. Bacteriophages
encode factors required for protection in a symbiotic mutualism. Sci-
ence 325:992–994. https://doi.org/10.1126/science.1174463.
80. Bettarel Y, Bouvier T, Nguyen HK, Thu PT. 2015. The versatile nature of
coral-associated viruses. Environ Microbiol 17:3433–3439. https://doi
.org/10.1111/1462-2920.12579.
81. Laffy PW, Wood-Charlson EM, Turaev D, Jutz S, Pascelli C, Bell SC, Peirce
TE, Weynberg KD, Van OMJH, Rattei T, Webster NS. 25 March 2018. Reef
invertebrate viromics : diversity, host specificity and functional capac-
ity. Environ Microbiol https://doi.org/10.1111/1462-2920.14110.
82. LaJeunesse TC, Parkinson JE, Gabrielson PW, Jeong HJ, Reimer JD,
Voolstra CR, Santos SR. 2018. Systematic revision of Symbiodiniaceae
highlights the antiquity and diversity of coral endosymbionts. Curr Biol
28:2570 –2580.e6. https://doi.org/10.1016/j.cub.2018.07.008.
83. Stat M, Carter D, Hoegh GO. 2006. The evolutionary history of Symbio-
dinium and scleractinian hosts—symbiosis, diversity, and the effect of
climate change. Perspect Plant Ecol Evol Syst 8:23– 43. https://doi.org/
10.1016/j.ppees.2006.04.001.
84. Rowan R. 1998. Diversity and Ecology of Zooxanthellae on coral
reefs. J Phycol 34:407– 417. https://doi.org/10.1046/j.1529-8817
.1998.340407.x.
85. Krueger T, Gates RD. 2012. Cultivating endosymbionts – host environ-
mental mimics support the survival of Symbiodinium C15 ex hospite. J
Exp Mar Bio Ecol 413:169 –176. https://doi.org/10.1016/j.jembe.2011.12
.002.
86. Fisher RM, Henry LM, Cornwallis CK, Kiers ET, West SA. 2017. The
evolution of host-symbiont dependence. Nat Commun 8:15973.
https://doi.org/10.1038/ncomms15973.
87. Ceh J, van Keulen M, Bourne DG. 2013. Intergenerational transfer of
specific bacteria in corals and possible implications for offspring fitness.
Microb Ecol 65:227–231. https://doi.org/10.1007/s00248-012-0105-z.
88. Leite DCA, Leão P, Garrido AG, Lins U, Santos HF, Pires DO, Castro CB,
van Elsas JD, Zilberberg C, Rosado AS, Peixoto RS. 2017. Broadcast
spawning coral Mussismilia hispida can vertically transfer its associated
bacterial core. Front Microbiol 8:176. https://doi.org/10.3389/fmicb
.2017.00176.
89. Sharp KH, Distel D, Paul VJ. 2012. Diversity and dynamics of bacterial
communities in early life stages of the Caribbean coral Porites as-
treoides. ISME J 6:790 – 801. https://doi.org/10.1038/ismej.2011.144.
90. Sharp KH, Ritchie KB, Schupp PJ, Ritson-Williams R, Paul VJ. 2010.
Bacterial acquisition in juveniles of several broadcast spawning coral
species. PLoS One 5:e10898. https://doi.org/10.1371/journal.pone
.0010898.
91. Sharp KH, Eam B, Faulkner DJ, Haygood MG. 2007. Vertical transmission
of diverse microbes in the tropical sponge Corticium sp. Appl Environ
Microbiol 73:622– 629. https://doi.org/10.1128/AEM.01493-06.
92. Brucker RM, Bordenstein SR. 2012. Speciation by symbiosis. Trends Ecol
Evol 27:443– 451. https://doi.org/10.1016/j.tree.2012.03.011.
93. Hughes TP, Kerry JT, Álvarez-Noriega M, Álvarez-Romero JG, Anderson
KD, Baird AH, Babcock RC, Beger M, Bellwood DR, Berkelmans R, Bridge
TC, Butler IR, Byrne M, Cantin NE, Comeau S, Connolly SR, Cumming GS,
Dalton SJ, Diaz-Pulido G, Eakin CM, Figueira WF, Gilmour JP, Harrison
HB, Heron SF, Hoey AS, Hobbs JPA, Hoogenboom MO, Kennedy EV, Kuo
CY, Lough JM, Lowe RJ, Liu G, McCulloch MT, Malcolm HA, McWilliam
MJ, Pandolfi JM, Pears RJ, Pratchett MS, Schoepf V, Simpson T, Skirving
WJ, Sommer B, Torda G, Wachenfeld DR, Willis BL, Wilson SK. 2017.
Global warming and recurrent mass bleaching of corals. Nature 543:
373–377. https://doi.org/10.1038/nature21707.
94. Hughes TP, Barnes ML, Bellwood DR, Cinner JE, Cumming GS, Jackson
JBC, Kleypas J, Van De Leemput IA, Lough JM, Morrison TH, Palumbi SR,
Van Nes EH, Scheffer M. 2017. Coral reefs in the Anthropocene. Nature
546:82–90. https://doi.org/10.1038/nature22901.
95. Fan L, Reynolds D, Liu M, Stark M, Kjelleberg S, Webster NS, Thomas T.
2012. Functional equivalence and evolutionary convergence in com-
plex communities of microbial sponge symbionts. Proc Natl Acad Sci
109:E1878 –E1887. https://doi.org/10.1073/pnas.1203287109.
96. Bourne DG, Dennis PG, Uthicke S, Soo RM, Tyson GW, Webster N. 2013.
Coral reef invertebrate microbiomes correlate with the presence of
photosymbionts. ISME J 7:1452–1458.
97. Li J, Chen Q, Long LJ, Dong J De, Yang J, Zhang S. 2014. Bacterial
dynamics within the mucus, tissue and skeleton of the coral Porites
lutea during different seasons. Sci Rep 4:1– 8.
98. Hakim JA, Koo H, Kumar R, Lefkowitz EJ, Morrow CD, Powell ML, Watts
SA, Bej AK. 2016. The gut microbiome of the sea urchin, Lytechinus
variegatus, from its natural habitat demonstrates selective attributes of
microbial taxa and predictive metabolic profiles. FEMS Microbiol Ecol
92:1–12.
99. Wessels W, Sprungala S, Watson SA, Miller DJ, Bourne DG. 2017. The
microbiome of the octocoral Lobophytum pauciflorum: Minor differ-
ences between sexes and resilience to short-term stress. FEMS Micro-
biol Ecol 93:1–13.
100. Ngangbam AK, Baten A, Waters DLE, Whalan S, Benkendorff K. 2015.
Characterization of bacterial communities associated with the Tyrian
purple producing gland in a marine gastropod. PLoS One 10:1–19.
101. Shinzato C, Shoguchi E, Kawashima T, Hamada M, Hisata K, Tanaka M,
Fujie M, Fujiwara M, Koyanagi R, Ikuta T, Fujiyama A, Miller DJ, Satoh N.
2011. Using the Acropora digitifera genome to understand coral re-
sponses to environmental change. Nature 476:320 –323.
102. Work TM, Aeby GS. 2014. Microbial aggregates within tissues infect a
diversity of corals throughout the Indo-Pacific. Mar Ecol Prog Ser
500:1–9.
103. Maldonado M. 2007. Intergenerational transmission of symbiotic bac-
teria in oviparous and viviparous demosponges, with emphasis on
intracytoplasmically-compartmented bacterial types. J Mar Biol Assoc
United Kingdom 87:1701–1713.
Minireview ®
January/February 2019 Volume 10 Issue 1 e02241-18 mbio.asm.org 14
D
ow
nl
oa
de
d
fr
om
h
tt
ps
:/
/j
ou
rn
al
s.
as
m
.o
rg
/j
ou
rn
al
/m
bi
o
on
0
2
F
eb
ru
ar
y
20
22
b
y
24
.1
16
.2
51
.2
22
.
https://doi.org/10.1111/mec.12384
https://doi.org/10.1111/mec.13812
https://doi.org/10.1111/mec.13812
https://doi.org/10.1111/mec.14003
https://doi.org/10.1016/j.tim.2018.06.004
https://doi.org/10.1016/j.tim.2018.06.004
https://doi.org/10.1038/s41396-018-0069-1
https://doi.org/10.1038/s41396-018-0069-1
https://doi.org/10.1016/j.tim.2016.11.003
https://doi.org/10.1038/ismej.2015.39
https://doi.org/10.3389/fmicb.2014.00206
https://doi.org/10.1128/MMBR.64.1.69-114.2000
https://doi.org/10.1128/MMBR.64.1.69-114.2000
https://doi.org/10.1126/science.1058543
https://doi.org/10.1126/science.1058543
https://doi.org/10.1126/science.1174463
https://doi.org/10.1111/1462-2920.12579
https://doi.org/10.1111/1462-2920.12579
https://doi.org/10.1111/1462-2920.14110
https://doi.org/10.1016/j.cub.2018.07.008
https://doi.org/10.1016/j.ppees.2006.04.001
https://doi.org/10.1016/j.ppees.2006.04.001
https://doi.org/10.1046/j.1529-8817.1998.340407.x
https://doi.org/10.1046/j.1529-8817.1998.340407.x
https://doi.org/10.1016/j.jembe.2011.12.002
https://doi.org/10.1016/j.jembe.2011.12.002
https://doi.org/10.1038/ncomms15973
https://doi.org/10.1007/s00248-012-0105-z
https://doi.org/10.3389/fmicb.2017.00176
https://doi.org/10.3389/fmicb.2017.00176
https://doi.org/10.1038/ismej.2011.144
https://doi.org/10.1371/journal.pone.0010898
https://doi.org/10.1371/journal.pone.0010898
https://doi.org/10.1128/AEM.01493-06
https://doi.org/10.1016/j.tree.2012.03.011
https://doi.org/10.1038/nature21707
https://doi.org/10.1038/nature22901
https://doi.org/10.1073/pnas.1203287109
https://mbio.asm.org
- UNTANGLING PATTERNS OF HOST-MICROBE COEVOLUTION IN A WEB OF MICROBES
(i) Phylosymbiosis and neutral theory—identifying stochastic and deterministic components of the microbiome.
(ii) Codivergence—microbial phylogeny and host phylogeny are congruent.
(iii) Metabolic collaboration—intimate association between host and microbe.
CORE MICROBIOME AND THE POTENTIAL OF VIRUSES
CHALLENGES, FURTHER CONSIDERATIONS, AND CONCLUSIONS
ACKNOWLEDGMENTS
REFERENCES
ESSAY
Natural experiments and long-term
monitoring are critical to understand and
predict marine host–microbe ecology and
evolution
Matthieu LerayID
1‡*, Laetitia G. E. WilkinsID
2¤‡
, Amy ApprillID
3
, Holly M. BikID
4
,
Friederike Clever
1,5
, Sean R. ConnollyID
1
, Marina E. De León
1,2
, J. Emmett DuffyID
6
,
Leïla Ezzat7, Sarah Gignoux-WolfsohnID8, Edward Allen Herre1, Jonathan Z. KayeID9,
David I. KlineID
1
, Jordan G. KuenemanID
1
, Melissa K. McCormickID
8
, W. Owen McMillan
1
,
Aaron O’DeaID
1,10*, Tiago J. PereiraID
4
, Jillian M. PetersenID
11
, Daniel F. PetticordID
1
,
Mark E. Torchin
1
, Rebecca Vega ThurberID
12
, Elin VidevallID
13,14
, William T. WcisloID
1
,
Benedict YuenID
11
, Jonathan A. EisenID
2,15,16
1 Smithsonian Tropical Research Institute, Balboa, Ancon, Republic of Panama, 2 UC Davis Genome
Center, University of California, Davis, Davis, California, United States of America, 3 Marine Chemistry and
Geochemistry Department, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts, United
States of America, 4 Department of Marine Sciences and Institute of Bioinformatics, University of Georgia,
Athens, Georgia, United States of America, 5 Department of Natural Sciences, Manchester Metropolitan
University, Manchester, United Kingdom, 6 Tennenbaum Marine Observatories Network, Smithsonian
Environmental Research Center, Edgewater, Maryland, United States of America, 7 Department of Ecology,
Evolution and Marine Biology, University of California Santa Barbara, Santa Barbara, California, United
States of America, 8 Smithsonian Environmental Research Center, Edgewater, Maryland, United States of
America, 9 Gordon and Betty Moore Foundation, Palo Alto, California, United States of America,
10 Department of Biological, Geological and Environmental Sciences, University of Bologna, Bologna, Italy,
11 Centre for Microbiology and Environmental Systems Science, University of Vienna, Vienna, Austria,
12 Department of Microbiology, Oregon State University, Corvallis, Oregon, United States of America,
13 Center for Conservation Genomics, Smithsonian Conservation Biology Institute, Washington, DC, United
States of America, 14 Department of Ecology and Evolutionary Biology, Brown University, Providence,
Rhode Island, United States of America, 15 Department of Evolution and Ecology, University of California,
Davis, Davis, California, United States of America, 16 Department of Medical Microbiology and Immunology,
University of California, Davis, Davis, California, United States of America
¤ Current address: Max Planck Institute for Marine Microbiology, Department of Symbiosis, Bremen,
Germany
‡ These authors share first authorship on this work.
* leray.upmc@gmail.com (ML); odeaa@si.edu (AO)
AbstractAU : Pleaseconfirmthatallheadinglevelsarerepresentedcorrectly:
Marine multicellular organisms host a diverse collection of bacteria, archaea, microbial
eukaryotes, and viruses that form their microbiome. Such host-associated microbes can sig-
nificantly influence the host’s physiological capacities; however, the identity and functional
role(s) of key members of the microbiome (“core microbiome”) in most marine hosts coexist-
ing in natural settings remain obscure. Also unclear is how dynamic interactions between
hosts and the immense standing pool of microbial genetic variation will affect marine eco-
systems’ capacity to adjust to environmental changes. Here, we argue that significantly
advancing our understanding of how host-associated microbes shape marine hosts’ plastic
and adaptive responses to environmental change requires (i) recognizing that individual
host–microbe systems do not exist in an ecological or evolutionary vacuum and (ii)
PLOS BIOLOGY
PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 1 / 18
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Leray M, Wilkins LGE, Apprill A, Bik HM,
Clever F, Connolly SR, et al. (2021) Natural
experiments and long-term monitoring are critical
to understand and predict marine host–microbe
ecology and evolution. PLoS Biol 19(8): e3001322.
https://doi.org/10.1371/journal.pbio.3001322
Published: August 19, 2021
Copyright: This is an open access article, free of all
copyright, and may be freely reproduced,
distributed, transmitted, modified, built upon, or
otherwise used by anyone for any lawful purpose.
The work is made available under the Creative
Commons CC0 public domain dedication.
Funding: Financial support for the workshop was
provided by grant GBMF5603 (https://doi.org/10.
37807/GBMF5603) from the Gordon and Betty
Moore Foundation (W.T. Wcislo, J.A. Eisen, co-
PIs), and additional funding from the Smithsonian
Tropical Research Institute and the Office of the
Provost of the Smithsonian Institution (W.T.
Wcislo, J.P. Meganigal, and R.C. Fleischer, co-PIs).
JP was supported by a WWTF VRG Grant and the
ERC Starting Grant ’EvoLucin’. LGEW has received
funding from the European Union’s Framework
Programme for Research and Innovation Horizon
2020 (2014-2020) under the Marie Sklodowska-
Curie Grant Agreement No. 101025649. AO was
supported by the Sistema Nacional de
Investigadores (SENACYT, Panamá). A. Apprill was
supported by NSF award OCE-1938147. D.I. Kline,
M. Leray, S.R. Connolly, and M.E. Torchin were
https://orcid.org/0000-0002-7327-1878
https://orcid.org/0000-0003-3632-2063
https://orcid.org/0000-0002-4249-2977
https://orcid.org/0000-0002-4356-3837
https://orcid.org/0000-0003-1537-0859
https://orcid.org/0000-0001-8595-6391
https://orcid.org/0000-0002-9037-1088
https://orcid.org/0000-0002-3653-4253
https://orcid.org/0000-0001-5128-9439
https://orcid.org/0000-0001-9521-6282
https://orcid.org/0000-0001-6564-7575
https://orcid.org/0000-0001-5495-4764
https://orcid.org/0000-0002-6424-2848
https://orcid.org/0000-0002-9852-3445
https://orcid.org/0000-0002-1764-6321
https://orcid.org/0000-0003-3516-2061
https://orcid.org/0000-0002-9998-3689
https://orcid.org/0000-0001-7897-4778
https://orcid.org/0000-0002-4979-0862
https://orcid.org/0000-0002-0159-2197
https://doi.org/10.1371/journal.pbio.3001322
http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pbio.3001322&domain=pdf&date_stamp=2021-08-19
http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pbio.3001322&domain=pdf&date_stamp=2021-08-19
http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pbio.3001322&domain=pdf&date_stamp=2021-08-19
http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pbio.3001322&domain=pdf&date_stamp=2021-08-19
http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pbio.3001322&domain=pdf&date_stamp=2021-08-19
http://crossmark.crossref.org/dialog/?doi=10.1371/journal.pbio.3001322&domain=pdf&date_stamp=2021-08-19
https://doi.org/10.1371/journal.pbio.3001322
https://creativecommons.org/publicdomain/zero/1.0/
https://creativecommons.org/publicdomain/zero/1.0/
https://doi.org/10.37807/GBMF5603
https://doi.org/10.37807/GBMF5603
expanding the field toward long-term, multidisciplinary research on entire communities of
hosts and microbes. Natural experiments, such as time-calibrated geological events associ-
ated with well-characterized environmental gradients, provide unique ecological and evolu-
tionary contexts to address this challenge. We focus here particularly on mutualistic
interactions between hosts and microbes, but note that many of the same lessons and
AU : Anabbreviationlisthasbeencompiledforthoseusedinthemaintext:Pleaseverifythatallentriesarecorrect:approaches would apply to other types of interactions.
Main
It is widely recognized that host-associated microbes play profound roles in the health of their
marine hosts and the ecosystems they inhabit. Although some such interactions with microbes
are transient, many are more persistent and can be generally described as symbioses. Symbio-
ses come in many flavors including parasitism, commensalism, and mutualism (see Box 1),
and, in this paper, we focus in particular on the mutually beneficial (i.e., mutualistic) subset of
such interactions involving marine hosts. Despite the wide recognition of the importance of
such mutualisms, it remains less clear how these associations scale up to drive broader ecologi-
cal and evolutionary patterns and processes. For example, the contribution of microbes to host
acclimatization and adaptation (see Box 1 for definitions) is an active new field of experimental
research with much potential. Studies, mostly conducted in controlled laboratory settings,
have evaluated the ecological costs/benefits for hosts to associate temporarily with different
microbes (e.g., corals [1–4]) or to engage in obligate intimate relationships (e.g., bobtail squid
with the bioluminescent bacteria Aliivibrio fischeri [5]).
Experimental studies are, however, intrinsically limited in several ways. They limit them-
selves to a small number of experimentally tractable hosts and microbes, and, in doing so,
fail to account for the enormous complexity of interactions and variation that exist in nature
between multiple hosts and their multitudes of associated microbes. Short-lived experiments
(e.g., days to weeks) cannot replicate the scales of time and space involved in the potential
coevolution of hosts and microbes (Box 1). Attempts to merge long-term datasets to reveal
overarching patterns (e.g., [6–9]) have provided valuable insights but are shadowed by the lim-
its and biases introduced by mixing information from different contexts or methodologies
[10]. These limitations obscure general principles on the roles (mutualistic or otherwise) of
host-associated microbes across host individuals, species, and communities [11–13].
Here, we demonstrate the value of moving beyond taxon-centric approaches to studying
host–microbe associations in their natural evolutionary and ecological context. We suggest
intensifying long-term research in well-documented “natural experiments”. Such natural
experiments, including well-calibrated geological events (e.g., vicariance and creation of novel
habitats accurately dated using fossil and geological data) and environmental gradients where
multiple hosts and associated microbes are subjected to the same range of environmental con-
ditions, can be particularly useful (Fig 1). These phenomena provide a unique framework for
comparative studies where the processes of interest occur over spatial and evolutionary time
scales that are nearly impossible to capture in laboratory experiments. The value of combining
experimental and long-term field studies at natural experiments has been recognized by ecolo-
gists [14–16]. We argue that similar approaches should be applied to the study of host–microbe
interactions. We highlight several natural experiments that can advance our understanding of
the ecological and evolutionary mechanisms shaping host–microbe interactions (with a focus
on mutualistic ones) in marine communities and ecosystems.
PLOS BIOLOGY
PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 2 / 18
supported by a Rohr Family Foundation grant for
the Rohr Reef Resilience Project, for which this is
contribution #2. This is contribution #85 from the
Smithsonian’s MarineGEO and Tennenbaum
Marine Observatories Network. The funders had no
role in study design, data collection and analysis,
decision to publish, or preparation of the
manuscript.
Competing interests: The authors have declared
that no competing interests exist.
Abbreviations: LTER, Long-Term Ecological
Research; MarineGEO, Marine Global Earth
Observatory; MBON, Marine Biodiversity
Observation Network; TEP, Tropical Eastern Pacific.
https://doi.org/10.1371/journal.pbio.3001322
Identifying important players
Marine organisms have evolved complex structural, behavioral, and chemical mechanisms to
regulate the presence, abundance, and activity of their microbial associates. Hosts can limit
colonization by transient opportunistic microbes that would use space and resources without
providing any benefits, and some hosts can even block pathogens entirely [17–19]. Host-spe-
cific and obligate microbial associates, often called the “core microbiome” of a host population
Box 1. Definitions of key terms
Acclimatization: The process by which an organism becomes accustomed to new envi-
ronmental conditions during its lifetime.
Adaptation: A heritable trait of an organism that increases its fitness in its surrounding
environment. In comparison to acclimatization, adaptations will be passed on to the
next generation.
Convergent evolution: Independent origins of similar features in different organisms in
response to separately experiencing similar selective pressures. Importantly, conver-
gently originated features, also known as analogous features, were not present in the
common ancestor of the taxa in question.
Genetic drift: Change in the relative frequency of genotypes due to random variation in
reproduction. Such drift is more common in small populations and leads to changes in
genotype frequencies independent of adaptive forces.
Host–microbe coevolution: During host–microbe coevolution, multicellular hosts and
their associated microbes show a concerted and heritable response to an environmental
change.
Homologous recombination: The process by which two pieces or stretches of DNA that
are very similar in their sequence physically align and exchange nucleotides.
Horizontal gene transfer: The unidirectional movement of DNA, usually only small frac-
tions of a genome, from one organism to another. Though this generally occurs more
frequently within species than between, it can also occur across vast evolutionary
distances.
Metagenomics: Studies of the genetic material of communities of organisms.
Phenotypic plasticity: Phenotypic plasticity is the ability of a specific genotype to pro-
duce more than one phenotype in response to a changing environment during an indi-
vidual’s lifetime. These phenotypic changes may include an organism’s behavior,
morphology, physiology, or other features. Phenotypic plasticity is adaptive if it increases
an individual’s survival and if the ability is passed on to the next generation.
Symbioses: Symbioses are broadly defined as intimate interactions between at least two
organisms where at least one of them benefits. We focus here specifically on mutually
beneficial interactions (aka mutualisms) between multicellular eukaryotes and their
associated microbes. These interactions may include disease resistance, predator avoid-
ance, and nutrition. These interactions will ultimately increase host survival and fitness.
PLOS BIOLOGY
PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 3 / 18
https://doi.org/10.1371/journal.pbio.3001322
Fig 1. Examples of marine natural experiments as observatories of host–microbe interactions. Regionally focused, long-term,
and taxonomically broad research programs will help fill key knowledge gaps about the nature of microbe functions and the
dynamics of host–microbe interactions in changing oceans. We highlight areas of the world’s oceans where environmental
gradients are well characterized, where the taxonomy and evolutionary history of the local host fauna and flora is already well
established, where paleoecological studies can provide important historical context, where a long-term monitoring program is
ongoing, and where there is significant research infrastructure. Long-term monitoring sites (white dots) include sites of the NSF’s
LTER Network, the Smithsonian Institution’s MarineGEO network of partners, the MBON, the AIMS, and the ASSEMBLE. (1)
NASA MODIS data; (2) Adapted from [93]; (3) Adapted from [73]; (4) Adapted from [74]; (5) Adapted from [94]; (6) Adapted
from [95]. AIMSAU : AbbreviationlistshavebeencompiledforthoseusedinFigs1and4:Pleaseverifythatallentriesarecorrect:, Australian Institute of Marine Science; ASSEMBLE, Association of European Marine Biological Laboratories;
LTER, Long-Term Ecological Research; MarineGEO, Marine Global Earth Observatory; MBON, Marine Biodiversity
Observation Network.
https://doi.org/10.1371/journal.pbio.3001322.g001
PLOS BIOLOGY
PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 4 / 18
https://doi.org/10.1371/journal.pbio.3001322.g001
https://doi.org/10.1371/journal.pbio.3001322
or species, are generally assumed to play more important functional roles than opportunistic
and transient taxa [20]. This core microbiome is exemplified by an obligate nutritional micro-
bial symbiosis, in which the host relies extensively on microbial partners for survival by syn-
thesis of food, often in a nutrient-limited habitat. The host may acquire these partners
horizontally (from the surrounding environment), vertically (from the parent to the offspring),
or in both ways (mixed mode) [21]. Many evolved symbioses result in codependency; for
example, the genomes of host-associated microbes have lost genes encoding pathways that
were previously essential, such as those for motility or environmental stress responses, but that
became obsolete in obligate symbiotic lifestyles [22]. In return, hosts have evolved mechanisms
to maintain their associated microbes in stable intracellular environments and to support their
nutritional needs [23]. Some of these nutritional associations are clearly identifiable because
symbionts form massive and dense populations, sometimes only consisting of a single micro-
bial species, in or on the bodies of their hosts. Examples include photosynthetic symbioses in
cnidarians [24] and chemosynthetic symbioses in invertebrate animals such as bathymodiolin
mussels, lucinid clams, Riftia tubeworms, and Astomonema nematodes [25,26]. Although
widespread, host reliance on a single or few microbes for nutrition are the exception rather
than the rule. The vast majority of animals and plants are instead associated with a diverse
assemblage of microbes where it is challenging to differentiate between members of the core
microbiome and the myriad of transient microbes and even more challenging to determine
what, if any, key functional roles such microbes play.
Several approaches have been proposed to identify key microbes or functions within com-
plex host microbiomes (reviewed in [27]). The most common practice is to identify microbial
taxa that are consistently associated with a host population or species using marker gene
sequencing, usually above some arbitrary prevalence threshold ([28]; but see [29,30] for alter-
native methods). The prevalence of a host–microbe association is typically measured without
explicit attention to co-occurring and closely related host taxa, the surrounding environment,
or adequacy of spatial and temporal sampling. This limited sampling and lack of context, often
resulting from funding constraints, leads to several major limitations. First, a microbial taxon
can be prevalent in a host population for reasons unrelated to its functional role. For example,
it may originate from the host’s food or habitat, including seawater or sediment [31]. Second,
even the core microbiome can change over time [32]. Functionally important microbes may
fluctuate in abundance throughout host ontogeny and may also vary seasonally. Essential host-
associated microbes may be overlooked if the sampling method cannot detect low abundance
reliably, resulting in false negatives, or if sampling is sporadic, missing the life stage or season
when particular microbes are essential. Third, many studies rely upon sequencing of rRNA
genes to characterize communities, yet rRNA genes are generally too conserved to distinguish
closely related taxa and reveal little directly about genomic functional potential. Clearly, under-
standing the functional roles of host-associated microbes requires analyses that go far beyond
individual marker gene profiles and instead encompass other types of information such as
whole genomes or metagenomes, transcriptomes, metabolomes, localization, biochemistry,
and more. Fourth, taxon-focused studies may miss valuable information about interactions
that could be gleaned from broader comparative analyses. Microbes that are specific to particu-
lar host genotypes, host species, or closely related groups of hosts, indicating a shared evolu-
tionary history, are likely candidates for core microbes with specialized functions (e.g., gut
fermenters associated with herbivores). These existing limitations could be robustly circum-
vented via whole-ecosystem studies where long-term collection of comprehensive genomic-
level datasets (e.g., ‘omic scale information) would transform our understanding of host–
microbe interactions at all levels.
PLOS BIOLOGY
PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 5 / 18
https://doi.org/10.1371/journal.pbio.3001322
To instigate this new approach, we recommend strategically intensifying research within a
few ocean regions. This entails collecting large scale data on host-associated microbes across
phylogenetically diverse sets of co-occurring host organisms, together with data on surround-
ing free-living microbes (i.e., in seawater and sediments) through time in areas where the sur-
rounding abiotic environment and community dynamics have been well characterized. A
regionally focused and coordinated approach will allow identifying environmental sources
and hosts that serve as reservoirs of key host-associated microbial taxa and genes. Long-term
investments in research on particular communities of hosts and microbes will also help estab-
lish links between changes in core microbiome composition, environmental factors, ecosystem
function, and resilience. Public archival of genomic data and samples (available for comple-
mentary analysis using emerging technologies) collected from a few intensively studied ocean
regions will foster transformative discoveries on dynamic host–microbe relationships. Habi-
tat-forming corals, sponges, seagrasses, and mangrove trees are important focal groups, since
breakdowns in the associations between these species and their microbiomes likely dispropor-
tionately influence other taxa and ecosystem functions. However, this should not come at the
expense of research on more inconspicuous and overlooked, yet functionally important taxa
that comprise the majority of the oceans’ biological diversity (e.g., small fish that fuel marine
food webs [33] and urchins and crustaceans that feed on algae that can displace corals [34]).
Systematic biases toward studying certain taxa (vertebrates, species with large body sizes,
charismatic fauna), partly caused by the lack of coordination, have clearly affected our under-
standing of the distribution and roles of host-associated microbes. For example, a recent
microbiome comparison of several Indo-Pacific invertebrate species demonstrated that
sponges have a less specific microbiome than had been assumed for many years [35]. Expand-
ing the taxonomic breadth of host–microbe studies will be most fruitful in areas where taxo-
nomically rigorous field guides, ecological survey data, and functional trait databases are
available. Substantial progress will also occur where phylogenetic relationships are known and
local expert taxonomists can be engaged. One of the numerous potential outcomes includes
building community-wide association matrices to unveil the extent of reliance between hosts
and microbial partners (specificity versus ubiquity, obligate versus facultative) and the interac-
tions that promote the stability of core microbiomes.
Role of microbes in host acclimatization and adaptation
Host-associated microbes can rapidly respond to extrinsic factors such as extreme or anoma-
lous environmental conditions (e.g., heatwaves, hypoxia), pathogens, anthropogenic distur-
bances (e.g., pollution, overfishing, aquaculture, invasive species), and acute and chronic
stressors [36,37]. They can also quickly change in response to factors intrinsic to the host (e.g.,
changes in host physiology [38]). The dynamic nature of microbes may provide a source of
ecological and evolutionary novelty to support potential host response mechanisms that aug-
ment the host’s own evolutionary potential. Host-associated microbial communities can shift
rapidly through the loss, gain, or replacement of individual members. Individual microbial
cells can make rapid physiological adjustments during their lifetime (plasticity) or within a few
generations (adaptation) [39] (Fig 2). In many microbes, relatively high rates of mutation and
exchange of genetic material among divergent lineages (through homologous recombination
and horizontal gene transfer) generate a high frequency of new genetic variants, some of
which may be better suited to novel conditions (Fig 2). These mechanisms contribute to fuel-
ing an immense standing pool of genetic variation that hosts can potentially draw upon. The
outcomes of the collective ecological and evolutionary response of hosts and their associated
microbes to environmental change may comprise 1 of 4 nonmutually exclusive scenarios
PLOS BIOLOGY
PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 6 / 18
https://doi.org/10.1371/journal.pbio.3001322
[40,41]: (1) Imbalance: a temporary or permanent change of host fitness and microbial func-
tions leading to increased disease susceptibility; (2) Resistance: the microbiome continues per-
forming its functions and the host does not lose or gain fitness; (3) Acclimatization: the newly
formed microbial community in conjunction with host phenotypic plasticity enable the indi-
vidual host to adjust and maintain performance under changing environmental conditions
(Fig 2); and (4) Adaptation: in the long term, newly formed interactions between host geno-
types and associated microbes increase the fitness of the symbiosis and they become heritable
(Fig 2).
The role that host-associated microbes play in their host’s response to environmental
change is also influenced by their mode of transmission (Fig 3). While vertical transmission
may help ensure the intergenerational stability of mutualistic symbioses, the dependence on
symbionts with highly simplified and inflexible genomes is a risky strategy under variable or
unpredictable stressful conditions [42,43]. Vertically transmitted symbionts have fewer oppor-
tunities to exchange genes with the vast pool of genetic diversity available in the external
Fig 2. Conceptual representation of the role of microbes in host acclimatization and adaptation. Microbes can frequently adapt to environmental
changes more rapidly than their host because of shorter generation times and higher standing genetic variation. Changes that occur at the levels of
individual microbes and microbiomes can rapidly generate phenotypic plasticity in a broad range of host traits (i.e., one host genotype expresses multiple
phenotypes induced by microbes). Microbially induced phenotypes may promote host adaptation if they become heritable traits. Within microbiomes,
transient microbes (thin dashed circles) have limited effects on host phenotype. On the other hand, core microbes (thick dashed circles) that engage in
prolonged relationships with hosts and potentially coevolve with hosts likely alter host phenotypes and promote host adaptation. Note that the time scale at
which evolutionary changes occur varies widely between organisms, but adaptation is generally slower than acclimatization. Plain line: nonaltered
interaction; dashed line: altered interaction; colors of microbes represent different microbial taxa.
https://doi.org/10.1371/journal.pbio.3001322.g002
PLOS BIOLOGY
PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 7 / 18
https://doi.org/10.1371/journal.pbio.3001322.g002
https://doi.org/10.1371/journal.pbio.3001322
environment, which could constrain the adjustment of these associations to rapidly changing
conditions. In the marine environment, the vast majority of mutualistic symbionts are
acquired horizontally from the surrounding environment or from other hosts [44]; this
includes associations where a host is entirely dependent upon a single or a few symbionts for
nutrition (e.g., tubeworms [45]; mussels [46]). Horizontal transmission has important implica-
tions for the adaptive potential of hosts [47]. The ability to acquire microbes and genes from
the surrounding environment allows hosts to access the huge evolutionary potential contained
within the larger microbial communities. Hosts with horizontally acquired microbes could
thus be better positioned to adjust and become resilient to changing environmental condi-
tions. Selection that maintains and fine-tunes the relationship could subsequently lead to adap-
tive genetic change.
Several key bottlenecks currently impede our understanding of how host-associated
microbes drive the initial response as well as long-term, evolutionary adaptation to climate
change–related disturbances in hosts with diverse microbial communities. First, changes in
microbiomes that confer adverse or beneficial outcomes for the host cannot be distinguished
from natural variability without adequate measures of host phenotypes that covary with fitness.
Unlike photosymbiotic organisms that exhibit quantifiable phenotypic responses to stress
Fig 3. The role of microbes in the host’s response to environmental changes is contingent upon their predominant mode of transmission.
Microbes that are present in the marine environment represent a vast pool of standing genetic variation. The majority of marine species with horizontal
(e.g., lucinid clams and snapping shrimps) or mixed mode of symbiont acquisition (e.g., sponges) interact with a large number of microbes that they
acquire during their lifetime. The ability to draw on this large evolutionary potential by switching microbes or gaining new genes potentially allows
hosts to respond rapidly to environmental changes. At the other end of the spectrum, the few marine hosts with strictly vertically transmitted symbionts
(e.g., flatworms) have less opportunity to exchange genes to rapidly adjust the symbiosis to changing conditions.
https://doi.org/10.1371/journal.pbio.3001322.g003
PLOS BIOLOGY
PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 8 / 18
https://doi.org/10.1371/journal.pbio.3001322.g003
https://doi.org/10.1371/journal.pbio.3001322
(e.g., using a bleaching index or symbiont density), the early signs of physiological stress are
difficult to observe and measure in the vast majority of marine host–microbe associations. Sec-
ond, studies are rarely designed to disentangle causes from effects. Before–after studies corre-
late seemingly altered microbial communities with perturbations or diseases, often without
establishing causality in the relationship [48,49]. Third, most research in this field has been
conducted over temporal scales that are not suited for understanding processes of acclimatiza-
tion and adaptation that may occur over months to decades [50]. Single or multistressor
laboratory experiments conducted over days to weeks are powerful means to identify environ-
mental thresholds beyond which the host–microbiome interactions become disrupted [51].
However, how experimental results can be extrapolated to understand the response of natural
systems exposed to ambient microbes and heterogeneous stressors in their natural environ-
ment remains unclear. Fourth, the response of host–microbe mutualistic symbioses to stress-
ors is partly shaped by the environmental conditions experienced during the lifetime of the
host and by previous generations, although that information is rarely considered or available.
For example, the susceptibility of corals to future environmental changes is partly contingent
upon changes in algal symbiont composition that occurred as a result of previous exposures
to temperature anomalies (i.e., symbiont shuffling in the controversial adaptive bleaching
hypothesis [52]). Therefore, the tolerance of hosts and their host-associated microbes to envi-
ronmental change is rarely interpretable without ecological context [53]. Finally, there is a
dearth of paired host and microbial genomes in public databases. The lack of population-wide
data relating traits of interest to host and microbial genomic variation at the individual level
(i.e., genome-wide association studies) limits our understanding of how genomic innovations
contribute to host acclimatization and adaptation [54].
Bolstering our understanding of the mechanisms of host–microbe evolution requires
investing resources into long-term multidisciplinary research on diverse communities of hosts
and microbes distributed across well-characterized environmental gradients. Rigorously
designed comparative population genomic studies and field experiments (e.g., reciprocal
transplants) combined with measures of host phenotypes using methods such as in situ imag-
ing [55], immunological assays [56], gene expression [57], metabolomic profiling [58], and
behavioral assays [59] will illuminate adaptive genetic variants, how they are transferred
among microbial strains across host communities, and their impacts upon host fitness.
Repeated through time, these measures will provide unique insights into how microbiome-
mediated phenotypic plasticity may allow hosts to rapidly accommodate to novel environ-
ments or resources (e.g., microbes allow some host individuals to obtain nutrients from novel
foods) through periodic (e.g., seasonal fluctuations) and transient environmental changes (e.g.,
heat waves). For foundational, long-lived, and large colonial host species, noninvasive methods
exist for repetitive sampling of tagged individuals (e.g., for corals [60]). The focus should also
expand beyond foundation species to include small, ecologically important host organisms
and those with life history strategies that make them particularly tractable for transgenera-
tional studies. This approach will only be fruitful if integrated measures of hosts and micro-
biomes are collected over multiple generations (i.e., beyond the time scale of a typical scientific
project), where physiochemical parameters are being monitored, and where the evolutionary
history of the local host fauna and flora is already well established. Targeted comparative
research can similarly leverage natural experiments that have played out over longer time
scales. Sudden discontinuities in the distribution of many closely related populations and
species have been linked to geological vicariant effects, sharp environmental gradients, or a
combination of both [61]. Organisms on opposing sides of dispersal barriers (sometimes
impassable) follow different evolutionary trajectories under the influence of local environmen-
tal conditions [62]. These systems provide unique historical contexts in which researchers can
PLOS BIOLOGY
PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 9 / 18
https://doi.org/10.1371/journal.pbio.3001322
generate testable hypotheses about the role that host-associated microbes played in the evolu-
tion of host traits. Signatures of convergent evolution, evident at the ecosystem-wide level (i.e.,
similar patterns observed across many hosts and symbionts that have been exposed to similar
selective pressures), likely reflect fundamental principles of adaptation [63].
Examples of natural experiments
Natural experiments are past events or gradients that allow researchers to explore biological
patterns and processes on spatial and temporal scales that far exceed those possible in the labo-
ratory. Natural experiments may or may not be created or altered by humans and have been
the bread and butter of natural historians, biogeographers, and evolutionary biologists for
decades. Building on this substantial body of conceptual work, we propose that natural experi-
ments can also enlighten our understanding of the evolution and ecology of host-associated
microbes and their hosts. We present examples of natural experiments where the outcomes of
complex interactions can be observed with replication to provide insights into the processes
underlying host–microbe evolution. Our examples focus on well-characterized systems where
host evolution has already been well explored, thereby allowing “tests” that approach the rigor
of laboratory experiments. We expect that studying natural experiments like these will allow
general principles of host–microbe evolution to emerge when repeated patterns are observed
within a system or across different systems.
Biogeography
The formation of the Isthmus of Panama presents an unparalleled opportunity for exploring
the roles of biogeographic isolation and environmental change in structuring host-associated
microbes (Fig 4). In the Miocene, populations of marine organisms and their microbial symbi-
onts moved freely between the Tropical Eastern Pacific (TEP) and Caribbean in a large, unified
tropical faunal province dominated by high primary productivity and seasonal upwelling [64].
Gradually, over millions of years, this shared faunal province became severed by uplift of the
Isthmus of Panama, which finally closed approximately 2.8 Ma (million years ago) [65]. The
Caribbean became nutrient poor, causing widespread extinction and a concurrent prolifera-
tion of coral reefs and immigration of new biotas [66]. In contrast, the TEP continued to expe-
rience strong seasonal upwelling and nutrient-rich conditions. In many cases, closely related
animal hosts diverged and followed separate evolutionary trajectories, adapting to the strongly
contrasting environments on opposite sides of the Isthmus. Presumably, their associated
microbiomes did so too. Today’s Caribbean and TEP marine ecosystems of Panama and
Central America are home to hundreds of sister species that emerged through transisthmian
vicariance, representing all major taxonomic groups. Decades of research have identified phy-
logenetic relationships between hosts, as well as the behavioral, physiological, and genetic
mechanisms involved in host divergence and reproductive isolation [65]. These data place
host-associated microbes into an unrivaled ecological and evolutionary framework.
Ocean gateways that remain open today also present unique attributes suitable for natural
experiments. The narrow Strait of Bab al Mandab connects the warm and saline semi-enclosed
Red Sea with the open and more variable Arabian Sea. The Red Sea is host to many endemic
species (5% to 13% endemic across a range of taxa [67]), while the pronounced seasonal varia-
tions in the Arabian Sea have driven fine-scale local adaptations [68]. Although the Mediterra-
nean has been connected to the Atlantic through the Strait of Gibraltar since the end of the
Messinian Salinity Crisis 5.3 Ma [69], the modern Mediterranean fauna bears the more recent
imprint of Pleistocene glacial and interglacial cycles. Temperature shifts in the basin over the
last 2 to 3 Million years dictated whether subtropical or higher latitude taxa could successfully
PLOS BIOLOGY
PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 10 / 18
https://doi.org/10.1371/journal.pbio.3001322
colonize the basin from the Atlantic and subsequent basin wide extinctions [70]. The historical
context of these ocean gateways and their impacts on gene flow have been explored in a myriad
of organisms ranging from plants to invertebrates, fish, and mammals.
Other important biogeographic regions characterized by unique environmental conditions,
long-term data collection, and good scientific infrastructure include the Great Barrier Reef
[71], the Baltic Sea [72], the Larsen B ice shelf [73], Ischia Island [74], and the French Polyne-
sian island of Moorea [75] (Fig 1). Extensive research networks such as the National Science
Foundation’s Long-Term Ecological Research (LTER) Network, the Smithsonian Institution’s
Marine Global Earth Observatory (MarineGEO) network of partners, and the Marine
Fig 4. Methodological approach to leveraging a natural experiment, the Isthmus of Panama, for the long-term study of host–microbe ecology and
evolution. Present-day organisms physically separated by the Isthmus of Panama are adapted to the distinct environmental conditions of the
productive TEP and the oligotrophic Caribbean. In the Gulf of Panama of the TEP, organisms experience some of the most drastic annual fluctuations
in temperature, pH, oxygen, salinity, and nutrients, due to intense seasonal upwelling. Conversely, the nearby Gulf of Chiriquı́ of the TEP experiences
weak to no upwelling due to trade winds being largely blocked by the Cordillera Central mountain range. Multidisciplinary and long-term research on
hosts and associated microbes across these environmental spatiotemporal gradients, where decades of taxonomic, ecological, and evolutionary research
can be leveraged, will help capture the dynamics of host–microbe interactions. TEP, Tropical Eastern Pacific.
https://doi.org/10.1371/journal.pbio.3001322.g004
PLOS BIOLOGY
PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 11 / 18
https://doi.org/10.1371/journal.pbio.3001322.g004
https://doi.org/10.1371/journal.pbio.3001322
Biodiversity Observation Network (MBON; Fig 1) are set to play a fundamental role in provid-
ing researchers with logistical access (field labs and sites) to these marine ecosystems and rig-
orously collected physicochemical and biological contextual data [via, for example, long-term
deployment of sondes (CTDs) and data loggers, standardized visual surveys, and other meth-
ods] at a global scale (Fig 1). The many examples of crucial long-term support networks typi-
cally overlook host-associated microbes. They can serve as a good model going forward or
they could be leveraged to facilitate comparative studies that map microbial variation across
communities of hosts from unique marine ecosystems to help us elucidate how host–microbe
associations adjust to changes in their environment at multiple temporal (from seasonal to
geological) and spatial scales (from local to biogeographical; Fig 4).
Emergence of volcanic islands
Novel habitats such as remote island archipelagos that formed over relatively recent geological
history also offer exceptional opportunities to study evolutionary processes in marine and ter-
restrial host-associated mutualistic microbes. Initially barren, shallow coastal areas were colo-
nized by marine organisms from neighboring areas that subsequently evolved in conditions
that are often drastically different from their native environments. Three archipelagos in par-
ticular, Hawai’i and Marquesas, located at the periphery of the Indo-Pacific region, and the
Galapagos in the TEP, have provided tremendous opportunities to study evolution through
comparative phylogeography (Fig 1). All three are composed of young islands (25 to 0.75 Ma,
5.5 to 0.4 Ma, and 3.2 to 0.05 Ma, respectively; reviewed in [76]) with high proportions of
endemic species (25.0% [77], 13.7% [78], and 13.6% [79] for fishes, respectively). The shallow
coastal habitats of the islands within these archipelagos were colonized sequentially by marine
species as they formed, resulting in a “progression” pattern whereby evolutionarily older line-
ages consistently occur on older islands [80]. These regions provide a unique historical context
for understanding the evolution of host-associated microbes and their roles in driving host
ecological success when new ecological opportunities emerge.
Ongoing human-induced changes
Marine communities are changing rapidly in the face of climate change and other anthropo-
genic activities [81]. The physicochemical parameters associated with the catastrophic changes
occurring over contemporary timescales are now relatively well characterized, but the effects
on most host-associated microbes are still virtually unknown [82]. Coral bleaching is a notable
exception. As host species and their associated microbes shift in distribution, they often face
novel abiotic and biotic conditions. For example, melting of ice is opening new pathways for
the movement of animals, plants, and microbes through the Arctic, from the North Pacific to
the North Atlantic, leading to one of the largest species invasions ever observed [83]. The grad-
ual increase in salinity caused by the expansion of the Panama Canal, along with predicted
increased runoff and evaporation, will likely result in greater movement of marine species
between the tropical Western Atlantic and the TEP [84] (Fig 4). Construction of the Suez
Canal in 1869 caused an influx of saline water into the Mediterranean that was followed by the
intrusion of invasive species from the subtropical Red Sea [85]. Rats introduced to islands of
the Chagos Archipelago precipitated a decline in bird density, thereby reducing the nitrogen
input on land and in the sea with downstream effects on coral reef productivity [86]. Finally,
many tropical species are expanding their distributions with the warming climate [87]. For
example, mangrove trees take advantage of the lower frequency of freezes to colonize salt
marshes [88], which allows many invertebrate and fish species to simultaneously expand their
ranges. Additional anthropogenic pressures stem from episodic or localized disasters such as
PLOS BIOLOGY
PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 12 / 18
https://doi.org/10.1371/journal.pbio.3001322
the 2010 Deepwater Horizon oil spill in the Gulf of Mexico [89], anoxic events (Bocas del Toro
[90]), sediment runoff events (Great Barrier Reef [91]), as well as water pollution and eutrophi-
cation around large urban centers such as Jakarta, Hong Kong, and Singapore [92] (Fig 1).
These anthropogenic changes provide multiple opportunities to understand how the rapid
evolutionary potential of host-associated microbes underpins adaptive evolution in hosts.
Conclusions
Understanding what changes in host-associated microbes mean for the maintenance of marine
communities and ecosystems requires measurements that go far beyond the typical life span of
a publicly funded scientific project. The integration of microbial sampling into long-term eco-
logical monitoring programs across key geographic locations will help us identify important
core and transient host-associated microbes and provide the fundamental basis for mechanis-
tic studies. Researchers should focus on the vast majority of marine animals and plants that
are able to interchange microbial partners, genes, and functions with surrounding microbial
communities. The future of marine ecosystems around the globe may in part depend upon
the ability of marine organisms to dip into the enormous pool of microbes and harness their
remarkable genetic potential.
Acknowledgments
We thank the staff of the Smithsonian Bocas del Toro Research Station, Rachel Collin, Jennifer
McMillan, and Patricia Leiro for helping with the logistics of the #istmobiome workshop
(December 9 to 13, 2019, Bocas del Toro) during which some of these ideas were discussed.
We thank Kendall D. Clements (ORCID: 0000-0001-8512-5977), A. Murat Eren (ORCID:
0000-0001-9013-4827), Niko Leisch (ORCID: 0000-0001-7375-3749), J. Patrick Megonigal
(ORCID: 0000-0002-2018-7883), Luis C. Mejı́a (ORCID: 0000-0003-2135-5241), Emilia M.
Sogin (ORCID: 0000-0001-7533-3705), and Blake Ushijima (ORCID: 0000-0002-1053-5207)
for participating in the discussions. Illustrations by Natalie Renier (http://nrenier.com/),
Woods Hole Oceanographic Institution.
References
1. Chakravarti LJ, Beltran VH, van Oppen MJH. Rapid thermal adaptation in photosymbionts of reef-build-
ing corals. Glob Chang Biol. 2017; 23:4675–4688. https://doi.org/10.1111/gcb.13702 PMID: 28447372
2. van Oppen MJH, Bongaerts P, Frade P, Peplow LM, Boyd SE, Nim HT, et al. Adaptation to reef habitats
through selection on the coral animal and its associated microbiome. Mol Ecol. 2018; 27:2956–2971.
https://doi.org/10.1111/mec.14763 PMID: 29900626
3. Rosado PM, Leite DCA, Duarte GAS, Chaloub RM, Jospin G, Nunes da Rocha U, et al. Marine probiot-
ics: increasing coral resistance to bleaching through microbiome manipulation. ISME J. 2019; 13:921–
936. https://doi.org/10.1038/s41396-018-0323-6 PMID: 30518818
4. Voolstra CR, Ziegler M. Adapting with microbial help: microbiome flexibility facilitates rapid responses
to environmental change. BioEssays. 2020; 42:2000004. https://doi.org/10.1002/bies.202000004
PMID: 32548850
5. Cohen ML, Mashanova EV, Rosen NM, Soto W. Adaptation to temperature stress by Vibrio fischeri
facilitates this microbe’s symbiosis with the Hawaiian bobtail squid (Euprymna scolopes). Evolution.
2019; 73:1885–1897. https://doi.org/10.1111/evo.13819 PMID: 31397886
6. Cornejo-Granados F, Gallardo-Becerra L, Leonardo-Reza M, Ochoa-Romo JP, Ochoa-Leyva A. A
meta-analysis reveals the environmental and host factors shaping the structure and function of the
shrimp microbiota. PeerJ. 2018; 6:e5382. https://doi.org/10.7717/peerj.5382 PMID: 30128187
7. Huggett MJ, Apprill A. Coral microbiome database: integration of sequences reveals high diversity and
relatedness of coral-associated microbes. Environ Microbiol Rep. 2019; 11:372–385. https://doi.org/10.
1111/1758-2229.12686 PMID: 30094953
PLOS BIOLOGY
PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 13 / 18
http://nrenier.com/
https://doi.org/10.1111/gcb.13702
http://www.ncbi.nlm.nih.gov/pubmed/28447372
https://doi.org/10.1111/mec.14763
http://www.ncbi.nlm.nih.gov/pubmed/29900626
https://doi.org/10.1038/s41396-018-0323-6
http://www.ncbi.nlm.nih.gov/pubmed/30518818
https://doi.org/10.1002/bies.202000004
http://www.ncbi.nlm.nih.gov/pubmed/32548850
https://doi.org/10.1111/evo.13819
http://www.ncbi.nlm.nih.gov/pubmed/31397886
https://doi.org/10.7717/peerj.5382
http://www.ncbi.nlm.nih.gov/pubmed/30128187
https://doi.org/10.1111/1758-2229.12686
https://doi.org/10.1111/1758-2229.12686
http://www.ncbi.nlm.nih.gov/pubmed/30094953
https://doi.org/10.1371/journal.pbio.3001322
8. Sullam KE, Essinger SD, Lozupone CA, O’Connor MP, Rosen GL, Knight R, et al. Environmental and
ecological factors that shape the gut bacterial communities of fish: a meta-analysis. Mol Ecol. 2012;
21:3363–3378. https://doi.org/10.1111/j.1365-294X.2012.05552.x PMID: 22486918
9. Thomas T, Moitinho-Silva L, Lurgi M, Björk JR, Easson C, Astudillo-Garcı́a C, et al. Diversity, structure
and convergent evolution of the global sponge microbiome. Nat Commun. 2016; 7:11870. https://doi.
org/10.1038/ncomms11870 PMID: 27306690
10. Lozupone CA, Stombaugh J, Gonzalez A, Ackermann G, Wendel D, Vázquez-Baeza Y, et al. Meta-
analyses of studies of the human microbiota. Genome Res. 2013; 23:1704–1714. https://doi.org/10.
1101/gr.151803.112 PMID: 23861384
11. Antwis RE, Griffiths SM, Harrison XA, Aranega-Bou P, Arce A, Bettridge AS, et al. Fifty important
research questions in microbial ecology. FEMS Microbiol Ecol. 2017; 93:fix044. https://doi.org/10.1093/
femsec/fix044 PMID: 28379446
12. Cullen CM, Aneja KK, Beyhan S, Cho CE, Woloszynek S, Convertino M, et al. Emerging priorities for
microbiome research. Front Microbiol. 2020; 11:136be. https://doi.org/10.3389/fmicb.2020.00136
PMID: 32140140
13. Wilkins LGE, Leray M, O’Dea A, Yuen B, Peixoto RS, Pereira TJ, et al. Host-associated microbiomes
drive structure and function of marine ecosystems. PLoS Biology. 2019; 17:e3000533. https://doi.org/
10.1371/journal.pbio.3000533 PMID: 31710600
14. Sagarin R, Pauchard A. Observational approaches in ecology open new ground in a changing world.
Front Ecol Environ. 2010; 8:379–386. https://doi.org/10.1890/090001
15. Barley SC, Meeuwig JJ. The power and the pitfalls of large-scale, unreplicated natural experiments.
Ecosystems. 2017; 20:331–339. https://doi.org/10.1007/s10021-016-0028-5
16. Hewitt JE, Thrush SF, Dayton PK, Bonsdorff E. The effect of spatial and temporal heterogeneity on the
design and analysis of empirical studies of scale-dependent systems. Am Nat. 2007; 169:398–408.
https://doi.org/10.1086/510925 PMID: 17243075
17. Douglas AE. Housing microbial symbionts: evolutionary origins and diversification of symbiotic organs
in animals. Philos Trans R Soc Lond B Biol Sci. 2020; 375:20190603. https://doi.org/10.1098/rstb.2019.
0603 PMID: 32772661
18. Foster KR, Schluter J, Coyte KZ, Rakoff-Nahoum S. The evolution of the host microbiome as an eco-
system on a leash. Nature. 2017; 548:43–51. https://doi.org/10.1038/nature23292 PMID: 28770836
19. McLaren MR, Callahan BJ. Pathogen resistance may be the principal evolutionary advantage provided
by the microbiome. Philos Trans R Soc Lond B Biol Sci. 2020; 375:20190592. https://doi.org/10.1098/
rstb.2019.0592 PMID: 32772671
20. Shade A, Handelsman J. Beyond the Venn diagram: the hunt for a core microbiome. Environ Microbiol.
2012; 14:4–12. https://doi.org/10.1111/j.1462-2920.2011.02585.x PMID: 22004523
21. Bright M, Bulgheresi S. A complex journey: transmission of microbial symbionts. Nat Rev Microbiol.
2010; 8:218–230. https://doi.org/10.1038/nrmicro2262 PMID: 20157340
22. Moran NA, McCutcheon JP, Nakabachi A. Genomics and evolution of heritable bacterial symbionts.
Annu Rev Genet. 2008; 42:165–190. https://doi.org/10.1146/annurev.genet.41.110306.130119 PMID:
18983256
23. Chomicki G, Werner GDA, West SA, Kiers ET. Compartmentalization drives the evolution of symbiotic
cooperation. Philos Trans R Soc Lond B Biol Sci. 2020; 375:20190602. https://doi.org/10.1098/rstb.
2019.0602 PMID: 32772665
24. van Oppen MJH, Medina M. Coral evolutionary responses to microbial symbioses. Philos Trans R Soc
Lond B Biol Sci. 2020; 375:20190591. https://doi.org/10.1098/rstb.2019.0591 PMID: 32772672
25. Clavijo JM, Donath A, Serôdio J, Christa G. Polymorphic adaptations in metazoans to establish and
maintain photosymbioses. Biol Rev Camb Philos Soc. 2018; 93:2006–2020. https://doi.org/10.1111/
brv.12430 PMID: 29808579
26. Dubilier N, Bergin C, Lott C. Symbiotic diversity in marine animals: the art of harnessing chemosynthe-
sis. Nat Rev Microbiol. 2008; 6:725–740. https://doi.org/10.1038/nrmicro1992 PMID: 18794911
27. Risely A. Applying the core microbiome to understand host–microbe systems. J Anim Ecol. 2020;
89:1549–1558. https://doi.org/10.1111/1365-2656.13229 PMID: 32248522
28. Astudillo-Garcı́a C, Bell JJ, Webster NS, Glasl B, Jompa J, Montoya JM, et al. Evaluating the core
microbiota in complex communities: a systematic investigation. Environ Microbiol. 2017; 19:1450–
1462. https://doi.org/10.1111/1462-2920.13647 PMID: 28078754
29. Shade A, Stopnisek N. Abundance-occupancy distributions to prioritize plant core microbiome
membership. Curr Opin Microbiol. 2019; 49:50–58. https://doi.org/10.1016/j.mib.2019.09.008 PMID:
31715441
PLOS BIOLOGY
PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 14 / 18
https://doi.org/10.1111/j.1365-294X.2012.05552.x
http://www.ncbi.nlm.nih.gov/pubmed/22486918
https://doi.org/10.1038/ncomms11870
https://doi.org/10.1038/ncomms11870
http://www.ncbi.nlm.nih.gov/pubmed/27306690
https://doi.org/10.1101/gr.151803.112
https://doi.org/10.1101/gr.151803.112
http://www.ncbi.nlm.nih.gov/pubmed/23861384
https://doi.org/10.1093/femsec/fix044
https://doi.org/10.1093/femsec/fix044
http://www.ncbi.nlm.nih.gov/pubmed/28379446
https://doi.org/10.3389/fmicb.2020.00136
http://www.ncbi.nlm.nih.gov/pubmed/32140140
https://doi.org/10.1371/journal.pbio.3000533
https://doi.org/10.1371/journal.pbio.3000533
http://www.ncbi.nlm.nih.gov/pubmed/31710600
https://doi.org/10.1890/090001
https://doi.org/10.1007/s10021-016-0028-5
https://doi.org/10.1086/510925
http://www.ncbi.nlm.nih.gov/pubmed/17243075
https://doi.org/10.1098/rstb.2019.0603
https://doi.org/10.1098/rstb.2019.0603
http://www.ncbi.nlm.nih.gov/pubmed/32772661
https://doi.org/10.1038/nature23292
http://www.ncbi.nlm.nih.gov/pubmed/28770836
https://doi.org/10.1098/rstb.2019.0592
https://doi.org/10.1098/rstb.2019.0592
http://www.ncbi.nlm.nih.gov/pubmed/32772671
https://doi.org/10.1111/j.1462-2920.2011.02585.x
http://www.ncbi.nlm.nih.gov/pubmed/22004523
https://doi.org/10.1038/nrmicro2262
http://www.ncbi.nlm.nih.gov/pubmed/20157340
https://doi.org/10.1146/annurev.genet.41.110306.130119
http://www.ncbi.nlm.nih.gov/pubmed/18983256
https://doi.org/10.1098/rstb.2019.0602
https://doi.org/10.1098/rstb.2019.0602
http://www.ncbi.nlm.nih.gov/pubmed/32772665
https://doi.org/10.1098/rstb.2019.0591
http://www.ncbi.nlm.nih.gov/pubmed/32772672
https://doi.org/10.1111/brv.12430
https://doi.org/10.1111/brv.12430
http://www.ncbi.nlm.nih.gov/pubmed/29808579
https://doi.org/10.1038/nrmicro1992
http://www.ncbi.nlm.nih.gov/pubmed/18794911
https://doi.org/10.1111/1365-2656.13229
http://www.ncbi.nlm.nih.gov/pubmed/32248522
https://doi.org/10.1111/1462-2920.13647
http://www.ncbi.nlm.nih.gov/pubmed/28078754
https://doi.org/10.1016/j.mib.2019.09.008
http://www.ncbi.nlm.nih.gov/pubmed/31715441
https://doi.org/10.1371/journal.pbio.3001322
30. Clever F, Sourisse JM, Preziosi RF, Eisen JA, Rodriguez Guerra EC, Scott JJ, et al. The gut micro-
biome stability of a butterflyfish is disrupted on severely degraded Caribbean coral reefs. bioRxiv. 2020;
https://doi.org/10.1101/2020.09.21.306712
31. Zhang C, Derrien M, Levenez F, Brazeilles R, Ballal SA, Kim J, et al. Ecological robustness of the gut
microbiota in response to ingestion of transient food-borne microbes. ISME J. 2016; 10:2235–2245.
https://doi.org/10.1038/ismej.2016.13 PMID: 26953599
32. Sharp KH, Pratte ZA, Kerwin AH, Rotjan RD, Stewart FJ. Season, but not symbiont state, drives micro-
biome structure in the temperate coral Astrangia poculata. Microbiome. 2017; 5:120. https://doi.org/10.
1186/s40168-017-0329-8 PMID: 28915923
33. Brandl SJ, Tornabene L, Goatley CHR, Casey JM, Morais RA, Côté IM, et al. Demographic dynamics of
the smallest marine vertebrates fuel coral reef ecosystem functioning. Science. 2019; 364:1189–1192.
https://doi.org/10.1126/science.aav3384 PMID: 31123105
34. Kuempel CD, Altieri AH. The emergent role of small-bodied herbivores in pre-empting phase shifts on
degraded coral reefs. Sci Rep. 2017; 7:39670. https://doi.org/10.1038/srep39670 PMID: 28054550
35. Cleary DFR, Swierts T, Coelho FJRC, Polónia ARM, Huang YM, Ferreira MRS, et al. The sponge
microbiome within the greater coral reef microbial metacommunity. Nat Commun. 2019; 10:1–12.
36. Brothers CJ, Van Der Pol WJ, Morrow CD, Hakim JA, Koo H, McClintock JB. Ocean warming alters pre-
dicted microbiome functionality in a common sea urchin. Proc R Soc B. 2018; 285:20180340. https://
doi.org/10.1098/rspb.2018.0340 PMID: 29925614
37. Cavalcanti GS, Shukla P, Morris M, Ribeiro B, Foley M, Doane MP, et al. Rhodoliths holobionts in a
changing ocean: host-microbes interactions mediate coralline algae resilience under ocean acidifica-
tion. BMC Genomics. 2018; 19:701. https://doi.org/10.1186/s12864-018-5064-4 PMID: 30249182
38. Alverdy JC, Luo JN. The influence of host stress on the mechanism of infection: lost microbiomes,
emergent pathobiomes, and the role of interkingdom signaling. Front Microbiol. 2017; 8:322. https://doi.
org/10.3389/fmicb.2017.00322 PMID: 28303126
39. Brooks AN, Turkarslan S, Beer KD, Lo FY, Baliga NS. Adaptation of cells to new environments. Wiley
Interdiscip Rev Syst Biol Med. 2011; 3:544–561. https://doi.org/10.1002/wsbm.136 PMID: 21197660
40. Pita L, Rix L, Slaby BM, Franke A, Hentschel U. The sponge holobiont in a changing ocean: from
microbes to ecosystems. Microbiome. 2018; 6:46. https://doi.org/10.1186/s40168-018-0428-1 PMID:
29523192
41. Apprill A. The role of symbioses in the adaptation and stress responses of marine organisms. Ann Rev
Mar Sci. 2020; 12:291–314. https://doi.org/10.1146/annurev-marine-010419-010641 PMID: 31283425
42. Kikuchi Y, Tada A, Musolin DL, Hari N, Hosokawa T, Fujisaki K, et al. Collapse of insect gut symbiosis
under simulated climate change. mBio. 2016; 7:e01578–16. https://doi.org/10.1128/mBio.01578-16
PMID: 27703075
43. Zhang B, Leonard SP, Li Y, Moran NA. Obligate bacterial endosymbionts limit thermal tolerance of
insect host species. Proc Natl Acad Sci USA. 2019; 116:24712–24718. https://doi.org/10.1073/pnas.
1915307116 PMID: 31740601
44. Russell SL. Transmission mode is associated with environment type and taxa across bacteria-eukary-
ote symbioses: a systematic review and meta-analysis. FEMS Microbiol Lett. 2019; 366:fnz013. https://
doi.org/10.1093/femsle/fnz013 PMID: 30649338
45. Nussbaumer AD, Fisher CR, Bright M. Horizontal endosymbiont transmission in hydrothermal vent
tubeworms. Nature. 2006; 441:345–348. https://doi.org/10.1038/nature04793 PMID: 16710420
46. Salerno JL, Macko SA, Hallam SJ, Bright M, Won Y-J, McKiness Z, et al. Characterization of symbiont
populations in life-history stages of mussels from chemosynthetic environments. Biol Bull. 2005;
208:145–155. https://doi.org/10.2307/3593123 PMID: 15837964
47. Eberhard WG. Evolution in bacterial plasmids and levels of selection. Q Rev Biol. 1990; 65:3–22.
https://doi.org/10.1086/416582 PMID: 2186429
48. Hooks KB, O’Malley MA. Dysbiosis and its discontents. mBio. 2017; 8:e01492–17str. https://doi.org/10.
1128/mBio.01492-17 PMID: 29018121
49. Relman DA. Thinking about the microbiome as a causal factor in human health and disease: philosophi-
cal and experimental considerations. Curr Opin Microbiol. 2020; 54:119–126. https://doi.org/10.1016/j.
mib.2020.01.018 PMID: 32114367
50. Bénard A, Vavre F, Kremer N. Stress & symbiosis: heads or tails? Front Ecol Evol. 2020; 8:167. https://
doi.org/10.3389/fevo.2020.00167
51. Maher RL, Rice MM, McMinds R, Burkepile DE, Vega Thurber R. Multiple stressors interact primarily
through antagonism to drive changes in the coral microbiome. Sci Rep. 2019; 9:6834. https://doi.org/
10.1038/s41598-019-43274-8 PMID: 31048787
PLOS BIOLOGY
PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 15 / 18
https://doi.org/10.1101/2020.09.21.306712
https://doi.org/10.1038/ismej.2016.13
http://www.ncbi.nlm.nih.gov/pubmed/26953599
https://doi.org/10.1186/s40168-017-0329-8
https://doi.org/10.1186/s40168-017-0329-8
http://www.ncbi.nlm.nih.gov/pubmed/28915923
https://doi.org/10.1126/science.aav3384
http://www.ncbi.nlm.nih.gov/pubmed/31123105
https://doi.org/10.1038/srep39670
http://www.ncbi.nlm.nih.gov/pubmed/28054550
https://doi.org/10.1098/rspb.2018.0340
https://doi.org/10.1098/rspb.2018.0340
http://www.ncbi.nlm.nih.gov/pubmed/29925614
https://doi.org/10.1186/s12864-018-5064-4
http://www.ncbi.nlm.nih.gov/pubmed/30249182
https://doi.org/10.3389/fmicb.2017.00322
https://doi.org/10.3389/fmicb.2017.00322
http://www.ncbi.nlm.nih.gov/pubmed/28303126
https://doi.org/10.1002/wsbm.136
http://www.ncbi.nlm.nih.gov/pubmed/21197660
https://doi.org/10.1186/s40168-018-0428-1
http://www.ncbi.nlm.nih.gov/pubmed/29523192
https://doi.org/10.1146/annurev-marine-010419-010641
http://www.ncbi.nlm.nih.gov/pubmed/31283425
https://doi.org/10.1128/mBio.01578-16
http://www.ncbi.nlm.nih.gov/pubmed/27703075
https://doi.org/10.1073/pnas.1915307116
https://doi.org/10.1073/pnas.1915307116
http://www.ncbi.nlm.nih.gov/pubmed/31740601
https://doi.org/10.1093/femsle/fnz013
https://doi.org/10.1093/femsle/fnz013
http://www.ncbi.nlm.nih.gov/pubmed/30649338
https://doi.org/10.1038/nature04793
http://www.ncbi.nlm.nih.gov/pubmed/16710420
https://doi.org/10.2307/3593123
http://www.ncbi.nlm.nih.gov/pubmed/15837964
https://doi.org/10.1086/416582
http://www.ncbi.nlm.nih.gov/pubmed/2186429
https://doi.org/10.1128/mBio.01492-17
https://doi.org/10.1128/mBio.01492-17
http://www.ncbi.nlm.nih.gov/pubmed/29018121
https://doi.org/10.1016/j.mib.2020.01.018
https://doi.org/10.1016/j.mib.2020.01.018
http://www.ncbi.nlm.nih.gov/pubmed/32114367
https://doi.org/10.3389/fevo.2020.00167
https://doi.org/10.3389/fevo.2020.00167
https://doi.org/10.1038/s41598-019-43274-8
https://doi.org/10.1038/s41598-019-43274-8
http://www.ncbi.nlm.nih.gov/pubmed/31048787
https://doi.org/10.1371/journal.pbio.3001322
52. Baker AC. Reef corals bleach to survive change. Nature. 2001; 411:765–766. https://doi.org/10.1038/
35081151 PMID: 11459046
53. Roach TNF, Dilworth J, H CM, Jones AD, Quinn RA, Drury C. Metabolomic signatures of coral bleach-
ing history. Nat Ecol Evol. 2021;1–9.
54. Awany D, Allali I, Dalvie S, Hemmings S, Mwaikono KS, Thomford NE, et al. Host and microbiome
genome-wide association studies: current state and challenges. Front Genet. 2019; 9:637. https://doi.
org/10.3389/fgene.2018.00637 PMID: 30723493
55. Geier B, Sogin EM, Michellod D, Janda M, Kompauer M, Spengler B, et al. Spatial metabolomics of in
situ host–microbe interactions at the micrometre scale. Nat Microbiol. 2020; 5:498–510. https://doi.org/
10.1038/s41564-019-0664-6 PMID: 32015496
56. Lozupone CA. Unraveling interactions between the microbiome and the host immune system to deci-
pher mechanisms of disease. mSystems. 2018; 3:e00183–17. https://doi.org/10.1128/mSystems.
00183-17 PMID: 29556546
57. Strader ME, Wong JM, Hofmann GE. Ocean acidification promotes broad transcriptomic responses in
marine metazoans: a literature survey. Front Zool. 2020; 17:7. https://doi.org/10.1186/s12983-020-
0350-9 PMID: 32095155
58. Galtier d’Auriac I, Quinn RA, Maughan H, Nothias L-F, Little M, Kapono CA, et al. Before platelets: the
production of platelet-activating factor during growth and stress in a basal marine organism. Proc R Soc
B. 2018; 285:20181307. https://doi.org/10.1098/rspb.2018.1307 PMID: 30111600
59. Vuong HE, Yano JM, Fung TC, Hsiao EY. The microbiome and host behavior. Annu Rev Neurosci.
2017; 40:21–49. https://doi.org/10.1146/annurev-neuro-072116-031347 PMID: 28301775
60. Greene A, Leggat W, Donahue MJ, Raymundo LJ, Caldwell JM, Moriarty T, et al. Complementary sam-
pling methods for coral histology, metabolomics and microbiome. Methods Ecol Evol. 2020; 11:1012–
1020. https://doi.org/10.1111/2041-210X.13431
61. Bowen BW, Gaither MR, DiBattista JD, Iacchei M, Andrews KR, Grant WS, et al. Comparative phylo-
geography of the ocean planet. Proc Natl Acad Sci USA. 2016; 113:7962–7969. https://doi.org/10.
1073/pnas.1602404113 PMID: 27432963
62. Lessios HA. The Great American schism: divergence of marine organisms after the rise of the Central
American Isthmus. Annu Rev Ecol Evol Syst. 2008; 39:63–91. https://doi.org/10.1146/annurev.ecolsys.
38.091206.095815
63. Sheppard SK, Guttman DS, Fitzgerald JR. Population genomics of bacterial host adaptation. Nat Rev
Genet. 2018; 19:549–565. https://doi.org/10.1038/s41576-018-0032-z PMID: 29973680
64. O’Dea A, Jackson JBC, Fortunato H, Smith JT, D’Croz L, Johnson KG, et al. Environmental change pre-
ceded Caribbean extinction by 2 million years. Proc Natl Acad Sci USA. 2007; 104:5501–5506. https://
doi.org/10.1073/pnas.0610947104 PMID: 17369359
65. O’Dea A, Lessios HA, Coates AG, Eytan RI, Restrepo-Moreno SA, Cione AL, et al. Formation of the
Isthmus of Panama. Sci Adv. 2016; 2:e1600883. https://doi.org/10.1126/sciadv.1600883 PMID:
27540590
66. O’Dea A, Jackson J. Environmental change drove macroevolution in cupuladriid bryozoans. Proc R
Soc B. 2009; 276:3629–3634. https://doi.org/10.1098/rspb.2009.0844 PMID: 19640882
67. DiBattista JD, Roberts MB, Bouwmeester J, Bowen BW, Coker DJ, Lozano-Cortés DF, et al. A review
of contemporary patterns of endemism for shallow water reef fauna in the Red Sea. J Biogeogr. 2016;
43:423–439. https://doi.org/10.1111/jbi.12649
68. DiBattista JD, Saenz-Agudelo P, Piatek MJ, Cagua EF, Bowen BW, Choat JH, et al. Population geno-
mic response to geographic gradients by widespread and endemic fishes of the Arabian Peninsula.
Ecol Evol. 2020; 10:4314–4330. https://doi.org/10.1002/ece3.6199 PMID: 32489599
69. Garcia-Castellanos D, Estrada F, Jiménez-Munt I, Gorini C, Fernàndez M, Vergés J, et al. Catastrophic
flood of the Mediterranean after the Messinian salinity crisis. Nature. 2009; 462:778–781. https://doi.
org/10.1038/nature08555 PMID: 20010684
70. Patarnello T, Volckaert F a. MJ, Castilho R. Pillars of Hercules: is the Atlantic–Mediterranean transition
a phylogeographical break? Mol Ecol. 2007; 16:4426–4444. https://doi.org/10.1111/j.1365-294X.2007.
03477.x PMID: 17908222
71. De’ath G, Fabricius KE, Sweatman H, Puotinen M. The 27-year decline of coral cover on the Great Bar-
rier Reef and its causes. Proc Natl Acad Sci USA. 2012; 109:17995–17999. https://doi.org/10.1073/
pnas.1208909109 PMID: 23027961
72. Herlemann DP, Labrenz M, Jürgens K, Bertilsson S, Waniek JJ, Andersson AF. Transitions in bacterial
communities along the 2000 km salinity gradient of the Baltic Sea. ISME J. 2011; 5:1571–1579. https://
doi.org/10.1038/ismej.2011.41 PMID: 21472016
PLOS BIOLOGY
PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 16 / 18
https://doi.org/10.1038/35081151
https://doi.org/10.1038/35081151
http://www.ncbi.nlm.nih.gov/pubmed/11459046
https://doi.org/10.3389/fgene.2018.00637
https://doi.org/10.3389/fgene.2018.00637
http://www.ncbi.nlm.nih.gov/pubmed/30723493
https://doi.org/10.1038/s41564-019-0664-6
https://doi.org/10.1038/s41564-019-0664-6
http://www.ncbi.nlm.nih.gov/pubmed/32015496
https://doi.org/10.1128/mSystems.00183-17
https://doi.org/10.1128/mSystems.00183-17
http://www.ncbi.nlm.nih.gov/pubmed/29556546
https://doi.org/10.1186/s12983-020-0350-9
https://doi.org/10.1186/s12983-020-0350-9
http://www.ncbi.nlm.nih.gov/pubmed/32095155
https://doi.org/10.1098/rspb.2018.1307
http://www.ncbi.nlm.nih.gov/pubmed/30111600
https://doi.org/10.1146/annurev-neuro-072116-031347
http://www.ncbi.nlm.nih.gov/pubmed/28301775
https://doi.org/10.1111/2041-210X.13431
https://doi.org/10.1073/pnas.1602404113
https://doi.org/10.1073/pnas.1602404113
http://www.ncbi.nlm.nih.gov/pubmed/27432963
https://doi.org/10.1146/annurev.ecolsys.38.091206.095815
https://doi.org/10.1146/annurev.ecolsys.38.091206.095815
https://doi.org/10.1038/s41576-018-0032-z
http://www.ncbi.nlm.nih.gov/pubmed/29973680
https://doi.org/10.1073/pnas.0610947104
https://doi.org/10.1073/pnas.0610947104
http://www.ncbi.nlm.nih.gov/pubmed/17369359
https://doi.org/10.1126/sciadv.1600883
http://www.ncbi.nlm.nih.gov/pubmed/27540590
https://doi.org/10.1098/rspb.2009.0844
http://www.ncbi.nlm.nih.gov/pubmed/19640882
https://doi.org/10.1111/jbi.12649
https://doi.org/10.1002/ece3.6199
http://www.ncbi.nlm.nih.gov/pubmed/32489599
https://doi.org/10.1038/nature08555
https://doi.org/10.1038/nature08555
http://www.ncbi.nlm.nih.gov/pubmed/20010684
https://doi.org/10.1111/j.1365-294X.2007.03477.x
https://doi.org/10.1111/j.1365-294X.2007.03477.x
http://www.ncbi.nlm.nih.gov/pubmed/17908222
https://doi.org/10.1073/pnas.1208909109
https://doi.org/10.1073/pnas.1208909109
http://www.ncbi.nlm.nih.gov/pubmed/23027961
https://doi.org/10.1038/ismej.2011.41
https://doi.org/10.1038/ismej.2011.41
http://www.ncbi.nlm.nih.gov/pubmed/21472016
https://doi.org/10.1371/journal.pbio.3001322
73. Convey P, Chown SL, Clarke A, Barnes DKA, Bokhorst S, Cummings V, et al. The spatial structure of
Antarctic biodiversity. Ecol Monogr. 2014; 84:203–244. https://doi.org/10.1890/12-2216.1
74. Hall-Spencer JM, Rodolfo-Metalpa R, Martin S, Ransome E, Fine M, Turner SM, et al. Volcanic carbon
dioxide vents show ecosystem effects of ocean acidification. Nature. 2008; 454:96–99. https://doi.org/
10.1038/nature07051 PMID: 18536730
75. McCliment EA, Nelson CE, Carlson CA, Alldredge AL, Witting J, Amaral-Zettler LA. An all-taxon micro-
bial inventory of the Moorea coral reef ecosystem. ISME J. 2012; 6:309–319. https://doi.org/10.1038/
ismej.2011.108 PMID: 21900967
76. Neall VE, Trewick SA. The age and origin of the Pacific islands: a geological overview. Philos Trans R
Soc Lond B Biol Sci. 2008; 363:3293–3308. https://doi.org/10.1098/rstb.2008.0119 PMID: 18768382
77. Randall JE. Reef and Shore Fishes of the Hawaiian Islands. Sea Grant College Program, University of
Hawai‘i; 2007.
78. Delrieu-Trottin E, Williams JT, Bacchet P, Kulbicki M, Mourier J, Galzin R, et al. Shore fishes of the Mar-
quesas Islands, an updated checklist with new records and new percentage of endemic species. Check
List. 2015; 11:1758. https://doi.org/10.15560/11.5.1758
79. McCosker JE, Rosenblatt RH. The fishes of the Galápagos Archipelago: an update. Proc Calif Acad
Sci. 2010; 61:167–195.
80. Shaw KL, Gillespie RG. Comparative phylogeography of oceanic archipelagos: hotspots for inferences
of evolutionary process. Proc Natl Acad Sci USA. 2016; 113:7986–7993. https://doi.org/10.1073/pnas.
1601078113 PMID: 27432948
81. Duarte CM, Agusti S, Barbier E, Britten GL, Castilla JC, Gattuso J-P, et al. Rebuilding marine life.
Nature. 2020; 580:39–51. https://doi.org/10.1038/s41586-020-2146-7 PMID: 32238939
82. Cavicchioli R, Ripple WJ, Timmis KN, Azam F, Bakken LR, Baylis M, et al. Scientists’ warning to
humanity: microorganisms and climate change. Nat Rev Microbiol. 2019; 17:569–586. https://doi.org/
10.1038/s41579-019-0222-5 PMID: 31213707
83. VanWormer E, Mazet J a. K, Hall A, Gill VA, Boveng PL, London JM, et al. Viral emergence in marine
mammals in the North Pacific may be linked to Arctic sea ice reduction. Sci Rep. 2019; 9:15569. https://
doi.org/10.1038/s41598-019-51699-4 PMID: 31700005
84. Salgado J, Vélez MI, González-Arango C, Rose NL, Yang H, Huguet C, et al. A century of limnologi-
cal evolution and interactive threats in the Panama Canal: long-term assessments from a shallow
basin. Sci Total Environ. 2020; 729:138444. https://doi.org/10.1016/j.scitotenv.2020.138444 PMID:
32380321
85. Albano PG, Steger J, Bošnjak M, Dunne B, Guifarro Z, Turapova E, et al. Native biodiversity collapse in
the eastern Mediterranean. Proc R Soc B. 2021; 288:20202469. https://doi.org/10.1098/rspb.2020.
2469 PMID: 33402072
86. Graham NAJ, Wilson SK, Carr P, Hoey AS, Jennings S, MacNeil MA. Seabirds enhance coral reef pro-
ductivity and functioning in the absence of invasive rats. Nature. 2018; 559:250–253. https://doi.org/10.
1038/s41586-018-0202-3 PMID: 29995864
87. Wernberg T, Bennett S, Babcock RC, de Bettignies T, Cure K, Depczynski M, et al. Climate-driven
regime shift of a temperate marine ecosystem. Science. 2016; 353:169–172. https://doi.org/10.1126/
science.aad8745 PMID: 27387951
88. Saintilan N, Wilson NC, Rogers K, Rajkaran A, Krauss KW. Mangrove expansion and salt marsh decline
at mangrove poleward limits. Glob Chang Biol. 2014; 20:147–157. https://doi.org/10.1111/gcb.12341
PMID: 23907934
89. Beyer J, Trannum HC, Bakke T, Hodson PV, Collier TK. Environmental effects of the Deepwater Hori-
zon oil spill: a review. Mar Pollut Bull. 2016; 110:28–51. https://doi.org/10.1016/j.marpolbul.2016.06.
027 PMID: 27301686
90. Altieri AH, Harrison SB, Seemann J, Collin R, Diaz RJ, Knowlton N. Tropical dead zones and mass mor-
talities on coral reefs. Proc Natl Acad Sci USA. 2017; 114:3660–3665. https://doi.org/10.1073/pnas.
1621517114 PMID: 28320966
91. MacNeil MA, Mellin C, Matthews S, Wolff NH, McClanahan TR, Devlin M, et al. Water quality mediates
resilience on the Great Barrier Reef. Nat Ecol Evol. 2019; 3:620–627. https://doi.org/10.1038/s41559-
019-0832-3 PMID: 30858590
92. Heery EC, Hoeksema BW, Browne NK, Reimer JD, Ang PO, Huang D, et al. Urban coral reefs: degra-
dation and resilience of hard coral assemblages in coastal cities of East and Southeast Asia. Mar Pollut
Bull. 2018; 135:654–681. https://doi.org/10.1016/j.marpolbul.2018.07.041 PMID: 30301085
93. Robertson DR, Christy JH, Collin R, Cooke RG, D’Croz L, Kaufmann KW, et al. The Smithsonian Tropi-
cal Research Institute: marine research, education, and conversation in Panama. Smithson Contrib
Mar Sci. 2009;73–93.
PLOS BIOLOGY
PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 17 / 18
https://doi.org/10.1890/12-2216.1
https://doi.org/10.1038/nature07051
https://doi.org/10.1038/nature07051
http://www.ncbi.nlm.nih.gov/pubmed/18536730
https://doi.org/10.1038/ismej.2011.108
https://doi.org/10.1038/ismej.2011.108
http://www.ncbi.nlm.nih.gov/pubmed/21900967
https://doi.org/10.1098/rstb.2008.0119
http://www.ncbi.nlm.nih.gov/pubmed/18768382
https://doi.org/10.15560/11.5.1758
https://doi.org/10.1073/pnas.1601078113
https://doi.org/10.1073/pnas.1601078113
http://www.ncbi.nlm.nih.gov/pubmed/27432948
https://doi.org/10.1038/s41586-020-2146-7
http://www.ncbi.nlm.nih.gov/pubmed/32238939
https://doi.org/10.1038/s41579-019-0222-5
https://doi.org/10.1038/s41579-019-0222-5
http://www.ncbi.nlm.nih.gov/pubmed/31213707
https://doi.org/10.1038/s41598-019-51699-4
https://doi.org/10.1038/s41598-019-51699-4
http://www.ncbi.nlm.nih.gov/pubmed/31700005
https://doi.org/10.1016/j.scitotenv.2020.138444
http://www.ncbi.nlm.nih.gov/pubmed/32380321
https://doi.org/10.1098/rspb.2020.2469
https://doi.org/10.1098/rspb.2020.2469
http://www.ncbi.nlm.nih.gov/pubmed/33402072
https://doi.org/10.1038/s41586-018-0202-3
https://doi.org/10.1038/s41586-018-0202-3
http://www.ncbi.nlm.nih.gov/pubmed/29995864
https://doi.org/10.1126/science.aad8745
https://doi.org/10.1126/science.aad8745
http://www.ncbi.nlm.nih.gov/pubmed/27387951
https://doi.org/10.1111/gcb.12341
http://www.ncbi.nlm.nih.gov/pubmed/23907934
https://doi.org/10.1016/j.marpolbul.2016.06.027
https://doi.org/10.1016/j.marpolbul.2016.06.027
http://www.ncbi.nlm.nih.gov/pubmed/27301686
https://doi.org/10.1073/pnas.1621517114
https://doi.org/10.1073/pnas.1621517114
http://www.ncbi.nlm.nih.gov/pubmed/28320966
https://doi.org/10.1038/s41559-019-0832-3
https://doi.org/10.1038/s41559-019-0832-3
http://www.ncbi.nlm.nih.gov/pubmed/30858590
https://doi.org/10.1016/j.marpolbul.2018.07.041
http://www.ncbi.nlm.nih.gov/pubmed/30301085
https://doi.org/10.1371/journal.pbio.3001322
94. Berumen ML, Voolstra CR, Daffonchio D, Agusti S, Aranda M, Irigoien X, et al. The Red Sea: environ-
mental gradients shape a natural laboratory in a nascent Ocean. In: Voolstra CR, Berumen ML, editors.
Coral Reefs of the Red Sea. Cham: Springer International Publishing; 2019. pp. 1–10.
95. Archana A, Thibodeau B, Geeraert N, Xu MN, Kao S-J, Baker DM. Nitrogen sources and cycling
revealed by dual isotopes of nitrate in a complex urbanized environment. Water Res. 2018; 142:459–
470. https://doi.org/10.1016/j.watres.2018.06.004 PMID: 29913387
PLOS BIOLOGY
PLOS Biology | https://doi.org/10.1371/journal.pbio.3001322 August 19, 2021 18 / 18
https://doi.org/10.1016/j.watres.2018.06.004
http://www.ncbi.nlm.nih.gov/pubmed/29913387
https://doi.org/10.1371/journal.pbio.3001322
Copyright of PLoS Biology is the property of Public Library of Science and its content may
not be copied or emailed to multiple sites or posted to a listserv without the copyright holder’s
express written permission. However, users may print, download, or email articles for
individual use.
Resource
Deep-Learning Resources for Studying Glycan-
Mediated Host-Microbe Interaction
s
Graphical Abstract
Highlights
d Glycan-focused language models can be used for sequence
–
to-function models
d Information in glycans predicts immunogenicity,
pathogenicity, and taxonomic origin
d Glycan alignments shed light into bacterial virulence
Bojar et al., 2021, Cell Host & Microbe 29, 132–14
4
January 13, 2021 ª 2020 The Author(s). Published by Elsevier In
https://doi.org/10.1016/j.chom.2020.10.004
Authors
Daniel Bojar, Rani K. Powers,
Diogo M. Camacho, James J. Collins
Correspondence
diogo.camacho@wyss.harvard.edu
(D.M.C.),
jimjc@mit.edu (J.J.C.)
In Brief
Bojar et al. present a workflow that
combines machine learning and
bioinformatics techniques to analyze the
prominent role of glycans in host-microbe
interactions. The herein developed
glycan-focused language models and
alignments allow for the prediction and
analysis of glycan immunogenicity,
association with pathogenicity, and
taxonomic classification.
c.
ll
mailto:diogo.camacho@wyss.harvard.�edu
mailto:jimjc@mit.�edu
https://doi.org/10.1016/j.chom.2020.10.004
http://crossmark.crossref.org/dialog/?doi=10.1016/j.chom.2020.10.004&domain=pdf
OPEN ACCESS
ll
Resource
Deep-Learning Resources for Studying
Glycan-Mediated Host-Microbe Interactions
Daniel Bojar,1,2 Rani K. Powers,1,2 Diogo M. Camacho,1,4,* and James J. Collins1,2,3,4,5,*
1Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, US
A
2Department of Biological Engineering and Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge,
MA 02139, USA
3Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
4These authors contributed equally
5Lead Contact
*Correspondence: diogo.camacho@wyss.harvard.edu (D.M.C.), jimjc@mit.edu (J.J.C.)
https://doi.org/10.1016/j.chom.2020.10.004
SUMMARY
Glycans, the most diverse biopolymer, are shaped by evolutionary pressures stemming from host-microbe
interactions. Here, we present machine learning and bioinformatics methods to leverage the evolutionary in-
formation present in glycans to gain insights into how pathogens and commensals interact with hosts. By
using techniques from natural language processing, we develop deep-learning models for glycans that are
trained on a curated dataset of 19,299 unique glycans and can be used to study and predict glycan functions.
We show that these models can be utilized to predict glycan immunogenicity and the pathogenicity of bac-
terial strains, as well as investigate glycan-mediated immune evasion via molecular mimicry. We also develop
glycan-alignment methods and use these to analyze virulence-determining glycan motifs in the capsular
polysaccharides of bacterial pathogens. These resources enable one to identify and study glycan motifs
involved in immunogenicity, pathogenicity, molecular mimicry, and immune evasion, expanding our under-
standing of host-microbe interactions.
INTRODUCTION
In contrast to RNA and proteins, whose sequences can be eluci-
dated from their associated DNA sequence, glycans are the only
biopolymer outside the rules of the central dogma of molecular
biology. Although glycans are synthesized by DNA-encoded en-
zymes (Lairson et al., 2008), an individual glycan sequence is
dependent on the interplay between multiple enzymes and
cellular conditions. Additionally, the expansive glycan alphabet
of hundreds of different monosaccharides allows for a large
number of potential oligosaccharides, built with different mono-
saccharides, lengths, connectivity, and branching. Glycans are
present as modifications on all other biopolymers (Varki, 2017),
exerting varying effects on biomolecules, including stabilization
and modulation of their functionality (Dekkers et al., 2017; Solá
and Griebenow, 2009). Apart from influencing the function of in-
dividual proteins, glycans are also crucial for cell-cell contact in
the case of glycan-glycan interactions during the attachment of
pathogenic bacteria to host cells (Day et al., 2015), and they
mediate essential developmental processes such as nervous
system development (Haltiwanger and Lowe, 2004). Recently,
Lauc et al. hypothesized that the plethora of available glycoforms
and their plasticity facilitated the evolution of complex multicel-
lular lifeforms (Lauc et al., 2014), reasoning that is supported
by the essential roles of glycans in developmental processes
132 Cell Host & Microbe 29, 132–144, January 13, 2021 ª 2020 The
This is an open access article under the CC BY-NC-ND license (http://
and cell-cell communication and emphasizes the evolutionary
information in glycans.
Because glycans make up the outermost layer of both eukary-
otic and prokaryotic cells, cross-kingdom interactions will
necessarily involve these molecules (Day et al., 2015). The prom-
inent role of glycans in host-pathogen interactions (Varki, 2017)
has resulted in evolutionary pressures and opportunities on
both sides of the interaction—natural selection can modify
host glycan receptors used by pathogens without losing their
functionalities, whereas pathogens and commensals need to
alter their glycans to evade the host immune system. These inter-
actions provide a window into understanding glycan-mediated
host-microbe relationships. Glycans display great phenotypi
c
variability: sequences can be changed depending on environ-
mental conditions, such as the level of extracellular metabolites
(Park et al., 2017), without the need for genetic mutations, poten-
tially facilitating rapid responses to changes in host-microbe
relationships.
Given the aforementioned glycan-mediated host-microbe in-
teractions, glycans could provide insights into pathogenicity
and commensalism determinants, as, for instance, molecular
mimicry of host glycans by both pathogens and commensals fa-
cilitates their immune evasion (Carlin et al., 2009; Varki and Gag-
neux, 2015). Additional therapeutic potential is enabled by the
widespread usage of glycans by viruses for cell adhesion and
Author(s). Published by Elsevier Inc.
creativecommons.org/licenses/by-nc-nd/4.0/).
mailto:diogo.camacho@wyss.harvard.edu
mailto:jimjc@mit.edu
https://doi.org/10.1016/j.chom.2020.10.004
http://crossmark.crossref.org/dialog/?doi=10.1016/j.chom.2020.10.004&domain=pdf
http://creativecommons.org/licenses/by-nc-nd/4.0/
C
D
Bonds Made
by
NeuNAc
α2
-8
α2-
3
α2-
6
N
um
b
e
r o
f G
ly
c
a
ns
0
200
400
α2-6
α2-3
α2-8
Bonds Made
by NeuNGc
0
40
80
NeuNAc
NeuNGc Kdo
Monosaccharides
with Bond α2-3
0
200
400
Monosaccharides Paired with
Fuc
N
um
b
e
r o
f G
ly
c
a
ns
Branching
O
c
c
ur
e
nc
e
s
Position
Main
Side
G
a
l
G
lc
G
a
l
N
A
c
M
a
n
Fu
c
G
lc
N
A
c
N
G
lc
N
A
c
O
S
G
lc
A
Rh
a
G
lc
N
A
c
G
ro
G
a
lO
A
c
G
a
lA
G
lc
N
1200
800
400
0
3000
2000
1000
0
G
lc
N
A
c
G
a
lO
S
A 12,674 Species-Specific Glycans
6,969 eukaryotic
6,119 prokaryotic
152 viral
19,299 Unique Glycans
1,027 Glycoletters
19,866 Glycowords
9,152 Glycans with at
least one label
1600
6000
Domain
Order
Kingdom
Family
Phylum
Genus
Class
Species
Number of Glycans
Number of Glycans
0 2000 4000 6000 0 2000 4000 0 20001000 3000
0 800 0 400 800 0 400 800
Virus
Archaea
Primates
Pseudomonadales
Fabales
Rhizobiales
Saccharomycetales
Lactobacillales
Artiodactyla
Burkholderiales
Actinomycetales
0 1000 2000
Plantae
Animalia
Fungi
Excavata
Virus
Euryarchaeota
Riboviria
Chromista
Proteoarchaeota
0 2000 4000
Hominidae
Pseudomonadaceae
Fabaceae
Saccharomycetaceae
Rhizobiaceae
Pasteurellaceae
Burkholderiaceae
Solanaceae
Muridae
Angiosperms
Chordata
Ascomycota
Firmicutes
Basidiomycota
Actinobacteria
Euglenozoa
Virus
Arthropoda
Homo
Salmonella
Burkholderia
Shigella
Bos
Sus
Streptococcus
Lactobacillus
Dicotyledons
Mammalia
Bacilli
Alphaproteobacteria
Monocotyledons
Saccharomycetes
Betaproteobacteria
Sordariomycetes
Actinobacteria
Sus scrofa
Mus musculus
Rattus norvegicus
Shigella dysenteriae
Gallus gallus
Pseudomonas sp.
Saccharomyces cerevisiae
B
Eukarya
Bacteria
Bacteria Proteobacteria Gammaproteobacteria
Enterobacterales Enterobacteriaceae Escherichia
Homo
Pseudomonas
Homo sapiens
Bos taurus
Escherichia coli
Figure 1. Using a Curated Glycan Dataset as a Resource for Glycobiology and Analyzing Host-Microbe Interactions
(A) Building curated datasets of species-specific and unique glycan sequences. Glycans stemming from proteins, lipids, small molecules, or cellular surfaces
were gathered from UniCarbKB, CSDB, GlyTouCan, and the academic literature. We deposited these datasets in our database SugarBase, containing additional
associated metadata, such as linkage and immunogenicity information.
(legend continued on next page)
ll
OPEN ACCESSResource
Cell Host & Microbe 29, 132–144, January 13, 2021 133
ll
OPEN ACCESS Resource
entry (Thompson et al., 2019) and pathogenic bacteria (Poole
et al., 2018).
In addition to previous work developing computational ap-
proaches to glycan analysis (McDonald et al., 2016; Spahn
et al., 2016), identifying relevant glycan motifs and their roles in
host-microbe interactions at scale would benefit from pattern-
learning algorithms, such as machine learning, that can uncover
statistical dependencies in biological sequences (Camacho
et al., 2018). Research on other biopolymers has shown that lan-
guage models, originally developed for the analysis of human
languages, perform best in this task (Alley et al., 2019; Almagro
Armenteros et al., 2020; Strodthoff et al., 2020), because they
can leverage evolutionarily conserved regularities and lan-
guage-like properties in such sequences. Language models,
with their memory-like features, are well suited for leveraging
patterns and implicit structure in biopolymers such as those un-
derlying nucleic acids (Valeri et al., 2020) and proteins (Alley
et al., 2019), because information in these sequences is order
dependent, and non-neighboring residues can have meaningful
interactions. Applying a natural language-processing approach
to biological sequences also enables learning a representation
of a molecule that can be used to analyze sequence motifs
and predict func
tional properties.
These types of models are
therefore a suitable starting point for the analysis of glycan
sequences.
Here, we present a resource toolkit comprising machine
learning and bioinformatics methods as well as a large glycan
database to leverage the evolutionary information present in gly-
cans for predictive purposes in the context of host-microbe in-
teractions, e.g., by understanding pathogenicity-associated
glycan motifs. This toolkit can be used as a complete workflow
for investigating host-microbe interactions, from a glycan data-
set to glycan motifs identified by machine learning and further
investigated by glycan alignments, or as separate modules. Un-
derlying all of this is our language model for glycans, SweetTalk,
trained on a dataset of 19,299 unique glycan sequences. With
this, we demonstrate that similarities between glycans can be
visualized and used to predict glycan properties such as human
immunogenicity. Another part of our platform is SweetOrigins, a
language-model-based classifier predicting the taxonomic
origin of glycans that we use to obtain evolution-informed repre-
sentations of glycans. To achieve this in the context of glycan-
mediated host-microbe interactions, we manually curated a
comprehensive dataset comprising 12,674 glycans with species
annotations. These datasets were combined into a database,
SugarBase, that is amenable to programmatic access and inte-
gration into deep-learning pipelines, thus providing resources for
analyses involving host-
microbe interactions.
In this work, we demonstrate the potential and generaliz-
ability of using SugarBase, SweetTalk, SweetOrigins, and a
glycan-alignment methodology for studying glycan-mediated
host-microbe interactions. We show that a language-model-
based classifier trained on glycan sequences can accurately
(B) Glycan species distribution in the species-specific glycan dataset. For all gly
taxonomic level are shown with their number of glycans.
(C and D) Analyzing the local structural context of glycoletters. We identified the m
local structural context together with its likely position in the glycan structure (main
sialic acids (D).
134 Cell Host & Microbe 29, 132–144, January 13, 2021
predict glycan immunogenicity and the pathogenicity of
E. coli strains, revealing predictive glycan motifs. We also
leverage the evolutionary information gained by
SweetOrigins
to analyze glycan motifs that could be used for molecular-
mimicry-mediated immune evasion by commensals and
pathogens. Applying our glycan-alignment methodology to
the example of the capsular polysaccharides of Staphylo-
coccus aureus and Acinetobacter baumannii, we uncover a
potential connection to the enterobacterial common antigen
and hypothesize a mechanism for the increased virulence
mediated by these glycan motifs. Taken together, these
resources offer a powerful and generalizable platform for
studying and understanding the role of glycans in host-
microbe interactions.
RESULTS
Curating Glycan Datasets for Glycobiology and Glycan-
Mediated Host-Microbe Interactions
To investigate the role of glycans in host-microbe interactions,
we constructed a dataset of species-specific glycan sequences
that could be used to train machine-learning models. For this, we
gathered and curated a dataset with glycans from GlyTouCan
(Tiemeyer et al., 2017), UniCarbKB (Campbell et al., 2014), the
Carbohydrate Structure Database (CSDB) (Toukach and Egor-
ova, 2016), and targeted literature searches (see STAR
Methods). To facilitate training deep-learning models on glycan
sequences, we only included glycans with fully elucidated se-
quences, including the determination of linkages between
monosaccharides. Our dataset contained 12,674 highly diverse
glycans with a deposited species association (Figure 1A; Table
S1) and included glycans from 1,726 species (corresponding to
39 taxonomic phyla; Figure 1B). Specifically, our dataset con-
tained 6,969 eukaryotic, 6,119 prokaryotic, and 152 viral gly-
cans. Because we included all species for which we could find
glycans, this dataset constituted a comprehensive snapshot of
currently known species-specific glycans, with glycans from
numerous bacteria, facilitating the study of glycan-mediated
host-microbe interactions.
We further reasoned that the inclusion of glycan sequences
without a deposited species label would strengthen the lan-
guage models we describe below. This approach is supported
by the success of transfer learning in the field of machine learning
(Howard and Ruder, 2018), in which models are initially trained
on large datasets without labels and then finetuned on smaller
datasets with labels. This makes more data available to learn
general patterns, such as sequence motifs, that can be lever-
aged to predict glycan properties. Accordingly, we curated a
separate dataset in which we used the databases mentioned
above to gather 19,299 unique glycan sequences, irrespective
of whether species information was available (Figure 1A;
STAR Methods; Table S2). To gain a comprehensive view of
glycobiology, we included all glycan categories, encompassing
cans with species information, up to the 10 most abundant classes for each
ost frequent monosaccharides following fucose in glycans (C), highlighting its
versus side branch). Additionally, we compared the binding behavior of several
Featurize
Input
/ // /
Language Model
(Glycoletters)
Classifiers
(Glycowords)
SweetTalk
Xt-1 Xt
LSTMRv LSTMRv LSTMRv
Xt+
1
Yt-1 Yt Yt+1
2-3x
Embedding
A
Language Model Output
C Glycowords With
Existing Alphabet
Possible Realized
Data Processing
GlyTouCan Literature
D
Datasets
α
2
β4
α3
β3 β3
Ser/Thr
α2
β4
α3 β3 β3
Ser/Thr
α2
β4
α3 β3 β3
Ser/Thr
α2
β4
α3 β3 β3
Ser/Thr
β3 β3
α3 β3
β4
α3
α2 β4
Fuc
Gal GalNAc
GlcNAc
tSNE Dim 1
Glycoletter Embeddings
Fuc
Glc
Neu
Man
B Glycowords
tS
N
E
D
im
2
Bonds
Gal
NeuNAc
Glc
α3 β3
1
0 –
30 –
-10 –
-30 –
-30
–
-10
–
30
–
10
–
N
um
b
e
r o
f G
ly
c
o
w
o
rd
s
Possible Realized
U
M
A
P
D
im
2
UMAP Dim 1
8 –
4 –
0 –
-4 –
-10
–
-6
–
-2
–
2
–
6
–
LSTMFw LSTMFw LSTMFw
19,299
Glycans
E F G
2-2-6
UMAP Dim 1
6
-8
-4
0
4
U
M
A
P
D
im
2
Non
-immuno
-genic
Immuno
-genic
N
um
b
e
r o
f U
nm
a
sk
e
d
G
ly
c
o
w
o
rd
s
Probability Immunogenic Probability Immunogenic
α3
α6
α6
α3
β2
β4 β4
α6
α2 α3 α3
α2
α6
α2 β4 β4 α3 β4 β4
α6
WT
α2 α3
α2 α6
1 2
3
3
1 2
2
α6 α3
α3 α6
β4 β21
1
2
1
1
1
α6 β4 β3 β4
WT
1 2
β4 β31
1
β3 β6
α6 α3
Fuc Gal GalNAcGlcNAc ManNAcGlc ManRhaNeuNAcNeuNGc Xyl
αGal
0.2
–
0.6
–
1.0
–
0.0
–
0.4
–
0.8
–
0.2
–
0.6
–
1.0
–
0.0
–
0.4
–
0.8
–
1 –
2 –
3 –
4 –
5 –
6 –
7 –
0.2
–
0.6
–
1.0
–
0.2
–
0.6
–
1.0
–
1 –
2 –
3 –
4 –
0.2
–
0.6
–
1.0
–
0.2
–
0.6
–
1.0
–
Homo sapiens
Non-Reducing Reducing
Homo sapiens
Non-Reducing Reducing
Ruminococcus
gnavus
Homo sapiens
High
MMannose
ghHig
haRh
NN-Glycans
nsO-
Glycan
1012
108
104
100
(legend on next page)
ll
OPEN ACCESSResource
Cell Host & Microbe 29, 132–144, January 13, 2021 135
ll
OPEN ACCESS Resource
protein-, lipid-, and small molecule-associated glycans, as well
as capsular and extracellular polysaccharides.
In our dataset, we observed 1,027 unique monosaccharides or
bonds that were present in glycan sequences and comprised the
smallest units of an alphabet for a glycan language. Analogous to
natural language processing, we termed these entities ‘‘glycolet-
ters’’ and constructed ‘‘glycowords’’ by considering trisaccha-
rides (i.e., three monosaccharides and two connecting bonds,
or five glycoletters), yielding 19,866 unique glycowords in our da-
taset. With this, we sought to incorporate local structural infor-
mation into our models and enable the discovery of relevant mo-
tifs, which usually contain subsequences larger than a single
monosaccharide. Even larger substructures would preclude
the analysis of shorter glycans and lead to an exponential in-
crease in the size of the resulting vocabulary. We would also
like to note that although we chose trisaccharides as building
blocks, glycan substructures of any length can be used to build
a vocabulary for our models without considerable changes.
To make these data and analysis resources readily accessible
and facilitate further advances in glycobiology, we created Sug-
arBase, a comprehensive glycan database with metadata and
analytical tools based on this work (Figure S1A; Table S2;
https://webapps.wyss.harvard.edu/sugarbase). SugarBase of-
fers accessible glycan data, explorable glycan representations
learned by our language models, and many of the methods
developed here as tools, such as the local structural context
of any glycoletter (Figure S1B) and glycan alignments, described
below.
Reasoning that our glycan datasets constitute broad re-
sources for glycobiology and host-microbe interactions, we set
out to investigate host glycan substructures that could be
emulated by microbes for molecular mimicry. Analyzing the envi-
ronment of the monosaccharide fucose as an example, we
observed N-acetylglucosamine (GlcNAc) and galactose (Gal)
as typical connected monosaccharides (Figure 1C), which is
consistent with the fucosyltransferase substrate specificities an-
notated in glycosyltransferase family 10 (Lombard et al., 2014).
Thus, microbial glycans containing fucose could potentially
include either GlcNAc or Gal in direct proximity to maximize sim-
ilarity with host glycans. This insight aids in formulating hypoth-
Figure 2. Learning the Language of Glycans Revealed Regularities in S
(A) Building a language model for glycobiology. We used glycowords, overlapping
based bidirectional RNN, SweetTalk, that was trained by predicting the next glyc
symbol nomenclature for glycans (SNFG).
(B) Learned representation of glycoletters by SweetTalk. We visualized the embe
SNE). Areas enriched for modified monosaccharides of one type are colored.
(C) Comparing the abundance of possible and observed glycowords. Possible
exhaustive combination (36 bonds and 991 monosaccharides).
(D) Comparing the distribution of possible and observed glycowords. We gene
monosaccharides and bonds and formed their embedding by averaging their co
jection (UMAP) of these generated glycowords (blue) and all observed glycoword
(E) Glycan embeddings learned by the immunogenicity classifier. Embeddings fo
according to whether they were immunogenic (blue) or non-immunogenic (orang
(F) Glycoword masking to probe the immunogenicity classifier. Glycowords were
Reducing’’/‘‘Reducing’’) and used as input for the trained immunogenicity classifi
glycan is for prediction, with the bar representing the full-length glycan at the bo
(G) Glycan in silico alterations to probe immunogenicity classifier. For 4,000
monosaccharide or bond. If the resulting glycowords were observed, we used th
probability is plotted together with the altered glycan sequences, with the wildtyp
monosaccharide was modified. The addition of an ‘‘S’’ implies a sulfurylated mo
136 Cell Host & Microbe 29, 132–144, January 13, 2021
eses and identifying glycan motifs relevant for molecular mim-
icry, as we describe below. We also differentiated binding
orientation preferences for different sialic acids, a crucial mono-
saccharide type in host-pathogen interactions (Figure 1D;
Haines-menges et al., 2015), revealing a preference for the char-
acteristic human monosaccharide NeuNAc to be (a2-3)-linked,
relative to other sialic acids such as NeuNGc. These types of an-
alyses can directly lead to hypotheses of glycan motifs that can
be investigated by using the methods presented in this work.
Using Natural Language Processing to Learn the
Grammar of Glycans
Next, we used our curated dataset of 19,299 glycan sequences
(Table S2) to develop a deep-learning-based language model,
SweetTalk. For this, we chose a bidirectional recurrent neural
network (RNN; Figure 2A; Sherstinsky, 2020), because this
type of model has delivered state-of-the-art results for other bio-
polymers, such as protein sequences (Alley et al., 2019; Almagro
Armenteros et al., 2020; Strodthoff et al., 2020). Originally devel-
oped for human languages, RNNs exhibit memory-like elements
by predicting the next word given the preceding words (Sherstin-
sky, 2020); this enables RNNs to learn complex, order-depen-
dent interactions in proteins by viewing amino acids as letters
and predicting the next amino acid given the preceding
sequence (Alley et al., 2019). Two of the main usages for a trained
language model are as follows: (1) extracting a learned represen-
tation for each word and (2) finetuning the model for predicting
structural or functional properties of a sequence. For the former,
a representation or embedding that characterizes a word in
terms of context, usage, and meaning is constructed in the pa-
rameters of the trained model for each word in the vocabulary.
This learned representation can be used to quantify the similarity
of two glycan sequences or analyze language properties, which
we demonstrate with the analysis of molecular mimicry in host-
microbe interactions. The latter—finetuning a general language
model on a predictive task such as predicting pathogenicity—
is also known as transfer learning (Howard and Ruder, 2018;
Tan et al., 2018), and in our case it involves general glycan fea-
tures that are learned by the language model to predict func-
tional properties.
ubstructures and Can Be Used to Predict Glycan Immunogenicity
units consisting of three monosaccharides and two bonds, for our glycoletter-
oletter given previous glycoletters. Glycans are drawn in accordance with the
dding for every glycoletter by t-distributed stochastic neighbor embedding (t-
glycowords were calculated from the pool of observed glycoletters and their
rated 250,000 glycowords by randomly sampling from the observed pool of
nstituent glycoletter embeddings. A uniform manifold approximation and pro-
s (orange) is shown.
r glycans from our immunogenicity dataset are shown via UMAP and colored
e).
progressively exchanged with padding (‘‘masking’’) from both termini (‘‘Non-
er. Inferred immunogenicity probability indicates how crucial each region of a
ttom.
iterations, single monosaccharides or bonds were replaced with a random
em as input for the trained immunogenicity classifier. Inferred immunogenicity
e glycan found at the bottom. In case of ambiguity, a number indicates which
nosaccharide, whereas ‘‘Me’’ implies a methylated monosaccharide.
https://webapps.wyss.harvard.edu/sugarbase
ll
OPEN ACCESSResource
Glycans are the only nonlinear biopolymer, with up to multiple
branches per sequence. To enable a language model despite
this branching, we extracted partially overlapping ‘‘glyco-
words’’ from the non-reducing end to the reducing end of gly-
cans in the bracket notation (Figure 2A), comprising three
monosaccharides and two bonds. These glycowords repre-
sented snapshots of structural contexts that characterize a
glycan sequence. By using monosaccharides and bonds as
‘‘glycoletters,’’ we then trained a glycoletter-based language
model, SweetTalk, predicting the next most probable glycolet-
ter given the preceding glycoletters in the context of these gly-
cowords (Table S3). This operation, instead of directly training
on full sequences, avoids learning specious relationships be-
tween glycoletters that are close in the bracket notation but
far apart in the actual glycan structure due to branching. We
then demonstrated the necessity of accounting for the order-
dependent information in glycans by training SweetTalk on
scrambled glycan sequences, randomizing the order but keep-
ing the composition of a sequence—this resulted in severely
degraded model performance, emphasizing the language-like
elements inherent in glycan sequences (Table S3). Analyzing
the learned embeddings of glycoletters after training SweetTalk
revealed similar positions in embedding space for monosac-
charides and their modified counterparts (e.g., sulfurylated
galactose, GalOS, and sulfurylated N-acetylgalactosamine,
GalNAcOS; Figure 2B), implying similarity in their language
characteristics and context. This finding is reminiscent of ob-
servations made on the popular word2vec embeddings that
also learn a representation of words in a human language by
considering their neighboring words/context, in which seman-
tically similar words form clusters (Mikolov et al., 2013).
We then constructed glycoword embeddings by averaging the
embeddings of their constituent glycoletters. Our first observa-
tion was that from the close to 1.2 trillion possible glycowords
(given our observed glycoletters), only 19,866 distinct glyco-
words (�0.0000016%) were observed here (Figure 2C). More-
over, these 19,866 glycowords were not evenly distributed in
the learned embedding space, as existing glycowords formed
clusters compared to in silico-generated, possible glycowords
(Figure 2D). The observation that the glycoword space (and,
thus, glycan space) is sparsely populated is potentially a conse-
quence of having to evolve dedicated enzymes for constructing
specific glycan substructures from a species-specific set of
monosaccharide building blocks, making most combinations
inaccessible.
Predicting Glycan Immunogenicity with a Glycan-Based
Language Model
Given the important role glycans play in human immunity (Kap-
pler and Hennet, 2020; Reusch and Tejada, 2015), we curated
known immunogenic glycans from the literature (Table S2) to fi-
netune a SweetTalk-based classifier with glycan sequences as
input to predict their immunogenicity to humans. On an indepen-
dent validation dataset, our model achieved an accuracy of
�92% (F1 score or balanced F score: 0.915), in comparison
with an accuracy of �51% for a model trained on scrambled
glycan sequences (Figures 2E–2G; Table S4). Alternative ma-
chine-learning models that did not treat glycan sequences as a
language, such as random forest classifiers, only achieved accu-
racies ranging from �80%–88% for this task (Table S4), empha-
sizing the importance of order and patterns for elucidating
glycan properties.
Rhamnose-rich glycans, a common monosaccharide in bac-
teria but not in mammals, were unambiguously assigned to an
immunogenic cluster by our RNN-based model and presented
the most striking motif for glycan immunogenicity (Figure 2E).
The cluster containing high-mannose glycans provided addi-
tional ambiguity, because it included both immature human gly-
cans and immunogenic fungal glycans, potentially suggesting
the immunogenicity of unintentionally exposed immature human
glycans. Indeed, the presence of immature high-mannose gly-
cans on viral surfaces has been noted to influence immunoge-
nicity, with many broadly neutralizing antibodies targeting the
high-mannose glycans on HIV glycoproteins (Lavine et al.,
2012). We also found that human mucosal O-glycans, character-
ized by their interactions with bacteria, were interspersed with
bacterial immunogenic glycans in the embedding space, in
contrast to N-linked glycans. This adds to the notion of an immu-
nological compromise of recognizing these bacterial glycans at
the expense of targeting human O-glycans with shared motifs,
such as the ABH blood group antigens (Kappler and Hennet,
2020). These analyses indicate that embeddings from glycan-
focused language models could be used to study characteristics
of glycans on a large scale and with many potential applications,
such as the exploration of glycan-immune system interactions.
Using Deep Learning to Provide Evolution-Informed
Glycan Representations
We next hypothesized that the evolutionary pressures on gly-
cans stemming from host-pathogen interactions could be ex-
tracted by a deep-learning model. For this, we constructed a lan-
guage-model-based classifier, SweetOrigins, to predict the
taxonomic origin of a glycan (Figure 3A). In distinguishing taxo-
nomic classes, SweetOrigins could learn species-specific fea-
tures of glycans that are indicative of their evolutionary history.
Based on a bidirectional RNN, we first pre-trained SweetOrigins
with a SweetTalk model as described above. We then used the
language-like properties learned in this process to finetune the
model on a different task—predicting the taxonomic group of
glycans. By doing this for every taxonomic level, from the spe-
cies level up to the domain level, we obtained eight SweetOrigins
models with the same basic model architecture except for
different final layers. These final layers could learn how to
combine the extracted information from glycans for predicting
their taxonomic group, and they differed in terms of their number
of output nodes, as the number of classes varied for each taxo-
nomic level. This strategy was successful in extracting evolu-
tionary information from glycans, as SweetOrigins models clas-
sified the taxonomic group of a glycan with high accuracy
(Table 1).
In contrast to other biological sequences such as DNA or pro-
teins, the number of available sequences for glycans is still
limited, which is compounded by their high diversity. This is
especially visible in prediction tasks in which only few glycans
per class are available, such as for the species-level SweetOri-
gins model, resulting in lower model performance for rare clas-
ses and less useful glycan representations for downstream ana-
lyses. As knowledge of host-microbe interactions at the species
Cell Host & Microbe 29, 132–144, January 13, 2021 137
A
Glycan
G
ly
c
o
w
o
rd
s
SweetOrigins
b
iL
ST
M
b
iL
ST
M
b
iL
ST
M
Domain: Bacteria
Kingdom: Bacteria
Phylum: Proteobacteria
Class: Gammaproteobacteria
Order: Enterobacterales
Family: Enteriobacteriaceae
Genus: Escherichia
Classification ResultFully Connected
Layer
Species: Escherichia coli
Glycoword
Embeddings
α6β2
β2
α3
β4 β4
α3
α6β2
β2
α3
β4 β4
α3
α6
β2
β2
α3
β4 β4
α3 α6
β2
β2
α3
β4 β4
α3
α6β2
β2
α3
β4 β4
α3
α6
β2
β2
α3 β4 β4
α3 α6
β2
β2
α3 β4 β4
α3
GlcNAc(b1-2)Man(a1-6)[Xyl(b1-2)][Man(a1-3)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-3)]GlcNAc
GlcNAc(b1-2)Man(a1-6)
[Xyl(b1-2)][Man(a1-3)]…
GlcNAc(b1-2)Man(a1-6)
[Man(a1-3)][Xyl(b1-2)]…
Xyl(b1-2)[GlcNAc(b1-2)
Man(a1-6)][Man(a1-3)]..
Xyl(b1-2)[Man(a1-3)]
[GlcNAc(b1-2)Man(a1-6)]..
Man(a1-3)[GlcNAc(b1-2)
Man(a1-6)][Xyl(b1-2)]..
Man(a1-3)[Xyl(b1-2)]
[GlcNAc(b1-2)Man(a1-6)]..
B
400
t-SNE Dim 1
-40
t-
SN
E
D
im
2
20
40
0
-20
αGal-engineered
O8/O9
K-12
O86/O127/O128
O13/O148/O150
C
F470
O6
O174
O4/O25
O6
J5
C D
t-SNE Dim 1
-20 0 20 60
40
0
-40
Yes
Unknown
No
Pathogenic
O157:H7
K-12
O111:B4
α3 β3
β3
α6 α4
α6 α4
β3α4
α3β3
α3 β3
Figure 3. Deep-Learning-Based Classifiers Use Glycans to Predict Taxonomic Origin and Pathogenicity
(A) Exemplary schematic of SweetOrigins to predict taxonomic origin from glycans. Lists of glycowords are used as input for a SweetOrigins model to predict the
taxonomic class ranging from the domain level down to the species level.
(B) Glycan data augmentation strategy. Different bracket notations describing the same glycan can be generated by alternating double branches as well as
replacing side branches with main branches to increase model robustness.
(C) Glycans of E. coli in embedding space distinguish strains. The embedding for all 1,010 E. coli-derived glycans with strain information from the trained species-
level SweetOrigins model is plotted via t-SNE and colored for areas enriched for annotated E. coli
strains.
(D) E. coli glycans predict pathogenicity. For all E. coli-derived glycans, representations learned by a model predicting pathogenicity are plotted via t-SNE and
colored as to whether they stem from pathogenic, non-pathogenic, or unlabeled E. coli. Example strains for all cases are annotated.
ll
OPEN ACCESS Resource
138 Cell Host & Microbe 29, 132–144, January 13, 2021
Table 1. Metrics of Trained SweetOrigins Models
Taxonomic Level Classes
Baseline
Accuracy Cross-Entropy Loss Accuracy MCC
Random Max Base Aug Base Aug Base Aug
Domain 4 (4) 0.2500 0.99 0.2841 0.1906 0.9128 0.9313 0.8134 0.8693
Kingdom 9 (11) 0.1111 0.98 0.3844 0.3249 0.8733 0.8953 0.8001 0.8390
Phylum 33 (39) 0.0303 0.98 0.8685 0.7543 0.7779 0.8008 0.7018 0.7341
Class 71 (101) 0.0141 0.96 1.3283 1.1729 0.6803 0.7149 0.6218 0.6638
Order 145 (207) 0.0069 0.92 2.2498 2.1132 0.4937 0.5333 0.4602 0.5066
Family 258 (411) 0.0039 0.90 2.9834 2.7068 0.4134 0.4660 0.3873 0.4428
Genus 405 (919) 0.0025 0.86 3.6588 3.4081 0.3658 0.3849 0.3505 0.3682
Species 581 (1,726) 0.0017 0.86 4.3704 3.9550 0.3052 0.3651 0.2870 0.3496
Taxonomic groups with fewer than five unique glycans were not used for model training or validation. Number of classes indicates the number of
included taxonomic groups, whereas the full number of taxonomic groups in our dataset is given in parentheses. Models were trained with the standard
set of glycans (Base) or after data augmentation (Aug). As an accuracy baseline, a random prediction of classes was used for each model. Max in-
dicates the maximum theoretically possible accuracy given shared glycan sequences across taxonomic groups. Cross-entropy loss, accuracy,
and Matthew’s correlation coefficient (MCC) of the trained model on a separate validation set are given for each taxonomic level. For each metric
and taxonomic level, the superior value is bolded.
ll
OPEN ACCESSResource
level could offer insights, we developed methods that enable
training glycan-focused machine-learning models on small data-
sets. This goal motivated our transfer-learning approach of pre-
training a language model on all glycan sequences and then fine-
tuning the model on a smaller dataset, because this approach in
natural language processing has in some cases reduced the
necessary dataset size by a factor of 100 (Howard and Ruder,
2018). In other domains of deep learning, such as image classi-
fication, data augmentation routinely results in improved model
quality and robustness by providing the model with slightly modi-
fied versions of the data (Perez and Wang, 2017), such as
rotating images or changing their brightness. We reasoned that
the same could be achieved for biomolecules such as glycans;
we thus designed a data-augmentation method, specifically for
glycans, by conceptualizing glycans as graphs and forming a
set of isomorphic graphs comprising slightly different lists of gly-
cowords that we used as inputs for SweetOrigins (Figure 3B;
STAR Methods). Capitalizing on the ambiguity of the bracket no-
tation (Tanaka et al., 2014), we generated bracket notations that
differed in their ordering of branches but still described the same
glycan. This led to model performance improvements at every
classification level, with absolute accuracy increases of up to
6%, by effectively increasing the amount of available data. As
we envisioned, classifications with less data per class, such as
the species level, benefited most from data augmentation (Table
1), paving the way for using glycan-based deep-learning models
with smaller datasets.
In general, our predictions were robust, and we could, for
example, accurately predict glycans from the kingdoms Animalia
(91.1%) and Bacteria (97.2%), as well as glycans from the phyla
Chordata (91.9%) and Firmicutes (90.4%) in our validation data-
set (Figures S2A–S2C). This demonstrates that SweetOrigins
can learn glycan representations from both hosts and microbes,
enabling the analyses presented below. Any misclassifications
occurred among closely related groups, such as viral glycans
misclassified as those of their hosts (Figures S2A–S2C). Glycan
embeddings from our trained SweetOrigins model illustrated
clusters reminiscent of taxonomic groups (Figure S2D). We
next used our trained SweetOrigins models to infer the taxo-
nomic origin of the 10,333 glycans without a species label in
our dataset (Table S2). For several randomly selected glycans,
we performed literature searches to validate the predictions
made by SweetOrigins (Figure S2E; Table S5), indicating that
our trained SweetOrigins models had accurately learned spe-
cies- or group-specific glycan motifs.
We next used SweetOrigins models to investigate host-path-
ogen interactions, specifically in the context of the well-studied
bacterium E. coli. Although SweetOrigins classifiers were only
trained up to the species level, we hypothesized that subspe-
cies-level information could be extracted from the rich glycan
representation learned by the species-level SweetOrigins model.
To test this, we gathered 1,010 glycan sequences from E. coli
with strain-level annotation from CSDB and used these as inputs
to our trained model, yielding learned representations that we
used to differentiate serotypes. We could readily identify clusters
enriched for several strains in the representations, such as the
serotypes O8/O9, characterized by a special polymannose O-
antigen (Greenfield et al., 2012), and the K-12 strain popular in
molecular biology research (Figure 3C), demonstrating the diver-
sity and characteristic features of glycans for different E. coli
strains.
We next reasoned, given the prominent role of glycans in host-
microbe interactions, that these glycan differences could be
used to predict E. coli pathogenicity, because E. coli strains
can range from being non-colonizing to commensal or patho-
genic (Lim et al., 2010). Accordingly, we trained a deep-
learning-based classifier with the same language-model archi-
tecture as SweetOrigins on glycan sequences to elucidate
whether information in glycans can predict pathogenicity. With
a threshold of 0.5 in the predicted probability of pathogenicity,
we found that we were able to predict E. coli strain pathogenicity
with an accuracy of �89% on a separate validation dataset (Fig-
ure 3D; F1 score: �0.906). This positioned E. coli strains along a
continuum of predicted pathogenicity and supported the role of
glycans in mediating pathogenicity. Interestingly, E. coli strains
such as O111:B4, which were labeled as ‘‘unknown’’ in the
Cell Host & Microbe 29, 132–144, January 13, 2021 139
ll
OPEN ACCESS Resource
dataset and therefore not available during model training, were
predicted to be among the pathogenic strains and confirmed
to cause gastric disease (Viljanen et al., 1990). Our trained model
placed the majority of E. coli glycans from unknown pathoge-
nicity strains between pathogenic and non-pathogenic strains,
adding to the notion of a continuum of pathogenicity (Casade-
vall, 2017).
Because glycans appear to be predictive of pathogenicity, we
reasoned that certain glycan motifs in E. coli strains on the path-
ogenic end of the spectrum might provide further insight into
pathogenesis. To address this notion, we identified glycan motifs
that are enriched in regions populated by predominantly patho-
genic E. coli strains in the representation learned by our model
(Figure 3D). Motifs in these pathogenicity-associated glycans
exhibited a striking resemblance to host mucosal glycans, with
an enrichment for a1-2-linked fucose and the core 1 O-glycan
structure (also known as T antigen; Gal(b1-3)GalNAc) prevalent
in mucins (Figures S3A and S3B). Consistent with our local struc-
tural context analysis (Figure 1B), the majority of a1-2-linked
fucose residues in pathogenic E. coli strains were linked to
galactose (Figure S3C), forming part of the human blood group
H antigen. Indeed, when analyzing the glycan motifs most pre-
dictive of E. coli strain pathogenicity, both Gal(b1-3)GalNAc
and Fuc(a1-2)Gal disaccharides were among the top 20 motifs
(Figure S3D). On the other hand, the presence of typical bacterial
glycan components, such as rhamnose or L-Glycero-D-Manno-
Heptose (LDManHep), was associated with lower predicted
pathogenicity (Figure S3D).
Using Glycan Alignments to Study Virulence
Determinants in Bacterial Pathogens
To better understand the function of glycans in host-microbe in-
teractions, we developed a sequence-alignment method. For
DNA and protein sequences, alignments use sequence changes
due to mutations and insertions to enable, for example, the iden-
tification of conserved motifs in protein families (Do�gan and Kar-
açalı, 2013). To facilitate analogous analyses for glycans and
capitalize on the evolutionary influence of host-pathogen inter-
actions on glycans, we developed methods for gapped, pairwise
alignments of glycan sequences based on the Needleman-
Wunsch alignment algorithm (Needleman and Wunsch, 1970).
For this, we constructed a substitution matrix (which we termed
GLYSUM; Table S6), analogous to the BLOSUM matrices used
in protein alignments, that utilizes the likelihood of substituting
two monosaccharides to calculate alignment scores. To assess
whether our glycan alignments performed as envisioned, we
analyzed viral glycans that are predominantly derived from their
host organisms and thus should align to host glycans. As ex-
pected, the optimal alignment for the viral glycans was indeed
from their host organisms (Figures 4A and 4B), supporting the
validity of our glycan-alignment method.
We reasoned that functionally relevant glycan motifs for host-
pathogen interactions are likely conserved to some extent and
could be analyzed with glycan alignments. As an example, we
used our glycan-alignment method to align the serotype 5
capsular polysaccharide of the clinically relevant pathogen
S. aureus, which is known to increase bacterial virulence (Tziana-
bos et al., 2001), against our dataset. Because the capsular poly-
saccharides of S. aureus mediate its evasion of the immune sys-
140 Cell Host & Microbe 29, 132–144, January 13, 2021
tem (Weidenmaier and Lee, 2015), we hypothesized that
comparing these to similar sequences might offer insights to un-
derstand their pathogenicity. Notably, the best alignment results
were achieved with the enterobacterial common antigen, ECA
(Figure 4C), conserved in the Enterobacteriaceae family, which
has been shown to be important for virulence (Gilbreath et al.,
2012) and outer membrane permeability (Mitchell et al., 2018).
These findings are supported by experiments demonstrating
that ECA deficiency in E. coli can be rescued by the expression
of enzymes from serotype 5 S. aureus (Kiser and Lee, 1998).
Such a phenotype complementation could suggest that this
ECA-like glycan motif fulfills a similar role in S. aureus as the ca-
nonical ECA in E. coli.
To further probe the connection of ECA-like glycans and
increased virulence, we aligned the canonical ECA motif against
our dataset to compile a list of ECA-like sequences and their
alignment distances; we used these distances to construct a
dendrogram detailing the relationships between ECA-like glycan
sequences (Figure 4D). Although most of the S. aureus-derived
ECA-like sequences formed a separate cluster, the type 5
capsular polysaccharide was located in a different cluster with
the canonical ECA sequences. Of note, we observed an ECA-
like motif in the capsular polysaccharide of A. baumannii (Fig-
ure 4D, bold), one of the most problematic hospital-acquired
pathogens, in the same cluster dominated by canonical ECA se-
quences. The capsular polysaccharide of A. baumannii has been
implicated with antibiotic resistance and virulence (Geisinger
and Isberg, 2015), providing an intriguing potential link to the
functions of the canonical ECA. For other pathogens, such as
Haemophilus ducreyi, the expression of a gene cluster synthe-
sizing a putative ECA-like glycan has also been linked to
increased virulence (Banks et al., 2008), further suggesting a
connection of this motif with virulence. Notably, the genera
Staphylococcus, Acinetobacter, and Haemophilus are not part
of the Enterobacteriaceae family that is typically associated
with the ECA, highlighting the importance of our glycan align-
ments for screening thousands of glycans to aid in understand-
ing motifs important for pathogenicity, such as the ECA-like gly-
cans from S. aureus and A. baumannii.
DISCUSSION
Here, we presented a set of resources—a collection of deep-
learning and bioinformatics methods, together with large,
curated datasets of glycan sequences—that can be used to
gain insights into many facets of glycan-mediated host-microbe
interactions. The aggregation of many glycan sequences in our
datasets leads to robust machine-learning models that are
largely unaffected by data-entry errors, thereby adjusting for
database errors. By training a language model to understand
the hidden grammar of glycan sequences, we demonstrated
that the information in glycans can be used to predict a range
of glycan properties, such as immunogenicity or pathogenicity.
We also showed that sequences can be compared and clustered
by learning a representation for each glycan via our trained
models. For applications involving glycoproteins, the distribution
of variant glycans on a protein (Wu et al., 2018) could be ac-
counted for by averaging their representations, potentially even
weighted by their relative abundance. By developing both
Human Immunodeficiency Virus
Gal β1-4 GlcNAc β1-2 Gal β1-4 GlcNAc β1-4 Man a1-3 Gal β1-4 GlcNAc β1-2 Man a1-6 Man β1-4 GlcNAc β1-4 Fuc a1-6 GlcNAc
Gal β1-4 GlcNAc β1-2 Gal β1-4 GlcNAc β1-4 Man a1-3 Gal β1-4 GlcNAc β1-2 Man a1-6 Man β1-4 GlcNAc β1-4 Fuc a1-6 GlcNAc
1 23
Alignment Score: 115
Percent Identity: 100.0
Percent Coverage: 100.0
Species: Homo sapiens
A
B SARS-CoV-2
Man a1-3 Man a1-6 Man a1-6 Man a1-3 Man β1-4 GlcNAc β1-4 GlcNAc
Man a1-3 Man a1-6 Man a1-6 Man a1-3 Man β1-4 GlcNAc β1-4 GlcNAc
1 13
Alignment Score: 65
Percent Identity: 100.0
Percent Coverage: 100.0
Species: Homo sapiens
C D
Staphylococcus aureus
ManNAcA β1-4 FucNAcOAc a1-3 D-FucNAc β1-4 ManNAcA
1 7
Alignment Score: 26
Percent Identity: 71.4
Percent Coverage: 100.0
Species: Escherichia coli
ManNAcA β1-4 GlcNAc a1-3 D-FucNAc a1-4 ManNAcA
ManNAcA β1-4 FucNAcOAc a1-3 D-FucNAc β1-4 ManNAcA
1 7
Alignment Score: 22
Percent Identity: 57.1
Percent Coverage: 100.0
Species: Escherichia coli
ManNAcA β1-4 GlcNAc a1-3 FucNAc a1-4 ManNAcA
ManNAcA β1-4 FucNAcOAc a1-3 D-FucNAc β1-4 ManNAcA
1 7
Alignment Score: 21
Percent Identity: 71.4
Percent Coverage: 100.0
Species: Yersinia pestis
ManNAcA β1-4 GlcNAcOAc a1-3 D-FucNAc a1-4 ManNAcA
ManNAcβ1-4Glcα1-4ManNAc S.marcescens
ManNAcAβ1-4FucNAcOAcα1-3D-FucNAcβ1-4ManNAcA S.aureus
Manα1-2Fucα1-2GlcOAcAβ1-3GalNAc P.alcalifaciens
Manα1-3FucNAcα1-3GlcNAcβ1-3FucNAc C.universalis
ManNAcAβ1-4GlcNAcα1-3D-FucNAcα1-4ManNAcA S.sonnei
ManNAcAβ1-4GlcNAcα1-3D-FucNAcα1-4ManNAcA E.coli
ManNAcOAcAβ1-3FucNAcα1-3D-FucNAcβ1-3ManNAcOAcA S.aureus
Manα1-3D-Fucα1-3GlcNAcβ1-3Rha P.agglomerans
ManNAcα1-3Rhaβ1-4GlcNAcα1-2Man.1 S.dysenteriae
ManNAcAβ1-4GlcNAcα1-3D-FucNAcα1-4ManNAcA Y.enterocolitica
ManNAcβ1-4Glcβ1-3ManNAc.1 C.werkmanii
GalNAcβ1-4GlcAβ1-3D-FucNAcNβ1-3GalNAc P.temperata
ManNAcβ1-4Glcβ1-3ManNAc C.braakii
ManNAcAβ1-4GlcNAcα1-3FucNAcα1-4ManNAcA E.coli
ManNAcAβ1-4GlcNAcα1-3D-FucNAcα1-4ManNAcA S.enterica
Manα1-3Fucα1-3GlcNAcβ1-4GalNAc E.tarda
ManNAcAβ1-4GlcNAcα1-3D-FucNAcα1-4ManNAcA Y.pestis
Manα1-2Fucα1-2Glcβ1-3GlcNAc P.rustigianii
Manβ1-4Glcβ1-3D-FucNAcOAcα1-4GalNAc C.gillenii
ManNAcβ1-4GlcNAcα1-3D-FucNAcα1-4ManNAc E.coli
ManNAcOAcAβ1-4FucNAcα1-3D-FucNAcβ1-4ManNAcOAcA S.aureus
Manα1-3Fucα1-3GlcNAcα1-2Man.1 Y.entomophaga
ManNAcAβ1-4GlcNAcα1-3D-FucNAcα1-4ManNAcA P.mirabilis
ManNAcAβ1-4GlcNAcOAcβ1-3D-FucNAcα1-4ManNAcA Y.pestis
ManNAcOAcAβ1-3FucNAcα1-3FucNAcα1-3ManNAcOAcA S.aureus
ManNAcOAcAβ1-3FucNAcα1-3D-FucNAcα1-3ManNAcOAcA S.aureus
ManNAcβ1-4GlcNAcα1-4ManNAc H.alvei
ManNAcAβ1-4ManNAcAβ1-3D-FucNAcα1-4ManNAcA A.baumannii
Manα1-3Fucα1-3GlcNAcα1-2Man Y.pseudotuberculosis
ManNAcAα1-4FucNAcα1-3D-FucNAcβ1-4ManNAcA S.aureus
ManNAcβ1-4GlcNAcβ1-6GlcNAcα1-4ManNAc B.anthracis
ManNAcAβ1-4GlcNAcα1-4ManNAcA P.putida
ManNAcAβ1-4GlcNAcα1-4ManNAcAβ1-3D-FucNAcα1-4ManNAcA A.globiformis
ManNAcAβ1-4L-GulNAcOAcAα1-3QuiNAcNButα1-4ManNAcA A.haemolyticus
ManNAcAβ1-4GlcNAcAβ1-6Glca1-4ManNAcA A.cyaneus
ManNAcα1-3Rhaβ1-4GlcNAcα1-2Man R.terrigena
ManNAcAβ1-4GlcNAcα1-3D-FucNAcα1-4ManNAcA P.shigelloides
ManNAcAβ1-4GlcNAcNAmAβ1-3GlcNAcα1-4ManNAcA E.albertii
ManNAcAβ1-4GlcNAcOAcα1-3D-FucNAcα1-4ManNAcA Y.pestis
C
an
on
ic
al
S
. a
ur
eu
s
Enterobacterial common antigen
Figure 4. Glycan Alignments Identify Pathogenicity-Associated Glycan Motifs
(A and B) Viral glycans aligned to host glycans. We aligned viral glycans to all glycans and depicted the highest scoring alignment.
(C) Glycan alignments using serotype 5 capsular polysaccharide of S. aureus. The repeating unit of the glycan was aligned against our database, and the best
three alignments are shown.
(D) ECA and ECA-like glycans. We aligned the canonical ECA sequence against our entire dataset, curated ECA-like sequences from the best 50 alignments, and
constructed a dendrogram from alignment distances.
ll
OPEN ACCESSResource
transfer-learning and data-augmentation methods for glycan-
focused machine learning, we also addressed the pressing issue
of the limited availability of glycan sequences due to experi-
mental difficulties, enabling machine learning for many applica-
tions in glycobiology.
Our deep-learning strategies enabled us to introduce lan-
guage models for glycans, while our curated datasets offer a
state-of-the-art coverage for glycan sequences across a multi-
tude of organisms. In contrast to word2vec-type models (Miko-
lov et al., 2013), our language-model-based approach captured
sequential information beyond mere co-occurrences in glycan
sequences and thus achieved better predictive results than
alternative machine-learning techniques. This also enabled us
to analyze glycan motifs, such as those important for immuno-
genicity and pathogenicity, that are dependent on sequential
information and their relative position in glycans. Additionally,
starting from a glycoletter-based model allowed for the con-
struction of embeddings for close to 1.2 trillion glycowords,
making SweetTalk easily extendable to the full diversity of gly-
cobiology. SweetTalk can also incorporate position-specific
modifications, illustrating its flexibility and potential for the anal-
ysis of information-rich glycosaminoglycans to predict, for
instance, viral binding such as required for severe acute respi-
ratory syndrome coronavirus 2 (SARS-CoV-2) cell entry (Liu
et al., 2020).
Our resources can be utilized as a complete workflow, from a
glycan dataset to motifs obtained by machine learning and
further analyzed by glycan alignment, or as separate modules.
The accuracy exhibited by our SweetOrigins models demon-
strated that glycans can be used to distinguish closely related
taxonomic groups and provided the means to leverage the
evolutionary information in glycans for predictive purposes.
Our observation that E. coli glycans are predictive of pathoge-
nicity adds to the role of glycans as mediators of host-microbe
relationships (Poole et al., 2018). The continuum of pathogenicity
of E. coli strains, suggested by our deep-learning model, further
adds to the redefinition of the notion of pathogenicity from a bi-
nary concept to a gradual, environmentally controlled process
(Casadevall, 2017), mediated and influenced by glycans.
Both glycan alignments and glycan classification can connect
glycan functions with sequence patterns, which we have used to
derive insight from glycan motifs by analyzing glycans that could
potentially be used for molecular-mimicry-mediated immune
evasion by pathogenic E. coli strains. We further hypothesized
Cell Host & Microbe 29, 132–144, January 13, 2021 141
ll
OPEN ACCESS Resource
that glycan-based molecular mimicry, in addition to mimicking
host glycans, could also extend to approximating glycans from
other bacteria for increased virulence, e.g., as in the case of
the capsular polysaccharides of S. aureus and A. baumannii, in
which we hypothesized that they potentially mimicked the ECA
of other bacteria. Our glycan-alignment method readily facili-
tated a hypothesis of the ECA mimicry performed by glycans
of these pathogens, with a potentially broader relevance of this
phenomenon in other pathogens, such as H. ducreyi, that are
predicted to engage in ECA mimicry as well. In general, the re-
sources developed here enable rapid discovery, understanding,
and utilization of functionally relevant glycan motifs from glycan
datasets, especially in the context of host-pathogen interac-
tions. Another important feature of trained machine-learning
models is the prediction of properties for newly acquired sam-
ples, such as predicting the pathogenic potential of newly iden-
tified E. coli strains based on their glycans. As glycobiology pro-
gresses, SugarBase and our deep-learning models could be
readily expanded and updated, enabling an even more compre-
hensive investigation of glycan-mediated host-microbe interac-
tions. This will eventually allow for precise classification at the
subspecies level using language-model-based approaches,
facilitating the glycan-based study of host-microbe interactions
at unprecedented resolution.
STAR+METHODS
Detailed methods are provided in the online version of this paper
and include the following:
d
KEY RESOURCES TABLE
d RESOURCE AVAILABILITY
B Lead Contact
B Materials Availability
B Data and Code Availability
d
METHOD DETAILS
B Dataset
B Data Processing
B Analyzing Links in Glycan Sequences
B Glycan In Silico Modification
B Glycan Alignment
B Model Training
d
QUANTIFICATION AND STATISTICAL ANALYSIS
SUPPLEMENTAL INFORMATION
Supplemental Information can be found online at https://doi.org/10.1016/j.
chom.2020.10.004.
ACKNOWLEDGMENTS
The authors would like to thank Jacqueline Valeri and Mathieu Groussin for
helpful discussions. This work was supported by the Predictive BioAnalytics
Initiative at the Wyss Institute for Biologically Inspired Engineering.
AUTHOR CONTRIBUTIONS
D.B. conceived the method. D.B., D.M.C., and J.J.C. designed the experi-
ments. D.B. performed the experiments and implemented the method.
R.K.P. developed the SugarBase web tool. D.M.C. and J.J.C. supervised the
work. D.B., R.K.P., D.M.C., and J.J.C. wrote and edited the manuscript.
142 Cell Host & Microbe 29, 132–144, January 13, 2021
DECLARATION OF INTERESTS
The authors declare no competing interests.
Received: June 29, 2020
Revised: September 9, 2020
Accepted: October 8, 2020
Published: October 28, 2020
REFERENCES
Alley, E.C., Khimulya, G., Biswas, S., AlQuraishi, M., and Church, G.M. (2019).
Unified rational protein engineering with sequence-based deep representation
learning. Nat. Methods 16, 1315–1322.
Almagro Armenteros, J.J., Johansen, A.R., Winther, O., and Nielsen, H. (2020).
Language modelling for biological sequences – curated datasets and base-
lines. bioRxiv. https://doi.org/10.1101/2020.03.09.983585.
Banks, K.E., Fortney, K.R., Baker, B., Billings, S.D., Katz, B.P., Munson, R.S.,
Jr., and Spinola, S.M. (2008). The enterobacterial common antigen-like gene
cluster of Haemophilus ducreyi contributes to virulence in humans. J. Infect.
Dis. 197, 1531–1536.
Bardor, M., Faveeuw, C., Fitchette, A.-C., Gilbert, D., Galas, L., Trottein, F.,
Faye, L., and Lerouge, P. (2003). Immunoreactivity in mammals of two typical
plant glyco-epitopes, core alpha(1,3)-fucose and core xylose. Glycobiology
13, 427–434.
Bashir, S., Leviatan Ben Arye, S., Reuven, E.M., Yu, H., Costa, C., Galiñanes,
M., Bottio, T., Chen, X., and Padler-Karavani, V. (2019). Presentation Mode of
Glycans Affect Recognition of Human Serum anti-Neu5Gc IgG Antibodies.
Bioconjug. Chem. 30, 161–168.
Bovin, N., Obukhova, P., Shilova, N., Rapoport, E., Popova, I., Navakouski, M.,
Unverzagt, C., Vuskovic, M., and Huflejt, M. (2012). Repertoire of human nat-
ural anti-glycan immunoglobulins. Do we have auto-antibodies? Biochim.
Biophys. Acta 1820, 1373–1382.
Camacho, D.M., Collins, K.M., Powers, R.K., Costello, J.C., and Collins, J.J.
(2018). Next-Generation Machine Learning for Biological Networks. Cell 173,
1581–1592.
Campbell, M.P., Peterson, R., Mariethoz, J., Gasteiger, E., Akune, Y., Aoki-
Kinoshita, K.F., Lisacek, F., and Packer, N.H. (2014). UniCarbKB: building a
knowledge platform for glycoproteomics. Nucleic Acids Res. 42, D215–D221.
Carlin, A.F., Uchiyama, S., Chang, Y.-C., Lewis, A.L., Nizet, V., and Varki, A.
(2009). Molecular mimicry of host sialylated glycans allows a bacterial path-
ogen to engage neutrophil Siglec-9 and dampen the innate immune response.
Blood 113, 3333–3336.
Casadevall, A. (2017). The Pathogenic Potential of a Microbe. MSphere 2,
e00015–e00017.
Day, C.J., Tran, E.N., Semchenko, E.A., Tram, G., Hartley-Tassell, L.E., Ng,
P.S.K., King, R.M., Ulanovsky, R., McAtamney, S., Apicella, M.A., et al.
(2015). Glycan:glycan interactions: High affinity biomolecular interactions
that can mediate binding of pathogenic bacteria to host cells. Proc. Natl.
Acad. Sci. USA 112, E7266–E7275.
Dekkers, G., Treffers, L., Plomp, R., Bentlage, A.E.H., de Boer, M., Koeleman,
C.A.M., Lissenberg-Thunnissen, S.N., Visser, R., Brouwer, M., Mok, J.Y., et al.
(2017). Decoding the Human Immunoglobulin G-Glycan Repertoire Reveals a
Spectrum of Fc-Receptor- and Complement-Mediated-Effector Activities.
Front. Immunol. 8, 877.
Do�gan, T., and Karaçalı, B. (2013). Automatic identification of highly conserved
family regions and relationships in genome wide datasets including remote
protein sequences. PLoS One 8, e75458.
Dotan, N., Altstock, R.T., Schwarz, M., and Dukler, A. (2006). Anti-glycan an-
tibodies as biomarkers for diagnosis and prognosis. Lupus 15, 442–450.
Geisinger, E., and Isberg, R.R. (2015). Antibiotic modulation of capsular exo-
polysaccharide and virulence in Acinetobacter baumannii. PLoS Pathog. 11,
e1004691.
Gilbreath, J.J., Colvocoresses Dodds, J., Rick, P.D., Soloski, M.J., Merrell,
D.S., and Metcalf, E.S. (2012). Enterobacterial common antigen mutants of
https://doi.org/10.1016/j.chom.2020.10.004
https://doi.org/10.1016/j.chom.2020.10.004
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref1
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref1
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref1
https://doi.org/10.1101/2020.03.09.983585
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref3
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref3
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref3
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref3
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref4
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref4
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref4
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref4
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref5
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref5
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref5
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref5
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref6
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref6
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref6
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref6
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref7
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref7
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref7
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref8
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref8
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref8
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref9
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref9
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref9
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref9
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref10
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref10
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref11
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref11
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref11
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref11
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref11
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref12
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref12
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref12
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref12
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref12
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref13
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref13
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref13
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref13
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref14
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref14
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref15
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref15
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref15
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref16
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref16
ll
OPEN ACCESSResource
Salmonella enterica serovar Typhimurium establish a persistent infection and
provide protection against subsequent lethal challenge. Infect. Immun. 80,
441–450.
Glorot, X., and Bengio, Y. (2010). Understanding the difficulty of training deep
feedforward neural networks, in: Proceedings of the Thirteenth International
Conference on Artificial Intelligence and Statistics. Presented at the
Proceedings of the Thirteenth International Conference on Artificial
Intelligence and Statistics, pp. 249–256.
Greenfield, L.K., Richards, M.R., Li, J., Wakarchuk, W.W., Lowary, T.L., and
Whitfield, C. (2012). Biosynthesis of the polymannose lipopolysaccharide O-
antigens from Escherichia coli serotypes O8 and O9a requires a unique com-
bination of single- and multiple-active site mannosyltransferases. J. Biol.
Chem. 287, 35078–35091.
Haines-menges, B.L., Whitaker, W.B., Lubin, J.B., and Boyd, E.F. (2015). Host
Sialic Acids: A Delicacy for the Pathogen with Discerning Taste. In Metabolism
and Bacterial Pathogenesis, C. Conway, ed. (American Society of
Microbiology), pp. 321–342.
Haltiwanger, R.S., and Lowe, J.B. (2004). Role of glycosylation in develop-
ment. Annu. Rev. Biochem. 73, 491–537.
Hochreiter, S., and Schmidhuber, J. (1997). Long short-term memory. Neural
Comput. 9, 1735–1780.
Hong, Y., and Reeves, P.R. (2014). Diversity of o-antigen repeat unit structures
can account for the substantial sequence variation of wzx translocases.
J. Bacteriol. 196, 1713–1722.
Howard, J., and Ruder, S. (2018). Universal Language Model Fine-tuning for
Text Classification. arXiv.
Kappler, K., and Hennet, T. (2020). Emergence and significance of carbohy-
drate-specific antibodies. Genes Immun. 21, 224–239.
Khasbiullina, N.R., Shilova, N.V., Navakouski, M.J., Nokel, A.Yu., Blixt, O.,
Kononov, L.O., Knirel, Y.A., and Bovin, N.V. (2019). The Repertoire of
Human Antiglycan Antibodies and Its Dynamics in the First Year of Life.
Biochemistry (Mosc.) 84, 608–616.
Kiser, K.B., and Lee, J.C. (1998). Staphylococcus aureus cap5O and cap5P
genes functionally complement mutations affecting enterobacterial com-
mon-antigen biosynthesis in Escherichia coli. J. Bacteriol. 180, 403–406.
Knirel, Y.A. (2011). Structure of O-Antigens. In Bacterial Lipopolysaccharides,
Y.A. Knirel and M.A. Valvano, eds. (Springer Vienna), pp. 41–115.
Lairson, L.L., Henrissat, B., Davies, G.J., and Withers, S.G. (2008).
Glycosyltransferases: structures, functions, and mechanisms. Annu. Rev.
Biochem. 77, 521–555.
Lauc, G., Kri�sti�c, J., and Zoldo�s, V. (2014). Glycans – the third revolution in evo-
lution. Front. Genet. 5, 145.
Lavine, C.L., Lao, S., Montefiori, D.C., Haynes, B.F., Sodroski, J.G., and Yang,
X.; NIAID Center for HIV/AIDS Vaccine Immunology (CHAVI) (2012). High-
mannose glycan-dependent epitopes are frequently targeted in broad neutral-
izing antibody responses during human immunodeficiency virus type 1 infec-
tion. J. Virol. 86, 2153–2164.
Lim, J.Y., Yoon, J., and Hovde, C.J. (2010). A brief overview of Escherichia coli
O157:H7 and its plasmid O157. J. Microbiol. Biotechnol. 20, 5–14.
Liu, L., Chopra, P., Li, X., Wolfert, M.A., Tompkins, S.M., and Boons, G.-J.
(2020). SARS-CoV-2 spike protein binds heparan sulfate in a length- and
sequence-dependent manner. bioRxiv. 2020.05.10.087288. https://doi.org/
10.1101/2020.05.10.087288.
Lombard, V., Golaconda Ramulu, H., Drula, E., Coutinho, P.M., and Henrissat,
B. (2014). The carbohydrate-active enzymes database (CAZy) in 2013. Nucleic
Acids Res. 42, D490–D495.
Lundberg, S.M., and Lee, S.-I. (2017). A Unified Approach to Interpreting
Model Predictions. In Advances in Neural Information Processing Systems,
Volume 30, I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S.
Vishwanathan, and R. Garnett, eds. (Curran Associates, Inc), pp. 4765–4774.
McDonald, A.G., Tipton, K.F., and Davey, G.P. (2016). A Knowledge-Based
System for Display and Prediction of O-Glycosylation Network Behaviour in
Response to Enzyme Knockouts. PLoS Comput. Biol. 12, e1004844.
Mikolov, T., Chen, K., Corrado, G., and Dean, J. (2013). Efficient Estimation of
Word Representations in Vector Space. arXiv.
Mitchell, A.M., Srikumar, T., and Silhavy, T.J. (2018). Cyclic Enterobacterial
Common Antigen Maintains the Outer Membrane Permeability Barrier of
Escherichia coli in a Manner Controlled by YhdP. mBio 9, e01321-18.
Needleman, S.B., and Wunsch, C.D. (1970). A general method applicable to
the search for similarities in the amino acid sequence of two proteins.
J. Mol. Biol. 48, 443–453.
Park, D., Xu, G., Barboza, M., Shah, I.M., Wong, M., Raybould, H., Mills, D.A.,
and Lebrilla, C.B. (2017). Enterocyte glycosylation is responsive to changes in
extracellular conditions: implications for membrane functions. Glycobiology
27, 847–860.
Paschinger, K., Fabini, G., Schuster, D., Rendi�c, D., and Wilson, I.B.H. (2005).
Definition of immunogenic carbohydrate epitopes. Acta Biochim. Pol. 52,
629–632.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen,
T., Lin, Z., Gimelshein, N., Antiga, L., et al. (2019). PyTorch: An Imperative
Style, High-Performance Deep Learning Library. arXiv.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn:
Machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830.
Perez, L., and Wang, J. (2017). The Effectiveness of Data Augmentation in
Image Classification using Deep Learning. arXiv.
Pochechueva, T., Jacob, F., Fedier, A., and Heinzelmann-Schwarz, V. (2012).
Tumor-associated glycans and their role in gynecological cancers: acceler-
ating translational research by novel high-throughput approaches.
Metabolites 2, 913–939.
Poole, J., Day, C.J., von Itzstein, M., Paton, J.C., and Jennings, M.P. (2018).
Glycointeractions in bacterial pathogenesis. Nat. Rev. Microbiol. 16, 440–452.
Reusch, D., and Tejada, M.L. (2015). Fc glycans of therapeutic antibodies as
critical quality attributes. Glycobiology 25, 1325–1334.
Samraj, A.N., Bertrand, K.A., Luben, R., Khedri, Z., Yu, H., Nguyen, D., Gregg,
C.J., Diaz, S.L., Sawyer, S., Chen, X., et al. (2018). Polyclonal human anti-
bodies against glycans bearing red meat-derived non-human sialic acid N-gly-
colylneuraminic acid are stable, reproducible, complex and vary between indi-
viduals: Total antibody levels are associated with colorectal cancer risk. PLoS
One 13, e0197464.
Sherstinsky, A. (2020). Fundamentals of Recurrent Neural Network (RNN) and
Long Short-Term Memory (LSTM) network. Phys. Nonlinear Phenom. 404,
132306.
Silipo, A., and Molinaro, A. (2010). The Diversity of the Core Oligosaccharide in
Lipopolysaccharides. In Endotoxins: Structure, Function and Recognition, X.
Wang and P.J. Quinn, eds. (Springer Netherlands), pp. 69–99.
Solá, R.J., and Griebenow, K. (2009). Effects of glycosylation on the stability of
protein pharmaceuticals. J. Pharm. Sci. 98, 1223–1245.
Spahn, P.N., Hansen, A.H., Hansen, H.G., Arnsdorf, J., Kildegaard, H.F., and
Lewis, N.E. (2016). A Markov chain model for N-linked protein glycosylation–
towards a low-parameter tool for model-driven glycoengineering. Metab.
Eng. 33, 52–66.
Strodthoff, N., Wagner, P., Wenzel, M., and Samek, W. (2020). UDSMProt: uni-
versal deep sequence models for protein classification. Bioinformatics 36,
2401–2409.
Tan, C., Sun, F., Kong, T., Zhang, W., Yang, C., and Liu, C. (2018). A Survey on
Deep Transfer Learning. arXiv.
Tanaka, K., Aoki-Kinoshita, K.F., Kotera, M., Sawaki, H., Tsuchiya, S., Fujita,
N., Shikanai, T., Kato, M., Kawano, S., Yamada, I., and Narimatsu, H. (2014).
WURCS: the Web3 unique representation of carbohydrate structures.
J. Chem. Inf. Model. 54, 1558–1566.
Thompson, A.J., de Vries, R.P., and Paulson, J.C. (2019). Virus recognition of
glycan receptors. Curr. Opin. Virol. 34, 117–129.
Tiemeyer, M., Aoki, K., Paulson, J., Cummings, R.D., York, W.S., Karlsson,
N.G., Lisacek, F., Packer, N.H., Campbell, M.P., Aoki, N.P., et al. (2017).
GlyTouCan: an accessible glycan structure repository. Glycobiology 27,
915–919.
Cell Host & Microbe 29, 132–144, January 13, 2021 143
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref16
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref16
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref16
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref18
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref18
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref18
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref18
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref18
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref19
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref19
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref19
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref19
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref20
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref20
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref21
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref21
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref22
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref22
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref22
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref23
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref23
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref24
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref24
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref25
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref25
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref25
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref25
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref26
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref26
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref26
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref27
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref27
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref28
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref28
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref28
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref29
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref29
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref29
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref29
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref29
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref30
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref30
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref30
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref30
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref30
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref31
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref31
https://doi.org/10.1101/2020.05.10.087288
https://doi.org/10.1101/2020.05.10.087288
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref33
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref33
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref33
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref34
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref34
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref34
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref34
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref35
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref35
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref35
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref36
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref36
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref37
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref37
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref37
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref38
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref38
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref38
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref39
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref39
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref39
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref39
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref40
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref40
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref40
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref40
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref41
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref41
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref41
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref42
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref42
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref42
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref43
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref43
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref44
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref44
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref44
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref44
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref45
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref45
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref46
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref46
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref47
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref47
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref47
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref47
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref47
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref47
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref48
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref48
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref48
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref49
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref49
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref49
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref50
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref50
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref51
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref51
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref51
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref51
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref52
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref52
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref52
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref53
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref53
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref54
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref54
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref54
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref54
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref55
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref55
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref56
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref56
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref56
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref56
ll
OPEN ACCESS Resource
Toukach, P.V., and Egorova, K.S. (2016). Carbohydrate structure database
merged from bacterial, archaeal, plant and fungal parts. Nucleic Acids Res.
44 (D1), D1229–D1236.
Tsuchiya, S., Yamada, I., and Aoki-Kinoshita, K.F. (2019).
GlycanFormatConverter: a conversion tool for translating the complexities of
glycans. Bioinformatics 35, 2434–2440.
Tzianabos, A.O., Wang, J.Y., and Lee, J.C. (2001). Structural rationale for the
modulation of abscess formation by Staphylococcus aureus capsular poly-
saccharides. Proc. Natl. Acad. Sci. USA 98, 9365–9370.
Valeri, J.A., Collins, K.M., Ramesh, P., Alcantar, M.A., Lepe, B.A., Lu, T.K., and
Camacho, D.M. (2020). Sequence-to-function deep learning frameworks for
engineered riboregulators. Nat Commun 11, 5058, https://doi.org/10.1038/
s41467-020-18676-2.
Varki, A. (2017). Biological roles of glycans. Glycobiology 27, 3–49.
144 Cell Host & Microbe 29, 132–144, January 13, 2021
Varki, A., and Gagneux, P. (2015). Biological Functions of Glycans. In
Essentials of Glycobiology, A. Varki, R.D. Cummings, J.D. Esko, P. Stanley,
G.W. Hart, M. Aebi, A.G. Darvill, T. Kinoshita, N.H. Packer, and J.H.
Prestegard, et al., eds. (Cold Spring Harbor Laboratory Press).
Viljanen, M.K., Peltola, T., Junnila, S.Y., Olkkonen, L., J€arvinen, H., Kuistila, M.,
and Huovinen, P. (1990). Outbreak of diarrhoea due to Escherichia coli
O111:B4 in schoolchildren and adults: association of Vi antigen-like reactivity.
Lancet 336, 831–834.
Weidenmaier, C., and Lee, J.C. (2015). Structure and Function of Surface
Polysaccharides of Staphylococcus aureus. In Staphylococcus Aureus, F.
Bagnoli, R. Rappuoli, and G. Grandi, eds. (Springer International Publishing),
pp. 57–93.
Wu, D., Struwe, W.B., Harvey, D.J., Ferguson, M.A.J., and Robinson, C.V.
(2018). N-glycan microheterogeneity regulates interactions of plasma pro-
teins. Proc. Natl. Acad. Sci. USA 115, 8763–8768.
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref57
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref57
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref57
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref58
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref58
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref58
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref59
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref59
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref59
https://doi.org/10.1038/s41467-020-18676-2
https://doi.org/10.1038/s41467-020-18676-2
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref60
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref61
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref61
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref61
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref61
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref62
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref62
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref62
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref62
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref62
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref63
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref63
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref63
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref63
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref64
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref64
http://refhub.elsevier.com/S1931-3128(20)30562-X/sref64
ll
OPEN ACCESSResource
STAR+METHODS
KEY RESOURCES TABLE
REAGENT or RESOURCE SOURCE IDENTIFIER
Software and Algorithms
PyTorch Paszke et al., 2019 https://github.com/pytorch/pytorch
Scikit-learn Pedregosa et al., 2011 https://github.com/scikit-learn/scikit-learn
Apex N/A https://github.com/NVIDIA/apex
Python-alignment N/A https://github.com/eseraygun/python-alignment
SHAP Lundberg and Lee, 2017 https://github.com/slundberg/shap
SweetTalk This paper https://github.com/midas-wyss/sweettalk
SweetOrigins This paper https://github.com/midas-wyss/sweetorigins
SugarBase This paper https://webapps.wyss.harvard.edu/sugarbase
RESOURCE AVAILABILITY
Lead Contact
Communication should be directed to the lead contact, James J. Collins (jimjc@mit.edu).
Materials Availability
This study did not generate new unique reagents.
Data and Code Availability
Data used for all analyses can be found in the supplementary tables. All code and trained models can be found at https://github.com/
midas-wyss/sweettalk and https://github.com/midas-wyss/sweetorigins.
METHOD DETAILS
Dataset
To create a comprehensive glycan dataset annotated with species labels, we manually curated 12,674 glycan sequences from three
sources: UniCarbKB (Campbell et al., 2014), the Carbohydrate Structure Database (CSDB) (Toukach and Egorova, 2016), and the
peer-reviewed scientific literature. From UniCarbKB, we compiled all glycans with species information, a length of at least three
monosaccharides to facilitate usage with machine learning models, and a working link to PubChem to retrieve their sequences.
We further complemented and extended this list by gathering glycans deposited in the Carbohydrate Structure Database (CSDB)
up to December 2019 with a length of at least three monosaccharides. For species with more than 15 strains available on CSDB,
only glycans from the first 15 strains were recorded to prevent taxonomic bias. For the model organism E. coli, all available glycan
sequences were recorded to facilitate a strain-based analysis. Labels for E. coli strain pathogenicity were assigned, if possible, via
the peer-reviewed academic literature. Finally, we performed additional literature searches, predominantly adding viral and archaeal
glycans, which are underrepresented in the other databases. We revised and completed the annotations for all species’ taxonomic
characterization (species, genus, family, order, class, phylum, kingdom, domain) based on the NCBI Taxonomy Browser. In total, the
dataset contained sequences from 1,726 different species from a range of 39 taxonomic phyla. To the best of our knowledge, this
database represents the most comprehensive and current resource of glycans and their species information to date (Table S1).
To enable transfer learning by first pre-training a language model, we also added glycan sequences that lacked species informa-
tion, by extracting the Web3 Unique Representation of Carbohydrate Structures (WURCS) representation (Tanaka et al., 2014) of the
set of all glycans with at least three monosaccharides deposited on GlyTouCan (Tiemeyer et al., 2017) that were also available on
PubChem (n = 18,926) and the databases mentioned above; this resulted in an augmented database containing 19,299 unique glycan
sequences (Table S2). For all glycans, we relied on the quality control of the respective database. All glycans in WURCS represen-
tation were reformatted into the IUPAC condensed representation, using the GlycanFormatConverter software (Tsuchiya et al.,
2019). For the immunogenicity classifier, all GlycoEpitope (https://www.glycoepitope.jp) entries with a minimum length of at least
three monosaccharides were extracted. This list was further complemented by targeted literature searches (Bardor et al., 2003; Ba-
shir et al., 2019; Bovin et al., 2012; Dotan et al., 2006; Hong and Reeves, 2014; Khasbiullina et al., 2019; Knirel, 2011; Paschinger et al.,
2005; Pochechueva et al., 2012; Samraj et al., 2018; Silipo and Molinaro, 2010) resulting in the final set of immunogenic glycans
(n = 685, Table S2). We included protein-, lipid-, and small molecule-associated glycans as well as capsular and extracellular
Cell Host & Microbe 29, 132–144.e1–e3, January 13, 2021 e1
mailto:jimjc@mit.edu
https://github.com/midas-wyss/sweettalk
https://github.com/midas-wyss/sweettalk
https://github.com/midas-wyss/sweetorigins
https://www.glycoepitope.jp
https://github.com/pytorch/pytorch
https://github.com/scikit-learn/scikit-learn
https://github.com/NVIDIA/apex
https://github.com/eseraygun/python-alignment
https://github.com/slundberg/shap
https://github.com/midas-wyss/sweettalk
https://github.com/midas-wyss/sweetorigins
https://webapps.wyss.harvard.edu/sugarbase
ll
OPEN ACCESS Resource
polysaccharides in our dataset of 19,299 glycans. All these glycans were paired with an ID to allow for our relational database Sugar-
Base, linking all available information (linkage type, species information, human immunogenicity, etc.) to a glycan sequence (Table
S2). Additionally, we included representations learned by our language model for all observed glycoletters (monosaccharides or
bonds) as well as glycowords (trisaccharides).
Data Processing
Glycan sequences were processed by removing dangling bonds (e.g., ‘(a1-’). Analogous to word stemming in natural language pro-
cessing, unifying different inflections of the same word, we removed position-specific information of monosaccharide modifications
to reduce vocabulary size. Then, we harmonized capitalization and, in the case of glycan repeat structures, appended the first mono-
saccharide to their end to capture more sequence context. Additional steps to exclude duplicated glycans included strict ordering of
multiple branches with equal lengths by ascending connection to the main branch (e.g., branch ending in ‘a1-2’ before branch ending
in ‘b1-4’). For branches closest to the non-reducing end, the longest branch was defined as the main chain. Observed monosaccha-
ride modifications necessitated a hierarchy of order (in case of multiple modifications on the same monosaccharide) to avoid dupli-
cates or mislabeling: NAc > OAc > NGc > OGc > NS > OS > NP > OP > NAm > OAm > NBut > OBut > NProp > OProp > NMe > OMe >
CMe > NFo > OFo > OPPEtn > OPEtn > OEtn > A > N > SH > OPCho > OPyr > OVac > OPam > OEtg > OFer > OSin > OAep > OCoum >
ODco > OLau > OSte > OOle > OBz > OCin > OAch > OMal > OMar > OOrn > rest.
Data processing for model training included featurization of glycan sequences into glycoletters (e.g., ‘Gal’), as well as glycowords
(three monosaccharides connected by two bonds). The conversion of a glycan sequence into glycowords, from the non-reducing to
the reducing end, resulted in a list of partially overlapping glycowords, with maximum overlap so that two subsequent glycowords
only differed in one monosaccharide and one bond. The aim of these glycowords is to capture representative characteristics and
local structural contexts of a given glycan. The dataset comprising all glycowords (n = 113,112) was then used to train a context-spe-
cific, glycoletter-based language model. For scrambled glycan sequences, the order of glycoletters in any given glycan was randomly
shuffled to maintain composition but erase patterns. All abbreviations for glycan nomenclature in this work can be found in Table S7.
Analyzing Links in Glycan Sequences
To determine typical local structural contexts of monosaccharides and bonds, we quantified the frequency of a given monosaccha-
ride co-occurring with any other monosaccharide in our extensive database of unique glycans. Additionally, we also compared the
relative frequencies of a particular monosaccharide being observed in the glycan main branch versus a side branch in our database.
Glycan In Silico Modification
We performed in silico modification of glycans by replacing monosaccharides and/or bonds with other observed monosaccharides/
bonds. We used exhaustive modification, replacing glycoletters with all possible glycoletters, while only retaining modified glycans
comprising previously observed glycowords. This ensured physiological relevance, given the extreme sparsity of observed glycan
sequences compared to the theoretical number of possibilities.
Glycan Alignment
Global sequence alignment of glycans was implemented according to the Needleman-Wunsch algorithm (Needleman and Wunsch,
1970) by adapting the Python Alignment library (https://github.com/eseraygun/python-alignment). For our GLYcan SUbstitution Ma-
trix (GLYSUM; Table S6), the exhaustive list of in silico modifications resulting in glycans with observed glycowords was generated
(n = 1,238,879). All thereby observed monosaccharide and/or bond substitutions were recorded in a symmetric matrix and converted
into substitution frequencies by dividing them by the total number of retained modifications. The substitution score Sij for each
possible substitution was then calculated with the following formula:
Sij = l log
�
pij
qi � qj
�
The substitution frequency is hereby denoted as pij, while qi and qj describe the observed base frequencies of the respective gly-
coletters. Additionally, we used l as a scaling factor (a value of four in this work) to arrive at suitable integer values by rounding all
values up or down. Substitutions never observed during this procedure received a final value of �5, lower than any of the observed
substitution scores, while the diagonal values of the substitution matrix were set at 5, higher than any of the observed substitution
scores. The penalty for gaps for alignments in this work was set at �5, to match the minimal substitution score.
Model Training
All models were trained on an NVIDIA� Tesla� K80 GPU using PyTorch (Paszke et al., 2019). For all models, architecture and hyper-
parameters were optimized by minimizing the respective loss function. For the language models, we used mixed precision training
utilizing the Apex library (https://github.com/nvidia/apex). For language models and classifiers, we randomly split the respective da-
taset into 80% for training and 20% for validation. A modified stratified shuffle split was used to randomly split glycans into training
and validation sets for the species classifier so that, for every class, 80% of the glycans were present in the training set and 20% in the
validation set. Further, only classes comprising at least five glycans were used for training and testing the SweetOrigins models. We
e2 Cell Host & Microbe 29, 132–144.e1–e3, January 13, 2021
https://github.com/eseraygun/python-alignment
https://github.com/nvidia/apex
ll
OPEN ACCESSResource
employed data augmentation by forming a generalizable subset of all possible isomorphic glycans if a glycan sequence had isomor-
phic glycans. Specifically, we swapped the order of double branches and exhaustively exchanged the main branch with the side
branches closest to the non-reducing end in the bracket notation (Figure 3B). The resulting sequence in the bracket notation still
described the same glycan in a slightly different way, increasing model robustness during training. Glycans were converted into lists
of glycowords describing the glycans, brought to equal lengths using a padding token facilitating model training, and used in batches
of 32 glycans for training and testing.
SweetTalk and the SweetOrigins models for each taxonomic level consisted of a three-layered, bidirectional recurrent neural
network using long short-term memory (LSTM) units (Hochreiter and Schmidhuber, 1997) with 128 nodes per layer, including an
embedding layer for the glycowords. The concatenated hidden representation learned by the bidirectional LSTMs was then pro-
jected to a fully connected layer at the end for the final prediction. The language model SweetTalk was trained by predicting the
next glycoletters, given preceding glycoletters, in the context of glycowords, thereby learning the local structural context of glyco-
letters. The embedding layer for classifiers was derived by first training a glycoletter-based language model and then extracting the
learned glycoletters embedding and calculating initial glycoword embeddings for SweetOrigins. The last, fully connected layer in all
models was initialized by Xavier initialization (Glorot and Bengio, 2010) and the number of nodes was determined by the number of
classes for each classifier. We used a cross-entropy loss function and the ADAM optimizer with a starting learning rate of 0.0001
(decaying it with a cosine function over 100 epochs during training) and a weight decay of 0.005. Additionally, we employed an early
stopping criterion after 10 epochs without improvement in validation loss for regularization.
The model for predicting E. coli strain pathogenicity followed the same architecture except for using 150 nodes per layer, a binary
cross-entropy loss function, and a learning rate of 0.0005. Machine learning models used for comparison comprised random forest
classifiers and support vector machines for classification. For the implementation of these models, we used the scikit-learn imple-
mentation (Pedregosa et al., 2011). Feature importances were extracted using SHAP (SHapley Additive exPlanations) values (Lund-
berg and Lee, 2017). Hyperparameters for all methods were optimized by maximization of accuracy via 5-fold cross-validation.
QUANTIFICATION AND STATISTICAL ANALYSIS
This study did not use statistical analysis. All experimental details can be found in the STAR Methods section.
Cell Host & Microbe 29, 132–144.e1–e3, January 13, 2021 e3
- Deep-Learning Resources for Studying Glycan-Mediated Host-Microbe Interactions
Introduction
Results
Curating Glycan Datasets for Glycobiology and Glycan-Mediated Host-Microbe Interactions
Using Natural Language Processing to Learn the Grammar of Glycans
Predicting Glycan Immunogenicity with a Glycan-Based Language Model
Using Deep Learning to Provide Evolution-Informed Glycan Representations
Using Glycan Alignments to Study Virulence Determinants in Bacterial Pathogens
Discussion
Supplemental Information
Acknowledgments
Author Contributions
Declaration of Interests
References
STAR★Methods
Key Resources Table
Resource Availability
Lead Contact
Materials Availability
Data and Code Availability
Method Details
Dataset
Data Processing
Analyzing Links in Glycan Sequences
Glycan In Silico Modification
Glycan Alignment
Model Training
Quantification and Statistical Analysis
Running Head: CRITIQUE OF OCEAN TEMPERATURES IN CORAL REEFS
CRITIQUE OF OCEAN TEMPERATURES IN CORAL REEFS Madison McNeill
Introduction
Coral reef ecosystems are the most diverse marine ecosystem in the world. They provide a home to thousands of species of plants and animals. In the last few decades, global warming has caused increased temperatures, resulting in ocean acidification and increasing surface temperatures of the ocean. This can lead to the bleaching of coral reefs as well as the death of coral reef fishes due to their inability to acclimate to the elevated temperature. These three papers were chosen, because they illustrate the environmental impact higher temperatures have on these coral reefs and the organisms that live within them.
·
Dias, M., Ferreira, A., Gouveia, R., Cereja, R., & Vinagre, C. (
2
018). Mortality, growth and regeneration following fragmentation of reef-forming corals under thermal stress. Journal of Sea Research, 141, 71-82. doi: 10.1016/j.seares.2018.08.008
.
·
De’ath, G., Lough, J., & Fabricius, K. (200
9
). Declining Coral Calcification on the Great Barrier Reef. Science, 323(5910), 116-119. doi: 10.1126/science.1165283
.
·
Nilsson, G., Östlund-Nilsson, S., & Munday, P. (2010). Effects of elevated temperature on coral reef fishes: Loss of hypoxia tolerance and inability to acclimate. Comparative Biochemistry and Physiology Part A: Molecular & Integrative Physiology, 156(4), 389-393. doi: 10.1016/j.cbpa.2010.03.009
.
Dias (2018) evaluated how elevated surface temperatures of the ocean affected growth, mortality, and regeneration following the fragmentation of nine coral species in the Indo-Pacific, while De’ath (2009) suggested that the ability of coral in the Great Barrier Reef may have depleted due to a decrease in the saturation state of aragonite and rising temperature stress in this region. The third paper evaluated, Nilsson (2010), examined whether or not an elevated temperature decreased tolerance levels for low-oxygen regions in two species of coral reef fishes. This experiment used adults fishes of two species and tested their ability to acclimate to changes in higher temperatures, which differed from the other two studies in that Dias and De’Ath did not study the fishes in the ecosystems, only the coral there. Dias found that whether or not a coral had previous injury did not impact the mortality, partial mortality, or rate of growth of each fragment. However, the species of coral and the ocean temperature had significant impacts on the results for each fragment. Although the cause for coral calcification of Great Barrier Reef corals was not determined by the De’ath’s study, he did find that it was largely related to increasing temperatures of oceans, which caused more thermal stress in coral populations. This differed from the Nilsson paper, which showed that certain species of coral reef fishes were unable to adjust to higher ocean temperatures, a phenomenon that has occurred due to global warming and ocean acidification.
Analysis
Introduction
When the three articles’ introductions were evaluated, some similarities as well as dissimilarities stood out. For example, the titles of the articles varied in appropriateness. Nilsson’s title was too long. The paper had a title that told its audience what the researchers hoped to get out of it, but the title seemed long and bulky. The title, in my opinion, could have been shortened or rephrased to one that grabbed the audience’s attention more quickly, even a change as simple as changing the title to, “Effects of elevated temperature on coral reef fishes.” However, Dias’s title was accurate and concise. “Mortality, growth and regeneration following fragmentation” was a title that accurately explained what was being examined within the confines of this study. De’ath had a title that matched the contents of the paper as well.
The abstract’s statement of purpose of all three articles matched the introductions. For the Dias article, they stressed that the impacts of thermal stress on fragments of regenerating coral species needed to quickly be explored, while De’ath’s abstract was well written, telling readers how many coral colonies were studied and what the results showed. The abstract of Nilsson’s paper plainly stated what occurred within the first two sentences. The abstract’s statement of purpose for this article was to display how two species of coral reef fishes in the Great Barrier Reef are failing to acclimate to higher sea surface temperatures. This was plainly stated in both the abstract and introduction of the article.
The hypotheses of the three articles varied greatly. Dias stated that the change in the global climate has led to rising sea surface temperatures and ocean acidification, which jeopardized coral reef survival. With this sentence, Dias made it clear why his study efforts were so urgent. Nilsson followed a similar pattern when he clearly stated his concerns for the inability of coral reef fishes to acclimate to rising water temperatures. De’ath’s hypothesis was stated in the abstract, which said that his study suggested that the increasing thermal stress may be depleting the ability of Great Barrier Reef corals to deposit calcium carbonate. Thus, the hypotheses of all three articles were given. I also found that Dias, De’ath, and Nilsson all had a nice way of arranging their data, which allowed the information to build to what the experimental design included and what the researchers were hoping to accomplish from this experiment.
Methods
The sample selection among the three articles showed great contrast. The Dias paper used nine reef-forming coral species, while De’ath’s experiment studied 328 colonies of coral from the same genus, Porites, which is a stony coral. Nilsson studied adults of two species of coral reef fishes. For Dias’s paper, the methods were easy to follow and seemed easy to repeat, while De’ath’s methods were harder to follow, for the details did not appear to all be listed. The Methods section of the Nilsson article was both valid and delivered with enough detail that another group could perform most of this study again. Only most of the experiment, because although the article listed when and where the experiment was conducted, the number of each species of adult coral reef fish caught and analyzed was not given in the Methods sections of the paper. This information is crucial, because a small sample size could invalidate the data, while a large sample size could support the data more accurately. Furthermore, if one species of coral reef fishes had a much larger or small number than the other species, the data would also not be well represented in the results found by this study. The number of samples for both of the other articles were given.
While some articles had strong Methods sections, others were missing key components. The experimental design for the three articles chosen all seemed valid. De’ath’s study seemed valid for the experiment being conducted, though I am unsure that this study could be repeated using the paper alone. The experimental design did make sense overall, in that Porites is commonly chosen for sclerochronological analyses because they have annual density bands that are widely distributed. Portites coral also has the capability of growing for hundreds of years, so choosing this genus of coral for a large analysis made sense. Using the three growth parameters De’ath mentioned—skeletal density, calcification rate, and annual extension rate—are good parameters to look at in a genus of coral that has such a long life span. Dias justified every step of his experimental design, making it easy to repeat the process. For instance, Dias utilized contrasting morphologies, because they have different susceptibilities to thermal stress, giving the overall results more credit. These corals were held in captivity for several years, giving the researchers knowledge of the corals’ thermal history. Twenty fragments were cut from each of the nine species, half of which were used as a control. Sources of variations were eliminated in this process by cutting only one coral from each colony. These methods appear valid, and each is given a reason as to why a scientist would conduct the experiment in this way, making the overall flow of the methods logical and easy to follow. This was similar to De’ath’s paper in that De’ath listed the parameters used to test the samples, and he mention that Porites has such a long lifespan, so these types of corals have been proven to record environmental changes within their skeletons. This statement justified why De’ath chose this coral and explained why these particular parameters were chosen. However, he did not specify how to conduct these analyses. Also, although the data was collected within a two-month period for both Dias and Nilsson’s experiments, De’ath’s experiments was a composite collection from the years 1900-2006, containing over sixteen thousand annuals records with corals ranging from ten to 436 years old. Hence, the broad range of years the specimens were collected was overwhelming, not to mention the three growth parameters the paper mentioned but again failed to explain. Lastly, for Nilsson’s experimental design, I found that it was carried out well, using adults of the two species of coral reef fishes and varying temperatures that supported their hypothesis. However, not including the number of each species caught negates the data to a certain degree. Overall, I found that the Methods section of Nilsson’s paper was logical, but it did not contain details that were pertinent to this experiment, whereas Dias included all pertinent information and De’ath failed to include how he performed the parameters that were chosen.
Results
Since the concentration for each study varied, the results were also quite different in composition. Dias et al. (2018) found that injury—whether present or absent—had no impact on the death or growth rate of the coral fragments studied. The researchers determined that the true factors that impacted death and growth rate of the corals analyzed were temperature and the coral species itself. These results were illustrated using tables and figures. Table 1 was difficult to follow, because some of the columns were abbreviated using terms not explained within the content of the article. However, the numbers in the table coincide with the text, showing that injury did not impact the growth or mortality rate of the coral fragments used in this experiment. The results found in De’ath’s paper were easy to follow, but showed that the cause of decline in coral populations in the GBR were still not known. Within the Results section of the articles written by Nilsson and De’ath, I saw that the figures and tables matched the text without repeating the same information to the audience. The figures and tables were accurate with what the text had previously stated, showing P values that were statistically significant, and the data was very easy to understand. The table in Dias’s article could have been better presented if the abbreviations used had some type of key that denote what each header meant. I did not find any discrepancies among the figures and text of the three articles as far as percentages were concerned.
The results found in the three studies did test the hypothesis of the researchers. For Dias’s paper, these results were shown in Figure 1, which illustrated that as temperature increased, the mortality rate of coral species also increased. As Dias mentioned in the Abstract section, there were two coral species that survived this experiment, Turbinaria reniformis and Galaxea fascicularis. However, the results of the Nilsson paper were to test the hypothesis of the researchers in that study, which was that after a given number of days in varying temperatures, the coral reef fishes studied would fail to acclimate to those temperature changes. Again, in the third paper, De’ath’s results tested the hypothesis, including 328 colonies of massive corals form 69 various reefs, which made the results more broad.
Discussion
I found that none of the three articles repeated the same information in both the figures and the text of the article. From the Discussion section of the Dias paper, a reader could tell the main points of the article, which were to show that there was variability in the susceptibility to thermal stress of different coral reef species. These coral reef species had the lowest mortality, partial mortality, and levels of bleaching at 26 degrees Celsius, while their growth rate was at its zenith at this temperature. Dias found that the regeneration rate of corals generally increased as the temperature increased. These results also show that the bleaching resistance capacity of most of the corals analyzed was overcome at 32 degrees Celsius. Because this paper is so new—published in 2018—I could not find its interpretations to be supported by other research. However, the article does list the direction in which the research is headed and lists other studies similar to this one.
The findings of De’ath’s and Dias’s articles were supported by each other, as well as many other articles over coral reef ecology. However, I found very few articles that supported the conclusions drawn by Nilsson (2010) about the effects increasing temperatures had on two species of coral reef fishes, which was a weakness for the paper. In all three journal articles, I found that the interpretation of data was logical.
Conclusion
Summary
The three articles, overall, had both strengths and weaknesses. For instance, all three papers were peer reviewed. Nilsson’s paper had few other articles that backed up its findings, while Dias and De’ath backed up each other’s paper. The article by Dias (2018) was very recent, which made it one of the newest published papers in its field, while the De’ath and Nilsson papers were a few years older. Although the Dias paper had tables and figures that were not entirely straight forward, the content of the article itself was very easy to follow. Each section within the article was set apart, whereas in the article by De’ath, the sections (introduction, methods, etc.) were not separated from each other. The Dias and De’ath papers had appropriate titles, while Nilsson’s title was too long. The abstracts and introductions match for all three articles. The pace for the Dias article is great, leading the audience straight into the hypothesis and objectives for the analysis, while De’ath failed to separate his paper into different sections. Overall, Dias and the other researchers took many steps to ensure accurate results, and the Methods section of this article is explained well enough to be repeated, which differed from Nilsson’s paper in that Nilsson did not include his sample size. When it came to reproducing the experiment, Dias included all pertinent information, but De’ath failed to include how he performed the parameters that were chosen.
De’ath had a concise paper with text that accurately related to the figures mentioned. Overall, the article was concise in its findings, but not as easy to follow as it could have been if the proper sections and subsections had been utilized. The title seemed appropriate, and readers know from the statement of purpose and the introduction that the primary goal of this study was to help determine what is causing the decline in corals’ ability to lay down a calcium carbonate skeleton to more efficiently build coral reef ecosystems. The De’ath paper used a large sample selection, which made the results seem more inclusive as opposed to Dias’s samples size of nine species of corals using twenty fragments of each species. Nilsson’s article had excellent figures that were easy to interpret, in contrast to Dias’s paper that did not explain what some of the abbreviations meant in the tables. The strengths and weaknesses of the three papers varied greatly.
Significance
When evaluating the role these articles play in the world, striking similarities were found. Nilsson’s article showed primary concerns toward two populations of fish species that lived in the Great Barrier Reef, while De’ath and Dias wrote papers over the reactions of different coral species to increasing surface temperatures. De’ath’s article has practical significance similar to the Dias paper, in that major ecosystems are dying as a result of rising ocean surface temperatures, and these researchers tried to find ways to explain these issues. Nilsson’s article examined whether or not increasing temperatures reduced the hypoxia tolerance of coral reef fishes. It has been cited sixty-nine times, cited in papers involving hypoxia tolerance of coral reef fishes, how temperature and hypoxia play a role in respiratory performance of certain tropical fishes, and many other similar studies (Nilsson et al.). De’ath et al. has been cited twenty-four times, which sparked interest for similar research in the Great Barrier Reef in the last decade. The Dias et al. paper was only published in 2018, so not many other researchers have cited this paper yet. This can be seen as a potential problem; however, the article was peer-reviewed by individuals who are well-educated in this particular field. The currency of the Dias article may also be seen as a good attribute, showing that this information was some of the newest in its field of interest. The research among all three articles has significance to today’s society, in that the bleaching of coral reefs has become a growing problem, and without more research to determine what factors are causing this issue, large hypoxic zones in aquatic ecosystems may result.
Overall, all three of these articles illustrated environmental significance. Human survival depends on the biodiversity of plants and animals, and many animals live in these coral reef ecosystems.
Works Cited
De’ath, G., Lough, J., & Fabricius, K. (2009). Declining Coral Calcification on the Great Barrier Reef. Science, 323(5910), 116-119. doi: 10.1126/science.1165283
Dias, M., Ferreira, A., Gouveia, R., Cereja, R., & Vinagre, C. (2018). Mortality, growth and regeneration following fragmentation of reef-forming corals under thermal stress. Journal of Sea Research, 141, 71-82. doi: 10.1016/j.seares.2018.08.008
Nilsson, G., Östlund-Nilsson, S., & Munday, P. (2010). Effects of elevated temperature on coral reef fishes: Loss of hypoxia tolerance and inability to acclimate. Comparative Biochemistry and Physiology Part A: Molecular & Integrative Physiology, 156(4), 389-393. doi: 10.1016/j.cbpa.2010.03.009
[Type here]
2
9