Chapter 1

How Should We Think About Evolution in the Age of Genomics?

 

James A. Shapiro

 

Dept. Biochemistry and Molecular Biology, University of Chicago, Gordon Center for Integrative Science W123B, 979 E. 57t Street, Chicago, IL 60637, USA jsha@uchicago.edu

 

 

ABSTRACT

Eibi Nevo’s research highlights the complexity of evolutionary responses to ecological parameters. This important work pioneered a growing awareness of the multiple levels of biological activity and organismal interactions that contribute to evolutionary change. In large measure, our current understanding of adaptive innovation is based on the newly acquired ability to track the details of evolutionary processes through genome analysis. Genomics has unambiguously demonstrated the importance of cell fusion, symbiosis, interspecific hybridization, genome restructuring involving mobile DNA elements, and the many forms of infectious heredity all to be major contributors to the appearance of organisms with novel adaptive characteristics. In addition, genomics has confirmed interspecific hybridization as a major stimulus to the rapid emergence of new taxa among sexually reproducing organisms. The work of Eibi and many other scientists has shown that ecology can trigger and influence all these different modes of hereditary change. We must recognize that genomic analyses have provided 21st Century evolutionary scientists with such a rich variety of documented paths to inherited novelty that it has become impossible to formulate a comprehensive theory of evolutionary change. Thus, an important part of the future in evolution science will be to adapt Eibi’s wisdom by devising synthetic Evolution Canyons as complex experimental microcosms, where we can rigorously study the principles governing ecological and biological interactions in adaptive innovation. Hopefully, those interactive principles will make it possible to integrate information from genomic analysis into a coherent picture of evolution as a biological response to ecological change.

 

KEY WORDS: Cell fusion; symbiosis; interspecific hybridization; mobile DNA elements; infectious heredity; horizontal DNA transfer; virosphere; genome restructuring; ecological triggers; synthetic Evolution Canyons

 

Background: Tribute to a unique evolutionary biologist

 

This 90th Jubilee Symposium and book, New Horizons in Evolution, pay tribute to Eviatar Nevo, a prolific pioneer best known today for his work at Evolution Canyon, where two distinct ecologies sit side-by-side [1]. Exploiting this special geography, Eibi and his many colleagues have been able to observe the effects of ecological differences on real-time processes of evolutionary change of numerous species from microbes to mammals at the population [2-4], organismal [5, 6], karyotypic [7] and molecular levels [8].

 

Eibi’s research highlights the complexity of evolutionary responses to ecological parameters using the most up-to-date tools available. In particular, he has documented the impacts of ecological stresses [9] on specific processes that bring about adaptive genome change: DNA repair [10], mutation [8], chromosome rearrangements [11], amplification and movement of repetitive and mobile DNA elements [12-14].

 

Eibi’s extremely broad documentation of how real-world ecological challenges integrate with genome change operations presents us with an opportunity to reconsider the basic principles of evolutionary biology in the age of genomics. Does our contemporary knowledge of how genomes change in the course of evolution confirm traditional principles established in the 19th and 20th Centuries? Or does that body of information force us to adopt a new, more contemporary set of principles? This paper will argue in favor of the latter position, based on contemporary empirical data cited in a pair of recent reviews [15, 16] and also posted under various headings on my University of Chicago web page (online link 1 – the online links are listed before the references after the main text before the references).

 

Basic principles of evolutionary change necessary to encompass Eibi’s work

 

Traditional evolutionary theory in the 19th and early 20th Centuries focused on accidental changes in isolated genomes. Under the principle of “Descent with Modification,” evolutionary biologists had to explain where and how hereditary changes took place. Each species was assumed to have its own genome, comprising the nuclear chromosomes. The nuclear genome was isolated from the genomes of other species by sexual incompatibility (the Bateson-Dobzhansky-Muller model [17]) and from life history events by the Weismann Barrier between the soma and the germline [18, 19]. Within these isolated genomes, hereditary changes were assumed to occur by random accidents in the course of germline reproduction. When DNA was identified as the molecular carrier of genetic information, the random accident theory was updated to unavoidable copying errors in the course of DNA replication [20, 21].

 

The traditional perspective did not allow any possibilities for ecological inputs, like those Eibi’s research has documented, into the process of hereditary variation. Today, of course, we can see that the “isolationist” view of hereditary determination was unjustifiably restrictive in several fundamental ways. The second half of the 20th Century made us aware of many biochemical processes of hereditary change, ranging from the active movement of mobile DNA elements in the genomes of maize and all other organisms (online link 2) [22] to DNA transfer, repair, rearrangement and mutator functions (online link 3) [23, 24]. Like all physiological activities (and as Eibi’s research demonstrates so well), these molecular “Natural Genetic Engineering” (NGE) processes of DNA change are subject to cellular regulation (online link 4) and operate in a manner that is sensitive to ecological inputs (online link 5). As we shall see below, genomic analysis has documented important evolutionary adaptations that result from NGE action.

 

In addition to the regulated physiological processes that alter DNA molecules, we now have overwhelming evidence that genomes are not as isolated from each other or from the environment as traditional theories assumed [16]. There are multiple forms of “infectious heredity” understood in its broadest sense (online link 6) [25]. All organisms reproduce in the presence of abundant environmental DNA as well as a staggering density of viruses, vesicles and other forms of enclosed DNA molecules. Multiple mechanisms exist for cells of different organisms to exchange DNA molecules, and there is abundant genomic data that adaptive horizontal DNA transfers have occurred across virtually all taxonomic boundaries. Moreover, as we shall discuss shortly, there is no question that cell fusions can combine unrelated genomes in a single hereditary lineage to form modified or new kinds of organisms, including the first mitochondrion-bearing ancestor of all eukaryotes. The reproductive boundaries between species are not as absolute as once assumed, and we shall see that mating across species boundaries is a major stimulus to evolutionary innovation in sexually reproducing organisms.

 

The Weissman Barrier soma-germline separation is also not absolute, as some exponents of traditional theory have claimed. Such a barrier cannot exist in cells that proliferate by vegetative multiplication, such as all prokaryotes and lower unicellular eukaryotes, where there is a direct hereditary connection between “somatic” and “germline” cells. Even in unicellular eukaryotes that undergo sexual differentiation and reproduction, vegetative cells are the direct precursors of spores and gametes. The same cell lineage connection exists between soma and germline in plants. Since flowers containing plant sexual organs develop out of somatic tissues, the genomic consequences of life history events can be incorporated into pollen and ovules that merge to form the next generation. Only in animals, where germline cells separate from somatic tissues early in multicellular development, is formation of a Weismann Barrier a realistic possibility. Nonetheless, we now know about processes of macromolecular transport in animals which facilitate the transfer of somatically acquired genomic information to animal sperm cells [26].

 

From the foregoing summary, we can see that discoveries based on genomic data provide us a 21st Century picture of ecologically sensitive evolutionary processes that coincides with what Eibi’s amazingly productive research has revealed.

 

Cell fusions produced foundational evolutionary innovations

 

            The deepest evolutionary divides among living organisms are the separation of all cells into three distinct lineages: Bacteria, Archaea and Eukarya [27, 28]. It is a salutary reminder of the power of genomic analysis and of the capacity for new data to transform our understanding of fundamental evolution principles to recognize that our knowledge of Archaea as a distinct cell type only dates from analysis of ribosomal RNA sequences in 1977, less than 50 years ago [29, 30]. The same kind of early genomic evidence made it clear that the mitochondrion of the earliest known eukaryotic ancestor descended from an endosymbiotic Gram-negative Proteobacterium [31-34]. Parallel sequence analysis confirmed the evolutionary origins of light-harvesting plastid organelles in a wide range of algae, plants and other photosynthetic eukaryotes as descendants of endosymbiotic cyanobacteria (online link 8).

 

            Fossil and genomic data tell us that both Bacteria and Archaea are the most ancient cell lineages (dating from > 3.4 GYA) and that the primordial symbiogenetic event in evolution of Eukarya occurred approximately 1.6-1.8 GYA, as the Earth’s atmosphere was accumulating a significant concentration of molecular oxygen (O2), following the evolution of oxygenic photosynthesis in cyanobacteria [35-37]. The O2 concentration is important because the best genomic evidence indicates that the host cell in the primordial symbiogenesis was an anaerobic archaeal cell encoding many proteins once considered to be exclusive to eukaryotes [35, 38-41]. Since Proteobacteria are aerobic and contemporary mitochondria are the loci of oxidative energy-yielding metabolism in eukaryotic cells, it is evident that acquisition of an aerobic endosymbiont would provide a significant metabolic advantage to an anaerobic host cell in an environment with a growing atmospheric O2 concentration.

 

            During the course of transformation from an independent cell to a subcellular organelle in the proto-eukaryotic cell, the endosymbiont Proteobacterium underwent a series of major changes in genome content (online link 9). In all eukaryotes, mitochondrial DNA coding content is only a fraction of that in Proteobacteria. The largest mitochondrial genome encodes only 100 protein and RNA molecules compared to over 800 for the smallest Proteobacterium cell. A typical animal mitochondrion like ours encodes only 37 molecules [42]. DNA containing the vast majority of bacterial coding sequences required for mitochondrial maintenance and metabolism transferred to the nuclear genome of the evolving eukaryotic host cells. The resulting nuclear-encoded mitochondrial proteins are synthesized like other eukaryotic proteins and imported into the mitochondrion organelle by newly evolved protein transport systems. Since overall genome size, physical DNA structure, and coding content differs greatly between the mitochondria in the cells of various eukaryotic lineages, it is clear that mitochondrial genome evolution has involved an ongoing and complex series of taxonomically-specific DNA restructuring processes [32].

 

            Although sequence data indicates that the symbiogenetic event originating mitochondrial evolution appears to have been unique, multiple cell fusion events have transferred plastids encoding photosynthetic capabilities to diverse eukaryotic lineages (online link 10). The oldest one involved a common cyanobacterial progenitor of the different plastids in four distinct groups of organisms: green algae (Chlorophyta), red algae (Rhodophyta), blue-gray algae (Glaucophyta), and green plants (Embryophyta). Such a cyanobacterial fusion is called a “primary symbiogenesis,” and there has been a second, much more recent primary symbiogenesis of a distinct species of cyanobacteria creating a single photosynthetic amoeba, Paulinella chromatophora. Further “secondary” symbiogenetic events have occurred when a photosynthetic eukaryote, generally one of the algae, has fused with a non-photosynthetic eukaryote cell type to create a novel photosynthetic lineage [43]. If the product of a secondary photosynthetic fusion merges with another non-photosynthetic lineage, that creates a “tertiary” symbiogenesis.

 

Many of the most important photosynthetic organisms on Earth, such as diatoms, have resulted from these secondary and tertiary symbiogeneses. Clearly, photosynthetic cell fusions have occurred over a prolonged period of evolutionary time and are quite likely to be taking place now. As with mitochondria, there has been significant DNA transfer from plastids to the nuclear genome, and plastid-specific protein transport has evolved to incorporate into the plastids nuclear-encoded proteins needed for photosynthesis. Also similar to mitochondria, there are lineage-specific differences in plastid DNA content, physical DNA structures, and coding capacities. In cases of secondary and tertiary symbiogenesis, there are also significant rearrangements and losses of nuclear DNA from the eukaryotic endosymbionts.

 

In addition to these various cases of photosynthetic symbiogenesis, there are numerous other cases of cell fusions and endosymbiosis that have profound adaptive significance [44-46]. Bacteria can invade other bacteria as well as virtually all kinds of eukaryotic cells [47, 48]. A smaller number of endosymbiotic Archaea have been documented, but more cases will doubtless appear as genomic analysis comes to bear on more types of organisms from other parts of the biosphere. Eukaryotic microbes can become endosymbionts of other eukaryotes, just as they do in secondary and tertiary photosynthetic fusions [49]. The adaptive significance of these endosymbiotic relationships involve various adaptive characteristics, such as synthesis of important nutrients (vitamins, amino acids, etc.), utilization of particular food sources (e.g., digestion of plant polymers), or protection against predators or infectious agents [45, 50]. So-called “obligate” endosymbiosis occurs when the cell fusion becomes essential for reproduction of the host organism or the endosymbiont (often due to genome reduction, similar to what occurred in mitochondria and plastids) [51, 52].

 

 

 

Microbiomes and holobionts

 

            Besides cell fusions, microbes and multicellular organisms establish important symbiotic relationships simply by growing in close proximity or by the microbes colonizing the cells or interior cavities of major organ systems, like the intestine (online link 11). Each multicellular organism has its own “microbiome,” the generic term for all the associated microorganisms [53]. The microbes interact biochemically with each other and with the multicellular host in ways that affect the overall phenotype.

 

Different organs or regions of a single host can have distinct microbiome compositions with unique phenotypic consequences. In plants, for example, the microbiomes on leaves and roots are dramatically different and play radically different roles in transport of nutrients and responses to biotic and abiotic stresses [54]. Of particular importance for all plants and animals is the role a healthy microbiome at each site plays in blocking infection by microbial pathogens.

 

The microbiome plays important roles in many adaptive phenotypes. We are becoming familiar with discussions of how the “human microbiome” (usually meaning the intestinal microbiome) affects our health and well-being, metabolism and digestion, pregnancy, immune responses, and even mood and states of mind. Microbiome species synthesize critical nutrients for the macroscopic holobiont host, ranging from amino acids in aphids [55] to a wide range of essential metabolites in primitive marine animals [56] to signaling molecules that affect functioning of our own metabolic, neural and innate immune systems [57]. In Drosophila, the intestinal microbiome affects growth factor signaling and morphogenesis [58], volatile pheromone production and social attraction [59], and neuropeptide synthesis and locomotor behavior [60].

 

Clearly, the adaptive properties of the host plus its microbiome result from expression of microbial as well as host genomes. Typically, the protein coding capacity of the total microbiome genome is far more diverse than that of the host genome [61]. In our own case, the human gut microbiome is estimated to encode from 3.3 to 9.9 million distinct proteins, or 150 – 450 times greater than the basic nucleus-encoded human proteome.

 

From an evolutionary point of view, the recognition of microbiome contributions to whole organism phenotypes poses a definitional challenge. What is the evolving entity? In order to deal with this question, the terms “holobiont” and “hologenome” were invented to describe the evolving entity and its genetic endowment [62-64]. A holobiont is composed of a multicellular plant or animal together with its associated microbiome, and this terminology has been widely adopted in the relevant literature. Holobiont heredity differs radically from Mendelian principles and has been described as far more similar to schemes proposed by Lamarck [65] and Darwin (“gemmules”) [66]. Frequently, microbiome components are transmitted horizontally to the oocyte (pre- or post-fertilization) or developing embryo by maternal tissues [67, 68], but horizontal transmission also occurs paternally [69, 70]. It is safe to say that our knowledge of trans-generational microbiome maintenance is very partial and requires a great deal of further research [71].

 

Because of their composite natures, holobionts can rapidly evolve complex adaptive phenotypes by acquisition or loss of microbiome constituents, outcomes not achievable simply by changes to the host genome. In Drosophila, mosquitoes and other invertebrates, acquisition of bacterial endosymbionts from the Wolbachia group affects a variety of important characteristics, such as resistance to viruses and parasites, and also frequently generates mating incompatibility between colonized and Wolbachia-free hosts [50, 72, 73]. Since mating incompatibility between two populations is often the first step of divergence into separate species, Wolbachia entry into the microbiome has been characterized as stimulating “speciation by symbiosis” [74].

 

Interspecific hybridization

 

            Cell fusions are an essential feature of sexual reproduction. Contrary to the idealized assumption of complete reproductive isolation between different species, there is abundant genomic evidence of mating between related but distinct microbial, plant and animal species, including real-time observations (online link 12). Interspecific matings are ecologically sensitive because their frequency will increase when mating population sizes decline and conspecific mates become harder to find.

 

            The consequences of interspecific mating are high levels of genome instability (such as chromosome rearrangements, activation of mobile DNA elements, whole genome duplications – online link 13) and the formation of novel species with phenotypic traits that are more than simple mixtures of characters from the two parents. This kind of hybrid speciation was long ago characterized as “Cataclysmic Evolution” by the distinguished evolutionary biologist G. Ledyard Stebbins [75]. Typically, the karyotype of the hybrid species contains a diploid number equal to the sum of the chromosomes in the two parental species. The increase in chromosome number results from the whole genome duplications necessary for the initial hybrid to undergo successful meiosis.

 

            Although hybrid speciation was long known to occur in plants and serve as the source of useful agricultural crop species, its importance in animals was not appreciated before genomics provided evidence for many hybrid species. Of particular interest are Darwin’s finches in the Galapagos Islands, an important evolutionary model cited by Darwin [76], and freshwater cichlid fishes that have become models for rapid speciation and phenotypic diversification [77-79]. In the case of the Galapagos finches, it is worthwhile noting that interspecific hybridization has also been followed in real time by Rosemary and Peter Grant and colleagues, who have documented abrupt changes in beak morphology in hybrid birds rather than the gradual changes postulated by Darwin [80-82].

 

Protein evolution.

 

            Ever since the articulation of the “one gene – one protein” hypothesis in the middle of the 20th Century [83], the formation of new protein sequences to execute novel functions has been seen as central to adaptive evolutionary change. With the identification of DNA as the genetic material and the elucidation of the coding relationships between genomic DNA and the sequence of amino acids in each protein [84, 85], protein evolution was widely assumed to occur largely by a gradual succession of single amino acid substitutions due to random mutations in the underlying DNA code. However, DNA sequencing and genomic comparisons led to a number of unexpected insights which indicated that protein evolution involves far more active cellular DNA manipulation than initially believed.

 

            Proteins as systems. The first pair of insights concerned the organization of protein molecules and of the DNA that encodes them. Most proteins consist of structurally and functionally independent “domains” joined together, often connected by short linker peptides [86, 87]. Different proteins are generally similar to each other in one or more domains but not in others. Because each domain has specific functional characteristics, the overall activity of a given protein is determined by the integration of its various domains into a functional system. This means that proteins can evolve functionally by amplifying, acquiring and rearranging their domain contents to generate novel combinations [88-94].

 

Mechanistically, amplifying and rearranging domains occurs by joining together distinct DNA coding regions, not by mutational changes altering particular amino acids. DNA joining involves many distinct biochemical processes and proteins that have to be coordinated and synthesized at the same time. Frequently, domain rearrangements involve mobile DNA NGE functions (online link 14).

 

While many domains are shared across broad phylogenetic distances, patterns of multi-domain architectures are specific to each taxon [95, 96]. Domain organization indicates that the primary object of evolutionary change is often the domain rather than the whole protein [96-100]. There are protein domain databases [101, 102], and it is common practice in contemporary comparative genomics to describe a coding region and its cognate protein product by its domain content. One major question in protein evolution involves the sources of new domain architectures [103]. Domain loss and the appearance of novel kinds of domains are genomic signatures at the emergence of major new taxonomic groups [104].

 

Coding sequences in pieces. When DNA sequencing was applied to mammalian regions encoding well-known proteins, a surprising result emerged. The sequences for a single polypeptide chain were not continuous but consisted of a series of expressed DNA elements (“exons”) encoding segments of the chain separated by intervening DNA elements (“introns”) [105, 106]. The intron-exon coding pattern is widespread among eukaryotes. In the process of protein synthesis, introns are “spliced” from the primary transcript to form the continuously coding mRNA that is translated on the ribosomes [107]. Although some introns are self-splicing ribozymes [108], splicing generally takes place on a complex “spliceosome” organelle [109].

 

There are multiple ways that the exon-intron-exon protein coding structure contributes to protein diversity and evolution. In coding region transcripts with multiple exons and introns, not all splicing events necessarily produce only a single combination of joined exons. “Alternative splicing” that creates different combinations of exons from a single pre-mRNA transcript enhances an organism’s protein repertoire [110]. Regulation of alternative splicing means that different conditions can control the expression of distinct proteins from a single coding region [111-113]. In many organisms, there is even “trans-splicing,” where exons from two different pre-mRNA transcripts can be joined together to produce hybrid proteins encoded by two different coding regions [114, 115]. Since the sequences of exon-intron boundaries are important determinants of where and when splicing events occur, one way that novel protein architectures evolve is by changes in splicing patterns rather than by alteration of amino acid coding sequences [116, 117].

 

By and large, there is a good (but not absolute) correspondence between exons and protein domains [118]. This means that protein evolution by domain rearrangement often involves mobilizing the corresponding exons, sometimes with associated introns, into new genomic sites. Since it is easier to mobilize a domain coding sequence that is isolated as one exon (or several exons in series), split coding regions facilitate functional protein evolution [119]. It has been documented that the requisite DNA restructuring events often involve the DNA rearrangement activities associated with mobile DNA elements (online link 14).

 

 Another major question in protein evolution concerns the origins of new domains. It turns out that novel protein coding sequences have a variety of sources in both “coding” and “non-coding” DNA sequences (online link 15). The ability of supposedly “non-coding” genetic elements to contribute to protein coding came as a surprise to many evolutionary theorists, who called such elements “selfish or junk DNA” [120-122]. In 2001, it was first established that repetitive mobile DNA elements contribute directly to many protein coding sequences [123]. Since then, repetitive mobile DNA elements have proven to be a rich source for the origination of exons encoding novel domains, often by acquiring novel splicing signals so that previously intronic segments form new exons (online link 16) [124].

 

Combined with the above-cited capacity of mobile DNA elements to help mobilize exon rearrangements, it appears from their role as substrates for novel domain coding sequences that the mobile DNA component of various genomes serve as major facilitators of protein evolution. We will see below that mobile DNA elements play a parallel role in the evolution of regulatory and transcription networks for complex adaptations in advanced plant and animal species. These discoveries show that once poorly understood so-called “junk DNA” elements can play important roles in evolution. Recognizing the validity of that statement should serve as an object lesson about how dangerous it is to misinterpret unexpected observations (in this case, the abundance of repetitive DNA in genomes of advanced organisms) and base broad generalizations upon our ignorance rather than our understanding.

 

Horizontal DNA transfers

 

            An important and unexpected aspect of rapid evolutionary adaptation first became evident when antibiotics were widely used to combat bacterial infections following World War II. Rather than acquire resistance by mutation, the mechanism well-documented by laboratory experiments, the vast majority of resistant bacteria isolated in clinical settings were found to contain genetic elements encoding high levels of resistance that were able to transmit that resistance to other, phylogenetically distant strains of bacteria, thereby helping to explain the rapid spread of antibiotic resistance (online link 17). These “resistance transfer factors” (R-factors) encoded various resistance mechanisms that included chemical inactivation of specific antibiotics, antibiotic removal from the host bacterium, and modification of the cellular targets of antibiotic action. Many R-factors combined coding information for multiple activities conferring resistance to several antibiotics at once [125].

 

Interbacterial transmission became known as a form of “infective heredity” [126-128] and provided a virtually instantaneous way of acquiring new adaptive traits. No extended process of developing a new character was necessary. Since the new genetic information came from another cell, infective heredity constituted a “horizontal transfer” of genetic information, quite distinct from the normal vertical transmission from ancestral cells. Over time, it became evident that many different kinds of adaptive traits in bacteria (and later in archaea) are subject to horizontal transfer, such as metabolic pathways, surface attachment structures, virulence factors, ability to establish symbiotic relationships, and synthesis of lethal compounds attacking unrelated bacteria [129-134]. Horizontal transfer became so universal a feature of bacterial genetics that some scientists proposed the concept of a shared pan-genome which individual strains of bacteria sampled freely according to the demands of the ecological niches they inhabited [135].

 

With the advent of widespread DNA sequence analysis, the protein-coding complement of many eukaryotic genomes was established. Comparisons of these coding sequences helped define the phylogenetic relationships of various species by their protein repertoires. While these relationships were largely consistent with shared ancestries and protein diversification across the generations (including the emergence of novel proteins and domains), there were also instances of protein-coding sequences appearing in lineages that were absent from the genomes of ancestral species but highly similar to those of unrelated organisms, often from a completely different domain of life. These “misplaced” coding sequences must have been acquired by horizontal DNA transfer either directly or indirectly from their original source. Multiple cellular mechanisms exist for horizontal DNA transfer (online link 18), but the genomic data provide no indication of how any particular transfer actually occurred.

 

Horizontal DNA transfers have been documented across virtually all taxonomic boundaries (online link 19). They involve many important adaptive traits. For example, both Bdelloid rotifers (a class of microscopic animal) and herbivorous nematode worms have acquired coding sequences from bacteria and fungi on multiple occasions that allow them to synthesize enzymes to break down otherwise indigestible plant polymers [136-139]. Since different rotifers have acquired hundreds of distinct bacterial and fungal sequences, it is clear that horizontal DNA acquisition by these tiny animals has comprised ongoing molecular incorporations [138]. The ability to short-circuit the process of protein adaptation and immediately extend the organism’s range of food resources illustrates the kind of evolutionary advantage conferred by cellular capacities for DNA uptake and genome integration.

 

Horizontal transfers can involve DNA segments encoding whole proteins or only encoding one or more individual domains. In either case, the transfers involve the introduction of new domains into a distant lineage and set up conditions for the evolution of novel domain architectures in multiple proteins [140, 141]. In this way, horizontal transfer can initiate the evolution of new protein families and the specialized adaptive traits they support. Examples include the proliferation of plant cell wall destabilizing proteins following transfer of expansin domains from plants to bacteria and fungi [142], interacting domains and regulatory networks in oomycetes (filamentous fungus-like water molds) [143], and the “effector” proteins with eukaryotic domains that Legionella bacteria inject into target cells to commandeer host functions during infection [144-147].

 

Major adaptive changes have been found to involve horizontal DNA transfers. One such change is the use of bacterial sequences to foster the emergence from thermophilic ancestors of new Archaeal lineages capable of growth under mesophilic conditions [148, 149]. Another recently documented case is the ability of autotrophic and osmotrophic eukaryotic lineages to assimilate environmental nitrate as their sole source of metabolic nitrogen [150].

 

Although the precise mechanism of horizontal DNA transfer is indeterminate for any particular case, the kinds of interactions that occur throughout the biosphere provide us with multiple potential paths for this kind of genomic exchange [16]. One of the most important, the ubiquity of genomic information protected inside virus capsids, is our next topic.

 

The virosphere as an evolutionary R & D sector (online link 20)

 

            Viruses are the most abundant biological entities on planet Earth [151]. They are found at astonishing concentrations in the soil, in bodies of water (~1010 per liter), and in the atmosphere (falling at ~109 per square meter per day) [152, 153]. In addition to viruses, there are a variety of virus-like particles (VLPs) that contain nucleic acids but not viral genomes [154]. Among the VLPs are dedicated DNA transport particles called “gene transfer agents” (GTAs) [155]. While these viral and virus-like agents inject DNA or RNA into cells that have surface receptors, which limits their range of target cells, some have been documented to participate in cross-species DNA transfers. VLP donors capable of transferring genetic information to mesophilic E. coli and B. subtilis bacteria have been reported to include microbial mats of hyperthermophilic bacteria from hot springs and marine bacteria from the oceans [156, 157].

 

Viruses, and especially the bacteriophages which infect bacteria, are the most numerous reservoirs of protein coding sequences on planet Earth. These coding sequences can be divided into two groups with quite distinct evolutionary potentials:

 

(1) The first group includes sequences encoding proteins similar to those found in cells that participate in established metabolic routines, ranging from the different steps of alternative photosynthesis systems to various forms of intracellular energy metabolism, phosphate recycling, apoptosis (programmed cell death), cell surface structures, virulence and infectivity [154, 158]. These reservoirs of established cell physiology functions can be utilized to repair damaged cells or to extend the metabolic capabilities of an organism adapting to novel ecological circumstances. [159-162]

 

(2) The second group consists of uniquely viral protein coding information that cells do not possess and which, therefore, have the potential to initiate the evolution of entirely new types of cellular proteins [163]. Over 90% of all unique coding sequences in the genomic databases occur in viral genomes [164]. Since they are unique, they do not exist in cells. When one of these unique sequences is exapted by a cell (often for a function related to its source, like anti-virus defense), a novel protein sequence and structure appears in the genome. This new motif is then available to combine with other protein domains or mutate to a new functionality and thus generate a capability which did not previously exist, either in the virosphere or cellular realm. This creative capability is part of what makes the virosphere an evolutionary R and D domain of life. In this vein, retroviruses have been cited as source of new genes in vertebrates [165].

           

Virus particle capsids naturally break down over time and liberate their nucleic acid cargos into the environment. The virosphere is thus a major contributor to the free DNA molecules present in all ecologies. Many types of cells have been found to be capable of taking up DNA from their surroundings and incorporating it into their genomes (online link 6): Archaea, “competent” bacteria with dedicated DNA import complexes, yeast, red algae and invertebrate and vertebrate sperm. Under experimental conditions, DNA “transformation” has further been extended to cultured animal cells, suggesting that similar processes may occur under natural conditions. The role of viruses as sources of environmental DNA for uptake has recently been demonstrated in experiments with so-called “super-spreader” bacteriophages that transfer plasmid DNA from E. coli bacteria to unrelated phage-resistant Streptococcus pneumonia bacteria competent for DNA import [166]