Don't Miss These Two Articles Relevant to Recent Discussions on Junk DNA History and Chromosome Fusion
Recent debate surrounding the new book Science and Human Origins has centered largely around the subject of "junk DNA" (in particular, the history of the junk DNA paradigm and quantitatively estimating how much of our genome harbors function) and the argument for human/chimp ancestry based on the evidence for the fusion origin of human chromosome two.
These discussions bring to mind a couple of enlightening and relevant articles published here at Evolution News & Views in 2009 by evolutionary biologist Richard Sternberg. The first of these is titled "Guy Walks Into a Bar and Thinks He's a Chimpanzee: The Unbearable Lightness of Chimp-Human Genome Similarity."
Here's a teaser:
ITSs...interstitial telomeric sequences...the chromosome scars, the pieces of junk DNA he was lecturing me about earlier. As you know, telomeres are the ends of chromosomes. In many species, including chimps and humans, the DNA sequences that are found at these genomic tips are tandem repetitions of TTAGGG. That's right...TTAGGGTTAGGGTTAGGG...over and over and over again. A notable exception to this rule is the fruit fly, an organism that in this regard has provided the junk DNA notion no succor, since its telomeres have complex combinations of three different retrotransposons instead of those six-basepair units. What is important to note, though, is that telomeric sequences are essential to the cell, and it seems that hardly a week does not pass without some new role being discovered for these elements.Sternberg further explains that,
How, precisely, are miles and miles of TTAGGG of significance? From the standpoint of chromosome architecture, the repetitive elements en masse have the propensity to form complicated topologies such as quadruplex DNA. These sequences or, rather, topographies are also bound by a host of chromatin proteins and particular RNAs to generate a unique "suborganelle" -- for the lack of better term -- at each end. As a matter of fact, the chromatin organization of telomeres can silence genes and has been linked to epigenetic modes of inheritance in yeast and fruit flies. Furthermore, different classes of transcripts emanate from telomeres and their flanking repetitive DNA regions, which are involved in various and sundry cellular and developmental operations.
The story, you see, is that in the lineage leading up (or down, I forget which) to chimps and humans, a fusion of chromosome ends occurred -- two telomeres became stuck together, the DNA was stitched together, and now we find the remnants of this event on the inside of chromosomes. And to be fair, I concede at this point that the 2q13 ITS site shared by chimps and humans can be considered a synapomorphy, a five-dollar cladistic term meaning a genetic marker that the two species share. As this is said, it is apparent that the countenance of my acquaintance lightens a bit only to darken a second later. For I follow up by saying that of all the known ITSs, and there are many in the genomes of chimps and humans, as well as mice and rats and cows..., the 2q13 ITS is the only one that can be associated with an evolutionary breakpoint or fusion. The other ITSs, I hasten to add, do not square up with chromosomal breakpoints in primates (Farré M, Ponsà M, Bosch M. 2009. "Interstitial telomeric sequences (ITSs) are not located at the exact evolutionary breakpoints in primates," Cytogenetic and Genome Research 124(2): 128-131.). In brief, to hone in on the 2q13 ITS as being typical of what we see in the human and chimp genomes seems almost like cherry-picking data. Most are not DNA scars in the way they have been portrayed.
ITSs reflect sites where TTAGGG repeats have been added to chromosomes by telomerases, that these repeats are moreover engineered -- literally synthesized by the telomerase machinery, that ITSs have a telomere-like chromatin organization and are associated with distinct sets of proteins, and that many have been linked to roles such a recombination hotspots.The second article is titled "How the Junk DNA Hypothesis Has Changed Since 1980." Again, here's a teaser:
As someone who has studied the concept of "junk DNA" for over twenty years, I am dismayed by two statements that appear repeatedly on various blog sites discussing evolution. No, I am not referring to arguments of the form "the onion has six times more DNA than do mammals; therefore, there is no deity," that are invariably followed by terms of disparagement hurled at anyone who even marginally departs from the Darwinian perspective. Rather, my consternation stems from a half-truth and a false fact that are recycled ad nauseum by those who apparently believe that, despite all the genomic and transcriptomic data that have been obtained only in this decade -- data that have overturned a number of trenchant assumptions -- a certain hypothesis published in 1980 is outside the purview of serious questioning.Sternberg further notes,
The half-truth is the oft-read comment that goes something like this: "No one ever asserted that junk DNA is without function...it was long suspected that these sequences have important roles in the cells," Now, to be fair, it is correct to say that models for, say, repetitive DNA-based operations in metazoan development, have been proposed since the 1960s. It is also true that the evolutionary process of exaptation -- the accidental acquisition of a function -- has been used to explain how the odd transposon here or there along a chromosome can regulate a locus. Nonspecific effects of "extra" DNA on the cell have also been suggested for around three decades, if not longer. That said, the junk DNA hypothesis that one commonly reads as being an unassailable observation, as an incontrovertible empirical conclusion, presents as a clear prediction that the vast majority of non-gene sequences are devoid of any precise specificational role in ontogeny. Allow me to explain.
Two papers appeared back to back in the journal Nature in 1980: "Selfish Genes, the Phenotype Paradigm and Genome Evolution" by W. Ford Doolittle and Carmen Sapienza and "Selfish DNA: The Ultimate Parasite" by Leslie Orgel and Francis Crick. These laid the framework for thinking about nonprotein-coding regions of chromosomes, judging from how they are cited. What these authors effectively did was advance Dawkins's 1976 selfish gene idea in such a way that all the genomic DNA evidence available up to that time could be accounted for by a plausible scenario. The thesis presented in both articles is that the only specific function of the vast bulk of "nonspecific" sequences, especially repetitive elements such as transposons, is to replicate themselves -- this is the consequence of natural selection operating within genomes, beneath the radar of the cell. These junk sequences, it was postulated, can duplicate and disperse throughout chromosomes because they have little or no effect on the phenotype, save for the occasional mutation that results from their mobility. On the positive side, the C-value paradox, the longstanding puzzle that genome sizes have no correlation with perceived organismal complexity -- a lily, for instance, can have twenty times more nuclear DNA than a mouse -- was satisfactorily explained by the hypothesis. Also, the problem of repetitive elements of which the "variety and patterns of their interspersion with unique sequence DNA make no particular phylogenetic or phenotypically functional sense" was argued to have a simple solution. Likewise, the finding in the late 1970s that protein-coding regions in eukaryotes are interrupted by nonprotein-coding "introns" could be understood...as perhaps the degenerate remains of old transposable sequences. [internal citations omitted]
It has been said that 90% of all genomic DNA (in eukaryotes) is junk. No taxon is mentioned; no reference is cited...the value is just repeated by those commenting on evo blogs. To be sure, tagging a percentage to such a claim is a lot better than simply saying that "most DNA is junk." In lieu of an actual piece of research that demonstrated support for this proclamation, let's critically examine the 90% junk figure by focusing on human genomic DNA. Only around 1.5% of our chromosomal sequences encode proteins, which entails that 98.5% of the genome is noncoding by the classical definition. If someone wanted to make the equation noncoding = junk, then lo and behold functional sequences in Homo sapiens drop far below the 10% value. But we know that this equation is not valid. A surprising finding of ENCODE and other transcriptome projects is that almost every nucleotide of human (and mouse) chromosomes is transcribed in a regulated way. Most of the RNAs produced are various nonprotein-coding transcripts that are copied from both strands in a cell type-, tissue type-, or developmental stage-specific manner. These RNAs belong to a number of different functional classes and new categories are being discovered all the time. Further, these nonprotein-coding transcriptional units extend into and arise from protein-coding segments. Many also map to the regions between protein-coding loci. The RNA map of the mammalian genome has moreover been demonstrated to be hierarchical and far from random.I highly recommend that readers check out both of these articles and look up the literature citations. It is certainly worth the time investment.
Clearly, the "gene" definition that provided the framework for the junk DNA hypothesis is defunct, and much discussion now centers on providing an operational description. That is to say, the coding/noncoding distinction is being rethought. And if one considers functional DNA to be equivalent to transcription units that are developmentally expressed together with their regulatory regions, the fraction that can be dismissed as junk becomes startlingly small -- this is what the results of recent studies imply.
Indeed, if we accept the equation transcription units + control elements = developmentally functional DNA, then the number of loci in the human genome jumps from a paltry 20,000 to hundreds of thousands, and the percentage of non-junk DNA increases to well over 90%.
It could be argued that most of these RNA-encoding loci are really cellular "noise" due to transcription running amok, on the basis that so few are phylogenetically conserved -- after all, didn't Orgel and Crick foresee such a possibility in their definition of selfish DNA? Well, this line of argumentation doesn't hold. Another counterintuitive result of the ENCODE project and other comparative genomic analyses is that known functional sections of the mammalian genome such as protein-coding segments appear to be diverging without constraint, whereas a host of "junk" sequences are under some type of selective pressure -- including most human "noncoding" DNA stretches. The same has been repeatedly detected for the fruit fly genome, where most nonprotein-coding sequences appear to be under functional constraint -- with the species-specific differences having the statistical hallmarks of being "adaptive". Even the Y chromosome of the fruit fly, long presented as "exhibit A" in the gallery of garbage DNA, has been shown to have diverse effects on the phenotype of this insect. Such results are exactly the opposite of what Orgel and Crick and Doolittle and Sapienza predicted.
Instead of 90% of the human or fly genome being junk, it seems that 90% or more of chromosomal DNA has some kind of specific developmental function, given the available data. Indeed, the emerging picture is that the species-specific nonprotein-coding regions encode numerous RNAs that help to shape the phenotype in ways that we are only beginning to understand. This is especially true for the transposable element fraction of human chromosomes -- about 50% of our DNA -- much of which is arranged and expressed in a taxon-specific manner. Part of the reason for why a human is not a chimp is not a cow is not a whale, then, is that each species has its own set of sui generis "genes" -- genomic texts specifying unique RNAs or even proteins that are used in embryogenesis. [internal citations omitted]