RNA, the Epicenter of Genetic Information
RNA likely underpinned the emergence of life, yet it is arguably the least appreciated of all biological mac-romolecules. For most of the past century, RNA has been regarded principally as an intermediary between gene and protein. However, most of the human genome expresses RNAs that do not encode proteins, which begs the question: why?
The understanding of the functions of RNA and the answer to this question are bound up with the history of molecular biology. The term ‘molecular biology’ was coined by the mathematician Warren Weaver in 19381 and has become synonymous with the nature, transmis-sion and manifestation of genetic information, and the structure of the molecules involved. The field had its roots in the discovery of DNA in 1869 and the identifica-tion of proteins and their enzymatic activity in the late 19th century, key events that gave birth to “the science of the chemistry of life”. Since then, proteins and DNA have been the primary focus of studies of cellular and devel-opmental processes and the conceptions of ‘genes’, ‘gene expression’ and ‘gene regulation’.
While it was clear early on that chromosomes are the vehicles of inheritance, and contain DNA, RNA and pro-teins, for a long time it was thought that genetic infor-mation is held in the proteins; nucleic acids seemed too simple. In the 1940s, however, it was shown that DNA is the reservoir of genetic information, although it took some time for this finding to be accepted.
The connection between DNA and protein produc-tion was solved by the convergence of genetics and bio-chemistry, mainly in experimentally amenable bacteria and fungi, which led to the breathtaking advances that elucidated the role of ‘messenger’ RNA (mRNA) and the ‘genetic code’ in the 1950s and 1960s. The assumption that genetic information is mostly transacted by proteins (‘one gene – one enzyme’), with RNA a transient inter-mediate, became entrenched, reflecting the mechanical zeitgeist of the age.
This assumption led to many subsidiary assumptions about the nature of genetic information, and the conclu-sion that most of the genomes of plants and animals are junk, based on theoretical considerations of mutational load and the finding that protein-coding sequences occupy only a small fraction of animal and plant genomes. The naivety of this conclusion and its super-ficial support by intrinsically circular assessments of the ‘neutral evolution’ of ‘non-coding’ sequences in genomes were rarely challenged.
There were other assumptions as well, including that heritable information is not transmitted from somatic cells to reproductive cells. This assertion, supported by a peculiar 1868 experiment involving amputation of mice tails, accompanied the so-called Modern Synthesis in the 1930s, which reconciled Mendelian genetics with Darwinian evolution and ruled out Lamarckian evolu-tion, to buttress the belief that ‘mutations’ are random.
Undoubtedly the biggest surprises in the history of molecular biology were the discoveries in the 1960s and 1970s that plant and animal genomes are replete with ‘repetitive elements’ and that their genes are mosaics of fragmented protein-coding and flanking regulatory mRNA sequences (‘exons’) separated by extensive tracts of intervening sequences (‘introns’), which are subsequently removed from the primary transcripts by splicing. It was immediately and almost universally assumed that introns are evolutionary relics colonized by ‘selfish’ genetic hobos and that the excised intronic RNA is simply degraded.
Also unexpected were the discoveries in the 1980s that RNA has catalytic capacity and, at the turn of the century, that the number of protein-coding genes in humans is similar to that in nematode worms that only have ~1,000 cells. By contrast, the extent of intronic and ‘intergenic’ non-protein-coding DNA sequences was found to increase with developmental complexity, rising to ~98% in humans and other mammals.
High-throughput expression studies revealed that these ‘non-coding’ sequences are transcribed in spa-tially restricted patterns to produce hundreds of thou-sands of RNAs that do not encode proteins. Many of these RNAs were subsequently shown to have regulatory and organizational functions during differentiation and development.
Here we provide an account of the development of molecular biology from the 19th century to the present. We pay particular attention to the history of the under-standing of RNA, which has been neglected. We also discuss the founder fallacies – where initial interpreta-tions of limited data were generalized, became orthodox explanations and then articles of faith. Our central theme is that the extrapolation of bacterial genetics to complex organisms, compounded by expectational, ascertain-ment and interpretative biases, has led to a linked series of false dichotomies and the misunderstanding of roles of RNA in the transmission of genetic information. The subsidiary theme is the clumsy progress of science.
This book focuses on RNA as the main player in cell and developmental biology, but also on chromatin com-position and regulatory logic. While most educated in the pre-genomic era were taught that gene regulation is primarily carried out by proteins, this became hard to reconcile with the finding that genes encoding regulatory RNAs vastly outnumber protein-coding genes in humans, and the demonstrations of widespread sequence-specific guidance of effector proteins by RNAs. The simplicity and logic of base-pairing for sense-antisense target rec-ognition and the ability of RNA to form complex three-dimensional structures are almost as old as the double helix itself. The existence of regulatory RNAs was hinted during the early period of molecular biology by genetic observations in fruit flies and maize, and by the appear-ance of unexplained bands in biochemical fractionations, but these were treated as oddities or interpreted through the lens of transcription factors, until the genome projects revealed the full extent of RNA expression in plants and animals.
We highlight the pioneers and controversies that accompanied the many unexpected observations, with particular attention to those that challenged the prevail-ing consensus, often ignored, at least at first. The book spans the early confusion about the functions of proteins and nucleic acids, the elucidation of the double helix and the ‘genetic code’, the premature relegation of RNA to intermediary between gene and protein, the strange genomes and genetics of plants and animals, and the misguided musings that underpinned the idea of junk DNA. We chronicle the spectacular advances brought by gene cloning and genome sequencing, the small and large regulatory RNA revolutions, and the slowly dawn-ing realization of the central role of transposon-derived sequences, intrinsically disordered proteins, ‘enhancers’ and RNA-directed epigenetic processes in multicellular development, which we have tried to integrate into a new framework for understanding genetic programming.
We have cited original references where possible, to give credit to the work of others and to provide the evi-dence for our assertions and conclusions, especially in relation to the findings of the last two decades. We have also included extensive footnotes that add detail and can be skipped, as well as suggestions for further reading.
While the story is still unfolding, we conclude that the genomes of humans and other complex organisms are not full of junk but rather are highly compact informa-tion suites that are largely devoted to the specification of regulatory RNAs. These RNAs drive the trajectories of differentiation and development, underpin brain func-tion and convey transgenerational memory of experience, much of it contrary to long-held conceptions of genetic programming and the dogmas of evolutionary theory.
|August 31, 2022
How to Read and Open File Type for PC ?