Published in Science Now, B&K, 10 (2): 5-8, December 2001
Science, we are brought up to believe, progresses logically and dispassionately, insulated from the glare of newspaper headlines and the foibles of the media. Not any more - the story of the human genome has been told with as much marketing skill as you would expect from a salesman. As biologists, we are left perplexed by the growing consensus that there are far fewer genes in the human genome than seems possible. In this article I will outline a new approach that makes a modest number of human genes seem entirely reasonable. Indeed, it is just what one might anticipate. Rarely has any development in biology been subject to so much exaggeration and hyperbole than the unraveling of the base-pairs in the human nucleus. In June 2000, simultaneous announcements came from the International Human Genome Sequencing Consortium and Celera Genomics in the USA. Around the world, the press reported that at last they had separately decoded the human genome. Readers could be forgiven a hint of déjà vu when, in February 2001, the same story went out once more. Close inspection showed that the first claim was because of a 'first draft'; the story from the following year was pegged to the release of a 'rough draft'. The newspapers, ever keen to show that they like to cover science, made much of each story. As is customary with science (in clear contrast to what happens in, say, political or financial reporting) there was no detailed questioning of the spokesmen.
The result was a misplaced impression in the public mind of what had been achieved. To the layman, it seemed as though we now had the blueprint for human life. In fact, what we had obtained was no genome map, but a primitive listing of most of the 3 billion base pairs in a somatic human cell. We had no final list of genes, no indication of which genes might be active or passive, little insight into the abundance of junk DNA in the sequences (only 5 per cent of the genes are active, though we have yet to identify them). We still know little of the RNA that the genes produce or the proteins for which they code. The listing provided eighty or ninety per cent of the total base pairs, and still left no answer to the most basic question of all - how many genes are there in the human genome?
Previous estimates of the number of human genes ranged up to 150,000 and even more. The estimate from the Human Genome Sequencing Consortium holds that the human genome contains as few as 31,780 protein-coding genes. The largest reasonable claim for identifiable genes is 38,000. In comparison, we have already identified 25,498 genes in the genome of the small flowering plant Arabidopsis thaliana (thale cress). If the estimates are right, then the human genome is on a par with other forms of life, rather than being greatly numerically superior. Drosophila, whose enlarged salivary gland chromosomes were the first to be investigated, boasts 13,601 genes. We are learning that there is little correlation between structural complexity of an organism and the number of base pairs in the genome. Humans contain 200 as many base pairs as the common yeast Saccharomyces, for instance; yet the genome of Amcba dubia contains 200 times as many as human cells.
There is certainly no simple relationship between the number of base pairs and the total of genes in a genome. In humans, it now appears that there are on average about 84,000 base pairs per gene. The largest gene known codes for dystrophin (a muscle protein, damage to which causes muscular dystrophy). It is composed of 2.4 million base pairs. Other species have simpler genes: in Arabodipsis there are, on average, 4,500 base pairs per gene; there are twice as many in the typical gene of Drosophila. Human genes are highly fragmented. The coding sections of a chromosome, the exons, are interrupted by non-coding sections, introns. Human genes tend to have longer-than-average introns (some up to 10,000 base pairs long). The fragmented nature of the genome allows genes to be read in various combinations, and more than one-third of the human genome seems to be capable of being read in different ways at different times. As a result, we may be able to encode for four or five times as many proteins as the less fragmented genes of an organism like Drosophila.
The use of this fruit fly in research dates back to the pioneering research of Thomas Hunt Morgan, who was born in Kentucky back in 1866. In 1908 he first chose Drosophila as his experimental organism. This was for several reasons. The best known is that fruit flies have giant chromosomes within the cells of the salivary glands. The chromosomes are transversely banded, and these bands can be identified with genetic loci. Less well known is that fruit flies reproduce rapidly - in a warm environment Drosophila can pass from new-laid egg to sexual maturity in twelve days. Rarely mentioned (but in practice of crucial importance) is the Drosophila is a remarkably resilient species. Researchers can forget to feed or water their flies, they can close the laboratory, go on holiday and neglect well-being of their fly colonies; yet Drosophila survives and goes on reproducing.
In 1908, genetics was a new science. The term was coined by William Bateson of Cambridge University in May 1900. Thomas Hunt Morgan studied genetics by measuring the crossing-over of alleles on adjacent chromosomes during cell division, and was able to publish the first-ever chromosome maps of Drosophila before the First World War. Prior to the Second World War chromosome maps of organisms like Zea maïs, which identified thousands of genes, were being published. Yet it took many decades before science was convinced that the visible loci were definitively associated with discrete genetic characteristics. In 1950, Edmund Sinnott's book Principles of Genetics emphasised that the idea remained to be proved. Since the number of genes tends to increase with the evolution of a species, we must ask where did the extra genes come from. Back in 1976, on a BBC interview with the late Brian Redhead, I speculated that many modern-day organisms might contain the relics of virus DNA in the genome. At the time this was an adventurous speculation, but research into reverse transcriptase has revealed a mechanism for the phenomenon to take place.
The human genome is littered with transposable elements and repeated genes, many of which date back to contact with viruses. Most have apparently been inactive for millions of years. In that sense, the present-day genome is a museum of past encounters. In part, the chromosomes look like a mass of reverse-transcribed alien DNA with occasional functioning human genes dotted about. The whole story of human development is locked away in these mysterious genetic components. Until work on the human genome neared completion, estimates of the number of human genes varied from 80,000 - 100,000. As work on chromosomes 21 and 22 was proceeding, it became apparent that there were far fewer genes than expected, and most geneticists began to accept a lower provisional total. Not everyone is convinced, however. A revealing insight can be gained by looking into the book maintained by David Stewart, of the Cold Spring Harbor Laboratory on Long Island, New York. He is taking $1 wagers on the outcome, and the bets taken so far show how wide are the private opinions of those who work in genetics.
The bets have to be written into a book that Dr Stewart maintains, for he cannot accept bets on the internet. His idea is that the winner takes all: if more than one person guesses the right answer the money will be equally divided. And what a range of bets he has received. By the time the number of bets was nearing 300, the guesses ranged from 27,462 to 312,278 with a mean of 67,000. So far, with 38,000 genes claimed, it seems that bets lower than this total must indicate that some scientists believe that today's figures may shrink as some gene discoveries will turn out to be untenable. Among the lower recent estimates are bets from the Genoscope Sequencing Centre of Evry, France, with 31,000 and Professor Philip Green of the University of Washington in Seattle with 34,000. At the upper end of the scale (and well above the current predictions) come Double Twist Inc of Oakland, California with 105,000 and Incyte Genomics of Palo Alto, California with Human Genome Sciences of Rockville, Maryland who have both placed bets on a human gene total above 140,000.
It is intriguing and irrational that eminent specialists insist on wagering that human cells contain far more genes than currently seems to be the case. Why is this? The reason seems to be that, since humans are so much more capable of complex tasks than other forms of life, they must need a correspondingly larger genetic component. In the minds of the conventional scientist, the ratio of 25 to 140 corresponds to some notion of a six-fold increase in biological sophistication that they sense between fruit flies and people. I believe this to be fundamentally misconceived. We imagine that humans are superior because of the achievements of which we are capable. Humans can build a house, navigate, and fly through space. What can simpler forms of unicellular life do, more than sit in a pond and vegetate? To me, a fundamental principle of life is that the behaviour of complex life-forms is a manifestation of what single cells can perform. A thecate amcba can select grains of minerals from its watery environment, pick them up, and fit them together to produce a delicately constructed flask-like shell within which it lives. This is a task of great complexity, and the building of walls by humans has resonances of the same remarkable ability.
Microorganisms can navigate, too. Many rod-shaped bacteria which inhabit silt develop granules of haematite, with which they sense the direction of the earth's magnetic field. Some amcbæ can form a capsule when the environment dries up, through which they can travel through the atmosphere carried on the breeze only to re-emerge when a suitable environment is encountered. An analogy with human space travel is discernible. We are taught - indeed it is a core of modern books and TV programmes about human nature - that every aspect of human life is controlled, directly or indirectly, by the workings of the brain. For all its ubiquity, this view, I am certain, is flawed. As you read these words, granulocytes in your throat are very likely identifying a Streptococcus that you recently inhaled. They identify the newcomer as a potentially dangerous intruder, and signal to each other to congregate around the proliferating pathogens and attack them. The bacteria are ingested, digested, and - all being well - they are soon destroyed. Mechanisms like this are at work throughout our lives; every minute, some such event takes place.
The key fact is that these responses of cells lie in the purview of the cells themselves. They reach their own decisions, make their own judgments, initiate action as a team, and regulate their own motility without the intervention of the human brain. We do not control such cells consciously, subconsciously, unconsciously, or in any other fashion. Single cells have the proclivity to conduct their own affairs for the good of the whole body, but are not under its control. Countless complex mechanisms of this sort maintain our lives. The regularisation of blood flow within the capillary bed, the timing and extent of proliferation of cells within organs of high cell turnover - like the liver - and the apoptotic sequence, are all aspects of the complex choreography conducted by the cells themselves.
One can argue further that many of the innate propensities of macroscopic organisms, like ourselves, are mere resonances of single-celled life. This implies an added importance to stem-cell research. Stem cells are undifferentiated, and thus have the propensity to develop into any cell line, and so to become any of the miriad specialised cells in the adult body. They are little more than amcbæ, yet have the potential to become fibrocytes or cardiac muscle, retinal rods or neurons. It is the cell (not the human body itself) that primarily regulates our affairs. And it is a single cell for which genes are required, and not an entire organism. In practical terms, there is little to choose between a free-living aquatic amcba and a young human phagocyte. They look alike, behave similarly, and perform closely-related functions. To the tyro, a sperm cell looks much like any flagellated protozoan. The cilia that propel Paramccium are largely the same as those that line the bronchial tree of mammals. Most protozoa provide models of functioning cells that are the free-living counterparts of specialised cell populations within the human body.
It is in the slime moulds, most often Dictyostelium, that we find a model for differentiation that is amenable to in vitro observation. These organisms live for most of their active lives as independent amcbæ. When the sporing phase of the life-cycle is near, key cells within the scattered community send out signals that attract the formerly free-living cells towards a common focus. The separate cells combine to form a slug-like single organism that crawls along through the woodland leaf-mould (or the petri dish) until it stops moving, and begin to extend upwards, forming a tenuous fibril, at the apex of which cells differentiate to form a sporangium and encysted spores. Here we can see the differentiation of identical cells into tissues with specific functions. There is no organism that regulates the change; the cells carry their destiny within them and reach their own decisions based upon timing and orientation within the whole. We can perceive here a clue to the differentiation of cells in our bodies.
The specialisation of implanted stem cells is not regulated by the brain, but is a response to messages from the surrounding cell populations and the discourse between them. During the early embryonic stages, the specialisation coded for in the genes is triggered by the positioning of the cells. At the blastula stage of embryonic development, the scores of cells are functionally identical. Once the neural groove is undergoing invagination, the populations are already adopting specific characteristics and are becoming predestined to perform specified functions. The cells are self-regulated, and do not rely upon pre-existing organismal control to dictate what they should do. Here is the reason why the human genome will contain fewer genes than everyone anticipated. The conventional view has been that we need genes to make a body - I argue that we need sufficient genes only to make the cell. Once the zygote is established and set on its course, then the cells that descend from it will regulate their own affairs. Most of the essential functions of a living somatic human cell are those required by the seemingly lowly amcba. All we will need is an extra coding system which allow the colonies that become humans to develop into the most disruptive and invasive species of them all.
On that basis, around 40,000 genes would be ample. Given that,
the single cell will take care of matters. A coordinated community
of these discrete microorganisms has just written this paper.
More remarkable, another - similar but unconnected - cell system
has just finished reading it.