In 2003, just fifty years after Watson and Crick published the first accurate structure of DNA, a complete sequence of the human genome was announced. It was so momentous that a press conference was held and the president of the United States congratulated the two teams that had raced to become the first discoverers, ironically covering 13 years of work only to end in a tie. What would emerge from studying and comparing that sequence to other species would be some of the most spectacular confirmations of evolutionary theory that most biologists could have hoped for.
Human DNA is contained in 23 pairs of chromosomes within the cell nucleus; one pair is the sex chromosomes (which combination you inherit determines your gender) and the other 22 pairs are autosomal or non-sex chromosomes. Mitochondrial DNA, a unique from of DNA, will not be discussed here. The 23 pairs of human chromosomes is known as the diploid number (2n) and the haploid number (n) refers to 23 chromosomes that are found in the sex cells. It was the haploid genome that was sequenced, yielding just over 3 billion DNA base pairs in the human genome project; 3,080 in females and 3,022 in males. Several donors were used to assure confidentiality. To the surprise of most, our genome only contains about 21,000 protein coding genes. Later, personal diploid genomes were sequenced for Craig Venter, James Watson, Yang Huanming, and Dan Stoicescu. And of these 3 billion base pairs, the protein coding genes represents only a minuscule 1.5% of the total. The amount of information in our DNA is estimated to be about 750 megabytes, or about what would fit on a standard DVD (since every base pair can be coded by 2 bits).
Although pseudogenes had been identified several decades before, the human genome project has allowed them to be studied in greater detail. Pseuodgenes are noncoding DNA segments (for proteins) that exhibit a high degree of similarity to functional genes but have been disabled by disruptive defects. This definition includes the inability of the pseudogene to code for its original protein but does not state that the pseudogene is necessarily functionless. By sequencing their base pairs it is obvious that these genes are molecular relics, having been disabled due to mutations, copying errors, or duplication errors. To date, 22% of the human genome is composed of ancient repeats, basically decayed pseudogenes and 80% of all human processed pseudogenes (see below) are primate specific. Some are non-functional due to just a single point mutation. Most pseudogenes however are disabled duplicates of working genes. Labs in the US, Japan, and Europe have identified about 20,000 human pseudogenes as of this writing and many scientists feel there may be many more pseudogenes than functioning genes in the human genome. Like an old computer that slowly builds up fragments of code over many years, this finding would fit well with what would be expected through evolution. Some pseudogenes appear to be conserved (shared between species with little change over time) and thus are likely to be functional in some way since it is inferred that they are under positive selection pressures.
The working gene contains several parts. Nucleotide regions that code for proteins are called exons, which are separated by non-coding sequences called introns. At the beginning of a gene is a segment called a promoter that allows the cell to recognize the starting point of the gene. After the entire gene is transcribed, splicing occurs and the introns are cut out, the exons joined, and an edited messenger RNA is produced. This mRNA then travels to a ribosome where it is translated into a protein.
The 98.5% of human DNA that does not code for proteins includes introns, non-coding RNA genes, enhancer sequences, and regulatory sequences. Besides their role in splicing, introns may also code for microRNA that help regulate the expressions of genes. The term “junk DNA” was coined by Ohno in 1972 to describe the incredible amount of DNA that did not code for proteins, and if eliminated did not seem to cause problems in animals models (mouse, for example) . It is a term that is now outdated since some of these non-coding regions have been found to influence the genome, as discussed above. In the puffer fish Takifugu rubripes, nearly 90% of its genome is non-coding DNA and it is hardly expected that functions will be found for all of this DNA.
Types of Pseudogenes
There are three basic types of pseudogenes.
1. Processed pseudogenes - after a gene has undergone splicing and translating into mRNA, a portion of the mRNA can be reverse transcribed back into DNA and then inserted back into the chromosome in a process called retrotransposition. This is very common and between 30% - 44% of the human genome consists of these repetitive elements called retrotransposons. Since they come from mature mRNA, they usually have a poly-A tail, lack introns, and lack promoters. Thus, they are DOA or “dead on arrival” when inserted and are immediately nonfunctional. The 80 human genes that produce ribosomal proteins has given arise to about 2,000 pseudogenes, about 10% of all the known pseudogenes from just this one group.
2. Non-processed pseudogenes - these are duplicated pseudogenes. Gene duplication is also common and is thought the be a major source of new genetic material for evolution. Initially, duplicated genes are functional until they acquire mutations. They do not affect an individual since an intact functional gene still exists and thus these duplicated but non-functional pseudogenes can stay in the genome, slowly building up more mutations. Shared identical or similar duplicated pseudogenes are strong evidence for common ancestry.
3. Disabled (unitary) pseudogenes - different mutations may stop or disable a gene. Like duplicated pseudogenes, these genes also become non-functional but in this case the genes are not duplicated first. Normally, these genes might be removed by selection, but it can become fixed in the population due to genetic drift or population bottlenecks. An example is the GLO gene that was disabled in primates, producing the pseudogene GLOP and the inability to produce vitamin C in those species with this pseudogene.
Examples of Unitary Pseudogenes
1. Vitamin C. This vitamin, also known as ascorbic acid and which nearly all mammals can synthesize, however is not produced in primates, fruit bats and guinea pigs due to a single mutation in one of the enzymes needed to synthesize the nutrient. In primates, the defective gene has been found (GLO) and the identical mutation is shared by multiple other species. The best, and some would claim only, reasonable explanation that different species would share this mutation has to be common ancestry. It is unreasonable and illogical to propose that the same mutation would occur in the same location producing the same pseudogene in different species unless they shared a common ancestor. As expected through evolutionary theory, the guinea pig mutation was found to be different and thus happened independently; since guinea pigs are not closely related to primates one would not expect the same mutation to have occurred. In addition, once this gene became defective it essentially became fossilized and mutations accumulated over millions of years. When comparing the mutations across species the pseudogene GLOP is more similar between humans and chimps compared to humans and orangutans, a more distant relative. This is double layered evidence not just for evolution but for macroevolution between humans and the other great apes.
(see Venema's discussion about the GLO pseudogene, Link)
Continued on next page -------------- > Click here.