In this article we will discuss about:- 1. Meaning of the Genetic Code 2. Properties of the Genetic Code 3. Nature 4. Universality 5. Deciphering the Genetic Code.
Meaning of the Genetic Code:
Although the concept of a gene as a unit of heredity is about a century old (the term coined in 1909) its role in the life of a cell was first clearly formulated in the ‘one gene-one enzyme hypothesis proposed by Beadle and Tatum in 1940.
Elucidation of the nature of the genetic material, discovery of the structure of DNA, the advances in protein chemistry and, finally, the discovery of messenger-RNA, paved the path for understanding the relation between genes and enzymes. Both DNA and m-RNA consist of only 4 different nucleotides, represented by A, T, G, C or A, U, G, C.
The genetic information is, therefore, coded in a ‘language’ having only 4 alphabets. On the other hand, most proteins are made from 20 different amino acids and the sequence of these amino acids in a polypeptide chain is specific for each protein.
Any change in the amino acid sequence generally proves detrimental. It is imperative, therefore, that the specific sequence of a protein be strictly preserved. This is assured if the message of the amino acid sequence is coded in the nucleotide sequence of DNA.
In other words, there must be a unit of nucleotide sequence for each of the 20 protein amino acids. Such a unit functions as a code for a particular amino acid, and for 20 amino acids, there must be at least 20 such units or codes. From a purely mathematical consideration, it is apparent that such a unit must contain a sequence of at least 3 nucleotides, because less than 3 nucleotides cannot give 20 combinations. When the 4 different nucleotides are taken 3 at a time, 43 or 64 different combinations are possible.
These combinations are more than sufficient to code for 20 protein amino acids. Each unit combination of three successive nucleotides forms a codon for an amino acid (triplet codon). It has later been experimentally proved that each of the 64 triplet codons are utilized either for specifying amino acids or as signals for stopping amino acid incorporation in the polypeptide chain.
Most amino acids (except methionine and tryptophan) have more than one codons but no single codon specifies more than one amino acid. The triplet codons specifying the protein amino acids and those for stopping amino acid incorporation (termination codons) are shown in Table 9.3.
It may be seen from the table that most amino acids have more than one codons. This is called the degeneracy of the genetic code. It may also be seen that the degree of degeneracy is not uniform and it involves in most cases the third base of the triplet.
Properties of the Genetic Code:
The genetic code has a number of characteristic properties:
(i) Triplet nature of the code. The codons for different amino acids, as well as those for chain termination, always consist of three successive nucleotides of DNA or m-RNA. The DNA and m-RNA codons are mutually complementary. For example, the DNA codon CGT is complementary to the m-RNA codon GCU (α-alanine).
(ii) Unambiguous nature of the code. One particular codon never codes for more than one amino acid.
(iii) Degeneracy of the genetic code. Most of the amino acids — except methionine and tryptophan — are coded by more than one codon. The number varies between 2 and 6.
(iv) The genetic code is non-overlapping and unpunctuated. This means that a sequence of nucleotides in m-RNA is read in groups of three without overlapping, and also without any gaps between successive triplets.
(v) Universality of the genetic code. This means that the same codons specify the same amino acids in all organisms starting from bacteria to plants and animals and even in viruses. Although viruses have no amino acids of their own, they have the ability to direct protein synthesis in the host cells. Some exceptions to the universality of the genetic code have later been discovered.
Nature of Genetic Code:
(i) Triplet Nature of the Genetic Code:
The genetic message coded in m-RNA molecule is translated into the amino acid sequence of a polypeptide. During this process sets of three m-RNA nucleotides are read successively starting from an initiator codon, AUG, which codes for methionine, till a termination codon arrives at the site on the ribosome. Since a termination codon does not code for any amino acid, the polypeptide chain synthesis stops. The termination codons are also known as non-sense codons.
The whole sequence of m-RNA — starting from the initiator codon up to the triplet preceding the termination codon — is known as the reading-frame. Since the reading-frame is a continuous sequence of nucleotides, addition or deletion of a single nucleotide results in a change of the triplets from that point downstream.
Such an event constitutes a type of mutation known as frame-shift mutation which can be induced artificially by treatment with a mutagenic dye, like acridine. Frame-shift mutations in the coliphage T4 provide a strong evidence in support of the triplet nature of codons.
Acridine induces two types of changes in DNA. It either causes an addition or deletion of a single nucleotide of DNA.
The results of such changes in m-RNA and the amino acid sequence of a polypeptide are shown in Fig. 9.38:
It can be seen from Fig. 9.38 that insertion (addition) of a single nucleotide viz. C (cytidylic acid) results in shifting of one nucleotide towards the 3′-end causing a total derangement of the reading frame. As a result, the codons of all the amino acids from the point of insertion change. Similar effect occurs if a nucleotide A (adenylic acid) is removed (deleted). If, due to such a shift, a termination codon appears in the frame, the polypeptide chain elongation stops.
An important evidence supporting triplet code came from the study of the rII gene coding for the coat protein of bacteriophage T4. It was observed that a single frame-shift mutation caused a defective viral protein coat, but when two such mutations having opposite effects i.e. one addition and the other deletion occurred close to each other, the effect on the coat protein was less deleterious.
This was so because the original reading frame was restored downstream (5′ —> 3′) from the point of the second change. As a result the amino acid sequence of the coat protein remains unaltered beyond that point.
When three changes of similar type, i.e. either insertion or deletion, occurred close to each other, the reading frame beyond the third change remained unaltered and the amino acid sequence of the coat protein beyond it remained unchanged.
These observations strongly supported that the codons are composed of triplets of nucleotides and that the reading frame is non-overlapping as well as unpunctuated. The effects of three insertions or deletions close to each other are shown in Fig. 9.39 in which a regular sequence of four hypothetical bases, A, B, C and D, has been considered for simplicity.
The insertion or deletion of three nucleotides in the DNA caused changes in only a few amino acids in the coat protein of T4 phage. As a result, the coat protein was almost normal. In contrast, single insertion or election completely deranged the reading frame and the coat protein of the phage was highly defective. These experimental evidences provided for the first time a strong support for the triplet nature of the codons.
(ii) Non-Overlapping Nature of the Code:
The non-overlapping nature of the genetic code means that the reading frame is read in sets of three consecutive nucleotides and that the same nucleotide is not used for the consecutive triplets. For example, the non-overlapping code reads a frame ABCDABCDA as ABC, DAB and CDA. Had the code been overlapping involving one nucleotide, it would have been read as ABC, CDA, ABC, CDA.
In that case, a single change in the nucleotide sequence would cause change in more than one amino acids because the same changed nucleotide would be used in more than one codon. Experimental determination of amino acid sequences of a normal (wild-type) protein and a mutant protein shows that a single mutational event always causes a change of only one amino acid. Thus, it is proved that the codons are non-overlapping.
(iii) Degenerate Nature of the Code:
Another important feature of the genetic code is its degeneracy. Had the genetic code been absolute, then each amino acid would have been coded by a single codon. In that case, the chance of mutation would have been much greater than the rate of mutation observed in practice. Since most amino acids have more than one codon, (i.e. degenerate), a mutant codon may be substituted by another without causing a change in the amino acid.
Thus, even if a mutation occurs, the organism may still produce a normal protein. Degeneracy therefore, should be considered as a positive attribute for the stability of the genetic make-up of an organism. It is a strength of the genetic code and not a weakness.
A notable feature of degeneracy is that, in most codons, the third nucleotide at the 3′-end of the triplet appears to be of less importance than the first two. For example, α-alanine has four codons, GCA, GCC, GCU and GCG, or threonine, has also four codons, ACA, ACC, ACG and ACU. The first two nucleotides are fixed, while the third position can be filled by any one of the four nucleotides. During protein synthesis, the m-RNA codons form base-pairs with the t-RNA anticodons.
The degeneracy of the m-RNA codons assumes a special significance in this perspective. Crick proposed the wobble hypothesis to explain the relationship between codons and anticodons. According to this hypothesis, the third base of the degenerate codons can form non-standard base pairing with a base in the anticodon. Standard base-pairing relationship is between A and U, and G and C. But anticodons often contain some unusual bases, like inosine, pseudo-uracil etc.
Inosinic acid is a purine ribonucleotide which has great flexibility in pairing relationship. Normally, a purine pairs with only a pyrimidine, but inosinic acid can base pair not only with pyrimidine’s like cytidylic acid or uridylic acid, but also with adenylic acid which is a purine. According to the wobble hypothesis the third base of a codon ‘wobbles’, while the first and the second bases form stable base-pairing with the anticodon bases.
The flexibility in base-pairing of the wobbling third base gives an opportunity to base-pair correctly with the anticodon, because every t-RNA having a specific anticodon sequence can select one, two or three codons, depending on the first base of the anticodon, i.e. the 5′-end base of the anticodon triplet. If the first base is an unusual one, like inosine (I), the anticodon can pair with a codon having any of the three bases, U, C and A in the third position.
This is shown below:
X and Y are any two m-RNA bases. X’ and Y’ are the complementary t-RNA bases. I is inosine. It is seen, therefore, that the first two bases of the codons and the last two bases of the anticodons form standard base pairing between A-U and G-C represented as X-X’ and Y-Y’, while the third base of th6 codon (C, A or U) and the first base of the anticodon may form, if required, non-standard base-pairing with inosine (I). This provides for correct sequencing of amino acids in a polypeptide chain in spite of degeneracy of the codons.
Universality of the Genetic Code:
Analysis of the sequences of nucleotides of m-RNA and of amino acids of proteins of different organisms has adduced evidence in favour of universality of the genetic code which means that the same codons stand for the same amino acids in all organisms irrespective of their taxonomic position. Although this is largely true, some exceptions have been discovered which prove that the genetic code is not absolutely universal.
The most notable exceptions are the mitochondrial codons. Mitochondria have their own DNA which is transcribed and translated to produce proteins the mitochondrial genetic codes which are different from the universal codons (shown in Table 9.3) are presented in Table 9.4.
Apart from those mentioned in Table 9.4, in the mitochondria of maize (Zea mays) the codon CGG codes for tryptophan, while this codon stands for arginine in the universal code. Also, it should be noted that tryptophan is coded by UGA in the mitochondria of mammals, Drosophila and yeast. Thus, mitochondrial codons are not uniform in all organisms. Maize mitochondria use the codons AGA and AGG for arginine, like the yeast mitochondria.
In more recent times, deviations of the universal code have also been discovered in some organisms. For example, in Mycoplasma capricoleum, tryptophan is coded by the codon UGA, as in mitochondria, whereas in the universal genetic code, it is one of the termination codons. In the eukaryotic protozoan Tetrahymena UAA codes for glutamine and not for termination. Probably, more such discrepancies would be revealed in future, challenging the concept of universality of the genetic code.
Deciphering the Genetic Code:
After confirmation that the genetic code consists of triplets of nucleotides, the next question remained to be answered – What triplets coded for the twenty amino acids that are normally present in proteins? If each nucleotide or base is considered as a letter, and each codon as a three-lettered word, then the composition of the code-words has to be deciphered. This is also known as breaking the genetic code.
The first break-through was made by the experiments conducted by Nirenberg and Matthaei in 1961 using an in vitro protein synthesizing system. In a reaction mixture, they added E. coli ribosomes, t-RNA molecules and enzymes, all present in the cell-free extract of E. coli, and the 20 protein amino acids together with an artificial m-RNA.
This m-RNA was synthesized using an enzyme, polynucleotide phosphorylase and uridine 5′-diphosphate as its substrate. The enzyme formed a polyribonucleotide consisting of a chain of only uridylic acid (poly-U) because uridine diphosphate was used as substrate. The reaction mixture containing the above-mentioned ingredients was distributed in a series of test tubes, each tube containing 19 unlabeled protein amino acids and one labelled with 14C (radioactive).
After incubation, the acid-insoluble polypeptide was isolated and analysed. It was observed that the only polypeptide synthesized was polyphenylalanine. No other amino acid was incorporated. It was concluded that, by using poly-U as messenger, only polyphenylalanine can be produced.
In other words, the triplet code for phenylalanine was deciphered as UUU. Similar experiments using artificial messengers like poly-C and poly-A revealed that the code words for proline and lysine are CCC and AAA, respectively.
Further progress in deciphering the codons was made by using copolymers of two nucleotides, like U and G with the help of the same enzyme, polynucleotide phosphorylase and substrates, uridine 5′ diphosphate and guanosine 5′-diphosphate. As this enzyme incorporates the nucleotides at random into a polynucleotide, the sequence of nucleotides is also at random.
However the proportion of each nucleotide in the resulting polynucleotide can be determined by analysis. Adopting statistical methods, the frequency of probable codons in the copolymers could be calculated, although the sequence of the codons remains uncertain. Using such copolymers of U and G, polypeptides were synthesized and, from the analysis of the amino acids incorporated, the probable codons were deciphered.
A copolymer of U and G could form the following probable triplets -UUU, UUG, UGU, GUU, UGG, GUG, GGU and GGG. The following amino acids were found to be incorporated using a copolymer of U and G — phenylalanine, cysteine, valine, glycine and tryptophan. So, the codons for these five amino acids must be among the eight probable triplets. By using copolymers of other nucleotides, probable codons of other amino acids were obtained, but the precise sequences for the codons remained uncertain.
The most important step in understanding the code-words was made possible through a technique developed by Khorana and his associates. They were able to synthesise chemically trinucleotides of known sequence from 5′ —> 3′ end. Such trinucleotides could bind amino acids in a reaction inixtuie.
This meant that trinucleotides could also act as messengers for individual amino acids. With the help of this technique, the codons for all the amino acids were deciphered within a short time and with certainty. It was found that most amino acids were coded by more than one codon and all the 64 possible codons could be accounted for.
It was revealed that three of these functioned as termination codons signaling stoppage of amino acid incorporation into the growing polypeptide chain. As these codons do not bind any amino acid, they were deciphered employing synthetic polymers of known sequence.
For example, a polymer of four nucleotides of known sequence — GUAA — was found to yield two small peptides. These were val-ser-lys and ser-lys. As the codons for these amino acids were already deciphered, viz. GUA for val, AGU for ser and AAG for lys, it was deduced that UUA acted as one of the termination codons (Fig. 9.40).