Quick Notes on Genetic Code:- 1. Introduction to Genetic Code 2. Properties of Genetic Code 3. Exceptions 4. Decipherence.
Introduction to Genetic Code:
Living things depend on proteins for existence, the latter produce enzymes necessary for all chemical reactions. Structural information required to specify the synthesis of any given protein resides in the molecule of DNA which has the spatial configuration of a double helix proposed by Watson and Crick (1953).
The linear sequence of bases in DNA constitutes alphabet (hereditary lettering of 4 bases – A, T, C, C) which ‘codes’ for another linear structure, a protein, written in another alphabet of 20 amino acids.
The actual transfer of information is, however, indirect. DNA is a ‘template’ for the formation of RNAs, which are incorporated into ribosomes and in turn act as templates for protein synthesis.
All properties of protein, including its secondary and tertiary structure, are ultimately determined by chromosomal DNA, and all biological properties are in turn determined by the amino acid sequence of the proteins within an organism, through protein structure and enzyme activity.
The term ‘coding’ implies the relationship between DNA and protein. By coding, the hereditary lettering carried in the four alphabet of DNA is ultimately converted into the protein language composed of twenty letter alphabet of amino acids.
Co-linearity of Gene and Polypeptide:
In 1958, Crick proposed the hypothesis that DNA determines the sequence of amino acids in a polypeptide. Fundamental to this relationship is that they are both linear in structures, in one case a sequence of nucleotides, in the other case a sequence of amino acids.
By comparing the nucleotide sequence of a gene with the amino acid sequence of a protein, we can determine directly whether the gene and the protein are co-linear or not. A gene of 3N base pairs is required to code for a protein of N amino acids.
The co-linearity of gene and protein was originally investigated in the tryptophan synthetase gene of E. coli by Yanofsky and his co-workers by utilizing a polypeptide chain A of tryptophan synthetase enzyme. It has been observed that different mutations in the DNA sequence were present in the same order as is observed in the alterations noticed in corresponding amino acid sequence in polypeptide chain A.
The recombination distances are relatively similar to the actual distances in the protein, so in this case there is much similarity between the recombination map and the physical map.
For eukaryotic split gene having introns where all base sequences are not translated into amino acid in proteins demonstrates that co-linearity between base sequence of gene and amino acid sequence in protein may be interrupted but not violated.
Properties of Genetic Code:
Code is Triplet:
Researches have been carried out by Ochoa, Kornberg, Nirenberg, Brenner, Crick and others to detect the coding ratio, i.e., the number of units in one system required to specify one unit in the other system. Certainly no one-to-one correspondence can be observed between nucleotides and amino acids.
If each kind of nucleotide specified a single amino acid, only proteins consisting of four amino acids could be constructed. Similarly, the correspondence of an amino acid to two nucleotides would give a larger number of possibilities but still not enough, only = 16.
If a three digit code is employed, however, a total of = 64 kinds of units or codons are established (Fig. 15.1), more than enough to encode twenty amino acids. The surplus forty four triplets were initially thought to be nonsense codons and the remaining twenty as sense codons.
However, later studies have shown that several triplets can code for one amino acid. As such the number of nonsense triplets is very few. Some of the nonsense triplets might also be used as ‘punctuations’, designating the end of a chemical message.
Critical information on the nature of coding units (i.e., the code is in triplets) was gathered from studies of the mutagenic effect on polynucleotide chain (DNA).
Application of mutagen leads to the deletion or duplication of one nucleotide pair or several adjacent pairs. Addition or deletion of one or two bases respectively often causes a drastic effect and the organisms ultimately dies.
The addition or deletion of three bases together, on the other hand, though causing changes in the behaviour of the organism, yet may not necessarily induce a lethal effect and organism may survive with altered mutated tissue.
(i) The direct and exact evidence supporting the triplet code concept was provided by Crick et al. (1961) based on their experiments on a virus, T4 bacteriophage (Fig. 15.2). They found, that the treatment with a chemical called pro-flavin either added or removed a base in its DNA molecule, thus damaging the virus and resulting in an altered or mutant form of the virus.
An addition followed by a deletion of base close by resulted in the restoration of the original virus. This implied that the normal sequences of bases in the DNA molecule had been restored by the second change.
A deletion or insertion completely upsets the reading frame as may be seen from the example of the base sequence GTCCAGACC. Normally the sequence will be read as GTC, CAG, ACC, …, but with the insertion of a new base T between the first and second nucleotides, it yields the sequence GTTCCAGACC … and leads to reading in the groups GTT, CCA, GAC, C …, and specifies wrong amino acids.
A similar consequence results from a deletion. Crossing between an addition and deletion will restore the correct reading frame of the sequence except in the region between them. It is easy to see that the combinations of two mutants in the form of two insertions or two deletions will still produce a misplaced reading frame.
Crick (1961) found that three additions or deletions of adjacent nucleotides resulted in the production of the normal virus, due to the restoration of the normal base sequence in DNA.
Thus experiments demonstrating that a combination of three insertions or deletions produced a bacteriophage of perfectly normal appearance and that recombinants containing insertions or deletions in numbers not multiples of three produce only nonfunctional or wrong protein, provided strong evidence that the genetic code operates as a triplet code or that one triplet of nucleotides constitutes a codon.
(ii) The triplet nature of the code was further confirmed through the research work of Nirenberg and Leder (1965) who found that although little binding of tRNA was possible in the presence of dinucleotide messengers, it occurred preferentially with trinucleotides.
They were able to stimulate binding of different amino acids through different sequences of the same three bases, once again giving credence to the existence of a triplet code.
Code is Non-Overlapping:
In nature, there is always a tendency towards economy. As suggested by Gamow, in his ‘overlapping’ coding hypothesis, the code is in the form of triplets, but not arranged in a straight chain. It is overlapping in the regions where a particular nucleotide serves in more than one coding unit.
Gamow suggested overlapping code on the basis of two characteristics:
(a) Distance between two bases in a DNA molecule is 3.4A;
(b) In a protein molecule also, the distance between two adjacent amino acids is 3.4A.
This can be explained in cases of mono-coding as well as overlapping coding but this is quite improbable in a straight chain triplet coding. In the non-overlapping code six nucleotides would code for two amino acids, while in case of overlapping code up-to four (Fig. 15.3).
In the non- overlapping code each letter Is read only once while in the overlapping code it would be read three times, each time as a part of different words. Mutational changes in one letter would affect only one word in the non-overlapping code while it would affect three words in the overlapping code.
There are evidences of non- overlapping nature of genetic code.
(i) The experimental evidence by Crick (1961) compellingly argued against an overlapping code and through their research substantiated the arguments provided by earlier scientists in favour of a non-overlapping code. They started with a messenger of known triplet sequence and used this to synthesize a particular protein.
On adding a nucleotide to it, the particular protein could no longer be synthesized. The result remained unaltered even with the addition of a second necleotide. The proper function of the nucleotide was restored, however, on introduction of a third nucleotide.
A given nucleotide sequence ACTACTAC- TACT bears the codons ACT, ACT, ACT, ACT under the non-overlapping coding systems. An insertion of a nucleotide G between the first C and the first T, under such a system will change the nucleotide sequence to ACGTACTACTACT and codon sequences to ACG, TAG, TAG, TAG, T.
The synthesis of original protein will not take place after the addition of a nucleotide. Instead the altered amino acid chain will be producing an altogether different protein. A second insertion of another nucleotide G between the first C and first G of the previously altered nucleotide chain results into a new nucleotide sequence ACGGTACTACTACT and the corresponding codon sequence ACG, GTA, CTA, CTA, CT.
The particular protein still cannot be synthesized. A third nucleotide addition, an insertion of nucleotide G, in the beginning of the nucleotide chain available after the last step causes it to read as GAGGGTACTACTACT and the corresponding codon chain available is GAC, GGT, ACT, ACT, ACT.
The third addition has restored most of the original triplet sequence. The deletion of bases from DNA has the same effect as that of deletion. The third deletion will, however, restores most of the reading frame and allow a sequence of amino acids, differing slightly from its original one. This suggests that the code is non-overlapping.
(ii) Another evidence supporting the existence of a non-overlapping code is provided by the effect of single-site mutations.
A single mutation in an overlapping coding system would invariably affect two or more adjacent amino acids in the nucleotide chain. A mutation from the first G to C in the nucleotide sequence ATGATGATG will cause change in one codon only in the case of a non-overlapping code. The original codon sequence of ATG, ATG, ATG will result into a codon sequence ATC, ATG, ATG after single mutation.
However, if the code was an overlapping one, the original codon sequence ATG, TGA, GAT, ATG, TGA, GAT, ATG will change into the codon sequence ATC, TGA, CAT, ATC, TGA, GAT, .ATG. As a result of single mutation, three changes take place. In the codon sequence when the overlapping code is in operation.
Only one change would be expected in case of a non-overlapping code. Since only single amino acid changes have been observed in the experimental studies of single-site mutation, this evidence reinforces the existence of non-overlapping code.
(iii) Brenner (1957), on the basis of all the published data on the studies of the sequence of amino acids in proteins, concluded that there were no forbidden zones in proteins, and neighbouring amino acids were invariably coded by unrelated groups of nucleotides.
It was further established that no specific amino acid will always have the same nearest neighbours and the amino acid sequences appear to be almost completely at random. Such revelations would not have been feasible had the code been of an overlapping nature.
(iv) Yanofsky (1963) provided perhaps the most convincing evidence available that excludes any overlapping code. In his studies of both mutation and recombination through transduction technique, he found that in each protein with a different amino acid at a given position, the amino acids on either side remained unchanged.
Code is Degenerate:
Sometimes three or four triplet codons code for a particular amino acid. Such a genetic code where there are more than one triplet (codon) codes for a single amino acid is known as degenerate code. Out of possible 64 different codons, 61 codons code for different amino acids.
As there are 20 amino acids, so it is obvious that more than one codon or triplet codes for one amino acid. If each amino acid is coded by a single codon, 44 codons out of 64 will be useless or nonsense codons.
Numerous evidences indicate that the genetic code is degenerate.
(i) If twenty triplets only would have made sense and the remaining forty four remained nonsense, then in a chromosome length mutations could occur only at very limited sites representing one-third of the length and not throughout its entire length.
But the rate of spontaneous mutation as well as the results of induced mutation through X-rays has shown that nearly the entire chromosome site is capable of undergoing mutation. It is possible if only when the code is degenerate. However, though the degenerate nature of the code has been established, the presence of high number of repeated sequences may make major segments of chromosomes non-mutable.
(ii) When two bases U and C, in a 3:1 proportion are synthesized into in RNA, the possible triplets and their frequency can be mathematically determined :
UUU = 3/4 x 3/4 x 3/4 = 27/64; UUC = 3/4 x 3/4 x 1/4 =9/64; UCU = 3/4 X 1/4 X 3/4 = 9/64; CUU = 1/4 x 3/4 x 3/4 = 9/64; UCC = 3/4 x 1/4 X 1/4 = 3/64; CUC = 1/4 x 3/4 X 1/4 = 3/64; CCU = 1/4 x 1/4 x 3/4 = 3/64; CCC = 1/4 X 1/4 X 1/4 = 1/64.
mRNA of this composition should guide the incorporation of eight amino acids but in fact only four amino acids were actually detected in the protein chain indicating the degenerate nature of the code, i.e., some of the codons in this case have directed the incorporation of the same amino acid.
(iii) According to the wobble hypothesis of Crick (1966), the first two bases of the triplet codon pair according to the set rules, i.e., A with U and G with C but the third base having much more freedom of movement than the other two, wobbles and permits more than one type of pairing at that position. Thus the wobble hypothesis explains the degeneracy of the code to some extent.
It is sometimes argued that the third base of a code is not very important and that specificity of a codon is particularly determined by the first two bases. It has been shown that the same tRNA can recognise more than one codons differing only at the third position. This paring is not very stable and is allowed due to wobbling in base pairing at this third position.
Crick in 1965 proposed a hypothesis called wobble hypothesis to explain this phenomenon. He discovered that if U is present at first position of anticodon, it can pair with either A or G at the third position of codon. Similar is the case with G, found in anticodon, which can pair with either C or U of codon (Table 15.1 A).
The wobble hypothesis visualizes that many codons are able to tolerate mutations at the third base site because of the non-restrictive spatial limitations for the corresponding base in the anti- codon. The third nucleotide in many codons was better tolerated and could be substituted without damage.
The corresponding base in the anticodon would wobble and accommodate. This kind of wobbling allows economy of the number of tRNA molecules since several codons meant for same amino acid are recognized by same tRNA.
Code is Comma-less:
A comma-less code means that no punctuation marks are needed between two words. In other words, we can say that after one amino acid is coded, the second amino acid will be automatically coded by the next three letters and no letters are wasted (Fig. 15.4).
However, the code for an entire polypeptide having several amino acids is always terminated by a nonsense codon which servers as full stop in the coding terminology.
If the genetic code functions with commas, a specific nucleotide serves as a punctuation mark. Through experiments it has been established that poly-A (AAA) codes for lysine, poly-C (CCC) for proline, and poly-U (UUU) for phenylalanine, which implies that the commas are not made up of A, C and U.
Code is Non-Ambiguous:
Ambiguity denotes that a single codon may code for more than one amino acid. Non- ambiguous means that there is no ambiguity about a particular codon. A particular codon will always code for the same amino acid.
The genetic code is generally non-ambiguous, can be experimentally confirmed using a specific single triplet-ribosome complex which directs the binding of specific tRNA. For example, UUU triplet-ribosome complex directs the binding of phenylalanine-tRNA and AAA triplet-ribosome complex directs the binding of the lysine-tRNA.
In the similar manner, by using the triplets of known sequence, the codons for valine, cysteine, leucine and some other amino acids were determined, thus clearly establishing the non-ambiguous nature of the genetic code under natural physiological conditions.
Code is Universal:
The genetic code is universal. It means that the same codon codes for the same amino acid in all the organisms, from human beings to virus.
Universal nature of genetic code has been experimentally evidenced.
(i) The crucial point in the genetic code is the fitting of tRNA with specific anticodon into the codon of the mRNA.
Thus if mRNA is taken from an eukaryote and tRNA from a prokaryote and protein synthesis could be carried as coded in the mRNA, then it can be proved that code is universal, if mRNA and ribosome are taken from E. coli, and amino acid and tRNA from rat, protein synthesis can be carried out as coded in the mRNA of E. coli. This is true also the other way round.
Von Ehrenstein and Lipmann found that E. coli tRNA to which labeled amino acids were added would form haemoglobin when incubated with the mRNA and ribosomes of rabbit reticulocytes.
The precision with which this interspecific attachment occurs was shown by converting cysteine into alanine in amino acid-activated tRNAcys and then observing that this alanine was now inserted into peptide positions ordinarily occupied by cysteine, in other words, the anti- codon of the cysteine-tRNA of a bacterial species recognized the cysteine codon of mammalian mRNA in spite of the fact that the tRNA was carrying an alanine amino acid.
(ii) The tRNA from E. coli, Xenopus laevis and guineapig bind to the same trinucleotides as shown by Nirenberg et al., indicates the universality of the code.
(iii) Studies of Merril and co-workers (1971) revealed that a bacterial enzyme X-D-galactose -1 phosphate uridyl transferase which catalyses the metabolism of galactose sugars is produced in human tissue culture cells, previously unable to make it, after infection by a virus carrying the E. coli gal+ gene. This provides strong evidence in favour of the universality of the code.
(iv) The correlated nucleotide and amino acid sequences in the overlapping genes of the DNA bacteriophage ф x 174 and in the capsid protein coding gene of RNA bacteriophage MS2 indicates that the genetic code is universal.
(v) Uniformity in amino acid sequence of homologous proteins, e.g., cytochrome c collected from widely divergent species like human, horse, chickens, yeast and bacteria displayed universality of the genetic code.
(vi) Finally genes from human and other organisms have been expressed in E. coli and those from bacteria and other organisms in plants. In each such case, the polypeptide produced by a gene in the new organism was identical with the one it produced in the organism of its origin.
Exceptions of Genetic Code:
A triplet codon demands its own tRNA with a complementary anticodon or a single tRNA responds to both members of a codon pair or to all (or at least some) of the four members of a codon family. Often one tRNA can recognise more than one codon, i.e., codon is degenerate.
This means that the base in the first position of the anticodon must be able to partner alternative bases in the corresponding third position of the codon. In such cases there may be differences in the efficiencies of the alternative recognition reactions (as a general rule, codons that are commonly used tend to be more efficiently read).
In addition to the constructions of a set of tRNAs able to recognise all the codons, there may be multiple tRNAs that respond to the same codon. The predictions of wobble pairing accord very well with the observed abilities of almost all tRNAs. But there are exceptions in which the codons recognized by a tRNA differ from those predicted by the wobble rules.
Such effects probably result from the influence of neighbouring bases and/or the conformation of the anticodon loop in the overall tertiary structure of the tRNA. Indeed, the importance of the structure of anticodon loop is inherent in the idea of the wobble hypothesis itself.
Further support for the influence of the surrounding structure is provided by the isolation of occasional mutants in which a change in a base in some other region of the molecule alters the ability of the anticodon to recognize codons.
Another unexpected pairing reaction is presented by the ability of the bacterial initiator, fMet-tRNA ƒmet to recognize both AUG and GUG. This misbehavior involves the third base of the anticodon. Though the genetic code is non-ambiguous, but GUG codes for methionine when used as initiator codon, but it codes for valine if present at the intercalary position, indicating its ambiguous nature.
The universality of the genetic code is striking, but some exceptions exist. They tend to affect the codons involved in initiation or termination and result from the production (or absence) of tRNAs representing certain codons. Almost all of the changes found in principal genomes affect termination codons.
In the prokaryote Mycoplasma capricolum, UGA is not used for termination, instead codes for tryptophan. In fact, it is the predominant Trp codon, and UGG is used only rarely. Two Trp-tRNA species exist, with the anticodons UCA (reads UCA and UGG) and CCA (reads only UGG).
Some ciliates (unicellular protozoa) read UAA and UAG as glutamine instead of termination signals. Tetrahymena thermophile, one of the ciliates, contains three tRNAglu species. One recognises the usual codons CAA and CAG for glutamine, one recognises both UAA and UAG (according to wobble hypothesis), and the last recognizes only UAG.
We assume that the release factor eRF has a restricted specificity, compared with that of other eukaryotes.
In another ciliate (Euplotes octacarinatus), UGA codes for cysteine. Only UAA is used as a termination codon, and UAG is not found. The change in meaning of UGA might be accomplished by a modification in the anticodon of tRNAcys to allow it to read UGA with the usual codon UGU and UGC.
The only substitution in coding for amino acids occurs in a yeast (Candida), where CUG means serine instead of leucine (and UAG is used as a sense codon).
All of these changes are sporadic, which is to say that they appear to have occurred independently in specific lines of evolution. They may be concentrated on termination codons, because these changes do not involve substitution of one amino acid for another. Thus the divergent uses of the termination codons could represent their ‘capture’ for normal coding purposes.
Exceptions to the universal genetic code also occur in the mitochondria from several species.
The earliest change was the employment of universal stop codon UGA to code for tryptophan which is common to all (non-plant) mitochondria. It is not likely that UGA coded for tryptophan in the universal code, but was changed to termination in cytoplasmic translation, because it is a stop codon in bacteria, plant mitochondria and nuclear genomes.
Departures from the universal code, all in non-plant mitochondria, are CUN (leucine) for threonine (in yeasts), AAA (lysine) for asparagine (in Platyhelminthes and echinoderms), UAA (stop) for tyrosine (in Planaria), and AGR (arginine) for serine (in several animal orders and for stop (in vertebrates) [N = A, U, G or C; R = A or G) (Table 15.1B).
The mitochondria of plants and protozoans differ in importing and utilizing tRNAs encoded by the nuclear as well as the mitochondrial genome, whereas in animal mitochondria, all the tRNAs are encoded by the organelle.
The small number of tRNAs encoded by the mitochondrial genome highlights an important feature of the mitochondrial genetic system — the use of a slightly different genetic code, which is distinct from the universal code used by both prokaryotic and eukaryotic cells.
Some of these changes make the code simpler, by-replacing two codons that had different meanings with a pair that has a single meaning. Pairs treated like this include UGG and UGA both Trp instead one Trp and one termination) and AUG and AUA (both Met instead of one Met and other lie).
The changes are typically preceded by loss of a codon from all coding sequences in an organism or organelle, often as a result of directional mutation pressure, accompanied by loss of the tRNA that translates the codon.
The code reappears later by conversion of another codon and emergence of a tRNA that translates the reappeared codon with a different assignment. Changes in release factors also contribute to this revised assignment. Thus the genetic code, formerly thought to be frozen, is now known to be in a state of evolution.
Decipherence of Genetic Code:
It was not possible to say which codon of the possible 64 codons should code for which of the 20 amino acids until the first clue to this problem came when M.W. Nirenberg used in vitro system for the synthesis of a polypeptide using an artificially synthesized mRNA molecule.
In 1961 Nirenberg and Mathaei characterized the first specific coding sequences, which helped in analysis of genetic code.
Their success on decipherence of code was dependent on two experimental systems:
(i) In vitro (cell free) protein synthesizing system,
(ii) An enzyme, polynucleotide phosphorylase which allowed the synthesis of synthetic mRNAs. These mRNAs served as templates for polypeptide synthesis in the cell free system.
The enzyme polynucleotide phosphorylase functions metabolically in bacteria to degrade RNA, but with high concentrations of ribonucleotide diphosphates, the reaction can be ‘forced’ in the opposite direction to synthesize RNA.
Like RNA polymerase it does not require any DNA template, each addition of ribonucleotide is random based on the relative concentration of the four ribonucleoside diphosphates added to the reaction mixtures. The probability of insertion of a specific ribonucleotide is proportional to the availability of that molecule, relative to other available ribonucleotides.
The cell free system for protein synthesis and the availability of synthetic mRNAs provided a means of deciphering the ribonucleotide composition of various triplets encoding specific amino acids.
Homopolymers Technique (Poly U Experiment):
In their initial experiments, Nirenberg and Mathaei, synthesized RNA homopolymers, each consisting of only one type of ribonucleotide, i.e., the produced mRNA in the in vitro system is either UUUUU …, AAAAA …, CCCCC … or GGGGG … In testing each mRNA, it was very much easy to determine which amino acid was incorporated in the polypeptide chain.
Different amino acids were labelled by using 14C and tested separately by radioactive counting. In the synthesized RNA using only uracil, there was no other base all along the length of mRNA and the only possible triplet was UUU.
When such a poly-U (RNA) was used in the synthesis of a polypeptide (using all extracts from E. coli, and supplying all the required components of protein synthesizing machinery), only polyphenylalanine was synthesized, meaning that the only amino acid coded was phenylalanine.
It was, therefore, immediately concluded that the input UUU coded for the amino acid phenylalanine. Subsequently, poly A gave polylysine and poly C gave poly-proline. Therefore, UUU was assigned to phenylalanine, AAA to lysine and CCC to proline. But the poly G did not serve as template as it gets folded backs on itself, for this assignment other method had been followed.
Heteropolymers (Random): Mixed Copolymers Technique:
The study of polynucleotides were further extended with copolymers as synthetic messengers containing two or more bases in definite proportion in cell free system. These randomly synthesized polynucleotides resulted in direct incorporation of amino acids into protein in a manner which indicated that a number of different code words are involved in the binding of different amino acids.
In cell free culture, with these synthetic polyribonucleotide’s, the different amino acids incorporated in a messenger could be clearly correlated with the expected variations in the frequency of different triplets in the synthetic copolymers. Thus this experiment showed the way of deriving nucleotide composition of triplets for each of the amino acids.
Nirenberg, Mathaei and Ochoa did their experiments using the RNA heteropolymers in this technique two or more different ribonucleoside diphosphates were added in combination to form the artificial message. The frequency of a particular triplet codon on the synthetic mRNA depended on the relative proportion of ribonucleotide addition in the cell free system.
The percentage of incorporation of particular amino acid in the polypeptide chain could be used for prediction against a particular triplet codon.
For example, in a system A and C are added in a ratio of 1 A: 5C. Now, the insertion of a ribonucleotide at any position along the RNA molecule during its synthesis is determined by the ratio of A:C. Therefore, there is a 1/6 possibility for an A and a 5/6 chance for a C to occupy each position.
On this basis, we can calculate the frequency of any given triplet appearing in the message. For AAA, frequency is (1/6)3 or 0.4%. For AAC, ACA and CAA, the frequencies are identical (1/6)3 x 5/6 or 2.3%, all three together it is 6.9%. In the same way 1A:2C is calculated which is 1/6 x (5/6)2 or 11.6% or all together 34.8%, whereas CCC is (5/6)3 or 57.9% of the triplets.
Now by examining the percentage of any given amino acid incorporated into the protein synthesized under the direction of this message, it is possible to propose probable base composition. As because proline appears 69%, it can be deduced that proline is likely to be coded by CCC (57.9%) and also by one of the triplet code 1A : 2C variety (11.6%), i.e., 57.9 + 11.6.
Histidine incorporation percentage is 14% which is probably coded by one 1A:2C category and another 1C:2A category (11.6+2.3)%. Threonine shows 12% incorporation, i.e., likely to be coded by one 1A:2C category. Asparagine and glutamine appear to be coded by one of the 1C:2A triplets and lysine appears to be coded by AAA.
Using as many as all four ribonucleotides to construct this kind of random heteropolymers of synthetic mRNA, the composition of triplet code words corresponding to all 20 amino acids could be determined (Table 15.2).
Heteropolymers (Ordered): Repeating Copolymers Technique:
In early 1960s H.G. Khorana could chemically synthesize long RNA molecule consisting of short sequences repeated many times. The short sequences were of di-, tri- or tetra-nucleotides, which were replicated many a times and finally joined enzymatically to form the long polynucleotides.
The dinucleotide repeats will be translated for two different amino acids; trinucleotide repeats will be converted into 3 potential triplets, depending on the point at which initiation occurs; and a tetra-nucleotide creates four repeating triplets.
When these synthetic mRNAs were added to a cell free system and amino acid incorporation is matched, the conclusions can be drawn from the composition assignment and triplet binding, and specific assignments were possible.
When the repeating dinucleotide sequence is UCUCUCUC…, it produces the triplets UCU and CUC — they can incorporate leucine and serine into the polypeptide. When the repeating trinucleotide sequence is UUCUUCUUC…, the possible triplets are of three kinds: UUC, UCU and CUU depending on the initiation point and they can incorporate phenylalanine, serine and leucine.
From the above two results it can be concluded that UCU and CUC encode for serine and leucine and also either UUC or CUU encodes for serine or leucine, while the other encodes for phenylalanine. Further, when the tetra-nucleotide sequence UUAC is repeated then it produces the UUA, UAC, ACU and CUU.
Here the incorporated amino acids are leucine, threonine and tyrosine. In the above two cases, the common code is CUU and common amino acid incorporated is leucine, so it can be concluded that CUU encodes for leucine.
Now from these experiments logically it can be determined that UCU encodes for serine and the rest UUC encodes for phenylalanine and also the CUC encodes for leucine (Table 15.3).
Like this way, by logical interpretations, Khorana reaffirmed triplets that were already deciphered and filled in gaps left from other approaches (Table 15.4).
Triplet Binding Technique:
Nirenberg and Leder in 1964 found that if a synthetic tri-nucleotide for a known sequence is used with ribosome and a particular aminoacyl- tkNA, these will form a complex provided that the used codon codes for the amino acid attached to the given aminoacyl-tRNA.
In order to work out the code for all 20 amino acids, all the possible 64 triplets had to be tried in cell free culture.
In the experiment, 20 samples of the mixture of all 20 amino acids were taken and in each sample, one amino acid was made radioactive in such a manner that each and every amino acid is radioactive in one sample or the other, and no two samples have same radioactive amino acid. For instance, in one set valine has been labelled and the rest 19 remained unlabelled.
Similarly, in another set lysine was labelled and the rest 19 remained un-labelled. Then the tRNAs and ribosomes are mixed with each of these samples and the same codon is used for all sets. When the mixture is poured on the nitrocellulose membrane, radioactivity on membrane will be observed only when the radioactive amino acid is taking part in the formation of complex.
Since in each sample the radioactive amino acid is known, it would be possible to detect the amino acid coded by a given codon by the presence of radioactivity on the membrane. Such a treatment was given to all 64 synthetic codons, and their respective amino acids were identified.
The base sequence in mRNA and the resulting amino acid sequence in protein reveals the code for each amino acid. All the 64 codons, along with their amino acids, are represented in Table 15.5.
An examination of the code table reveals the following characteristics:
i. Each codon consists of three nucleotides, i.e., the code is triplet. 61 codons represent 20 amino acids. Three represent (UAA, UAG, UGA) punctuation marks for termination of protein synthesis.
ii. Almost all amino acids are coded by more than one codon, except methionine and tryptophan which have only one codon. Phenylalanine, tyrosine, histidine, glutamine, asparagine, lysine, aspartic acid, glutamic add and cysteine are the nine amino acids which are represented by two codons each. Three amino acids, i.e., arginine, serine and leucine have Six codons each. The table indicates the degeneracy of the genetic, code.
iii. If an amino acid has more than one codon, the first two nucleotides are identical and the third nucleotide can be either cytosine or uracil. Adenine and guanine are also similarly interchangeable at the third position. For example, UUU and UUC, both code for phenylalanine, and UCU, UCC, UGA and UCG code for serine.
However, there are some exceptions to the equivalence rule of the first two nucleotides, as AGU and AGC also code for serine apart from UCU, UCC, UCA and UCG.
Similarly, the amino acid leucine is also coded- by six codons, i.e., UUA, UUG, CUU, CUC, CUA and CUG.
The frequent interchange of cytosine and uracil or guanine and adenine suggests that great variations can occur in AT/GC ratio in certain organisms without affecting large changes in the relative proportions of amino acids present in them, as for almost every amino acid there is one codon that carries G or C and another that carries A or U as its third nucleotide.
The two organisms carrying the same protein sequence information in their DNA, by selecting one or the other kind of synonym codon, can show different AT/GC ratios.
iv. The genetic code has a definite structure in the sense that the synonyms for the same amino acid are not randomly dispersed over the table but are usually found together. The only exceptions are the codons, six each for arginine, serine, and leucine, which are spread over the table.
v. Multiple codons for an amino acid show in general the similarity in first two nucleotides and it is the third nucleotide which varies.
AUG is the initiation codon, i.e., the polypeptide chain starts with methionine. This amino acid is the formulated form of methionine. The initiation codon binds to fmet-tRNA having an anticodon 3′ UAC 5′ which is identical to that of met-tRNA, i.e., both met- tRNA and fmet-tRNA are coded by AUG but the signal for the starting amino acid is much more complex than the signal for all other amino acids.
According to Stent, there exist two separable species of tRNA capable of accepting methionine. Methionine of only one of these is concerned into formyl methionine by the action of the special formulation enzyme. The other or ordinary met- tRNA incorporates methionine into the interior of the growing polypeptide chain and responds to the codon AUG only.
Formyl-met-tRNA initiates the polypeptide chain and responds to GUG (valine codon) also. The GUG while present at the initiation point, codes for methionine whereas in the intercalary position, it codes for valine. The anticodon of this species of tRNA seems to be permissive with respect to the first nucleotide base of the codon and selective with respect to the second and third nucleotide bases.
UAA, UAG and UGA are the chain termination codons. They do not code for any of the amino acids but serve as stop codon. These codons do not have any tRNA but are read by specific proteins called release factors. These codons are also called nonsense codons.
A mutation from a sense to nonsense codon in the middle of a genetic message results in the release of immature or incomplete polypeptides which do not have any biological activity. Nonsense mutations can be induced by mutagens. UAG was formerly known as amber, UAA as ochre and UGA as opal.