In this article we will discuss about the determination of DNA sequence.
The sequence of bases A, T, G, and C in genes and whole genome can provide useful information for understanding gene structure and organisation, encoded protein and cell behaviour. The first complete nucleotide sequence of an entire viral genome (φX174) was determined in 1977 in the laboratory of Frederick Sanger.
In the 90s two new methodologies were developed for sequencing DNA, one by Sanger and Coulson in Cambridge, England, the other by Maxam and Gilbert at Harvard University. The Sanger-Coulson technique is most widely used and is described. The technique is also called dideoxy sequencing or the chain termination method because di-deoxynucleotide triphosphates (ddNTPs) are the chain terminators of DNA synthesis.
To start the procedure, DNA is cleaved by a restriction enzyme to obtain a fragment which is denatured to get single-stranded DNA. The preparation is divided into four samples for four separate sequencing reactions (Fig. 23.12).
Each sample tube gets DNA polymerase, the four nucleotide triphosphates (dATP, dCTP, dGTP, and dTTP) and only one of the four di-deoxyribonucleoside triphosphates or ddNTPs (ddATP, ddCTP, ddGTP, and ddTTP).
A radiolabeled oligonucleotide primer for DNA synthesis is used that will hybridise to the 3′ end of the single-stranded fragment and allow addition of nucleotides into the growing complementary chain. The di-deoxynucleotide does not contain the deoxyribose 3′ hydroxyl group which is required for the formation of a phosphodiester bond between two nucleotides during DNA strand elongation.
Therefore, incorporation of a di-deoxynucleotide into the elongating DNA strand leads to termination of DNA chain synthesis, resulting in a series of labeled DNA fragments of varying lengths, all of which end at the base represented by the di-deoxynucleotide in each reaction. The size of each fragment is determined by its terminal di-deoxynucleotide.
The lengths of the fragments increase by one base at a time and gel electrophoresis can separate fragments that differ by only one nucleotide in length. The fragments are detected on X-ray film by subjecting the gel to autoradiography. Since each of the labelled fragments would migrate to a specific location in the gel, the sequence of the fragment can be read directly from the positions of the bands in the gel.
Automated systems are available for performing large-scale DNA sequencing. The di-deoxynucleotide reactions use fluorescently labelled primers. As the newly synthesised DNA fragments are migrating through the gel during electrophoresis, they pass through a laser beam that excites the fluorescent label. The emitted light of longer wavelength passes through a photomultiplier.
The computer collects and analyses the data. This type of automated DNA sequencing allows large-scale analysis for complete genome sequences of organisms including bacteria, yeasts, Drosophila, and is also being used for obtaining the complete sequence of the human genome.
Sequences determined by any sequencing method can be entered into databases in the computer. Several computer programs are available that allow further analysis and characterisation of the sequence.
Computer programs can be used to analyse DNA sequences for restriction-site location, for comparing a number of sequences, finding homologous regions, transcription regulatory sequences, and more. Programs can also search DNA sequences for possible protein-coding regions by finding a chain initiation codon in a frame with a stop codon.
Finding such an open reading frame (ORF) however, does not necessarily mean that the particular DNA sequence encodes a protein in the cell, something that can be determined only by doing a number of experiments.
Programs also exist that can translate a cloned DNA sequence into a theoretical amino acid sequence, and to make predictions about the structure and function of the protein. This is possible because the sequences of all sequenced proteins have been submitted into the databases, enabling scientists to make rapid comparisons by computer.