Bioinformatics provides the following tools for analysing genomic information: 1. Compiling the Accurate Sequence 2. Annotation.
Tool # 1. Compiling the Accurate Sequence:
To ensure that the nucleotide sequence of a genome is complete and error free, the genome is sequenced more than once. For example, using the shotgun method on the genome of the bacterium pseudomonas aeruginosa, researchers, sequenced the entire genome of 6.3 million nucleotides seven times to ensure that the sequence was accurate and free from errors.
Even with this level of redundancy, the assembler software recognized 1604 regions that required further clarification. These regions were reanalysed and re-sequenced to improve accuracy. Finally, the accuracy of the shotgun method’s sequence was compared with the sequence of two widely separated genome regions obtained by conventional cloning.
The sequence of the 81,843 nucleotides cloned and sequenced by conventional methods (clone-by-clone) was in perfect agreement with the sequence obtained by the shotgun method. This level of care is not abnormal. Similar precautions are used in all genome projects.
The HGP sequenced the human genome 3.2 billion base pairs of a total of 12 times. The privately based shotgun cloning project based at the Biotechnology company, Celera, used a strategy of sequencing from both ends of DNA fragments and covered the genome 35.6 times.
Although a draft of the human genome is finished, several other tasks are yet to be completed. These include obtaining the remaining sequence and correlated errors (proof-reading the genome), filling sequence gaps (which amounted to about 150 Mb in mid-2001) and then sequencing the 7 to 15 percent of the genome that contains heterochromatin.
Heterochromatic regions of genome were excluded by design, as they contain long stretches of repetitive DNA sequences and were initially thought to contain no genes. However, in sequencing the genome of Drosophila, researchers discovered that heterochromatic regions do contain a smaller number of genes (about 50 genes in Drosophila).
As a result of this discovery, heterochromatic regions of the human genome must be sequenced to ensure that all genes are identified. Once the human genome or any other genome is sequenced, compiled and proofread, the next stage- annotation-begins.
Tool # 2. Annotation:
After a genome sequence has been obtained, organized, and checked for accuracy, the next task is to find all the genes that encode proteins. This is the first step in annotation, which is a process that identifies-genes, their regulatory sequences and their function(s).
Annotation also identifies non-protein coding genes (including ribosomal RNA, transfer RNA and small nuclear RNAs), finds and characterizes the mobile genetic elements (or transposons) and repetitive sequence families that may be present in the genome.