The simplest tasks used in bioinformatics concern the creation and maintenance of databases of biological information. Nucleic acid sequences (and the protein sequences derived from them) comprise the majority of such databases. While the storage and or ganization of millions of nucleotides is far from trivial, designing a database and developing an interface whereby researchers can both access existing information and submit new entries is only the beginning.
The most pressing tasks in bioinformatics involve the analysis of sequence information. Computational Biology is the name given to this process, and it involves the following:
Finding the genes in the DNA sequences of various organisms
Developing methods to predict the structure and/or function of newly discovered proteins and structural RNA sequences.
Clustering protein sequences into families of related sequences and the development of protein models.
Aligning similar proteins and generating phylogenetic trees to examine evolutionary relationships.
The process of evolution has produced DNA sequences that encode proteins with very specific functions. It is possible to predict the three-dimensional structure of a protein using algorithms that have been derived from our knowledge of physics, chemistry and most importantly, from the analysis of other proteins with similar amino acid sequences. The diagram below summarizes the process by which DNA sequences are used to model protein structure. The processes involved in this transformation are detailed in the pages that follow.