Institute of Mathematical Problems of Biology RAS


	Structure

Laboratory of Macromolecular Crystallography

This is a review of the works carried out in LMC of the IMPB RAS. Information on other papers in this field may be found in the original papers listed below.

Computer processing of nucleotide sequences. First steps.

(N.L.Lunina)

      In early 70s a new direction began to emerge in computer biology. It was concerned with processing of nucleotide sequences. The data from Heidelberg database became readily available. Numerous publications were devoted to computer methods of the work with nucleotide sequences.

      In early 80s a researcher of the Research Computing Center, A.S.Kondrashov advanced an initiative to engage in development of such methods with the use of computers available in the RCC. He formulated a list of requirements which the sequence processing program must meet. At his suggestion a system HEID was developed. It enabled search for sequences in the database and processing of selected sequences. In the course of this work numerous discussions were held with researchers of the Institute of Biochemistry and Physiology of Microorganisms (IBPM) - V.V.Vel'kov and V.V.Kryukov - and researchers of the Institute of Protein Research - L.A.Voronin and A.V.Finkelstein. The software package HEID was a result of these discussions.

      The researchers from the IBPM used it for both the work with the sequences in the database and for processing their own freshly obtained ones.

      The system was initially designed for ES 1040 and then transferred to SM-4 as that computer became available in the RCC. The system description was published in ONTI NCBI in 1984. The system enabled one to find statistical regularities in the distribution of nucleotides; to search for a site of interest, various-type repetitions, open reading frames; to define the proteins which are read out from a sequence, etc.

      As statistical regularities in the distribution of nucleotides and nucleotide pairs (purine/pyrimidine nucleotides) were revealed, a table started to be compiled for the nearest neighbors, combinations of two and of three (both real and expected neighbors were considered to see where the differences from the expected ones were the strongest).

      Due to close collaboration with biologists the system HEID contained a set of functions which was full enough and convenient in use. Subsequently researchers from the IBPM admitted that in after years some new programs for a sequence search in databanks appeared which offered more functions, but none of them did offer all the opportunities that HEID did.

      In the ensuing years computer processing of nucleotide sequences became one of the main directions of the RCC (new name - the Institute of Mathematical Problems of Biology) activity.

March, 24, 2003

Publications

The full texts of papers

Lunina, N.L. (1984). "The computer-based system HEID for the treatment of nucleotide sequences". Software., NCBI AN SSSR, Pushchino, Russia. (In Russian)