Structure

Laboratory of Macromolecular Crystallography



This is a review of the works carried out in LMC of the IMPB RAS. Information on other papers in this field may be found in the original papers listed below.

Ab-initio low-resolution phasing in macromolecular crystallography by maximisation of the likelihood.

(1997-2000)

      This project was developed in collaboration with the laboratory of Biological Structures of the Institute of Genetics and Molecular and Cellular Biology (IGBMC) (Strasbourg, France, PI Prof. A.D.Podjarny). The project was aimed to find to what extent additional information based on statistical modelling may be used for solving of the phase problem. This new information was incorporated into the general procedure of ab-initio phasing developed previously to manage the Fourier syntheses histograms information [1].

      The traditional goal of the first stage in determination of a macromolecular structure is to find the function ρ(h,k,l), which presents the distribution of electron density in the crystal of the studied object. This function is periodical with respect to the three space directions and so may be presented as a three-dimensional Fourier series

(1)     

      The complex coefficients F(h,k,l)exp[i φ(h,k,l)] are called in crystallography as the structure factors and the real values F(h,k,l) and φ(h,k,l) as the magnitudes and the phases correspondingly. The conventional X-ray experiment allows to determine the magnitudes F(h,k,l) only. The problem of the restoring of the phases values is called as the phase problem of X-ray crystallography. Obviously, some additional information on the studied object must be attracted to solve this problem. When being found the phase values may be coupled with the experimental magnitudes and use to calculate an approximate density distribution by means of (1). If a finite number of structure factors only were used to calculate the series (1) people say that the Fourier synthesis of a finite resolution has been calculated. The synthesis resolution is linked to the number of structure factors used. The more members in the series (1) were used the more fine details may be recognised when analysing this synthesis.

      The main idea of the statistical modelling may be explained with the next example. Let some region in the crystal unit cell was specified and we hope that this region is an approximate envelope of the molecules, i.e. the majority of atoms are located in this region. The plausibility of this hypothesis may be checked with the next statistical test. Let the atomic coordinates are chosen randomly in the tested region and the structure factors corresponding to this random model are calculated. The values of these structure factors are random variables and the question may be posed how large is the probability that the calculated magnitudes will coincide with the observed ones. One may believe that this probability will be high if the tested region does contain the most part of real atom positions. On the contrary, if the region has nothing to do with the real molecules envelope, then the atoms placed in this region will hardly ever reproduce the observed magnitudes. The probability to have the calculated magnitudes equal to the observed ones will be low in this case. The probability discussed is nothing, but the statistical likelihood corresponding to the hypothesis that the set of experimental magnitudes may be imagined as the magnitudes calculated from the model generated randomly inside the tested region. If several alternative regions are suggested as a possible molecule envelope, then the likelihood may be calculated for every of them. The choice of the region which provide with the maximal likelihood is just the realisation of the maximal likelihood principle, which is widely used in the mathematical statistics.

      The practical calculation of the likelihood corresponding to the tested region presents a hard mathematical problem. Nevertheless, an approximate likelihood value may be calculated relatively simply with the use of Monte-Carlo type simulation procedure. In this procedure for a particular tested region a lot of random models are generated and corresponding sets of structure factor magnitudes are calculated. The likelihood may be estimated now as the share of the models, which reveal high correlation of the calculated and observed magnitudes.

      First we applied this idea of statistical testing to the selection of the best solution from several alternative regions (of an arbitrary form) in the frame work of the FAM method [2,3]. The next development of the idea was the attempt to choose the molecule envelope from a set of simple spherical regions [5].

      Further the likelihood based testing of envelopes was incorporated into the general ab-initio phasing procedure [1, 4, 6, 9]. This procedure consists of several steps:

  • generation of a large number of phase sets and calculation for every generated set some figure of quality of the set;
  • selection of phase sets with good quality for further analysis;
  • grouping of selected sets in clusters of close sets and averaging of phases in every particular cluster.

      The key point in this procedure is the choice of the selection criterion. The idea of statistical testing of potential envelopes may be transformed to a selection criterion as follows:

  • the generated phase set is used (together with the observed magnitudes) to calculate the corresponding Fourier synthesis;
  • the region of the highest values of the synthesis is considered as a trial envelope;
  • the likelihood corresponding to this region is calculated and used a figure of quality of the phase set.

      In the approaches discussed above it was supposed that the atom can occupy every point in the tested region with the same probability. In other words, the choice was performed from uniform (in some region) probability distributions. In more general formulation it is possible to consider arbitrary probability distributions and to look for the one, which results in the maximal likelihood value. This problem is very difficult for solution and a simplified goal may be considered at the first step. This goal is to find a probability distribution q(x,y,z), which possesses the likelihood value L(q) exceeding the likelihood corresponding to the distribution uniform in the whole crystal cell. To obtain such distribution it is enough to shift a bit from the uniform distribution along the direction of the antigradient of the likelihood function:

(2)     

One has L(qλ)>L(const) if λ is small enough. So that the gradient of the likelihood function may serve as approximate prior coordinate distribution and to keep some information on the studied object [2,4].

      The main results concerning the development of this approach were summarised in the theses of T.Petrova [7,8].

March, 24, 2003
V.Lunin

Publications

The full texts of papers


  1. Lunin, V.Yu., Urzhumtsev, A.G. & Skovoroda, T.A. (1990). "Direct low-resolution phasing from electron-density histograms in protein crystallography". Acta Cryst., A46, 540-544.

  2. Lunin, V.Y. (1997). "The likelihood based choice of priors in statistical approaches to the phase problem". In: Direct Methods for Solving Macromolecular Structure, ed. S.Fortier, NATO ASI Series C, Vol.507, 451-454.

  3. Lunin, V.Yu., Lunina, N.L., Petrova, T.E., Urzhumtsev, A.G. & Podjarny, A.D. (1998). "On the Ab initio solution of the Phase Problem for Macromolecules at Very Low Resolution. II. Generalized Likelihood Based Approach to Cluster Discrimination". Acta Cryst. D54, 726-734.

  4. Petrova, T.E., Lunin, V.Yu., Lunina, N.L. & Skovoroda, T.P. (1999). "Maximum Likelihood Approach to Choosing a Prior Distribution of Atomic Coordinates in Macromolecular Structures". Biophysics, 44, 1, 18-22.

  5. Petrova, T.E., Lunin, V.Y. & Podjarny, A.D. (1999). "A likelihood-based search for the macromolecular position in the crystalline unit cell". Acta Crys. A55, 739-745.

  6. Petrova, T.E., Lunin, V.Y. & Podjarny, A.D. (2000). "Ab initio low-resolution phasing in crystallography of macromolecules by maximization of likelihood". Acta Cryst. D56, 1245-1252.

  7. Petrova, T.E. (2000). "Using of the maximum likelihood principle in the solution of the phase problem in macromolecula crystallography". Resume of Ph.D. These, Puschino, Russia. (In Russian)

  8. Petrova, T.E. (2000). "Using of the maximum likelihood principle in the solution of the phase problem in macromolecula crystallography". Ph.D. Theses, ITEB RAS, Puschino, Russia. (In Russian)

  9. Lunin, V.Y., Lunina, N.L., Petrova, T.E., Skovoroda, T.P., Urzhumtsev, A.G. & Podjarny, A.D. (2000). "Low-resolution ab initio phasing: problems and advances". Acta Cryst. D56, 1223-1232.