Laboratory of Macromolecular Crystallography
This is a review of the works carried out in LMC of the IMPB RAS. Information on other papers in this field
may be found in the original papers listed below.
Ab-initio low-resolution phasing in macromolecular crystallography by
maximisation of the likelihood.
(1997-2000)
This project was developed in collaboration with the laboratory of Biological
Structures of the Institute of Genetics and Molecular and Cellular Biology
(IGBMC) (Strasbourg, France, PI Prof. A.D.Podjarny). The project was aimed to
find to what extent additional information based on statistical modelling may be
used for solving of the phase problem. This new information was incorporated
into the general procedure of ab-initio phasing developed previously to manage
the Fourier syntheses histograms information [1].
The traditional goal of the first stage in determination of a macromolecular
structure is to find the function ρ(h,k,l), which presents the
distribution of electron density in the crystal of the studied object. This
function is periodical with respect to the three space directions and so may be
presented as a three-dimensional Fourier series
(1)
The complex coefficients
F(h,k,l)exp[i φ(h,k,l)] are called in
crystallography as the structure factors and the real values
F(h,k,l) and φ(h,k,l) as the magnitudes and the
phases correspondingly. The conventional X-ray experiment allows to determine
the magnitudes F(h,k,l) only. The problem of the restoring of the
phases values is called as the phase problem of X-ray crystallography.
Obviously, some additional information on the studied object must be attracted
to solve this problem. When being found the phase values may be coupled with the
experimental magnitudes and use to calculate an approximate density distribution
by means of (1). If a finite number of structure factors only were used to
calculate the series (1) people say that the Fourier synthesis of a finite
resolution has been calculated. The synthesis resolution is linked to the number
of structure factors used. The more members in the series (1) were used the more
fine details may be recognised when analysing this synthesis.
The main idea of the statistical modelling may be explained with the next
example. Let some region in the crystal unit cell was specified and we hope that
this region is an approximate envelope of the molecules, i.e. the majority of
atoms are located in this region. The plausibility of this hypothesis may be
checked with the next statistical test. Let the atomic coordinates are chosen
randomly in the tested region and the structure factors corresponding to this
random model are calculated. The values of these structure factors are random
variables and the question may be posed how large is the probability that the
calculated magnitudes will coincide with the observed ones. One may believe that
this probability will be high if the tested region does contain the most part of
real atom positions. On the contrary, if the region has nothing to do with the
real molecules envelope, then the atoms placed in this region will hardly ever
reproduce the observed magnitudes. The probability to have the calculated
magnitudes equal to the observed ones will be low in this case. The probability
discussed is nothing, but the statistical likelihood corresponding to the
hypothesis that the set of experimental magnitudes may be imagined as the
magnitudes calculated from the model generated randomly inside the tested
region. If several alternative regions are suggested as a possible molecule
envelope, then the likelihood may be calculated for every of them. The choice of
the region which provide with the maximal likelihood is just the realisation of
the maximal likelihood principle, which is widely used in the mathematical
statistics.
The practical calculation of the likelihood corresponding to the tested region
presents a hard mathematical problem. Nevertheless, an approximate likelihood
value may be calculated relatively simply with the use of Monte-Carlo type
simulation procedure. In this procedure for a particular tested region a lot of
random models are generated and corresponding sets of structure factor
magnitudes are calculated. The likelihood may be estimated now as the share of
the models, which reveal high correlation of the calculated and observed
magnitudes.
First we applied this idea of statistical testing to the selection of the best
solution from several alternative regions (of an arbitrary form) in the frame
work of the FAM method [2,3]. The next development of the idea was the attempt
to choose the molecule envelope from a set of simple spherical regions [5].
Further the likelihood based testing of envelopes was incorporated into the
general ab-initio phasing procedure [1, 4, 6, 9]. This procedure consists of
several steps:
- generation of a large number of phase sets and calculation for every
generated set some figure of quality of the set;
- selection of phase sets with good quality for further analysis;
- grouping of selected sets in clusters of close sets and averaging of phases
in every particular cluster.
The key point in this procedure is the choice of the selection criterion. The
idea of statistical testing of potential envelopes may be transformed to a
selection criterion as follows:
- the generated phase set is used (together with the observed magnitudes) to
calculate the corresponding Fourier synthesis;
- the region of the highest values of the synthesis is considered as a trial
envelope;
- the likelihood corresponding to this region is calculated and used a figure
of quality of the phase set.
In the approaches discussed above it was supposed that the atom can occupy every
point in the tested region with the same probability. In other words, the choice
was performed from uniform (in some region) probability distributions. In more
general formulation it is possible to consider arbitrary probability
distributions and to look for the one, which results in the maximal likelihood
value. This problem is very difficult for solution and a simplified goal may be
considered at the first step. This goal is to find a probability distribution
q(x,y,z), which possesses the likelihood value L(q)
exceeding the likelihood corresponding to the distribution uniform in the whole
crystal cell. To obtain such distribution it is enough to shift a bit from the
uniform distribution along the direction of the antigradient of the likelihood
function:
(2)
One has L(qλ)>L(const) if λ is
small enough. So that the gradient of the likelihood function may serve as
approximate prior coordinate distribution and to keep some information on the
studied object [2,4].
The main results concerning the development of this approach were summarised in
the theses of T.Petrova [7,8].
March, 24, 2003 V.Lunin
Publications
The full texts of papers
- Lunin, V.Yu., Urzhumtsev, A.G. & Skovoroda, T.A. (1990). "Direct
low-resolution phasing from electron-density histograms in protein
crystallography". Acta Cryst., A46, 540-544.
- Lunin, V.Y. (1997). "The likelihood based choice of priors in statistical
approaches to the phase problem". In: Direct Methods for Solving Macromolecular
Structure, ed. S.Fortier, NATO ASI Series C, Vol.507, 451-454.
- Lunin, V.Yu., Lunina, N.L., Petrova, T.E., Urzhumtsev, A.G. & Podjarny,
A.D. (1998). "On the Ab initio solution of the Phase Problem for Macromolecules
at Very Low Resolution. II. Generalized Likelihood Based Approach to Cluster
Discrimination". Acta Cryst. D54, 726-734.
- Petrova, T.E., Lunin, V.Yu., Lunina, N.L. & Skovoroda, T.P. (1999).
"Maximum Likelihood Approach to Choosing a Prior Distribution of Atomic
Coordinates in Macromolecular Structures". Biophysics, 44, 1, 18-22.
- Petrova, T.E., Lunin, V.Y. & Podjarny, A.D. (1999). "A likelihood-based
search for the macromolecular position in the crystalline unit cell". Acta
Crys. A55, 739-745.
- Petrova, T.E., Lunin, V.Y. & Podjarny, A.D. (2000). "Ab initio
low-resolution phasing in crystallography of macromolecules by maximization of
likelihood". Acta Cryst. D56, 1245-1252.
- Petrova, T.E. (2000). "Using of the maximum likelihood principle in the
solution of the phase problem in macromolecula crystallography". Resume of Ph.D.
These, Puschino, Russia. (In Russian)
- Petrova, T.E. (2000). "Using of the maximum likelihood principle in the
solution of the phase problem in macromolecula crystallography". Ph.D. Theses,
ITEB RAS, Puschino, Russia. (In Russian)
- Lunin, V.Y., Lunina, N.L., Petrova, T.E., Skovoroda, T.P., Urzhumtsev, A.G.
& Podjarny, A.D. (2000). "Low-resolution ab initio phasing: problems and
advances". Acta Cryst. D56, 1223-1232.
|