Phaistos

From Phaistos

Revision as of 10:28, 3 October 2011; view current revision
←Older revision | Newer revision→

3D Structure prediction

Our article on a probabilistic model of the C-alpha geometry of proteins made the cover of the september 2006 issue of PLoS Comp. Biol. More recently we also developed probabilistic models of the main chain (TORUSDBN) and the side chains (BASILISK) in atomic detail. Together, these two models establish the first rigorous probabilistic model of protein structure in atomic detail and incontinuous space. Such a model provides an attractive and rigorous alternative to the usual fragment and rotamer libraries.
Enlarge
Our article on a probabilistic model of the C-alpha geometry of proteins made the cover of the september 2006 issue of PLoS Comp. Biol. More recently we also developed probabilistic models of the main chain (TORUSDBN) and the side chains (BASILISK) in atomic detail. Together, these two models establish the first rigorous probabilistic model of protein structure in atomic detail and incontinuous space. Such a model provides an attractive and rigorous alternative to the usual fragment and rotamer libraries.

One of the major unsolved problems in science today is the protein folding problem: given an amino acid sequence, predict the overall three-dimensional structure of the corresponding protein. It has been known since the seminal work of Christian B. Anfinsen in the early seventies that the sequence of a protein encodes its structure, but the exact details of the encoding still remain elusive. Since the protein folding problem is of enormous practical, theoretical and medical importance, and in addition forms a fascinating intellectual challenge, it is often called the holy grail of bioinformatics.

Some sampled conformations for the sequence A(15)V(15)A(15)...
Enlarge
Some sampled conformations for the sequence A(15)V(15)A(15)...
A greedy collapse using the TorusDBN and radius of gyration as an energy function.

We are tackling the protein structure prediction problem from an original angle. Our group develops sophisticated probabilistic models that describe various aspects of protein structure, and uses these models in the prediction of structure from sequence. These probabilistic models are mainly based on two key ingredients: graphical models (and more precisely dynamic Bayesian networks), which are powerful machine learning methods, and directional statistics, the statistics of angles, directions and orientations.

It is important to note that these probabilistic models are not black box methods: they can be rigorously interpreted and used in the framework of physics, and more specifically statistical mechanics. In other words, we follow the view of Edwin T. Jaynes, who showed that statistical mechanics can be seen as a form of statistical inference based on partial information, rather than a physical theory.

We are steadily working on PHAISTOS, a framework for all-atom Monte Carlo simulations of proteins. It incorporates the probabilistic models mentioned above, which can be used for conformational sampling, probabilistic prediction of protein structure or to formulate knowledge based energy terms. In addition, Phaistos incorporates the OPLS and PROFASI force fields, and incorporates state-of-the-art move-algorithms to explore the conformational space. Finally, we are also incorporating support for experimental data, including small angle X-ray scattering and NMR data (see our [http://www.sciencedirect.com/science/article/pii/S1090780711003090 recent article in the Journal of Magnetic Resonance).

Software

  • Mocapy++, a C++ toolkit for inference and learning in dynamic Bayesian networks that supports directional statistics. Directional statistics is the statistics of angles an directions, which is especially useful for the formulation of probabilistic models of biomolecular structure. We used this toolkit to formulate and trian our probabilistic models of protein structure.
  • Phaistos, our framework for Monte Carlo simulations of proteins.