From Phaistos

Revision as of 11:20, 26 June 2008; view current revision
←Older revision | Newer revision→

3D Structure prediction

Image:Plos cover ham06.png
Our article on a probabilistic model of protein structure (C-alpha only) made the cover of the september 2006 issue of PLoS Comp. Biol. A second article (full backbone) was recently (June, 2008) published with PNAS.

One of the major unsolved problems in modern day molecular biology is the protein folding problem: given an amino acid sequence, predict the overall three-dimensional structure of the corresponding protein. It has been known since the seminal work of Christian B. Anfinsen in the early seventies that the sequence of a protein encodes its structure, but the exact details of the encoding still remain elusive. Since the protein folding problem is of enormous practical, theoretical and medical importance, and in addition forms a fascinating intellectual challenge, it is often called the holy grail of bioinformatics.

Some sampled conformations for the sequence A(15)V(15)A(15)...
Some sampled conformations for the sequence A(15)V(15)A(15)...
A greedy collapse using the TorusDBN and radius of gyration as an energy function.

We are tackling the protein structure prediction problem from an original angle. Our group develops sophisticated probabilistic models that describe various aspects of protein structure, and uses these models in the prediction of structure from sequence. These probabilistic models are mainly based on two key ingredients: graphical models (including dynamic Bayesian networks and factor graphs), which are powerful machine learning methods, and directional statistics, the statistics of angles, directions and orientations.

It is important to note that these probabilistic models are not black box methods: they can be rigorously interpreted and used in the framework of physics, and more specifically statistical mechanics. In other words, we follow the view of Edwin T. Jaynes, who showed that statistical mechanics can be seen as a form of statistical inference based on partial information, rather than a physical theory.

We are steadily working on a program for ab initio protein structure prediction based on probabilistic models, called Phaistos. Some software is already freely available, see below.


  • Mocapy, a toolkit for inference and learning in Dynamic Bayesian Networks.
  • Phaistos, currently consisting of code to do fast, probabilistic conformational sampling using TorusDBN.