# Phaistos

### From Phaistos

Revision as of 09:35, 23 December 2011Wb (Talk | contribs) (Main Page moved to Phaistos) ← Previous diff |
Revision as of 16:28, 23 December 2011Wb (Talk | contribs) Next diff → |
||

Line 1: |
Line 1: | ||

- | =The PHAISTOS project= | + | =PHAISTOS= |

+ | PHAISTOS is a Markov chain Monte Carlo framework for protein | ||

+ | structure simulations. It contains a variety of both established | ||

+ | and novel moves types, and provides support for several | ||

+ | force-fields from the literature. In addition, an interface to | ||

+ | the Muninn generalized ensemble package makes it possible to | ||

+ | easily conduct multi-histogram based simulations, avoiding the | ||

+ | convergence problems often associated with Metropolis-Hastings | ||

+ | based sampling. | ||

- | [[Image:plos_cover_ham06.jpg|thumb|200px|left|Our [http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pcbi.0020131 article] on a probabilistic model of the C-alpha geometry of proteins made the cover of the September 2006 issue of PLoS Comp. Biol. More recently we also developed probabilistic models of the main chain [http://www.pnas.org/cgi/content/abstract/0801715105v1?etoc (TORUSDBN)] and the side chains [http://www.biomedcentral.com/1471-2105/11/306/abstract/ (BASILISK)] in atomic detail. Together, these two models establish the first rigorous probabilistic model of protein structure in atomic detail and in continuous space. Such a model provides an attractive and rigorous alternative to the usual fragment and rotamer libraries.]] | + | [[Image:plos_cover_ham06.jpg|thumb|200px|left|Our [http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pcbi.0020131 article] on a probabilistic model of the C-alpha geometry of |

- | + | proteins made the cover of the September 2006 issue of PLoS | |

- | One of the major unsolved problems in science today is the '''protein folding problem''': given an amino acid sequence, predict the overall three-dimensional structure of the corresponding protein. It has been known since the seminal work of [http://en.wikipedia.org/wiki/Christian_Anfinsen Christian B. Anfinsen] in the early seventies that the sequence of a protein encodes its structure, but the exact details of the encoding still remain elusive. Since the protein folding problem is of enormous practical, theoretical and medical importance, and in addition forms a fascinating intellectual challenge, it is often called the holy grail of bioinformatics. | + | Computational Biology]] |

- | + | ||

- | [[Image:sb_a15v15a15.png|thumb|200px|right|Some sampled conformations for the sequence A(15)V(15)A(15)...]] | + | |

[[Image:Protein_folding_ani.gif|right|A greedy collapse using the TorusDBN and radius of gyration as an energy function.]] | [[Image:Protein_folding_ani.gif|right|A greedy collapse using the TorusDBN and radius of gyration as an energy function.]] | ||

- | We are tackling the protein structure prediction problem from an original angle. Our group develops sophisticated '''probabilistic models''' that describe various aspects of protein structure, and uses these models in the prediction of structure from sequence. These probabilistic models are mainly based on two key ingredients: '''graphical models''' (and more precisely '''dynamic Bayesian networks'''), which are powerful machine learning methods, and [http://en.wikipedia.org/wiki/Directional_statistics directional statistics], the statistics of angles, directions and orientations. | + | A unique feature of PHAISTOS is the use of probabilistic models |

+ | to capture essential structural properties in proteins. These | ||

+ | models are available both as proposal distributions (moves), and | ||

+ | for likelihood evaluations (energies). This increases the | ||

+ | flexibility when settings up a simulation, by allowing the user | ||

+ | to choose how to incorporate the bias provided by these models in | ||

+ | the simulation. For instance, similar to the use of fragment or | ||

+ | rotamer libraries, using probabilistic models for sampling of | ||

+ | backbone angles and sidechain angles corresponds to having an | ||

+ | implicit energy term present in the simulation. Unlike fragment | ||

+ | and rotamer libraries, however, when using probabilistic models, | ||

+ | this term can be evaluated and compensated for if | ||

+ | necessary. PHAISTOS currently incorporates models for the CA-only | ||

+ | representation of protein backbones ([http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pcbi.0020131 FB5HMM]), full-atom | ||

+ | backbones ([http://www.pnas.org/cgi/content/abstract/0801715105v1?etoc TORUSDBN]), full-atom sidechains ([http://www.biomedcentral.com/1471-2105/11/306/abstract/ BASILISK]), and | ||

+ | single-mass sidechains (COMPAS). | ||

- | It is important to note that these probabilistic models '''are not black box methods''': they can be rigorously interpreted and used in the framework of physics, and more specifically [http://en.wikipedia.org/wiki/Statistical_mechanics statistical mechanics]. In other words, we follow the view of [http://en.wikipedia.org/wiki/Edwin_Thompson_Jaynes Edwin T. Jaynes], who showed that [http://en.wikipedia.org/wiki/Maximum_entropy_thermodynamics statistical mechanics can be seen as a form of statistical inference] based on partial information, rather than a physical theory. | + | PHAISTOS also contains a highly efficient local move, CRISP, which is |

+ | capable of locally resampling short stretches of the protein backbone, | ||

+ | without violating the local geometry of the chain. This move was | ||

+ | [http://pubs.acs.org/journal/jctcce recently demonstrated] to | ||

+ | outperform current state-of-the-art local move algorithms. In | ||

+ | addition, it was shown that using this move, it was possible to | ||

+ | explore native ensembles of proteins with similar efficiency as | ||

+ | Molecular Dynamics. | ||

- | We are steadily working on [https://sourceforge.net/projects/phaistos/ PHAISTOS], a framework for all-atom Monte Carlo simulations of proteins. It incorporates the probabilistic models mentioned above, which can be used for conformational sampling, probabilistic prediction of protein structure or to formulate knowledge based energy terms. In addition, Phaistos incorporates the OPLS and PROFASI force fields, and incorporates state-of-the-art move-algorithms to explore the conformational space. Finally, we are also incorporating support for experimental data, including small angle X-ray scattering and NMR data (see our [http://www.sciencedirect.com/science/article/pii/S1090780711003090 recent article in the Journal of Magnetic Resonance]). | + | Finally, PHAISTOS contains tools to conduct simulations under |

- | + | restraints from experimental data. In the current release, we have | |

- | ==Software== | + | support for SAXS data and NMR chemical shift data, but this will be |

+ | extended to other data types in future releases. | ||

+ | ==Related Projects== | ||

+ | * [http://sourceforge.net/projects/muninn/ Muninn], A framework for conducting generalized ensemble simulations. | ||

* [http://sourceforge.net/projects/mocapy/ Mocapy++], a C++ toolkit for inference and learning in dynamic Bayesian networks that supports directional statistics. [http://en.wikipedia.org/wiki/Directional_statistics Directional statistics] is the statistics of angles an directions, which is especially useful for the formulation of probabilistic models of biomolecular structure. We used this toolkit to formulate and train our probabilistic models of protein structure. | * [http://sourceforge.net/projects/mocapy/ Mocapy++], a C++ toolkit for inference and learning in dynamic Bayesian networks that supports directional statistics. [http://en.wikipedia.org/wiki/Directional_statistics Directional statistics] is the statistics of angles an directions, which is especially useful for the formulation of probabilistic models of biomolecular structure. We used this toolkit to formulate and train our probabilistic models of protein structure. | ||

- | * [http://sourceforge.net/projects/phaistos/ Phaistos], our framework for Monte Carlo simulations of proteins. | + | |

+ | ==PHAISTOS-related references== | ||

+ | * Hamelryck, T., Kent, J., Krogh, A. (2006) Sampling realistic protein conformations using local structural bias. PLoS Comput. Biol., 2(9): e131. [http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pcbi.0020131 Download pdf]. | ||

+ | * Boomsma, W., Mardia, KV., Taylor, CC., Ferkinghoff-Borg, J., Krogh, A. and Hamelryck, T. (2008) A generative, probabilistic model of local protein structure. Proc. Natl. Acad. Sci. USA, 105, 8932-8937. [http://www.pnas.org/content/105/26/8932.abstract?etoc Download pdf]. | ||

+ | * Borg, M., Mardia, KV., Boomsma, W., Frellsen, J., Harder, T., Stovgaard, K., Ferkinghoff-Borg, J., Røgen, P., Hamelryck, T. A probabilistic approach to protein structure prediction: PHAISTOS in CASP9. LASR 2009 - Statistical tools for challenges in bioinformatics, pp. 65-70. Leeds university press, Leeds, UK. [http://www.maths.leeds.ac.uk/lasr2009/proceedings/borg.pdf Download pdf]. | ||

+ | * Harder, T., Boomsma, W., Paluszewski, M., Frellsen, J., Johansson, KE., Hamelryck, T. (2010) Beyond rotamers: A generative , probabilistic model of side chains in proteins. BMC Bioinformatics, 11:306. [http://www.biomedcentral.com/1471-2105/11/306 Download pdf]. | ||

+ | * Stovgaard, K., Andreetta, C., Ferkinghoff-Borg, J., Hamelryck, T. (2010) Calculation of accurate small angle X-ray scattering curves from coarse-grained protein models. BMC Bioinformatics, 11:429. [http://www.biomedcentral.com/1471-2105/11/429 Download pdf]. | ||

+ | * Hamelryck, T., Borg, M., Paluszewski, M., Paulsen, J., Frellsen, J., Andreetta, C., Boomsma, W. Bottaro, S., Ferkinghoff-Borg, J. (2010) Potentials of mean force for protein structure prediction vindicated, formalized and generalized. PLoS ONE, 5(11): e13714. [http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0013714 Download pdf]. | ||

+ | * Olsson, S., Boomsma, W., Frellsen, J., Bottaro, S., Harder, T., Ferkinghoff-Borg, J., Hamelryck, T. (2011) Generative probabilistic models extend the scope of inferential structure determination. J. Magn. Reson. 213(1), 182-6. [http://www.ncbi.nlm.nih.gov/pubmed/21993764 Pubmed]. | ||

+ | * Harder, T., Borg, M., Boomsma, W., Røgen, P., Hamelryck, T. (2011) Fast large-scale clustering of protein structures using Gauss integrals. Bioinformatics. [http://bioinformatics.oxfordjournals.org/cgi/reprint/btr692?ijkey=RYIzmTvnmtSqXsX&keytype=ref Download pdf] | ||

+ | |||

+ | ==Acknowledgements== | ||

+ | The development of PHAISTOS was made possible through grants from the Danish Council for Independent Research, the Danish Council for Strategic | ||

+ | Research, the Novo Nordisk STAR Program, and Radiometer (DTU). |

## Revision as of 16:28, 23 December 2011

## Contents |

# PHAISTOS

PHAISTOS is a Markov chain Monte Carlo framework for protein structure simulations. It contains a variety of both established and novel moves types, and provides support for several force-fields from the literature. In addition, an interface to the Muninn generalized ensemble package makes it possible to easily conduct multi-histogram based simulations, avoiding the convergence problems often associated with Metropolis-Hastings based sampling.

A unique feature of PHAISTOS is the use of probabilistic models to capture essential structural properties in proteins. These models are available both as proposal distributions (moves), and for likelihood evaluations (energies). This increases the flexibility when settings up a simulation, by allowing the user to choose how to incorporate the bias provided by these models in the simulation. For instance, similar to the use of fragment or rotamer libraries, using probabilistic models for sampling of backbone angles and sidechain angles corresponds to having an implicit energy term present in the simulation. Unlike fragment and rotamer libraries, however, when using probabilistic models, this term can be evaluated and compensated for if necessary. PHAISTOS currently incorporates models for the CA-only representation of protein backbones (FB5HMM), full-atom backbones (TORUSDBN), full-atom sidechains (BASILISK), and single-mass sidechains (COMPAS).

PHAISTOS also contains a highly efficient local move, CRISP, which is capable of locally resampling short stretches of the protein backbone, without violating the local geometry of the chain. This move was recently demonstrated to outperform current state-of-the-art local move algorithms. In addition, it was shown that using this move, it was possible to explore native ensembles of proteins with similar efficiency as Molecular Dynamics.

Finally, PHAISTOS contains tools to conduct simulations under restraints from experimental data. In the current release, we have support for SAXS data and NMR chemical shift data, but this will be extended to other data types in future releases.

## Related Projects

- Muninn, A framework for conducting generalized ensemble simulations.
- Mocapy++, a C++ toolkit for inference and learning in dynamic Bayesian networks that supports directional statistics. Directional statistics is the statistics of angles an directions, which is especially useful for the formulation of probabilistic models of biomolecular structure. We used this toolkit to formulate and train our probabilistic models of protein structure.

## PHAISTOS-related references

- Hamelryck, T., Kent, J., Krogh, A. (2006) Sampling realistic protein conformations using local structural bias. PLoS Comput. Biol., 2(9): e131. Download pdf.
- Boomsma, W., Mardia, KV., Taylor, CC., Ferkinghoff-Borg, J., Krogh, A. and Hamelryck, T. (2008) A generative, probabilistic model of local protein structure. Proc. Natl. Acad. Sci. USA, 105, 8932-8937. Download pdf.
- Borg, M., Mardia, KV., Boomsma, W., Frellsen, J., Harder, T., Stovgaard, K., Ferkinghoff-Borg, J., Røgen, P., Hamelryck, T. A probabilistic approach to protein structure prediction: PHAISTOS in CASP9. LASR 2009 - Statistical tools for challenges in bioinformatics, pp. 65-70. Leeds university press, Leeds, UK. Download pdf.
- Harder, T., Boomsma, W., Paluszewski, M., Frellsen, J., Johansson, KE., Hamelryck, T. (2010) Beyond rotamers: A generative , probabilistic model of side chains in proteins. BMC Bioinformatics, 11:306. Download pdf.
- Stovgaard, K., Andreetta, C., Ferkinghoff-Borg, J., Hamelryck, T. (2010) Calculation of accurate small angle X-ray scattering curves from coarse-grained protein models. BMC Bioinformatics, 11:429. Download pdf.
- Hamelryck, T., Borg, M., Paluszewski, M., Paulsen, J., Frellsen, J., Andreetta, C., Boomsma, W. Bottaro, S., Ferkinghoff-Borg, J. (2010) Potentials of mean force for protein structure prediction vindicated, formalized and generalized. PLoS ONE, 5(11): e13714. Download pdf.
- Olsson, S., Boomsma, W., Frellsen, J., Bottaro, S., Harder, T., Ferkinghoff-Borg, J., Hamelryck, T. (2011) Generative probabilistic models extend the scope of inferential structure determination. J. Magn. Reson. 213(1), 182-6. Pubmed.
- Harder, T., Borg, M., Boomsma, W., Røgen, P., Hamelryck, T. (2011) Fast large-scale clustering of protein structures using Gauss integrals. Bioinformatics. Download pdf

## Acknowledgements

The development of PHAISTOS was made possible through grants from the Danish Council for Independent Research, the Danish Council for Strategic Research, the Novo Nordisk STAR Program, and Radiometer (DTU).