# Phaistos

### From Phaistos

Revision as of 10:28, 3 October 2011Thamelryck (Talk | contribs) (→3D Structure prediction) ← Previous diff |
Current revisionWb (Talk | contribs) (→PHAISTOS-related references) |
||

Line 1: |
Line 1: | ||

- | =3D Structure prediction= | + | [[Image:plos_cover_ham06.jpg|thumb|150px|right|Our [http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pcbi.0020131 article] on a probabilistic model of the C-alpha geometry of |

+ | proteins made the cover of the September 2006 issue of PLoS | ||

+ | Computational Biology]] | ||

- | [[Image:plos_cover_ham06.jpg|thumb|200px|left|Our [http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pcbi.0020131 article] on a probabilistic model of protein structure (C-alpha only) made the cover of the september 2006 issue of PLoS Comp. Biol. A second article (full backbone) was recently (June, 2008) published with [http://www.pnas.org/cgi/content/abstract/0801715105v1?etoc PNAS].]] | + | PHAISTOS is a Markov chain Monte Carlo framework for protein |

+ | structure simulations. It contains a variety of both established | ||

+ | and novel moves types, and provides support for several | ||

+ | force-fields from the literature. In addition, an interface to | ||

+ | the Muninn generalized ensemble package makes it possible to | ||

+ | easily conduct multi-histogram based simulations, avoiding the | ||

+ | convergence problems often associated with Metropolis-Hastings | ||

+ | based sampling. | ||

- | One of the major unsolved problems in modern day molecular biology is the '''protein folding problem''': given an amino acid sequence, predict the overall three-dimensional structure of the corresponding protein. It has been known since the seminal work of [http://en.wikipedia.org/wiki/Christian_Anfinsen Christian B. Anfinsen] in the early seventies that the sequence of a protein encodes its structure, but the exact details of the encoding still remain elusive. Since the protein folding problem is of enormous practical, theoretical and medical importance, and in addition forms a fascinating intellectual challenge, it is often called the holy grail of bioinformatics. | + | [[Image:Protein_folding_ani.gif|left|A greedy collapse using the TorusDBN and radius of gyration as an energy function.]] |

- | [[Image:sb_a15v15a15.png|thumb|200px|right|Some sampled conformations for the sequence A(15)V(15)A(15).]] | + | A unique feature of PHAISTOS is the use of probabilistic models |

+ | to capture essential structural properties in proteins. These | ||

+ | models are available both as proposal distributions (moves), and | ||

+ | for likelihood evaluations (energies). This increases the | ||

+ | flexibility when settings up a simulation, by allowing the user | ||

+ | to choose how to incorporate the bias provided by these models in | ||

+ | the simulation. For instance, similar to the use of fragment or | ||

+ | rotamer libraries, using probabilistic models for sampling of | ||

+ | backbone angles and sidechain angles corresponds to having an | ||

+ | implicit energy term present in the simulation. Unlike fragment | ||

+ | and rotamer libraries, however, when using probabilistic models, | ||

+ | this term can be evaluated and compensated for if | ||

+ | necessary. PHAISTOS currently incorporates models for the CA-only | ||

+ | representation of protein backbones ([http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pcbi.0020131 FB5HMM]), full-atom | ||

+ | backbones ([http://www.pnas.org/cgi/content/abstract/0801715105v1?etoc TORUSDBN]), full-atom sidechains ([http://www.biomedcentral.com/1471-2105/11/306/abstract/ BASILISK]), and | ||

+ | single-mass sidechains (COMPAS). | ||

- | [[Image:Protein_folding_ani.gif|right|A greedy collapse using the TorusDBN and radius of gyration as an energy function.]] | + | PHAISTOS also contains a highly efficient local move, CRISP, which is |

+ | capable of locally resampling short stretches of the protein backbone, | ||

+ | without violating the local geometry of the chain. This move was | ||

+ | [http://pubs.acs.org/journal/jctcce recently demonstrated] to | ||

+ | outperform current state-of-the-art local move algorithms. In | ||

+ | addition, it was shown that using this move, it was possible to | ||

+ | explore native ensembles of proteins with similar efficiency as | ||

+ | Molecular Dynamics. | ||

- | We are tackling the protein structure prediction problem from an original angle. Our group develops sophisticated '''probabilistic models''' that describe various aspects of protein structure, and uses these models in the prediction of structure from sequence. These probabilistic models are mainly based on two key ingredients: '''graphical models''' (including '''dynamic Bayesian networks''' and '''factor graphs'''), which are powerful machine learning methods, and '''directional statistics''', the statistics of angles, directions and orientations. | + | Finally, PHAISTOS contains tools to conduct simulations under |

+ | restraints from experimental data. In the current release, we have | ||

+ | support for SAXS data and NMR chemical shift data, but this will be | ||

+ | extended to other data types in future releases. | ||

- | It is important to note that these probabilistic models '''are not black box methods''': they can be rigorously interpreted and used in the framework of physics, and more specifically [http://en.wikipedia.org/wiki/Statistical_mechanics statistical mechanics]. In other words, we follow the view of [http://en.wikipedia.org/wiki/Edwin_Thompson_Jaynes Edwin T. Jaynes], who showed that [http://en.wikipedia.org/wiki/Maximum_entropy_thermodynamics statistical mechanics can be seen as a form of statistical inference] based on partial information, rather than a physical theory. | + | ==Related Projects== |

+ | * [http://muninn.sourceforge.net/ Muninn], A framework for conducting generalized ensemble simulations. | ||

+ | * [http://sourceforge.net/projects/mocapy/ Mocapy++], a C++ toolkit for inference and learning in dynamic Bayesian networks that supports directional statistics. [http://en.wikipedia.org/wiki/Directional_statistics Directional statistics] is the statistics of angles an directions, which is especially useful for the formulation of probabilistic models of biomolecular structure. We used this toolkit to formulate and train our probabilistic models of protein structure. | ||

- | We are steadily working on a program for ab initio protein structure prediction based on probabilistic models, called [http://en.wikipedia.org/wiki/Phaistos_disc Phaistos]. Some software is already freely available, see below. | + | ==PHAISTOS-related references== |

- | + | * Hamelryck, T., Kent, J., Krogh, A. (2006) Sampling realistic protein conformations using local structural bias. PLoS Comput. Biol., 2(9): e131. [http://compbiol.plosjournals.org/perlserv/?request=get-document&doi=10.1371/journal.pcbi.0020131 Download pdf]. | |

- | ==Software== | + | * Boomsma, W., Mardia, KV., Taylor, CC., Ferkinghoff-Borg, J., Krogh, A. and Hamelryck, T. (2008) A generative, probabilistic model of local protein structure. Proc. Natl. Acad. Sci. USA, 105, 8932-8937. [http://www.pnas.org/content/105/26/8932.abstract?etoc Download pdf]. |

+ | * Borg, M., Mardia, KV., Boomsma, W., Frellsen, J., Harder, T., Stovgaard, K., Ferkinghoff-Borg, J., Røgen, P., Hamelryck, T. A probabilistic approach to protein structure prediction: PHAISTOS in CASP9. LASR 2009 - Statistical tools for challenges in bioinformatics, pp. 65-70. Leeds university press, Leeds, UK. [http://www.maths.leeds.ac.uk/lasr2009/proceedings/borg.pdf Download pdf]. | ||

+ | * Harder, T., Boomsma, W., Paluszewski, M., Frellsen, J., Johansson, KE., Hamelryck, T. (2010) Beyond rotamers: A generative , probabilistic model of side chains in proteins. BMC Bioinformatics, 11:306. [http://www.biomedcentral.com/1471-2105/11/306 Download pdf]. | ||

+ | * Stovgaard, K., Andreetta, C., Ferkinghoff-Borg, J., Hamelryck, T. (2010) Calculation of accurate small angle X-ray scattering curves from coarse-grained protein models. BMC Bioinformatics, 11:429. [http://www.biomedcentral.com/1471-2105/11/429 Download pdf]. | ||

+ | * Hamelryck, T., Borg, M., Paluszewski, M., Paulsen, J., Frellsen, J., Andreetta, C., Boomsma, W. Bottaro, S., Ferkinghoff-Borg, J. (2010) Potentials of mean force for protein structure prediction vindicated, formalized and generalized. PLoS ONE, 5(11): e13714. [http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0013714 Download pdf]. | ||

+ | * Olsson, S., Boomsma, W., Frellsen, J., Bottaro, S., Harder, T., Ferkinghoff-Borg, J., Hamelryck, T. (2011) Generative probabilistic models extend the scope of inferential structure determination. J. Magn. Reson. 213(1), 182-6. [http://www.ncbi.nlm.nih.gov/pubmed/21993764 Pubmed]. | ||

+ | * Harder, T., Borg, M., Boomsma, W., Røgen, P., Hamelryck, T. (2011) Fast large-scale clustering of protein structures using Gauss integrals. Bioinformatics. [http://bioinformatics.oxfordjournals.org/cgi/reprint/btr692?ijkey=RYIzmTvnmtSqXsX&keytype=ref Download pdf] | ||

+ | * Bottaro, S., Boomsma, W., Johansson, K.E., Andreetta, C., Hamelryck, T. and Ferkinghoff-Borg, J. (2012) Subtle Monte Carlo updates in dense molecular systems. Journal of Chemical Theory and Computation. 8 (2), 695-702. [http://pubs.acs.org/doi/abs/10.1021/ct200641m HTML at journal] | ||

+ | * Harder, T., Borg, M., Bottaro, S., Boomsma, W., Olsson, S., Ferkinghoff-Borg, J., Hamelryck, T. (2012) An efficient null model for conformational fluctuations in proteins. Structure, 20, 1028-1039. [http://www.cell.com/structure/abstract/S0969-2126%2812%2900139-6 Download pdf]. | ||

+ | * Boomsma W, Frellsen J, Harder T, Bottaro S, Johansson KE, Tian P, Stovgaard K, Andreetta C, Olsson S, Valentin JB, Antonov LD, Christensen AS, Borg M, Jensen JH, Lindorff-Larsen K, Ferkinghoff-Borg J, Hamelryck T (2013). PHAISTOS: A framework for Markov chain Monte Carlo simulation and inference of protein structure, Journal of computational chemistry, [http://onlinelibrary.wiley.com/doi/10.1002/jcc.23292/abstract doi:10.1002/jcc.23292] | ||

- | * [http://sourceforge.net/projects/mocapy/ Mocapy], a toolkit for inference and learning in dynamic Bayesian networks. | + | ==Acknowledgements== |

- | * [http://sourceforge.net/projects/phaistos/ Phaistos], a framework for Monte Carlo simulations of proteins. | + | The development of PHAISTOS was made possible through grants from the Danish Council for Independent Research, the Danish Council for Strategic |

+ | Research, the Novo Nordisk STAR Program, and Radiometer (DTU). |

## Current revision

PHAISTOS is a Markov chain Monte Carlo framework for protein structure simulations. It contains a variety of both established and novel moves types, and provides support for several force-fields from the literature. In addition, an interface to the Muninn generalized ensemble package makes it possible to easily conduct multi-histogram based simulations, avoiding the convergence problems often associated with Metropolis-Hastings based sampling.

A unique feature of PHAISTOS is the use of probabilistic models to capture essential structural properties in proteins. These models are available both as proposal distributions (moves), and for likelihood evaluations (energies). This increases the flexibility when settings up a simulation, by allowing the user to choose how to incorporate the bias provided by these models in the simulation. For instance, similar to the use of fragment or rotamer libraries, using probabilistic models for sampling of backbone angles and sidechain angles corresponds to having an implicit energy term present in the simulation. Unlike fragment and rotamer libraries, however, when using probabilistic models, this term can be evaluated and compensated for if necessary. PHAISTOS currently incorporates models for the CA-only representation of protein backbones (FB5HMM), full-atom backbones (TORUSDBN), full-atom sidechains (BASILISK), and single-mass sidechains (COMPAS).

PHAISTOS also contains a highly efficient local move, CRISP, which is capable of locally resampling short stretches of the protein backbone, without violating the local geometry of the chain. This move was recently demonstrated to outperform current state-of-the-art local move algorithms. In addition, it was shown that using this move, it was possible to explore native ensembles of proteins with similar efficiency as Molecular Dynamics.

Finally, PHAISTOS contains tools to conduct simulations under restraints from experimental data. In the current release, we have support for SAXS data and NMR chemical shift data, but this will be extended to other data types in future releases.

## Related Projects

- Muninn, A framework for conducting generalized ensemble simulations.
- Mocapy++, a C++ toolkit for inference and learning in dynamic Bayesian networks that supports directional statistics. Directional statistics is the statistics of angles an directions, which is especially useful for the formulation of probabilistic models of biomolecular structure. We used this toolkit to formulate and train our probabilistic models of protein structure.

## PHAISTOS-related references

- Hamelryck, T., Kent, J., Krogh, A. (2006) Sampling realistic protein conformations using local structural bias. PLoS Comput. Biol., 2(9): e131. Download pdf.
- Boomsma, W., Mardia, KV., Taylor, CC., Ferkinghoff-Borg, J., Krogh, A. and Hamelryck, T. (2008) A generative, probabilistic model of local protein structure. Proc. Natl. Acad. Sci. USA, 105, 8932-8937. Download pdf.
- Borg, M., Mardia, KV., Boomsma, W., Frellsen, J., Harder, T., Stovgaard, K., Ferkinghoff-Borg, J., Røgen, P., Hamelryck, T. A probabilistic approach to protein structure prediction: PHAISTOS in CASP9. LASR 2009 - Statistical tools for challenges in bioinformatics, pp. 65-70. Leeds university press, Leeds, UK. Download pdf.
- Harder, T., Boomsma, W., Paluszewski, M., Frellsen, J., Johansson, KE., Hamelryck, T. (2010) Beyond rotamers: A generative , probabilistic model of side chains in proteins. BMC Bioinformatics, 11:306. Download pdf.
- Stovgaard, K., Andreetta, C., Ferkinghoff-Borg, J., Hamelryck, T. (2010) Calculation of accurate small angle X-ray scattering curves from coarse-grained protein models. BMC Bioinformatics, 11:429. Download pdf.
- Hamelryck, T., Borg, M., Paluszewski, M., Paulsen, J., Frellsen, J., Andreetta, C., Boomsma, W. Bottaro, S., Ferkinghoff-Borg, J. (2010) Potentials of mean force for protein structure prediction vindicated, formalized and generalized. PLoS ONE, 5(11): e13714. Download pdf.
- Olsson, S., Boomsma, W., Frellsen, J., Bottaro, S., Harder, T., Ferkinghoff-Borg, J., Hamelryck, T. (2011) Generative probabilistic models extend the scope of inferential structure determination. J. Magn. Reson. 213(1), 182-6. Pubmed.
- Harder, T., Borg, M., Boomsma, W., Røgen, P., Hamelryck, T. (2011) Fast large-scale clustering of protein structures using Gauss integrals. Bioinformatics. Download pdf
- Bottaro, S., Boomsma, W., Johansson, K.E., Andreetta, C., Hamelryck, T. and Ferkinghoff-Borg, J. (2012) Subtle Monte Carlo updates in dense molecular systems. Journal of Chemical Theory and Computation. 8 (2), 695-702. HTML at journal
- Harder, T., Borg, M., Bottaro, S., Boomsma, W., Olsson, S., Ferkinghoff-Borg, J., Hamelryck, T. (2012) An efficient null model for conformational fluctuations in proteins. Structure, 20, 1028-1039. Download pdf.
- Boomsma W, Frellsen J, Harder T, Bottaro S, Johansson KE, Tian P, Stovgaard K, Andreetta C, Olsson S, Valentin JB, Antonov LD, Christensen AS, Borg M, Jensen JH, Lindorff-Larsen K, Ferkinghoff-Borg J, Hamelryck T (2013). PHAISTOS: A framework for Markov chain Monte Carlo simulation and inference of protein structure, Journal of computational chemistry, doi:10.1002/jcc.23292

## Acknowledgements

The development of PHAISTOS was made possible through grants from the Danish Council for Independent Research, the Danish Council for Strategic Research, the Novo Nordisk STAR Program, and Radiometer (DTU).