Transmembrane Helix Prediction

TMHMM is a method for prediction transmembrane helices based on a hidden Markov model and developed by Anders Krogh and Erik Sonnhammer.

Data sets

Membrane proteins

Our set of 160 membrane proteins was split into ten parts as they were used for cross validation

Each entry consists of three lines:
1. The Swiss-prot identifier (fasta style)
2. The protein sequence (one long line)
3. The assignment preceeded by `#', `i' for inside (cytoplasmic side), `M' for helix, and `o' for outside (non-cytoplasmic side).

The whole set or cross-validation partition 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.

Proteins with known structure

This set of 645 proteins from PDB has been used as a negative set (non-membrane) to test the discriminative power of TMHMM (submitted).

Models

For the server (TMHMM 1.0) we use a model trained on the complete set of 160 proteins.

Stuff from ISMB paper

Press here to see the predictions on the 160 proteins. These are cross validated, i.e., the model used to predict the structure of a given protein was NOT trained on the partition of the data containing that protein. The format is like the sequence format above, except that lines are split. Lines preceeded by `#' are correct annotation and those preceeded by `?0' the prediction. Cross-validation models: The models trained on the set of 160 proteins (used for the above predictions). One model for each test set
Model 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.

In the ISMB paper we used one more set of 83 membrane proteins: partition 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.


Go to the DTU Health Tech home page
Last updated July 7 2000 by Anders Krogh, krogh@cbs.dtu.dk