At most 10,000 sequences and 4,000,000 amino acids per submission; each sequence not more than 8,000 amino acids.
The sequences are kept confidential and will be deleted after processing.
This server is for prediction of transmembrane helices in proteins.
TMHMM has been rated best in an independent comparison of programs
for prediction of TM helices:
- S. Moller, M.D.R. Croning, R. Apweiler.
Evaluation of methods for the prediction of membrane spanning regions.
Bioinformatics, 17(7):646-653, July 2001.
Quote from the abstract:
`Our results show that TMHMM is currently the best performing transmembrane
TMHMM is described in
- A. Krogh,
B. Larsson, G. von Heijne, and E. L. L. Sonnhammer.
Predicting transmembrane protein topology with a hidden
Markov model: Application to complete genomes.
Journal of Molecular Biology, 305(3):567-580, January 2001.
(PDF, 959503 bytes)
- E. L.L.
Sonnhammer, G. von Heijne, and A. Krogh.
A hidden Markov model for predicting transmembrane
helices in protein sequences.
In J. Glasgow, T. Littlejohn, F. Major, R. Lathrop, D. Sankoff, and C. Sensen,
editors, Proceedings of the Sixth International Conference on
Intelligent Systems for Molecular Biology, pages 175-182, Menlo Park,
CA, 1998. AAAI Press.
(Gzipped PostScript, 8 pages, 42470 bytes)
(PDF, 844205 bytes)
Press here to see other material (training data, etc).
The program takes proteins in FASTA format.
the 20 amino acids and B, Z, and X, which are all treated equally as unknown.
Any other character is changed to X, so please make sure the sequences
are sensible proteins
This is an example (one protein):
>5H2A_CRIGR you can have comments after the ID
How to run it
Either give the name of the local file in which you have the proteins in
the top half of the window, or paste the sequence(s) into the lower part
of the window. Then press `Submit'. (It should be possible to both
give it a local file and paste sequences if you really want.)
There are two output formats: Long and short.
Long output format
For the long format (default), tmhmm
gives some statistics and a list of the location of the predicted transmembrane
helices and the predicted location of the intervening loop regions.
Here is an example:
# COX2_BACSU Length: 278
# COX2_BACSU Number of predicted TMHs: 3
# COX2_BACSU Exp number of AAs in TMHs: 68.6888999999999
# COX2_BACSU Exp number, first 60 AAs: 39.8875
# COX2_BACSU Total prob of N-in:
# COX2_BACSU POSSIBLE N-term signal sequence
inside 1 6
TMhelix 7 29
outside 30 43
TMhelix 44 66
inside 67 86
TMhelix 87 109
outside 110 278
If the whole sequence is labeled as inside or outside, the prediction
is that it contains no membrane
helices. It is probably not wise to interpret it as a prediction
of location. The prediction gives the most probable location and orientation
of transmembrane helices in the sequence. It is found by an algorithm called
N-best (or 1-best in this case) that sums over all paths through the model
with the same location and direction of the helices.
The first few lines gives some statistics:
Length: the length of the protein sequence.
Number of predicted TMHs: The number of predicted transmembrane helices.
Exp number of AAs in TMHs: The expected number of amino acids intransmembrane
helices. If this number is larger than 18 it is very likely to be a transmembrane
protein (OR have a signal peptide).
Exp number, first 60 AAs: The expected number of amino acids in transmembrane
helices in the first 60 amino acids of the protein. If this number more
than a few, you should be warned that a predicted transmembrane helix in
the N-term could be a signal peptide.
Total prob of N-in: The total probability that the N-term is on the cytoplasmic
side of the membrane.
POSSIBLE N-term signal sequence: a warning that is produced when "Exp number,
first 60 AAs" is larger than 10.
Plot of probabilities
The plot shows the posterior probabilities
of inside/outside/TM helix. Here one can see possible weak TM helices that
were not predicted, and one can get an idea of the certainty of each
segment in the prediction.
At the top of the plot (between 1 and 1.2) the N-best prediction is
The plot is obtained by calculating the total probability that a
residue sits in helix, inside, or outside summed over all possible
paths through the model. Sometimes it seems like the plot and the
prediction are contradictory, but that is because the plot shows probabilities
for each residue, whereas the prediction is the over-all most probable
structure. Therefore the plot should be seen as a complementary source
Below the plot there are links to
The plot in encapsulated postscript
A script for making the plot in gnuplot.
The data for the plot.
Short output format
In the short output format one line is produced for each protein with no
graphics. Each line starts with the sequence identifier and then these
"len=": the length of the protein sequence.
"ExpAA=": The expected number of amino acids intransmembrane helices (see
"First60=": The expected number of amino acids in transmembrane
helices in the first 60 amino acids of the protein (see above).
"PredHel=": The number of predicted transmembrane helices by N-best.
"Topology=": The topology predicted by N-best.
For the example above the short output would be (except that it would be
on one line):
The topology is given as the position of the transmembrane helices separated
by 'i' if the loop is on the inside or 'o' if it is on the outside. The
above example 'i7-29o44-66i87-109o' means that it starts on the inside,
has a predicted TMH at position 7 to 29, the outside, then a TMH at position
Predicted TM segments in the n-terminal region sometime turn out to be
One of the most common mistakes by the program is to reverse the direction
of proteins with one TM segment.
Do not use the program to predict whether a non-membrane protein is
cytoplasmic or not.