Services
LipoP - 1.0
Signal peptidase I & II cleavage sites in gram- bacteria
The LipoP 1.0 server produces predictions of lipoproteins and discriminates between lipoprotein signal peptides, other signal peptides and n-terminal membrane helices in Gram-negative bacteria.
Note: Although LipoP 1.0 has been trained on sequences from Gram-negative bacteria only, the following paper reports that it has a good performance on sequences from Gram-positive bacteria also:
Methods for the bioinformatic identification of bacterial lipoproteins encoded in the genomes of Gram-positive bacteria
O. Rahman, S. P. Cummings, D. J. Harrington and I. C. Sutcliffe
World Journal of Microbiology and Biotechnology 24(11):2377-2382 (2008)
NOTE: LipoP is outdated and is only kept online for reference. Lipoprotein signal peptides are better predicted by the current version of SignalP! |
Submission
Restrictions
At most 5000 sequences and 500,000 amino acids per submission; each sequence not less than 70 and not more than 5,000 amino acids.
Confidentiality
The sequences are kept confidential and will be deleted after processing.
CITATIONS
For publication of results, please cite:
Prediction of lipoprotein signal peptides in Gram-negative bacteria.
A. S. Juncker, H. Willenbrock, G. von Heijne, H. Nielsen, S. Brunak
and A. Krogh.
Protein Sci. 12(8):1652-62, 2003
Instructions
Input
The program takes proteins in FASTA format. It recognizes the 20 amino acids and B, Z, and X, which are all treated equally as unknown. Any other character is changed to X, so please make sure the sequences are sensible proteinsThis is an example (one protein):
>5H2A_CRIGR you can have comments after the ID
MEILCEDNTSLSSIPNSLMQVDGDSGLYRNDFNSRDANSSDASNWTIDGENRTNLSFEGYLPPTCLSILHL
QEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYRWP
LPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHHSRFNSRTKAFLKIIAVWTISVGVSMPIPVF
GLQDDSKVFKQGSCLLADDNFVLIGSFVAFFIPLTIMVITYFLTIKSLQKEATLCVSDLSTRAKLASFSFL
PQSSLSSEKLFQRSIHREPGSYTGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICKESCNE
HVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENRKPLQLILVNTIPALAYKSSQLQA
GQNKDSKEDAEPTDNDCSMVTLGKQQSEETCTDNINTVNEKVSCV
Only the first 70 amino acids are used for prediction.
How to run it
The sequences can be input in the following two ways:
-
Paste a single sequence (just the amino acids) or a number of sequences in
FASTA
format into the upper window of the main server page.
- Select a FASTA file on your local disk, either by typing the file name into the lower window or by browsing the disk.
Both ways can be employed at the same time: all the specified sequences will be processed.
Select one of the three output options ("Extensive, with graphics", "Extensive, no graphics", or "Short") and click on the "Submit" button.
Output format
The output format is essentially in GFF format. The default (long) output format looks like this:
# ANIA_NEIGO SpII score=29.6052 margin=11.2327 cleavage=18-19 Pos+2=G # Cut-off=-3 ANIA_NEIGO LipoP1.0:Best SpII 1 1 29.6052 ANIA_NEIGO LipoP1.0:Margin SpII 1 1 11.2327 ANIA_NEIGO LipoP1.0:Class SpI 1 1 18.3725 ANIA_NEIGO LipoP1.0:Class CYT 1 1 -0.200913 ANIA_NEIGO LipoP1.0:Signal CleavII 18 19 29.6052 # FALAA|CGGEQ Pos+2=G ANIA_NEIGO LipoP1.0:Signal CleavI 24 25 18.0333 # GGEQA|AQAPA ANIA_NEIGO LipoP1.0:Signal CleavI 20 21 15.9259 # LAACG|GEQAA ANIA_NEIGO LipoP1.0:Signal CleavI 26 27 12.0794 # EQAAQ|APAET ANIA_NEIGO LipoP1.0:Signal CleavI 25 26 11.4077 # GEQAA|QAPAE ANIA_NEIGO LipoP1.0:Signal CleavI 27 28 9.40252 # QAAQA|PAETP(output trunctated)
The first line, which is the only line if short
output is chosen, summarizes the best prediction. In the example the
best prediction is a lipoprotein with a cleavage site between amino acid
18 and 19 and amino acid G (glycine) in position +2 after the cleavage site.
The second line gives the cut-off used. In the following the columns contain
- Sequence ID
- Type of prediction. Best means the highest scoring class, Margin gives the difference between the best score and the second best score, Class gives the score of other classes and Signal lines contain predicted cleavage sites.
- Feature type, see below
- Location in the sequence. For lines with a class prediction it is always 1. For cleavage sites it is the last amino acid of the signal peptide relative to the predicted cleavage site.
- Location as above axcept that for cleavage sites it is the first amino acids after the cleavage site.
- Score. For the "Margin" type it is the difference between the best and the second best class score. Otherwise the log-odds score.
- For the cleavage sites the ±5 context is shown after the #,
and for lipoprotein cleavage sites the amino acid in postition +2 is shown
(which may determine whether the lipoprotein is attached to the inner or
outer membrane, see below).
These 4 clases are predicted
SpI: signal peptide (signal peptidase I)
SpII: lipoprotein signal peptide (signal peptidase II)
TMH: n-terminal transmembrane helix. This is generally not a very reliable prediction and should be tested. This part of the model is mainly there to avoid tranmembrane helices being falsely predicted as signal peptides.
CYT: cytoplasmic. It really just means all the rest.
For technical reasons (see paper) the score for CYT is always the same.
These signals are predicted:
CleavI: Cleavage sites for (signal peptidase I).
CleavII: Cleavage sites for (signal peptidase II).
Plot of scores
A plot of the cleavage site scores is made in postscript unless you have chosen the short output format or disabled the plot. For each predicted cleavage site, the score is shown. Two different colors are used for SpI and SpII. To the left is shown the scores of the classes scoring higher than the cut-off. The postscript is converted to an image (png format) and included in the html output (if selected).Below the plot there are links to
- The plot in encapsulated postscript
- A script for making the plot in gnuplot.
If there are only few predictions of cleavage sites, no plot is made.
It is shown in the paper that the margin, i.e., the difference between the best and the second best prediction, correlates well with the number of falsely predicted signal peptides.
An aspartic acid (D) in position +2 after the cleavage site of a lipoprotein means that it is attached to the inner membrane, and most other lipoproteins are attached to the outer membrane. Therefore we report the amino acid in this position for predicted lipoproteins. See e.g. Seydel et al (1999) Molecular Microbiology 34: 810-821 for more details.
The cross-validation test reported in the paper gave the results shown in the table below. The highest scoring class was predicted. For signal peptides, 309 out of 328 were correctly classified as such, whereas 2 where classified as lipoproteins, 14 as cytoplasmic and 3 as having an n-terminal transmembrane helix. Of 63 lipoproteins, 61 were classified correctly.
Correct class |
Predicted class |
||||
SPaseI |
SPaseII |
Cytoplasmic |
TMH |
Total |
|
SPaseI |
309 |
2 |
14 |
3 |
328 |
SPaseII |
2 |
61 |
0 |
0 |
63 |
Cytoplasmic |
5 |
1 |
382 |
0 |
388 |
TMH |
8 |
0 |
21 |
142 |
171 |
References
Prediction of lipoprotein signal peptides in Gram-negative bacteria.
A. S. Juncker1, H. Willenbrock1, G. von Heijne2, H. Nielsen1, S. Brunak1 and A. Krogh3.
1
Center for Biological Sequence Analysis, BioCentrum-DTU,
The Technical University of Denmark, DK-2800 Lyngby, Denmark
2
Department of Biochemistry, Stockholm University,
S-106 91 Stockholm, Sweden
3
Bioinformatics Centre, University of Copenhagen,
Universitetsparken 15, 2100 Copenhagen, Denmark
Abstract
PMID: 12876315