LipoP - 1.0

Signal peptidase I & II cleavage sites in gram- bacteria

The LipoP 1.0 server produces predictions of lipoproteins and discriminates between lipoprotein signal peptides, other signal peptides and n-terminal membrane helices in Gram-negative bacteria.

Note: Although LipoP 1.0 has been trained on sequences from Gram-negative bacteria only, the following paper reports that it has a good performance on sequences from Gram-positive bacteria also:

Methods for the bioinformatic identification of bacterial lipoproteins encoded in the genomes of Gram-positive bacteria
O. Rahman, S. P. Cummings, D. J. Harrington and I. C. Sutcliffe
World Journal of Microbiology and Biotechnology 24(11):2377-2382 (2008)

NOTE: LipoP is outdated and is only kept online for reference. Lipoprotein signal peptides are better predicted by the current version of SignalP!

Submission

Restrictions
At most 5000 sequences and 500,000 amino acids per submission; each sequence not less than 70 and not more than 5,000 amino acids.
Confidentiality
The sequences are kept confidential and will be deleted after processing.

CITATIONS

For publication of results, please cite:

Prediction of lipoprotein signal peptides in Gram-negative bacteria.
A. S. Juncker, H. Willenbrock, G. von Heijne, H. Nielsen, S. Brunak and A. Krogh.
Protein Sci. 12(8):1652-62, 2003

Instructions

Input

The program takes proteins in FASTA format. It recognizes the 20 amino acids and B, Z, and X, which are all treated equally as unknown. Any other character is changed to X, so please make sure the sequences are sensible proteins

This is an example (one protein):

>5H2A_CRIGR you can have comments after the ID
MEILCEDNTSLSSIPNSLMQVDGDSGLYRNDFNSRDANSSDASNWTIDGENRTNLSFEGYLPPTCLSILHL
QEKNWSALLTAVVIILTIAGNILVIMAVSLEKKLQNATNYFLMSLAIADMLLGFLVMPVSMLTILYGYRWP
LPSKLCAVWIYLDVLFSTASIMHLCAISLDRYVAIQNPIHHSRFNSRTKAFLKIIAVWTISVGVSMPIPVF
GLQDDSKVFKQGSCLLADDNFVLIGSFVAFFIPLTIMVITYFLTIKSLQKEATLCVSDLSTRAKLASFSFL
PQSSLSSEKLFQRSIHREPGSYTGRRTMQSISNEQKACKVLGIVFFLFVVMWCPFFITNIMAVICKESCNE
HVIGALLNVFVWIGYLSSAVNPLVYTLFNKTYRSAFSRYIQCQYKENRKPLQLILVNTIPALAYKSSQLQA
GQNKDSKEDAEPTDNDCSMVTLGKQQSEETCTDNINTVNEKVSCV

Only the first 70 amino acids are used for prediction.

How to run it

The sequences can be input in the following two ways:

Paste a single sequence (just the amino acids) or a number of sequences in FASTA format into the upper window of the main server page.
Select a FASTA file on your local disk, either by typing the file name into the lower window or by browsing the disk.

Both ways can be employed at the same time: all the specified sequences will be processed.

Select one of the three output options ("Extensive, with graphics", "Extensive, no graphics", or "Short") and click on the "Submit" button.

Output format

The output format is essentially in GFF format. The default (long) output format looks like this:

# ANIA_NEIGO SpII score=29.6052 margin=11.2327 cleavage=18-19 Pos+2=G
# Cut-off=-3
ANIA_NEIGO	LipoP1.0:Best	SpII	1	1	29.6052
ANIA_NEIGO	LipoP1.0:Margin	SpII	1	1	11.2327
ANIA_NEIGO	LipoP1.0:Class	SpI	1	1	18.3725
ANIA_NEIGO	LipoP1.0:Class	CYT	1	1	-0.200913
ANIA_NEIGO	LipoP1.0:Signal	CleavII	18	19	29.6052	# FALAA|CGGEQ Pos+2=G
ANIA_NEIGO	LipoP1.0:Signal	CleavI	24	25	18.0333	# GGEQA|AQAPA
ANIA_NEIGO	LipoP1.0:Signal	CleavI	20	21	15.9259	# LAACG|GEQAA
ANIA_NEIGO	LipoP1.0:Signal	CleavI	26	27	12.0794	# EQAAQ|APAET
ANIA_NEIGO	LipoP1.0:Signal	CleavI	25	26	11.4077	# GEQAA|QAPAE
ANIA_NEIGO	LipoP1.0:Signal	CleavI	27	28	9.40252	# QAAQA|PAETP

(output trunctated)

The first line, which is the only line if short output is chosen, summarizes the best prediction. In the example the best prediction is a lipoprotein with a cleavage site between amino acid 18 and 19 and amino acid G (glycine) in position +2 after the cleavage site. The second line gives the cut-off used. In the following the columns contain

Sequence ID
Type of prediction. Best means the highest scoring class, Margin gives the difference between the best score and the second best score, Class gives the score of other classes and Signal lines contain predicted cleavage sites.
Feature type, see below
Location in the sequence. For lines with a class prediction it is always 1. For cleavage sites it is the last amino acid of the signal peptide relative to the predicted cleavage site.
Location as above axcept that for cleavage sites it is the first amino acids after the cleavage site.
Score. For the "Margin" type it is the difference between the best and the second best class score. Otherwise the log-odds score.
For the cleavage sites the ±5 context is shown after the #, and for lipoprotein cleavage sites the amino acid in postition +2 is shown (which may determine whether the lipoprotein is attached to the inner or outer membrane, see below).

These 4 clases are predicted

SpI: signal peptide (signal peptidase I)

SpII: lipoprotein signal peptide (signal peptidase II)

TMH: n-terminal transmembrane helix. This is generally not a very reliable prediction and should be tested. This part of the model is mainly there to avoid tranmembrane helices being falsely predicted as signal peptides.

CYT: cytoplasmic. It really just means all the rest.

For technical reasons (see paper) the score for CYT is always the same.

These signals are predicted:

CleavI: Cleavage sites for (signal peptidase I).

CleavII: Cleavage sites for (signal peptidase II).

Plot of scores

A plot of the cleavage site scores is made in postscript unless you have chosen the short output format or disabled the plot. For each predicted cleavage site, the score is shown. Two different colors are used for SpI and SpII. To the left is shown the scores of the classes scoring higher than the cut-off. The postscript is converted to an image (png format) and included in the html output (if selected).

Below the plot there are links to

The plot in encapsulated postscript
A script for making the plot in gnuplot.

If there are only few predictions of cleavage sites, no plot is made.

It is shown in the paper that the margin, i.e., the difference between the best and the second best prediction, correlates well with the number of falsely predicted signal peptides.

An aspartic acid (D) in position +2 after the cleavage site of a lipoprotein means that it is attached to the inner membrane, and most other lipoproteins are attached to the outer membrane. Therefore we report the amino acid in this position for predicted lipoproteins. See e.g. Seydel et al (1999) Molecular Microbiology 34: 810-821 for more details.

The cross-validation test reported in the paper gave the results shown in the table below. The highest scoring class was predicted. For signal peptides, 309 out of 328 were correctly classified as such, whereas 2 where classified as lipoproteins, 14 as cytoplasmic and 3 as having an n-terminal transmembrane helix. Of 63 lipoproteins, 61 were classified correctly.

Correct class	Predicted class
Correct class	SPaseI	SPaseII	Cytoplasmic	TMH	Total
SPaseI	309	2	14	3	328
SPaseII	2	61	0	0	63
Cytoplasmic	5	1	382	0	388
TMH	8	0	21	142	171

It is also shown in the paper that the prediction is more reliable the higher the margin is.

References

Prediction of lipoprotein signal peptides in Gram-negative bacteria.
A. S. Juncker¹, H. Willenbrock¹, G. von Heijne², H. Nielsen¹, S. Brunak¹ and A. Krogh³.

¹ Center for Biological Sequence Analysis, BioCentrum-DTU, The Technical University of Denmark, DK-2800 Lyngby, Denmark
² Department of Biochemistry, Stockholm University, S-106 91 Stockholm, Sweden
³ Bioinformatics Centre, University of Copenhagen, Universitetsparken 15, 2100 Copenhagen, Denmark

Abstract

A method to predict lipoprotein signal peptides in Gram-negative Eubacteria, LipoP, has been developed. The hidden Markov model (HMM) was able to distinguish between lipoproteins (SPaseII-cleaved proteins), SPaseI-cleaved proteins, cytoplasmic proteins, and transmembrane proteins. This predictor was able to predict 96.8% of the lipoproteins correctly with only 0.3% false positives in a set of SPaseI-cleaved, cytoplasmic, and transmembrane proteins. The results obtained were significantly better than those of previously developed methods. Even though Gram-positive lipoprotein signal peptides differ from Gram-negatives, the HMM was able to identify 92.9% of the lipoproteins included in a Gram-positive test set. A genome search was carried out for 12 Gram-negative genomes and one Gram-positive genome. The results for Escherichia coli K12 were compared with new experimental data, and the predictions by the HMM agree well with the experimentally verified lipoproteins. A neural network-based predictor was developed for comparison, and it gave very similar results.

PMID: 12876315

Software Downloads

Version 1.0a

Linux

GETTING HELP

If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Correspondence: Technical Support: