1. Specify the input sequences
All the input sequences must be in one-letter amino acid
code. The allowed alphabet (not case sensitive) is as follows:
A C D E F G H I K L M N P Q R S T V W Y and X (unknown)
All the alphabetic symbols not in the allowed alphabet
will be converted to X before processing. All the non-alphabetic
symbols, including white space and digits, will be ignored.
The sequences can be input in the following two ways:
-
Paste a single sequence (just the amino acids) or a number of sequences in
FASTA
format into the upper window of the main server page.
-
Select a FASTA
file on your local disk, either by typing the file name into the lower window
or by browsing the disk.
Both ways can be employed at the same time: all the specified sequences will
be processed. However, there may be not more than 2,000 sequences and
200,000 amino acids in total in one submission. The sequences
may not be longer than 6,000 amino acids.
2. Customize your run
- Organism group:
It is important for performance that you choose the correct organism
group —
Eukaryotes, Gram-negative bacteria or Gram-positive bacteria —
since the signal peptides of these three groups are known to differ
from each other.
Gram-positive bacteria correspond to
Actinobacteria and
Firmicutes in the
NCBI Taxonomy.
Gram-negative bacteria are all other
eubacteria, except
Tenericutes (including
Mycoplasma), which seem to lack a type I signal peptidase and
therefore do not have standard signal peptides.
Unfortunately, we are
unable to provide a SignalP version for
Archaea, since there are too few experimentally confirmed signal
peptides from this organism group in the
UniProt database (click
here to repeat the search).
- D-cutoff values:
The default cutoff values for SignalP 4 are chosen to optimize the
performance measured as Matthews Correlation Coefficient (MCC). This
results in a lower sensitivity (true positive rate) than
SignalP 3.0
had. In SignalP 4.1, we have introduced the option of setting the
cutoff to a lower value which yields the same sensitivity as SignalP
3.0. This will make the false positive rate slightly higher, but
still better than that of SignalP 3.0. Read more on the Performance page.
You can see which cutoff values are being used in the boxes marked
"D-cutoff". They will change if you change the setting for
"D-cutoff values" or "Organism group".
If you want to experiment with your own cutoff
values, select "User defined" and the boxes will go blank, ready for
you to fill in values between 0 and 1.
- Graphics output:
In the default output, SignalP embeds one plot in PNG format
per sequence, showing the C-, S-, and Y-scores for each position in
the sequence. You can choose to avoid the plots (No graphics) or to add
an Encapsulated PostScript (EPS) file for each sequence. The EPS
files will be provided as links.
See the Output format for an example and explanation of
the scores.
- Output format:
You can choose between four output formats:
- Standard
- Appropriate for most users. Shows one plot and one summary per sequence.
- Short
- Convenient if you submit lots of sequences. Shows only one line of
output per sequence. Incompatible with graphics.
- Long
- Shows the C-, S-, and Y-scores for each position in
the sequence in addition to the Standard output.
- All
- Shows the output scores of both neural network types (SignalP-TM
and SignalP-noTM) for each position in
the sequence. Incompatible with graphics.
See the Output format for an example and explanation of
the scores.
- Method:
Signalp 4.1 contains two types of neural networks. SignalP-TM has
been trained with sequences containing transmembrane segments in the
data set, while SignalP-noTM has been trained without those
sequences. Per default, SignalP 4.1 uses SignalP-TM as a preprocessor to determine
whether to use SignalP-TM or SignalP-noTM in the final prediction
(if 4 or more positions are predicted to be in a transmembrane
state, SignalP-TM is used, otherwise SignalP-noTM).
An exception is Gram-positive bacteria, where SignalP-TM is used
always.
If you are confident that there are no transmembrane segments in
your data, you can get a slightly better performance by choosing
"Input sequences do not include TM regions", which will tell SignalP
4.1 to use SignalP-noTM always.
- Positional limits:
- Minimal predicted signal peptide length
- SignalP 4.0 could, in rare cases, erroneously predict signal peptides
shorter than 10 residues. These errors have in SignalP 4.1 been
eliminated by imposing a lower limit on the cleavage site
position (signal peptide length). The minimum length is by default 10, but
you can adjust it. Signal peptides shorter than 15 residues are very rare.
If you want to disable this
length restriction completely, enter 0 (zero).
- N-terminal truncation of input sequence
- By default, the predictor truncates each sequence to max.
70 residues before submitting it to the neural networks. If you
want to predict extremely long signal peptides, you can try a
higher value, or disable truncation completely by entering 0
(zero). Note: The neural networks are trained with sequences
with a maximal length of 70, and they include the relative
position in the sequence in their input. Therefore, general performance
will deteriorate if you change this setting.
3. Submit the job
Click on the
"Submit" button. The status of your job (either 'queued'
or 'running') will be displayed and constantly updated until it terminates and
the server output appears in the browser window.
At any time during the wait you may enter your e-mail address and simply leave
the window. Your job will continue; you will be notified by e-mail when it has
terminated. The e-mail message will contain the URL under which the results are
stored; they will remain on the server for 24 hours for you to collect them.