1. Specify the input sequences
All the input sequences must be in one-letter amino acid
code. The allowed alphabet (not case sensitive) is as follows:
A C D E F G H I K L M N P Q R S T V W Y and X (unknown)
All the alphabetic symbols not in the allowed alphabet
will be converted to X before processing. All the non-alphabetic
symbols, including white space and digits, will be ignored.
The sequences can be input in the following two ways:
Paste a single sequence (just the amino acids) or a number of sequences in
format into the upper window of the main server page.
Select a FASTA
file on your local disk, either by typing the file name into the lower window
or by browsing the disk.
Both ways can be employed at the same time: all the specified sequences will
be processed. However, there may be not more than 5,000 sequences in one submission.
2. Customize your run
- Organism group:
Choose Plant for any organism with chloroplasts/plastids and Non-plant otherwise.
- Output format:
You can choose between two output formats:
- Shows one plot and one summary per sequence.
- Convenient if you submit lots of sequences. Shows only one line of
output per sequence and no graphics.
3. Submit the job
Click on the "Submit"
button. The status of your job (either 'queued'
or 'running') will be displayed and constantly updated until it terminates and
the server output appears in the browser window.
At any time during the wait you may enter your e-mail address and simply leave
the window. Your job will continue; you will be notified by e-mail when it has
terminated. The e-mail message will contain the URL under which the results are
stored; they will remain on the server for 24 hours for you to collect them.
Training and testing data set
The dataset used for training, validating, and testing TargetP 2.0 (using nested cross-validation) can be found here.
The sequences are in FASTA format with the UniProt AC as sequence name:
The annotations are in a tab-separated file where each line contains three fields: The UniProt AC, the type of targeting peptide,
and the length of the targeting peptide.
The type can be
- "SP" for signal peptide,
- "MT" for mitochondrial transit peptide (mTP),
- "CH" for chloroplast transit peptide (cTP),
- "TH" for thylakoidal lumen composite transit peptide (lTP),
- "Other" for no targeting peptide (in this case, the length is given as 0).
Predictions on proteomes
Results from TargetP predictions on whole proteomes from UniProt (gzipped text files):
Detecting Sequence Signals in Targeting Peptides Using Deep Learning
José Juan Almagro Armenteros, Marco Salvatore, Ole Winther, Olof Emanuelsson, Gunnar von Heijne, Arne Elofsson, and Henrik Nielsen
Life Science Alliance 2
(5), e201900429. doi:10.26508/lsa.201900429
The source code for training and running TargetP 2.0 is available under the creative
commons CC BY-NC-SA license from