DTU Health Tech

Department of Health Technology

ChloroP - 1.1

Chloroplast transit peptides and their cleavage sites in plant proteins

The ChloroP server predicts the presence of chloroplast transit peptides (cTP) in protein sequences and the location of potential cTP cleavage sites.

NOTE: This service is superseded by TargetP which integrates predictions of chloroplast transit peptides, signal peptides and mitochondrial transit peptides.

Discontinued

This service is unfortunately discontinued.
The software may still be available for download for academic institutions at the Downloads tab.

Instructions



Submit...
A. by "cut and paste": (i) a single sequence in FASTA format or only plain sequence, or (ii) multiple sequences in FASTA format.
B. a file in FASTA format. Type in the path to the file directly or use the "Browse" button to find your file.
NOTE: ChloroP accepts only amino acid sequences.

Include the N-terminus
It is strongly recommended to include the N-terminus of the submitted sequence. The further from the N-terminal residue the submitted sequence starts, the more difficult and unreliable the prediction will be.

Submit preferably around 100 residues
Submit if possible about 100 N-terminal residues. The suggested length is due to the fact that the "cTP"/"no cTP" predictor was trained with input sequences of length 100 residues. However, shorter sequences may also be satisfactory predicted (it is more important that the N-terminal part is intact). The cleavage site prediction is restricted to search for a potential cleavage site within the 100 most N-terminal residues and is in itself not influenced by sequence length.

Sequence names
The name of the sequences may be of any length, but only the first 20 characters will be preserved throughout the prediction and presented on the ChloroP prediction result page.

Detailed output
Check this box if you would like the neural network score also for each residue (and not only for each sequence) to be presented. A derivative of the network score used for finding the area in which the cleavage site is searched for (the 40 residues surrounding the residue with the highest derivative score) is also presented, along with the cleavage site score (CS-score) for each residue.

Output format



On the result page, for each query sequence the name, the length, and the neural network output score on which the cTP/non-cTP assignment is based will be presented. The higher the score, the more certain is the network that this sequence contains an N-terminal chloroplast transit peptide (cTP). A potential cTP length will also be presented, along with the corresponding cleavage site score. Please note that the prediction of the transit peptide length is carried out even if its presence is not predicted. The purpose is to provide maximal information in borderline cases.

If "Detailed output" is chosen, the neural network score for each residue will also be presented. The higher this score, the more certain is the network that this residue is part of a cTP. A derivative of the network score is also presented. This score is used for finding the area in which the cleavage site is searched for - namely among the 40 residues surrounding the residue with the highest derivative score. Finally, the cleavage site score (CS-score) is presented for each residue. This score is calculated from a scoring matrix derived from an automatic motif finding algorithm called MEME at sdsc. The cleavage site score is defined so that the predicted cleavage site is directly N-terminal of the highest scoring residue within the 40 residues. Thus, there might exist one or several CS-score(s) that happen(s) to be greater than the score of the proposed cTP length, but since they are located outside of the 40 residues around the highest derivative score, the presented cTP length is still what ChloroP considers as the most likely presequence length (ie. corresponding to the most likely cleavage site).

Interpretation of output

Always shown:

Name is the name of the submitted sequence, truncated if longer than 11 characters.
Length is the length (!) of the submitted sequence.
Score is the output score from the second step network. The prediction cTP/no cTP is based solely on this score.
cTP tells whether or not this is predicted as a cTP-containing sequence; "Y" means that the sequence is predicted to contain a cTP; "-" means that is predicted not to contain a cTP.
CS-score is the MEME scoring matrix score for the suggested cleavage site.
cTP-length is the predicted length of the presequence (Please note that the prediction of the transit peptide length is carried out and presented even if its presence is not predicted).

Shown only if "Detailed output" was chosen:

NN-score, Raw is the score for each residue from the first step network.
NN-score, Deriv. is a numerical derivative of the network score. Used for finding the amino acid stretch of 40 residues in which the cleavage site is searched.
CS-score (cleavage site score) is the MEME scoring matrix score, defined so that the predicted cleavage site is directly N-terminal of the highest scoring residue.

Datasets


cTP containing data set used for training of ChloroP networks

ChloroP was trained on a set of 150 sequences, whereof 75 were chloroplast transit peptide (cTP) containing. The 75 cTP containing proteins have all been checked with the papers originally presenting them and a few database annotation errors have been corrected. The 75 sequences are also redundancy reduced (with regard to their annotated cTP sequence) using the Hobom algorithm 2 (Hobohm, U, et. al., Protein Science 1:409-417 (1992)).

Here are the 75 sequences containing cTP:

Here are the 75 sequences not containing cTP:

References


ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites. Olof Emanuelsson1, Henrik Nielsen1,2, and Gunnar von Heijne1. Protein Science: 8: 978-984, 1999.

1 Department of Biochemistry, Stockholm University, S-106 91 Stockholm, Sweden
2 Center for Biological Sequence Analysis, BioCenterum-DTU, Technical University of Denmark, DK-2800 Lyngby, Denmark

Abstract

We present a neural network based method (ChloroP) for identifying chloroplast transit peptides and their cleavage sites. Using cross-validation, 88% of the sequences in our homology reduced training set were correctly classified as transit peptides or nontransit peptides. This performance level is well above that of the so far only publicly available chloroplast localization predictor PSORT. Cleavage sites are predicted using a scoring matrix derived by an automatic motif-finding algorithm. Approximately 60% of the known cleavage sites in our sequence collection were predicted to within +- 2 residues from the cleavage sites given in SWISS-PROT. An analysis of 715 A. thaliana sequences from SWISS-PROT suggests that the ChloroP method should be useful for the identification of putative transit peptides in genome-wide sequence data.

For errors in the published version of the article click here.

Updates


ChloroP 1.1: changes from the previous version

1. The output format has changed as follows:
  • the predicted cTP length is presented instead of the predicted cleavage site;

  • the potential cTP length is presented for all sequences, even if no presequence was predicted to be there (to provide maximum information in borderline cases);

  • the "detailed output" earlier available (and mandatory...) only when submitting a single sequence is now optional for all types of submissions.
2. Multiple sequences may now also be input by "cut-and-paste" (use FASTA format).

3. The first 20 characters of sequence names will be preserved throughout the prediction and presented on the ChloroP prediction result page (as compared to 11 in previous versions. Wow! Great improvement, isn't it?).


Earlier changes

1. Presentation of cleavage site score for every residue in a submitted sequence

Presented only if only a single sequence was submitted. The cleavage site score (calculated from a scoring matrix) is defined so that the predicted cleavage site is located directly N-terminal of the highest score in a 40 amino acids long stretch around the peak value of the network output score derivative. Thus, there might exist one or several CS-score(s) that happen(s) to be greater than the score of the proposed cleavage site, but since they are all located outside the area comprising the 40 residues determined by the derivative, the presented cleavage site is still what ChloroP considers as the most likely site. The user may however feel free to search for other potential cleavage sites; the cleavage site score was included in the output in order to make such a search possible.

2. Length restriction on predicted transit peptides

To avoid prediction of absurdly long transit peptides, the cleavage site (or rather the highest derivative) is searched among the 100 N-terminal residues. In our set of 75 cTP-containing proteins, the longest cTP is 91 amino acids. The restriction focuses the prediction on the N-terminal part of the sequences while still allowing for smaller errors in the N-terminal part of the submitted sequences.

Software Downloads




GETTING HELP

If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Correspondence: Technical Support: