Training and testing data set
The dataset used for training, validating, and testing TargetP 2.0 (using nested cross-validation) can be found here.
The sequences are in FASTA format with the UniProt AC as sequence name:
The annotations are in a tab-separated file where each line contains three fields: The UniProt AC, the type of targeting peptide,
and the length of the targeting peptide.
The type can be
- "SP" for signal peptide,
- "MT" for mitochondrial transit peptide (mTP),
- "CH" for chloroplast transit peptide (cTP),
- "TH" for thylakoidal lumen composite transit peptide (lTP),
- "Other" for no targeting peptide (in this case, the length is given as 0).
Predictions on proteomes
Results from TargetP predictions on whole proteomes from UniProt (gzipped text files):