Training and testing data set

The dataset used for training, validating, and testing TargetP 2.0 (using nested cross-validation) can be found here.

The sequences are in FASTA format with the UniProt AC as sequence name: Download

The annotations are in a tab-separated file where each line contains three fields: The UniProt AC, the type of targeting peptide, and the length of the targeting peptide.
The type can be


Predictions on proteomes

Results from TargetP predictions on whole proteomes from UniProt (gzipped text files):