The NetGPI dataset
The datasets for training and benchmarking NetGPI-1.1 can be found here. The dataset is provided in 2-line FASTA format.
The format is as follows:
>uniprot_ac|kingdom|anchoring|pos_from_end|pos_from_beginning|part_no|anchor_exp|omega_exp
amino-acid sequence
where:
- uniprot_ac is an accession number
- kingdom is the organism's kingdom
- anchoring is GPI-anchored or non_GPI-anchored
- pos_from_end is the position within the sequence from the end, where 0 is the sentinel
- pos_from_beginning is the position within the truncated sequence from the beginning
- part_no is the partition that the protein is assigned to
- anchor_exp is 1 when the entry has experimental evidence for the GPI-anchoring signal sequence, 0 otherwise
- omega_exp is 1 when the entry has experimental evidence for the omega-site, 0 otherwise
NetGPI dataset: download