DTU Health Tech
Department of Health Technology
This link is for the general contact of the DTU Health Tech institute.
If you need help with the bioinformatics programs, see the "Getting Help" section below the program.
Here, you will find the data set used for training and testing, as well as the T cell epitope data used for evalaution of the NetMHCIIpan-3.2 method.
The training binding data are partitioned in 5 files to be used for cross-validation. For instance does the train1 file contain training data, and test1 file test data for the first cross-validation partitioning. It is critical that this data partitioning is maintained.
The format for each of the files is
AAAGAEAGKATTEEQ 0.190842 DRB1_0101 AAAGAEAGKATTEEQ 0.006301 DRB1_0301 AAAGAEAGKATTEEQ 0.066851 DRB1_0401 AAAGAEAGKATTEEQ 0.006344 DRB1_0405 AAAGAEAGKATTEEQ 0.035130 DRB1_0701 AAAGAEAGKATTEEQ 0.006288 DRB1_0802 AAAGAEAGKATTEEQ 0.176268 DRB1_0901 AAAGAEAGKATTEEQ 0.042555 DRB1_1101 AAAGAEAGKATTEEQ 0.114855 DRB1_1302 AAAGAEAGKATTEEQ 0.006377 DRB1_1501
where the first column gives the peptide, the second column the log50k transformed binding affinity (i.e. 1 - log50k( aff nM)), and the last column the class II allele.
When classifying the peptides into binders and non-binders for calculation of the AUC values for instance, a threshold of 500 nM is used. This means that peptides with log50k transformed binding affinity values greater than 0.426 are classified as binders.
train1 (Train data) test1 (Test data)
train2 (Train data) test2 (Test data)
train3 (Train data) test3 (Test data)
train4 (Train data) test4 (Test data)
train5 (Train data) test5 (Test data)
The format is
>0705172A=AAHAEINEA=H2-IAb 385 gi|223299|prf||0705172A GSIGAASMEFCFDVFKELKVHHANENIFYCPIAIMSALAMVYLGAKDSTRTQINKVVRFD KLPGFGDSIEAQCGTSVNVHSSLRDILNQITKPNDVYSFSLASRLYAEERYPILPEYLQC VKELYRGGLEPINFQTAADQARELINSWVESQTNGIIRNVLQPSSVDSQTAMVLVNAIVF KGLWEKAFKDEDTQAMPFRVTEQESKPVQMMYQIGLFRVASMASEKMKILELPFASGTMS MLVLLPDEVSGLEQLESIINFEKLTEWTSSNVMEERKIKVYLPRMKMEEKYNLTSVLMAM GITDVFSSSANLSGISSAESLKISQAVHAAHAEINEAGREVVGSAEAGVDAASVSEEFRA DHPFLFCIKHIATNAVLFFGRCVSP
where the first part of the fasta header contains the proteinID (0705172A), the epitope (AAHAEINEA), and the MHC restriction (H2-IAb)