Here, you will find the data set used for training and evaluation of the NN-align method. Fourteen HLA-DR and four mouse class II alleles are included in the benchmark. Follwing the links below you will be directed to a directory containing the data for each allele. Each directory contains 6 data files. The files c000, c001, c002, c003, and c004 contain the split datafile used for cross validation. If for instance the file c004 is used as evaulation set, the other four file c000, c001, c002, and c003 are used as training date. The file all contains all data (i.e. cat c00?).
The format for each of the files (c00?, all) is
ACRVKHDSMAEPKTVY 0.227054 AKRVVRDPQGIRAWV 0.024247 AQFMWIIRKRIQLP 0.803966 ATSTKKLHKEPATLIKAIDG 0.000000 AWVAWRNRCK 0.340978 CYVSGFHPSDIEVDLL 0.047212 DGKTPRAVNACGIN 0.000000 ERAEAWRQKLHGRL 0.614743
where the first column gives the peptide sequence, and the second column the log50k transformed binding affinity (i.e. 1 - log50k( aff nM)).
When classifying the peptides into binders and non-binders, a threshold of 500 nM is used. This means that peptides with log50k transformed binding affinity values greater than 0.426 are classified as binders.
DRB1*0101 datasets
DRB1*0301 datasets
DRB1*0401 datasets
DRB1*0404 datasets
DRB1*0405 datasets
DRB1*0701 datasets
DRB1*0802 datasets
DRB1*0901 datasets
DRB1*1101 datasets
DRB1*1302 datasets
DRB1*1501 datasets
DRB3*0101 datasets
DRB4*0101 datasets
DRB5*0101 datasets
H2-IAb datasets
H2-IAd datasets
H2-IAs datasets