Here, you will find the data set used for training and evaluation of the NN-align method. Fourteen HLA-DR and four mouse class II alleles are included in the benchmark. Follwing the links below you will be directed to a directory containing the data for each allele. Each directory contains 6 data files. The files c000, c001, c002, c003, and c004 contain the split datafile used for cross validation. If for instance the file c004 is used as evaulation set, the other four file c000, c001, c002, and c003 are used as training date. The file all contains all data (i.e. cat c00?).
The format for each of the files (c00?, all) is
ACRVKHDSMAEPKTVY 0.227054 AKRVVRDPQGIRAWV 0.024247 AQFMWIIRKRIQLP 0.803966 ATSTKKLHKEPATLIKAIDG 0.000000 AWVAWRNRCK 0.340978 CYVSGFHPSDIEVDLL 0.047212 DGKTPRAVNACGIN 0.000000 ERAEAWRQKLHGRL 0.614743
where the first column gives the peptide sequence, and the second column the log50k transformed binding affinity (i.e. 1 - log50k( aff nM)).
When classifying the peptides into binders and non-binders, a threshold of 500 nM is used. This means that peptides with log50k transformed binding affinity values greater than 0.426 are classified as binders.