Training and Evaluation Data

NN-align. A neural network-based alignment algorithm for MHC class II peptide binding prediction.

Here, you will find the data set used for training and evaluation of the NN-align method. Fourteen HLA-DR and four mouse class II alleles are included in the benchmark. Follwing the links below you will be directed to a directory containing the data for each allele. Each directory contains 6 data files. The files c000, c001, c002, c003, and c004 contain the split datafile used for cross validation. If for instance the file c004 is used as evaulation set, the other four file c000, c001, c002, and c003 are used as training date. The file all contains all data (i.e. cat c00?).

The format for each of the files (c00?, all) is

ACRVKHDSMAEPKTVY 0.227054
AKRVVRDPQGIRAWV 0.024247
AQFMWIIRKRIQLP 0.803966
ATSTKKLHKEPATLIKAIDG 0.000000
AWVAWRNRCK 0.340978
CYVSGFHPSDIEVDLL 0.047212
DGKTPRAVNACGIN 0.000000
ERAEAWRQKLHGRL 0.614743

where the first column gives the peptide sequence, and the second column the log50k transformed binding affinity (i.e. 1 - log50k( aff nM)).

When classifying the peptides into binders and non-binders, a threshold of 500 nM is used. This means that peptides with log50k transformed binding affinity values greater than 0.426 are classified as binders.

DRB1*0101 datasets
DRB1*0301 datasets
DRB1*0401 datasets
DRB1*0404 datasets
DRB1*0405 datasets
DRB1*0701 datasets
DRB1*0802 datasets
DRB1*0901 datasets
DRB1*1101 datasets
DRB1*1302 datasets
DRB1*1501 datasets
DRB3*0101 datasets
DRB4*0101 datasets
DRB5*0101 datasets
H2-IAb datasets
H2-IAd datasets
H2-IAs datasets

References

Morten Nielsen NN-align. A neural network-based alignment algorithm for MHC class II peptide binding prediction.