Supplementary material

Here, you will find the data set used for training and testing of the NetMHCpan-4.0 method.


Training data

The training binding data are partitioned in 5 files to be used for cross-validation. For instance do the f000_ba and f000_el files contain the binding affinity and eluted ligand training data, and the c000_ba and c000_el files the binding affinity and eluted ligand test data for the first cross-validation partitioning. It is critical that this data partitioning is maintained.

The format for each of the files is

ARWLASTPL 0.589395 BoLA-D18.4 85.0
ASYAAAAAY 0.496594 BoLA-D18.4 232.0
GMMGGLWKY 0.439136 BoLA-D18.4 432.0
KMFHGGLRY 0.898463 BoLA-D18.4 3.0
KMLEASTIY 0.75609 BoLA-D18.4 14.0
KQLEYSWVL 0.481554 BoLA-D18.4 273.0
KQWSWFSLL 0.451477 BoLA-D18.4 378.0
MMFDAMGAL 0.935937 BoLA-D18.4 2.0
MMMSTAVAF 0.762939 BoLA-D18.4 13.0
MTFPVSLEY 0.485003 BoLA-D18.4 263.0

where the first column gives the peptide, the second column the log50k transformed binding affinity (i.e. 1 - log50k( aff nM)) or 1/0 for the eluted ligangd data, and the third column the class I allele.

When classifying BA peptides into binders and non-binders for calculation of the AUC values for instance, a threshold of 500 nM is used. This means that peptides with log50k transformed binding affinity values greater than 0.426 are classified as binders.

BA data

f000_ba (Train data) c000_ba (Test data)
f001_ba (Train data) c001_ba (Test data)
f002_ba (Train data) c002_ba (Test data)
f003_ba (Train data) c003_ba (Test data)
f004_ba (Train data) c004_ba (Test data)

EL data

f000_el (Train data) c000_el (Test data)
f001_el (Train data) c001_el (Test data)
f002_el (Train data) c002_el (Test data)
f003_el (Train data) c003_el (Test data)
f004_el (Train data) c004_el (Test data)

References

NetMHCpan, a method for MHC class I binding prediction beyond humans
Ilka Hoof, Bjoern Peters, John Sidney, Lasse Eggers Pedersen, Ole Lund, Soren Buus, and Morten Nielsen
Immunogenetics 61.1 (2009): 1-13
PMID: 19002680   Full text