Supplementary material


Training data

NetMHCpan-4.1

Here, you will find the data set used for training of NetMHCpan-4.1.

NetMHCpan_train.tar.gz

Download the file and untar the content using

cat NetMHCpan_train.tar.gz | uncompress | tar xvf -

This will creat the directory called NetMHCpan_train. In this directory you will find 12 files. 5 files (c00?_ba) with partitions with binding affinity data, and 5 files (c00?_el) with eluted ligand data. The format for each file is

TEAARELGY 1 HLA-B44:03
ATDYPLIAR 1 pat-FL
RQPDSGISSI 1 pat-NS2
SVDIDSEL 1 Line.27
DGDEDLPGPPVRYY 1 HLA-A01:01
MPSNSVQLAY 1 Line.34
EEVKLIKKM 1 MAVER-1
TQREFMLSF 1 RPMI8226
VLLPKKTESHHK 1 A10
LKNPVRIFV 1 A20-A20
where the different columns are peptide, target value, and MHC_molecule/cell-line. In cases where the 3rd columns is a cell-line ID, the MHC molecules expressed in the cell-line are listed in the allelelist file.

The allelelist file contains the information about alleles expressed in each MA cell-line data set, and MHC_pseudo.dat the MHC pseudo sequenes for each MHC molecule.


NetMHCIIpan-4.0

Here, you will find the data set used for training of NetMHCIIpan-4.0.

NetMHCIIpan_train.tar.gz

Download the file and untar the content using

cat NetMHCIIpan_train.tar.gz | uncompress | tar xvf -

This will creat the directory called NetMHCIIpan_train. In this directory you will find 22 files. 10 files (train_BA?.txt, test_BA?.txt) with partitions with binding affinity data, and 10 files (train_EL?.txt, test_EL?.txt) with eluted ligand data. The format for each file is

AAASVPAADKFKTFE 0.203668 HLA-DPA10103-DPB10201 XXXAAATFEXXX 
APEVKYTVFETALKK 0.838333 HLA-DPA10103-DPB10201 XXXAPELKKXXX 
ATFEAMYLGTCKTLT 0.325328 HLA-DPA10103-DPB10201 XXXATFTLTXXX 
AVWVDGKARTAWVDS  0.14783 HLA-DPA10103-DPB10201 XXXAVWVDSXXX 
ELYYAIYKASPTLAF 0.617078 HLA-DPA10103-DPB10201 XXXELYLAFXXX 
ENVIDVKLVDANGKL 0.173508 HLA-DPA10103-DPB10201 XXXENVGKLXXX 
where the different columns are peptide, target value, MHC_molecule/cell-line, and context. In cases where the 3rd columns is a cell-line ID, the MHC molecules expressed in the cell-line are listed in the allelelist.txt file.

The allelelist.txt file contains the information about alleles expressed in each MA cell-line data set, and pseudosequence.2016.all.X.dat the MHC pseudo sequenes for each MHC molecule.


Evaluation data

Here, you will find the data set used for evaluation of NetMHCpan-4.1 and NetMHCIIpan-4.0 methods.


NetMHCpan-4.1

CD8 Epitope data set

CD8_epitopes.fsa
CD8_benchmark.tar.gz (Tar file with filtered CD8 epitope benchmark)
CD8_benchmark_pred.tar.gz (Tar file with prediction files for CD8 epitope benchmark)

The CD8_benchmark.tar.gz file contains a directory with 1660 files (one for each CD8 epitope included in the benckmark) with negatives wth overlap to the training data remove. The filename refer to the epitope-HLA conbination, and the source protein sequence can be found by a look up in the included CD8_mapped file. The CD8_benchmark_pred.tar.gz file contains the raw prediction files for the CD8 epitope benchmark. The target value is found in column 14, and the raw prediction score in column 12.

MS Ligands

HLA-A02:02
HLA-A02:05
HLA-A02:06
HLA-A02:11
HLA-A11:01
HLA-A23:01
HLA-A25:01
HLA-A26:01
HLA-A30:01
HLA-A30:02
HLA-A32:01
HLA-A33:01
HLA-A66:01
HLA-A68:01
HLA-B07:02
HLA-B08:01
HLA-B14:02
HLA-B15:01
HLA-B15:02
HLA-B15:03
HLA-B15:17
HLA-B18:01
HLA-B35:03
HLA-B37:01
HLA-B38:01
HLA-B40:01
HLA-B40:02
HLA-B45:01
HLA-B46:01
HLA-B53:01
HLA-B58:01
HLA-C03:03
HLA-C05:01
HLA-C07:02
HLA-C08:02
HLA-C12:03

NetMHCIIpan-4.0

CD4_epitopes.fsa


References

NetMHCpan-4.1 and NetMHCIIpan-4.0: Improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data
Submitted 2020.