Supplementary material


Training data


NetMHCIIpan-4.0

Here, you will find the data set used for training of NetMHCIIpan-4.1.

NetMHCIIpan_train.tar.gz

Download the file and untar the content using

cat NetMHCIIpan_train.tar.gz | uncompress | tar xvf -

This will creat the directory called NetMHCIIpan_train. In this directory you will find 12 files. 10 files (c00?_ba, c00?_el) with partitions with binding affinity (ba) with eluted ligand data (el). The format for each file is (here shown for an el file)

AAAAAAAAAAAAA 1 Bergseng__9037_SWEIG AGRAAAAAAAAG
AAAAAAAAAAAAA 1 Bergseng__9064_AMALA AGRAAAAAAAAG
AAAAAAAAAAAAA 1 Bergseng__9089_BOB AGRAAAAAAAAG
AAAAAAAAAAAAAA 1 Bergseng__9037_SWEIG AGRAAAAAAAGA
AAAAAAAAAAAAAA 1 Bergseng__9064_AMALA AGRAAAAAAAGA
AAAAAAAAAAAAAA 1 Bergseng__9089_BOB AGRAAAAAAAGA
AAAAAAAAAAAAAAA 1 Bergseng__9037_SWEIG AGRAAAAAAGAG
AAAAAAAAAAAAAAA 1 Bergseng__9064_AMALA AGRAAAAAAGAG
AAAAAAAAAAAAAAA 1 Bergseng__9089_BOB AGRAAAAAAGAG
AAAAAAAAAAAAAAAAAAAAA 1 Abelin__MAPTAC_HLA_DQB10602_DQA10102 KHPAAAAAAYYQ
where the different columns are peptide, target value, MHC_molecule/cell-line, and context. In cases where the 3rd columns is a cell-line ID, the MHC molecules expressed in the cell-line are listed in the allelelist.txt file.

The allelelist.txt file contains the information about alleles expressed in each MA cell-line data set, and pseudosequence.2016.all.X.dat the MHC pseudo sequenes for each MHC molecule.