Here, you will find the data set used for training of NetMHCIIpan-4.1.
Download the file and untar the content using
cat NetMHCIIpan_train.tar.gz | uncompress | tar xvf -
This will creat the directory called NetMHCIIpan_train. In this directory you will find 12 files. 10 files (c00?_ba, c00?_el) with partitions with binding affinity (ba) with eluted ligand data (el). The format for each file is (here shown for an el file)
AAAAAAAAAAAAA 1 Bergseng__9037_SWEIG AGRAAAAAAAAG AAAAAAAAAAAAA 1 Bergseng__9064_AMALA AGRAAAAAAAAG AAAAAAAAAAAAA 1 Bergseng__9089_BOB AGRAAAAAAAAG AAAAAAAAAAAAAA 1 Bergseng__9037_SWEIG AGRAAAAAAAGA AAAAAAAAAAAAAA 1 Bergseng__9064_AMALA AGRAAAAAAAGA AAAAAAAAAAAAAA 1 Bergseng__9089_BOB AGRAAAAAAAGA AAAAAAAAAAAAAAA 1 Bergseng__9037_SWEIG AGRAAAAAAGAG AAAAAAAAAAAAAAA 1 Bergseng__9064_AMALA AGRAAAAAAGAG AAAAAAAAAAAAAAA 1 Bergseng__9089_BOB AGRAAAAAAGAG AAAAAAAAAAAAAAAAAAAAA 1 Abelin__MAPTAC_HLA_DQB10602_DQA10102 KHPAAAAAAYYQwhere the different columns are peptide, target value, MHC_molecule/cell-line, and context. In cases where the 3rd columns is a cell-line ID, the MHC molecules expressed in the cell-line are listed in the allelelist.txt file.
The allelelist.txt file contains the information about alleles expressed in each MA cell-line data set, and pseudosequence.2016.all.X.dat the MHC pseudo sequenes for each MHC molecule.