Here, you will find the data set used for training of NetMHCpan-4.1.
Download the file and untar the content using
cat NetMHCpan_train.tar.gz | uncompress | tar xvf -
This will creat the directory called NetMHCpan_train. In this directory you will find 12 files. 5 files (c00?_ba) with partitions with binding affinity data, and 5 files (c00?_el) with eluted ligand data. The format for each file is
TEAARELGY 1 HLA-B44:03 ATDYPLIAR 1 pat-FL RQPDSGISSI 1 pat-NS2 SVDIDSEL 1 Line.27 DGDEDLPGPPVRYY 1 HLA-A01:01 MPSNSVQLAY 1 Line.34 EEVKLIKKM 1 MAVER-1 TQREFMLSF 1 RPMI8226 VLLPKKTESHHK 1 A10 LKNPVRIFV 1 A20-A20where the different columns are peptide, target value, and MHC_molecule/cell-line. In cases where the 3rd columns is a cell-line ID, the MHC molecules expressed in the cell-line are listed in the allelelist file.
The allelelist file contains the information about alleles expressed in each MA cell-line data set, and MHC_pseudo.dat the MHC pseudo sequenes for each MHC molecule.
Here, you will find the data set used for training of NetMHCIIpan-4.0.
Download the file and untar the content using
cat NetMHCIIpan_train.tar.gz | uncompress | tar xvf -
This will creat the directory called NetMHCIIpan_train. In this directory you will find 22 files. 10 files (train_BA?.txt, test_BA?.txt) with partitions with binding affinity data, and 10 files (train_EL?.txt, test_EL?.txt) with eluted ligand data. The format for each file is
AAASVPAADKFKTFE 0.203668 HLA-DPA10103-DPB10201 XXXAAATFEXXX APEVKYTVFETALKK 0.838333 HLA-DPA10103-DPB10201 XXXAPELKKXXX ATFEAMYLGTCKTLT 0.325328 HLA-DPA10103-DPB10201 XXXATFTLTXXX AVWVDGKARTAWVDS 0.14783 HLA-DPA10103-DPB10201 XXXAVWVDSXXX ELYYAIYKASPTLAF 0.617078 HLA-DPA10103-DPB10201 XXXELYLAFXXX ENVIDVKLVDANGKL 0.173508 HLA-DPA10103-DPB10201 XXXENVGKLXXXwhere the different columns are peptide, target value, MHC_molecule/cell-line, and context. In cases where the 3rd columns is a cell-line ID, the MHC molecules expressed in the cell-line are listed in the allelelist.txt file.
The allelelist.txt file contains the information about alleles expressed in each MA cell-line data set, and pseudosequence.2016.all.X.dat the MHC pseudo sequenes for each MHC molecule.
Here, you will find the data set used for evaluation of NetMHCpan-4.1 and NetMHCIIpan-4.0 methods.
CD8_epitopes.fsa
CD8_benchmark.tar.gz (Tar file with filtered CD8 epitope benchmark)
CD8_benchmark_pred.tar.gz (Tar file with prediction files for CD8 epitope benchmark)
The CD8_benchmark.tar.gz file contains a directory with 1660 files (one for each CD8 epitope included in the benckmark) with negatives wth overlap to the training data remove. The filename refer to the epitope-HLA conbination, and the source protein sequence can be found by a look up in the included CD8_mapped file. The CD8_benchmark_pred.tar.gz file contains the raw prediction files for the CD8 epitope benchmark. The target value is found in column 14, and the raw prediction score in column 12.
HLA-A02:02
HLA-A02:05
HLA-A02:06
HLA-A02:11
HLA-A11:01
HLA-A23:01
HLA-A25:01
HLA-A26:01
HLA-A30:01
HLA-A30:02
HLA-A32:01
HLA-A33:01
HLA-A66:01
HLA-A68:01
HLA-B07:02
HLA-B08:01
HLA-B14:02
HLA-B15:01
HLA-B15:02
HLA-B15:03
HLA-B15:17
HLA-B18:01
HLA-B35:03
HLA-B37:01
HLA-B38:01
HLA-B40:01
HLA-B40:02
HLA-B45:01
HLA-B46:01
HLA-B53:01
HLA-B58:01
HLA-C03:03
HLA-C05:01
HLA-C07:02
HLA-C08:02
HLA-C12:03
NetMHCpan-4.1 and NetMHCIIpan-4.0: Improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data
Submitted 2020.