NetMHCpan - 2.8

Pan-specific binding of peptides to MHC class I alleles of known sequence

NetMHCpan server predicts binding of peptides to any known MHC molecule using artificial neural networks (ANNs). The method is trained on more than 150,000 quantitative binding data covering more than 150 different MHC molecules. Predictions can be made for HLA-A, B, C, E and G alleles, as well as for non-human primates, mouse, Cattle and pig. Further, the user can upload full length MHC protein sequences, and have the server predict MHC restricted peptides from any given protein of interest.

Version 2.8 has been retrained on extented data set including 10 prevalent HLA-C and 7 prevalent BoLA MHC-I molecules.

Predictions can be made for 8-14 mer peptides. Note, that all non 9mer predictions are made using approximations. Most HLA molecules have a strong preference for binding 9mers.

The prediction values are given in nM IC50 values and as %-Rank to a set of 200.000 random natural peptides. For alleles distant to the MHC molecules included in the training of the method, only the Rank score is provided.

The project is a collaboration between CBS, IMMI at Copenhagen University and LIAI.

Link to table (tab seperated) describing the training data Training data table

As of July 8th, the nomenclature for BoLA-I has been updated to follow IPD Release 1.3.

Submission

Type of input

Paste a single sequence or several sequences in FASTA format into the field below:

or submit a file in FASTA format directly from your local disk:

Peptide length (several lengths are possible):

Select species/loci

Select Allele (max 20 per submission) or type allele names (ie HLA-A01:01) separated by commas (and no spaces). Max 20 alleles per submission).

For list of allowed allele names click here List of MHC allele names.

or paste a single full length MHC protein sequence in FASTA format into the field below:

or submit a file containing a full length MHC protein sequence in FASTA format directly from your local disk:

Threshold for strong binder: % Rank OR IC50
Threshold for weak binder: % Rank OR IC50

Sort by affinity

Include IC50 prediction value for all alleles (default is for white-listed alleles only)

Save prediction to xls file

Restrictions:
At most 5000 sequences per submission; each sequence not more than 20,000 amino acids and not less than 8 amino acids. Max 20 MHC alleles per submission.

Confidentiality:
The sequences are kept confidential and will be deleted after processing.

CITATIONS

For publication of results, please cite:

NetMHCpan - MHC class I binding prediction beyond humans
Ilka Hoof, Bjoern Peters, John Sidney, Lasse Eggers Pedersen, Ole Lund, Soren Buus, and Morten Nielsen
PMID: 19002680
Full text
NetMHCpan, a method for quantitative predictions of peptide binding to any HLA-A and -B locus protein of known sequence.
Nielsen M, Lundegaard C, Blicher T, Lamberth K, Harndahl M, Justesen S, Roeder G, Peters B, Sette A, Lund O, Buus S.
PMID: 17726526
Full text

DATA RESOURCES

Data resources used to develop this server was obtained from

IEDB database.
- Quantitative peptide binding data were obtained from the IEDB database.
IMGT/HLA database. Robinson J, Malik A, Parham P, Bodmer JG, Marsh SGE: IMGT/HLA - a sequence database for the human major histocompatibility complex. Tissue Antigens (2000), 55:280-287.
- HLA protein sequences were obtained from the IMGT/HLA database (version 3.1.0).

Usage instructions

1. Specify the input sequences

All the input sequences must be in one-letter amino acid code. The allowed alphabet (not case sensitive) is as follows:

A C D E F G H I K L M N P Q R S T V W Y and X (unknown)

All the other symbols will be converted to X before processing.

The server allows for input in either FASTA or PEPTIDE format.

Note that for Peptide input, all peptides MUST of equal length. Note also, that you must click the box Click if input is PEPTIDE format if the input is in peptide format.

The sequences can be input in the following two ways:

Paste a single sequence (just the amino acids) or a number of sequences in FASTA format or a list of peptides into the upper window of the main server page.
Select a FASTA or PEPTIDE file on your local disk, either by typing the file name into the lower window or by browsing the disk.

Both ways can be employed at the same time: all the specified sequences will be processed. However, there may be not more than 10 sequences in total in one submission. The sequences shorter than 15 or longer than 10000 amino acids will be ignored.

2. Customize your run

Select the allele(s) you want to make predictions for from the scroll-down menu (select multiple alleles using the ctrl key), or type in the allele names separated by commas (with out blank spaces).

Give threshold value for binding values to be displayed.

Click the box Sort by affinity to have the output sorted by descending predicted binding affinity

Click the box save prediction to xls file to save the raw prediction output to an excel file. This file will be available in the bottum of the results output file.

3. Submit the job

Click on the "Submit" button. The status of your job (either 'queued' or 'running') will be displayed and constantly updated until it terminates and the server output appears in the browser window.

At any time during the wait you may enter your e-mail address and simply leave the window. Your job will continue; you will be notified by e-mail when it has terminated. The e-mail message will contain the URL under which the results are stored; they will remain on the server for 24 hours for you to collect them.

Output format

DESCRIPTION

The prediction output consists of 8 columns.

Residue number

HLA Allele

Peptide sequence

Protein identifier

Prediction score (called 1-log50K(aff))

Affinity as IC50 value in nM (only for white-listed alleles)

%Random - %Rank of prediction score to a set of 200.000 random natural 9mer peptides

Binding level (SB: strong binder, WB: weak binder)

EXAMPLE OUTPUT

# NetMHCpan version 2.8 # Input is in FSA format HLA-A0101 : Estimated prediction accuracy 0.811 (using nearest neighbor HLA-A0101) # Threshold for Strong binding peptides 50.000 # Threshold for Weak binding peptides 500.000 ----------------------------------------------------------------------------------- pos HLA peptide Identity 1-log50k(aff) Affinity(nM) %Random BindLevel ----------------------------------------------------------------------------------- 0 HLA-A*0101 ASQKRPSQR seq2_optional_c 0.063 25230.98 32.00 1 HLA-A*0101 SQKRPSQRH seq2_optional_c 0.023 38824.58 50.00 2 HLA-A*0101 QKRPSQRHG seq2_optional_c 0.003 48254.07 50.00 3 HLA-A*0101 KRPSQRHGS seq2_optional_c 0.009 45287.57 50.00

Article abstract

Main reference:

NetMHCpan - MHC class I binding prediction beyond humans
Hoof I¹, Peter B³, Sidney J³, Pedersen LE²
Lund O¹, Buus S², Nielsen M¹,
Immunogenetics. 2009 Jan;61(1):1-13.

¹Center for Biological Sequence Analysis, Technical University of Denmark, DK-2800 Lyngby, Denmark
²Division of Experimental Immunology, Institute of Medical Microbiology and Immunology, University of Copenhagen, Denmark
³La Jolla Institute for Allergy and Immunology, San Diego, California, United States of America

Binding of peptides to major histocompatibility complex (MHC) molecules is the single most selective step in the recognition of pathogens by the cellular immune system. The human MHC genomic region (called HLA) is extremely polymorphic comprising several thousand alleles, each encoding a distinct MHC molecule. The potentially unique specificity of the majority of HLA alleles that have been identified to date remains uncharacterized. Likewise, only a limited number of chimpanzee and rhesus macaque MHC class I molecules have been characterized experimentally. Here, we present NetMHCpan-2.0, a method that generates quantitative predictions of the affinity of any peptide-MHC class I interaction. NetMHCpan-2.0 has been trained on the hitherto largest set of quantitative MHC binding data available, covering HLA-A and HLA-B, as well as chimpanzee, rhesus macaque, gorilla, and mouse MHC class I molecules. We show that the NetMHCpan-2.0 method can accurately predict binding to uncharacterized HLA molecules, including HLA-C and HLA-G. Moreover, NetMHCpan-2.0 is demonstrated to accurately predict peptide binding to chimpanzee and macaque MHC class I molecules. The power of NetMHCpan-2.0 to guide immunologists in interpreting cellular immune responses in large out-bred populations is demonstrated. Further, we used NetMHCpan-2.0 to predict potential binding peptides for the pig MHC class I molecule SLA-1*0401. Ninety-three percent of the predicted peptides were demonstrated to bind stronger than 500 nM. The high performance of NetMHCpan-2.0 for non-human primates documents the method's ability to provide broad allelic coverage also beyond human MHC molecules. The method is available at http://www.cbs.dtu.dk/services/NetMHCpan.

PMID: 19002680

Full text

Supplementary material

NetMHCpan - MHC class I binding prediction beyond humans

Here, you will find the data set used for evaluation in the above paper. The data falls in five parts a) Non-human primates, b) HLA-A and HLA-B ligand c) HLA-E ligands d) SYFPEITHI HLA-C ligands, and e) SYFPEITHI HLA-G ligands.

a) Non-human primates. The format for data is

Allele  Peptide log50k
Mamu-A01 YPPMMCYFL 1.0
Mamu-A01 NSPLHCYTM 1.0
Mamu-A01 ITPQPVPTA 0.482918
Mamu-A01 LTPIFSDLL 0.790002
Mamu-A01 GSPTNLEFI 0.634812
Mamu-A01 DSPHYVPIL 0.682619
Mamu-A01 TLPELNLSL 0.787187
Mamu-A01 ASPRIGDQL 0.945674
Mamu-A01 FSPFKLNLI 1.0
Mamu-A01 MIPLLFILF 0.911688

where the first column gives the allele, and the second column gives the peptide and the last column the log50k transformed binding affinity (i.e. 1 - log50k( aff nM)).

When classifying the peptides into binders and non-binders, a threshold of 500 nM is used. This means that peptides with log50k transformed binding affinity values greater than 0.426 are classified as binders.

b) HLA-A and HLA-B ligands. The file contains 596 HLA-A and HLA-B ligands downloaded from the SYFPEITHI database. The FASTA header for each entry has the format

>uniprot|A8KA43 227 KRFGKAYNL B2705

where the first column is the protein identifier, the second column is the location of the HLA ligand in the protein sequence, the third column is the HLA ligand, and the last column is the HLA restriction.

c) HLA-E ligands. The file contains seven HLA-E ligands downloaded from the IEDB database. All ligands are frm the same source protein, and the file contains all 9mer peptides form the source protein (P0A1D4) with the ligands annotated with the value 1 and all other peptides with the value 0. The format of the data is

MAAKDVKFG 0
AAKDVKFGN 0
AKDVKFGND 0
KDVKFGNDA 0
DVKFGNDAR 0
VKFGNDARV 0
KFGNDARVK 0
FGNDARVKM 0
GNDARVKML 0
NDARVKMLR 0

d) HLA-C ligands. The file contains the source proteins for 77 HLA-C ligands from the SYFPEITHI database in FASTA format. The FASTA header for each entry has the format

>gnl|BL_ORD_ID|54508    244     FAPYNKPSL Cw0102

e) HLA-G (HLA-G*0101) ligands. The file contains the source proteins for 11 HLA-G ligands from the SYFPEITHI database in FASTA format. The FASTA header for each entry has the format

>sp|P49327|FAS_HUMAN    751     HVPEHAVVL

where the first column is the protein identifier, the second column is the location of the HLA ligand in the protein sequence, and the third column is the HLA ligand.

a) Non-human primates. NOTE. This data set has been updated Aug 12. 2009, so that it now corresponds to the data presented in the NetMHCpan-2.0 publication
b) HLA-A and HLA-B ligands
c) HLA-E ligands
d) HLA-C SYFPEITHI ligands
e) HLA-G SYFPEITHI ligands

References

Ilka Hoof, Bjoern Peters, John Sidney, Lasse Eggers Pedersen, Ole Lund, Soren Buus, and Morten Nielsen
NetMHCpan - a method for MHC class I binding prediction beyond humans

Version history

Please click on the version number to activate the corresponding server.

2.8	The current server (online since 26 Feb 2013). New in this version: Method retrained on extented data set including 10 prevalent HLA-C and 7 prevalent BoLA MHC-I molecules.
2.4	The current server (online since 18 Dec 2010). New in this version: Method retrained on extented data set including several HLA-C allele, and two BoLA alleles.
2.3	The current server (online since 08 Sept 2010). New in this version: Method retrained on the version 2.2 data excluding data from the Mamu-A1*02601 allele due to data contamination. Also, the method has been updated to include the newest MHC allele releases from the IMGT/HLA and IPD-MHC databases (for non-human primates and pig). These updates include incoportation of the new nomemclature for HLA and Rhesus macaque (Mamu).
2.2	The current server (online since 01 Sept 2009). New in this version: Method retrained on an extented data covering more than 100 MHC alleles and more than 110,000 peptide/MHC interactions.
2.1	The current server (online since 06 April 2009). New in this version: Predicted binding score are shown as percent rank to a pool of 1000.000 random natural 9mer peptides. IC50 values are only shown for a set of white-listed alleles (mostly HLA-A and HLA-B alleles) where the values can be relied on.
2.0	The current server (online since 24 June, 2008). New in this version: Binding predictions for all known MHC molecules including HLA-C, non-classical HLA (HLA-E and HLA-G), non-human primates, pig and mouse. Prediction of performamce accuracy Main publication: NetMHCpan - MHC class I binding prediction beyond humans Ilka Hoof, Bjoern Peters, John Sidney, Lasse Eggers Pedersen, Ole Lund, Soren Buus, and Morten Nielsen Submitted.
1.1	Online from March 2007 til 24 of June, 2008). New in this version: Includes prediction of peptides length 8-11 .
1.0	Original version (online version until March 2008, 2006): Main publication: NetMHCpan, a Method for Quantitative Predictions of Peptide Binding to Any HLA-A and -B Locus Protein of Known Sequence. Nielsen M, et al. (2007) PLoS ONE 2(8): e796. doi:10.1371/journal.pone.0000796 View the abstract, the full text version at PLoSONE: Full text, or Full text including supplementary materials: PDF_fulltext.pdf

Software Downloads

Version 4.2cstatic

Linux

Version 4.2c

Version 4.1b

Linux
Darwin

Version 4.0a

Linux
Darwin

Version 3.0a

Linux
Darwin

Version 2.8a

Version 2.8

Version 2.4a

GETTING HELP

If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Correspondence: Technical Support: