NetMHCpan - 4.0

Pan-specific binding of peptides to MHC class I alleles of known sequence

Prediction of peptide-MHC class I binding using artificial neural networks (ANNs).

New in this version: the method is trained on naturally eluted ligands AND on binding affinity data. It returns two properties: either the likelihood of a peptide becoming a natural ligands, or the predicted binding affinity.

NetMHCpan server predicts binding of peptides to any MHC molecule of known sequence using artificial neural networks (ANNs). The method is trained on a combinatino of more than 180,000 quantitative binding data and MS derived MHC eluted ligands. The binding affinity data covers 172 MHC molecules from human (HLA-A, B, C, E), mouse (H-2), cattle (BoLA), primates (Patr, Mamu, Gogo) and swine (SLA). The MS eluted ligand data covers 55 HLA and mouse allelee. Furthermore, the user can obtain redictions to the any custom MHC class I molecule by uploading a full length MHC protein sequence.

Predictions can be made for peptides of any length.

11-03-2019: Server updated to have precalculated percentile rank values and binding motifs for all HLA alleles included in the latest IMGT HLA release

The project is a collaboration between CBS, ISIM, and LIAI.

Submission

Hover the mouse cursor over the symbol for a short description of the options

Type of input

Paste a single sequence or several sequences in FASTA format into the field below:

or submit a file in FASTA format directly from your local disk:

Peptide length (you may select multiple lengths):

Select species/loci

Select Allele (max 20 per submission)

or type allele names (ie HLA-A01:01) separated by commas (and no spaces). Max 20 alleles per submission).

For list of allowed allele names click here List of MHC allele names.

or paste a single full length MHC protein sequence in FASTA format into the field below:

or submit a file containing a full length MHC protein sequence in FASTA format directly from your local disk:

Threshold for strong binder: % Rank
Threshold for weak binder: % Rank

Make BA predictions

Sort by prediction score

Save predictions to XLS file

Restrictions:
At most 5000 sequences per submission; each sequence not more than 20,000 amino acids and not less than 8 amino acids. Max 20 MHC alleles per submission.

Confidentiality:
The sequences are kept confidential and will be deleted after processing.

CITATIONS

For publication of results, please cite:

NetMHCpan-4.0: Improved Peptide–MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data
Vanessa Jurtz, Sinu Paul, Massimo Andreatta, Paolo Marcatili, Bjoern Peters and Morten Nielsen
The Journal of Immunology (2017) ji1700893; DOI: 10.4049/jimmunol.1700893
Full text [PDF]
NetMHCpan-3.0: improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length data sets
Morten Nielsen and Massimo Andreatta
Genome Medicine (2016): 8:33
Full text [PDF]
NetMHCpan, a method for MHC class I binding prediction beyond humans
Ilka Hoof, Bjoern Peters, John Sidney, Lasse Eggers Pedersen, Ole Lund, Soren Buus, and Morten Nielsen
Immunogenetics 61.1 (2009): 1-13
PMID: 19002680 Full text

DATA RESOURCES

Data resources used to develop this server was obtained from

IEDB database.
- Quantitative peptide binding data were obtained from the IEDB database.
IMGT/HLA database. Robinson J, Malik A, Parham P, Bodmer JG, Marsh SGE: IMGT/HLA - a sequence database for the human major histocompatibility complex. Tissue Antigens (2000), 55:280-287.
- HLA protein sequences were obtained from the IMGT/HLA database (version 3.1.0).

Instructions

1. Specify the input sequences

All the input sequences must be in one-letter amino acid code. The alphabet is as follows (case sensitive):

A C D E F G H I K L M N P Q R S T V W Y and X (unknown)

Any other symbol will be converted to X before processing.

The server allows for input in either FASTA or PEPTIDE format.

Sequences can be submitted in the following two formats:

Paste a single sequence (just the amino acids) or a number of sequences in FASTA format or a list of peptides into the upper window of the main server page.
Select a FASTA or PEPTIDE file on your local disk, either by typing the file name into the lower window or by browsing the disk.

At most 5000 sequences per submission; each sequence not more than 20,000 amino acids and not less than 8 amino acids.

2. Customize your run

1. Specify peptide length (only for FASTA input). By default input proteins are digested into 9-mer peptides.

2. Select species/loci from the scroll-down menu.

3. Select allele(s) from the scroll-down menu or type in the allele names separated by commas (without blank spaces). If you choose to type in the allele names, you can consult the List of MHC molecule names.; use the molecule names in the first column.

4. Optionally specify thresholds for strong and weak binders. They are expressed in terms of %Rank, that is percentile of the predicted binding affinity compared to the distribution of affinities calculated on set of random natural peptides. The peptide will be identified as a strong binder if it is found among the top x% predicted peptides, where x% is the specified threshold for strong binders (by default 0.5%). The peptide will be identified as a weak binder if the % Rank is above the threshold of the strong binders but below the specified threshold for the weak binders (by default 2%).

5. Tick the box Sort by affinity to have the output sorted by descending predicted binding affinity.

3. Submit the job

Click on the "Submit" button. The status of your job (either 'queued' or 'running') will be displayed and constantly updated until it terminates and the server output appears in the browser window.

At any time during the wait you may enter your e-mail address and simply leave the window. Your job will continue; when it terminates you will be notified by e-mail with a URL to your results. They will be stored on the server for 24 hours.

4. Output

A description of the output format can be found in the output format tab.

Output format

DESCRIPTION

The prediction output for each molecule consists of the following columns:

Pos Residue number (starting from 0)

HLA Molecule/allele name

Peptide Amino acid sequence of the potential ligand

Core The minimal 9 amino acid binding core directly in contact with the MHC

Of The starting position of the Core within the Peptide (if > 0, the method predicts a N-terminal protrusion)

Gp Position of the deletion, if any.

Gl Length of the deletion.

Ip Position of the insertions, if any.

Il Length of the insertion.

Icore Interaction core. This is the sequence of the binding core including eventual insertions of deletions.

Identity Protein identifier, i.e. the name of the Fasta entry.

Score The raw prediction score

Aff(nM) Predicted binding affinity in nanoMolar units (if binding affinity predictions is selected).

%Rank Rank of the predicted affinity compared to a set of random natural peptides. This measure is not affected by inherent bias of certain molecules towards higher or lower mean predicted affinities. Strong binders are defined as having %rank<0.5, and weak binders with %rank<2. We advise to select candidate binders based on %Rank rather than nM Affinity

BindLevel (SB: strong binder, WB: weak binder). The peptide will be identified as a strong binder if the % Rank is below the specified threshold for the strong binders, by default 0.5%. The peptide will be identified as a weak binder if the % Rank is above the threshold of the strong binders but below the specified threshold for the weak binders, by default 2%.

NOTES

Peptide vs. iCore vs. Core

Three amino acid sequences are reported for each row of predictions:
The Peptide is the complete amino acid sequence evaluated by NetMHCpan. Peptides are the full sequences submitted as a peptide list, or the result of digestion of source proteins (Fasta submission)
The iCore is a substring of Peptide, encompassing all residues between P1 and P-omega of the MHC. For all intents and purposes, this is the minimal candidate ligand/epitope that should be considered for further validation.
The Core is always 9 amino acids long, and is a construction used for sequence aligment and identification of binding anchors.

EXAMPLE OUTPUT

Fasta input:

>Gag_180_209
TPQDLNTMLNTVGGHQAAMQMLKETINEEA

Peptide length: 8, 9, 10, 11, 12
Allele: HLA-A*0301
Toggle Sort by prediction score

will return the following predictions:


# NetMHCpan version 4.0

# Tmpdir made /usr/opt/www/webface/tmp/server/netmhcpan/59DBCCFF00005A84DAFF1311/netMHCpanVszuD8
# Input is in FSA format

# Peptide length 8,9,10,11,12

# Make Eluted ligand likelihood predictions

HLA-A03:01 : Distance to training data  0.000 (using nearest neighbor HLA-A03:01)

# Rank Threshold for Strong binding peptides   0.500
# Rank Threshold for Weak binding peptides   2.000
-----------------------------------------------------------------------------------
  Pos          HLA         Peptide       Core Of Gp Gl Ip Il        Icore        Identity     Score   %Rank  BindLevel
-----------------------------------------------------------------------------------
   15  HLA-A*03:01       HQAAMQMLK  HQAAMQMLK  0  0  0  0  0    HQAAMQMLK     Gag_180_209 0.5697290  0.2857 <= SB
   14  HLA-A*03:01      GHQAAMQMLK  GQAAMQMLK  0  1  1  0  0   GHQAAMQMLK     Gag_180_209 0.2137130  1.1582 <= WB
    7  HLA-A*03:01       TMLNTVGGH  TMLNTVGGH  0  0  0  0  0    TMLNTVGGH     Gag_180_209 0.0487720  3.0466
    8  HLA-A*03:01       MLNTVGGHQ  MLNTVGGHQ  0  0  0  0  0    MLNTVGGHQ     Gag_180_209 0.0319510  3.7842
   13  HLA-A*03:01     GGHQAAMQMLK  GQAAMQMLK  0  1  2  0  0  GGHQAAMQMLK     Gag_180_209 0.0313010  3.8215
   12  HLA-A*03:01    VGGHQAAMQMLK  VQAAMQMLK  0  1  3  0  0 VGGHQAAMQMLK     Gag_180_209 0.0166440  5.2079
   15  HLA-A*03:01      HQAAMQMLKE  HQAAMQMLK  0  0  0  0  0    HQAAMQMLK     Gag_180_209 0.0124970  5.9719
   16  HLA-A*03:01        QAAMQMLK  QAA-MQMLK  0  0  0  3  1     QAAMQMLK     Gag_180_209 0.0086270  7.1279
   21  HLA-A*03:01       MLKETINEE  MLKETINEE  0  0  0  0  0    MLKETINEE     Gag_180_209 0.0079270  7.4157
..
..
-----------------------------------------------------------------------------------

Protein Gag_180_209. Allele HLA-A*03:01. Number of high binders 1. Number of weak binders 1. Number of peptides 105

Link to Allele Frequencies in Worldwide Populations HLA-A03:01
-----------------------------------------------------------------------------------

References

Main reference

NetMHCpan-4.0: Improved Peptide–MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data

Vanessa Jurtz ¹, Sinu Paul ², Massimo Andreatta ³, Paolo Marcatili ¹, Bjoern Peters ², and Morten Nielsen^1,3

The Journal of Immunology (2017) ji1700893; DOI: 10.4049/jimmunol.1700893
Full text [PDF]

¹ Department of Bio and Health Informatics, Technical University of Denmark, DK-2800 Lyngby, Denmark
² Division of Vaccine Discovery, La Jolla Institute for Allergy and Immunology, CA92037 La Jolla, USA
³ Instituto de Investigaciones Biotecnologicas, Universidad Nacional de San Martin, Buenos Aires, Argentina

Cytotoxic T cells are of central importance in the immune system’s response to disease. They recognize defective cells by binding to peptides presented on the cell surface by MHC class I molecules. Peptide binding to MHC molecules is the single most selective step in the Ag-presentation pathway. Therefore, in the quest for T cell epitopes, the prediction of peptide binding to MHC molecules has attracted widespread attention. In the past, predictors of peptide–MHC interactions have primarily been trained on binding affinity data. Recently, an increasing number of MHC-presented peptides identified by mass spectrometry have been reported containing information about peptide-processing steps in the presentation pathway and the length distribution of naturally presented peptides. In this article, we present NetMHCpan-4.0, a method trained on binding affinity and eluted ligand data leveraging the information from both data types. Large-scale benchmarking of the method demonstrates an increase in predictive performance compared with state-of-the-art methods when it comes to identification of naturally processed ligands, cancer neoantigens, and T cell epitopes.

Earlier references:

NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length data sets

Morten Nielsen^1,2 and Massimo Andreatta¹

Genome Medicine (2016): 8:33

¹Instituto de Investigaciones Biotecnologicas, Universidad Nacional de San Martin, Buenos Aires, Argentina
²Center for Biological Sequence Analysis, Technical University of Denmark, DK-2800 Lyngby, Denmark

Binding of peptides to MHC class I molecules (MHC-I) is essential for antigen presentation to cytotoxic T-cells. Here, we demonstrate how a simple alignment step allowing insertions and deletions in a pan-specific MHC-I binding machine-learning model enables combining information across both multiple MHC molecules and peptide lengths. This pan-allele/pan-length algorithm significantly outperforms state-of-the-art methods, and captures differences in the length profile of binders to different MHC molecules leading to increased accuracy for ligand identification. Using this model, we demonstrate that percentile ranks in contrast to affinity-based thresholds are optimal for ligand identification due to uniform sampling of the MHC space.

Full text [PDF]

NetMHCpan - MHC class I binding prediction beyond humans
Hoof I¹, Peter B³, Sidney J³, Pedersen LE² Lund O¹, Buus S², Nielsen M¹

Immunogenetics. (2009) Jan;61(1):1-13.

¹Center for Biological Sequence Analysis, Technical University of Denmark, DK-2800 Lyngby, Denmark
²Division of Experimental Immunology, Institute of Medical Microbiology and Immunology, University of Copenhagen, Denmark
³La Jolla Institute for Allergy and Immunology, San Diego, California, United States of America

Binding of peptides to major histocompatibility complex (MHC) molecules is the single most selective step in the recognition of pathogens by the cellular immune system. The human MHC genomic region (called HLA) is extremely polymorphic comprising several thousand alleles, each encoding a distinct MHC molecule. The potentially unique specificity of the majority of HLA alleles that have been identified to date remains uncharacterized. Likewise, only a limited number of chimpanzee and rhesus macaque MHC class I molecules have been characterized experimentally. Here, we present NetMHCpan-2.0, a method that generates quantitative predictions of the affinity of any peptide-MHC class I interaction. NetMHCpan-2.0 has been trained on the hitherto largest set of quantitative MHC binding data available, covering HLA-A and HLA-B, as well as chimpanzee, rhesus macaque, gorilla, and mouse MHC class I molecules. We show that the NetMHCpan-2.0 method can accurately predict binding to uncharacterized HLA molecules, including HLA-C and HLA-G. Moreover, NetMHCpan-2.0 is demonstrated to accurately predict peptide binding to chimpanzee and macaque MHC class I molecules. The power of NetMHCpan-2.0 to guide immunologists in interpreting cellular immune responses in large out-bred populations is demonstrated. Further, we used NetMHCpan-2.0 to predict potential binding peptides for the pig MHC class I molecule SLA-1*0401. Ninety-three percent of the predicted peptides were demonstrated to bind stronger than 500 nM. The high performance of NetMHCpan-2.0 for non-human primates documents the method's ability to provide broad allelic coverage also beyond human MHC molecules. The method is available at http://www.cbs.dtu.dk/services/NetMHCpan.

PMID: 19002680

Full text

Version history

Please click on the version number to activate the corresponding server (if available).

4.0	Online since 5 Sep 2017. New in this version: NetMHCpan is now trained both on affinity data and naturally eluted ligands The method predicts two properties: likelihood of ligand presentation and binding affinity Publication: NetMHCpan-4.0: Improved Peptide–MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data Vanessa Jurtz, Sinu Paul, Massimo Andreatta, Paolo Marcatili, Bjoern Peters and Morten Nielsen The Journal of Immunology (2017) ji1700893; DOI: 10.4049/jimmunol.1700893
3.0	Online since 10 Feb 2016. New in this version: Improved algorithm implementing insertions and deletions in the alignment Method trained on extended data-set including peptides of length 8 to 13 Length distribution of predicted binders that reflects the MHC molecules preferences Publication: NetMHCpan-3.0: improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length data sets Morten Nielsen and Massimo Andreatta Genome Medicine (2016): 8:33
2.8	Online since 26 Feb 2013. New in this version: Method retrained on extented data set including 10 prevalent HLA-C and 7 prevalent BoLA MHC-I molecules.
2.4	Online since 18 Dec 2010. New in this version: Method retrained on extented data set including several HLA-C allele, and two BoLA alleles.
2.3	Online since 08 Sept 2010. New in this version: Method retrained on the version 2.2 data excluding data from the Mamu-A1*02601 allele due to data contamination. Also, the method has been updated to include the newest MHC allele releases from the IMGT/HLA and IPD-MHC databases (for non-human primates and pig). These updates include incoportation of the new nomemclature for HLA and Rhesus macaque (Mamu).
2.2	Online since 01 Sept 2009. New in this version: Method retrained on an extented data covering more than 100 MHC alleles and more than 110,000 peptide/MHC interactions.
2.1	Online since 06 April 2009. New in this version: Predicted binding score are shown as percent rank to a pool of 1000.000 random natural 9mer peptides. IC50 values are only shown for a set of white-listed alleles (mostly HLA-A and HLA-B alleles) where the values can be relied on.
2.0	Online since 24 June, 2008. New in this version: Binding predictions for all known MHC molecules including HLA-C, non-classical HLA (HLA-E and HLA-G), non-human primates, pig and mouse. Prediction of performance accuracy Publication: NetMHCpan - MHC class I binding prediction beyond humans Ilka Hoof, Bjoern Peters, John Sidney, Lasse Eggers Pedersen, Ole Lund, Soren Buus, and Morten Nielsen Immunogenetics 61.1 (2009): 1-13
1.1	Online from March 2007 until 24 of June, 2008. New in this version: Includes prediction of peptides length 8-11 .
1.0	Original version (online version until March 2007): Publication: NetMHCpan, a Method for Quantitative Predictions of Peptide Binding to Any HLA-A and -B Locus Protein of Known Sequence. Nielsen M, et al. (2007) PLoS ONE 2(8): e796. doi:10.1371/journal.pone.0000796 View the the full text version at PLoSONE: Full text, or Full text including supplementary materials: PDF_fulltext.pdf

Software Downloads

Version 4.1b

Linux
Darwin

Version 4.0a

Linux
Darwin

Version 3.0a

Linux
Darwin

Version 2.8a

Version 2.8

Version 2.4a

GETTING HELP

If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Correspondence: Technical Support: