For publication of results, please cite:
This version: NetMHCpan-4.1 and NetMHCIIpan-4.0: Improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data
Birkir Reynisson, Bruno Alvarez, Sinu Paul, Bjoern Peters and Morten Nielsen
Accepted for publication, NAR Webserver issue 2020
NetMHCpan-4.0: Improved Peptide MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data
Vanessa Jurtz, Sinu Paul, Massimo Andreatta, Paolo Marcatili, Bjoern Peters and Morten Nielsen
The Journal of Immunology (2017) ji1700893; DOI: 10.4049/jimmunol.1700893
NetMHCpan-3.0: improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length data sets
Morten Nielsen and Massimo Andreatta
Genome Medicine (2016): 8:33
NetMHCpan, a method for MHC class I binding prediction beyond humans
Ilka Hoof, Bjoern Peters, John Sidney, Lasse Eggers Pedersen, Ole Lund, Soren Buus, and Morten Nielsen
Immunogenetics 61.1 (2009): 1-13
Data resources used to develop this server was obtained from
- IEDB database.
- Quantitative peptide binding data were obtained
from the IEDB database.
- IMGT/HLA database. Robinson J, Malik A, Parham P, Bodmer JG,
Marsh SGE: IMGT/HLA - a sequence database for the human major histocompatibility complex. Tissue Antigens (2000),
- HLA protein sequences were obtained from the IMGT/HLA database (version 3.1.0).
Would you prefer to run NetMHCpan at your own site? NetMHCpan v. 4.1
is available as a stand-alone software package, with the same
functionality as the service above. Ready-to-ship packages
exist for the most common UNIX platforms. There is a download tap
for academic users; other users are requested to contact
CBS Software Package Manager at
In this section, the user must define the input for the prediction server following these steps:
1) Specify the desired type of input data (FASTA or PEPTIDE) using the drop down menu.
2) Provide the input data by means of pasting the data into the blank field, uploading it using the "Choose File" button or by loading sample data using the "Load Data" button. All the input sequences must be in one-letter amino acid code. The alphabet is as follows (case sensitive):
A C D E F G H I K L M N P Q R S T V W Y and X (unknown)
Any other symbol will be converted to X before processing. At most 5000 sequences are allowed per submission; each sequence must be not more than 20,000 amino acids long and not less than 8 amino acids long.
3) If FASTA was selected as input type, the user must select the peptide length(s) the prediction server is going to work with. NetMHCpan-4.1 will "chop" the input FASTA sequence in overlapping peptides of the provided length(s) and will predict binding against all of them. By default input proteins are digested into 9-mer peptides. Note that, if PEPTIDE was selected as input type, this step is unnecessary and thus the peptide length selector will directly not appear in the interface.
Here, the user must define which MHC(s) molecule(s) the input data is going to be predicted against:
1) First, select the HLA/MHC supertype family.
2) After selecting the MHC family, the user will be able to select a single or multiple MHC molecules from the updated "Select Allele(s)" list. On the other hand, the user may opt to directly type the MHC names in the provided blank field (separated by commas and without blank spaces); if this is the case, there will be no need to select an MHC supertype familiy from the drop-down menu. Click here for a list of MHC molecule names (use the names in the first column). Please note that a maximum of 20 MHC types is allowed per submission.
3) Optionally, the user may choose to paste a full MHC protein sequence in the blank box, or directly upload it by clicking the "Choose file" button. Such sequence must be in FASTA format.
Please note that steps 2) and 3) are mutually exclusive, and are only labeled this way for explanation purposes.
In this section, the user may define additional parameters to further customize the run:
1, 2) Specify thresholds for strong and weak binders. They are expressed in terms of %Rank, that is percentile of the predicted binding affinity compared to the distribution of affinities calculated on set of random natural peptides. The peptide will be identified as a strong binder if it is found among the top x% predicted peptides, where x% is the specified threshold for strong binders (by default 0.5%). The peptide will be identified as a weak binder if the % Rank is above the threshold of the strong binders but below the specified threshold for the weak binders (by default 2%).
3) Specify a %Rank threshold to filter out predictions. Only sequences with a predicted %Rank value less than the specified threshold will be printed. To print all predictions, leave this value set to -99.
4) Tick this option to include also Binding Affinity predictions together with Eluted Ligand likelihood.
5) Tick this box to have the output sorted by descending prediction score.
6) Enable this option to export the prediction output to .XLS format (readable by most spreadsheet softwares, like Microsoft Excel).
After the user has finished the "INPUT DATA", "MHC SELECTION" and "ADDITIONAL CONFIGURATION" steps, the submission can now be done. To do so, the user can click on "Submit" to submit the job to the processing server, or click on "Clear fields" to clear the page and start over.
The status of your job (either 'queued' or 'running') will be displayed and constantly updated until it terminates and the server output appears in the browser window.
After the server has finished running the corresponding predictions, an
output page will be delivered to the user. A description of the output format can be found at output format
At any time during the wait you may enter your e-mail address and simply leave the window. Your job will continue; when it terminates you will be notified by e-mail with a URL to your results. They will be stored on the server for 24 hours.
For the following FASTA input example:
Peptide length: 8, 9, 10, 11, 12
Sort by prediction score: On
NetMHCpan-4.1 will return the following output (showing the first 10 predicted peptides):
# NetMHCpan version 4.1
# Tmpdir made /usr/opt/www/webface/tmp/server/netmhcpan/5E4EEA2C000053D26D7D1DEF/netMHCpanYdWfgb
# Input is in FSA format
# Peptide length 8,9,10,11,12
# Make Eluted ligand likelihood predictions
HLA-A03:01 : Distance to training data 0.000 (using nearest neighbor HLA-A03:01)
# Rank Threshold for Strong binding peptides 0.500
# Rank Threshold for Weak binding peptides 2.000
Pos HLA Peptide Core Of Gp Gl Ip Il Icore Identity Score_EL Rnk_EL BindLevel
15 HLA-A*03:01 HQAAMQMLK HQAAMQMLK 0 0 0 0 0 HQAAMQMLK Gag_180_209 0.6594640 0.259 <= SB
14 HLA-A*03:01 GHQAAMQMLK GQAAMQMLK 0 1 1 0 0 GHQAAMQMLK Gag_180_209 0.2451190 1.084 <= WB
13 HLA-A*03:01 GGHQAAMQMLK GQAAMQMLK 0 1 2 0 0 GGHQAAMQMLK Gag_180_209 0.1045310 1.977 <= WB
12 HLA-A*03:01 VGGHQAAMQMLK VQAAMQMLK 0 1 3 0 0 VGGHQAAMQMLK Gag_180_209 0.0245250 4.089
7 HLA-A*03:01 TMLNTVGGH TMLNTVGGH 0 0 0 0 0 TMLNTVGGH Gag_180_209 0.0194210 4.545
15 HLA-A*03:01 HQAAMQMLKE HQAAMQMLK 0 0 0 0 0 HQAAMQMLK Gag_180_209 0.0083750 6.588
16 HLA-A*03:01 QAAMQMLK QAA-MQMLK 0 0 0 3 1 QAAMQMLK Gag_180_209 0.0042090 8.777
8 HLA-A*03:01 MLNTVGGHQ MLNTVGGHQ 0 0 0 0 0 MLNTVGGHQ Gag_180_209 0.0029890 10.119
21 HLA-A*03:01 MLKETINEE MLKETINEE 0 0 0 0 0 MLKETINEE Gag_180_209 0.0015830 13.034
11 HLA-A*03:01 TVGGHQAAM TVGGHQAAM 0 0 0 0 0 TVGGHQAAM Gag_180_209 0.0013180 14.043
The prediction output for each molecule consists of the following columns:
Pos: Residue number (starting from 0) of the peptide in the protein sequence.
HLA: Specified MHC molecule / Allele name.
Peptide: Amino acid sequence of the potential ligand.
Core: The minimal 9 amino acid binding core directly in contact with the MHC (i.e excluding potential insertions).
Of: The starting position of the Core within the Peptide (if > 0, the method predicts a N-terminal protrusion).
Gp: Position of the deletion, if any.
Gl: Length of the deletion, if any.
Ip: Position of the insertion, if any.
Il: Length of the insertion, if any.
Icore: Interaction core. This is the sequence of the peptide bound and presented by the MHC.
Identity: Protein identifier, i.e. the name of the FASTA entry.
Score: The raw prediction score.
%Rank: Rank of the predicted binding score compared to a set of random natural peptides. This measure is not affected by inherent bias of certain molecules towards higher or lower mean predicted affinities. Strong binders are defined as having %rank<0.5, and weak binders with %rank<2. We advise to select candidate binders based on %Rank rather than Score
BindLevel: (SB: Strong Binder, WB: Weak Binder). The peptide will be identified as a strong binder if the %Rank is below the specified threshold for the strong binders (by default, 0.5%). The peptide will be identified as a weak binder if the %Rank is above the threshold of the strong binders but below the specified threshold for the weak binders (by default, 2%).
Peptide vs. iCore vs. Core
Three amino acid sequences are reported for each row of predictions:
The Peptide is the complete amino acid sequence evaluated by NetMHCpan. Peptides are the
full sequences submitted as a peptide list, or the result of digestion of source proteins (Fasta submission)
The iCore is a substring of Peptide, encompassing
all residues between P1 and P-omega of the MHC. For all intents and purposes, this is the minimal candidate
ligand/epitope that should be considered for further validation.
The Core is always 9 amino acids long,
and is a construction used for sequence aligment and identification of binding anchors.
NetMHCpan-4.1 and NetMHCIIpan-4.0: Improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data
Birkir Reynisson 1*, Bruno Alvarez 2*, and Morten Nielsen1,3
Accepted for publication, NAR webserver issue 2020
Major Histocompatibility Complex (MHC) molecules are expressed on the cell
surface, where they present peptides to T cells, which gives them a key
role in the development of T cell immune responses. MHC molecules come in
two main variants: MHC Class I (MHC-I) and MHC Class II (MHC-II). MHC-I
predominantly present peptides derived from intracellular proteins,
whereas MHC-II predominantly presents peptides from extracellular
proteins. In both cases, the binding between MHC and antigenic peptides
is the most selective step in the antigen presentation pathway. Therefore,
the prediction of peptide binding to MHC is a powerful utility to predict
the possible specificity of a T cell immune response. Commonly MHC binding
prediction tools are trained on binding affinity or mass spectrometry
eluted ligands. Recent studies have however demonstrated how the
integration of both data types can boost predictive performances. Inspired
by this, we here present NetMHCpan-4.1 and NetMHCIIpan-4.0, two
web-servers created to predict binding between peptides and MHC-I and
MHC-II, respectively. Both methods exploit tailored machine learning
strategies to integrate different training data types, resulting in
state-of-the-art performance and outperforming their competitors. The
servers are available at http://www.cbs.dtu.dk/services/NetMHCpan-4.1/
NetMHCpan-4.0: Improved Peptide–MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Dta
Vanessa Jurtz 1, Sinu Paul 2, Massimo Andreatta 3, Paolo Marcatili 1, Bjoern Peters 2, and Morten Nielsen1,3
The Journal of Immunology (2017) ji1700893; DOI: 10.4049/jimmunol.1700893
Department of Bio and Health Informatics, Technical University of Denmark, DK-2800 Lyngby, Denmark
Division of Vaccine Discovery, La Jolla Institute for Allergy and Immunology, CA92037 La Jolla, USA
Instituto de Investigaciones Biotecnologicas, Universidad Nacional de San Martin, Buenos Aires, Argentina
Cytotoxic T cells are of central importance in the immune system’s response to disease.
They recognize defective cells by binding to peptides presented on the cell surface by MHC class I molecules.
Peptide binding to MHC molecules is the single most selective step in the Ag-presentation pathway.
Therefore, in the quest for T cell epitopes, the prediction of peptide binding to MHC molecules has attracted widespread attention.
In the past, predictors of peptide–MHC interactions have primarily been trained on binding affinity data.
Recently, an increasing number of MHC-presented peptides identified by mass spectrometry have been reported containing information about
peptide-processing steps in the presentation pathway and the length distribution of naturally presented peptides.
In this article, we present NetMHCpan-4.0, a method trained on binding affinity and eluted ligand data
leveraging the information from both data types. Large-scale benchmarking of the method demonstrates an
increase in predictive performance compared with state-of-the-art methods when it comes to identification of
naturally processed ligands, cancer neoantigens, and T cell epitopes.
NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length data sets
Morten Nielsen1,2 and Massimo Andreatta1
Genome Medicine (2016): 8:33
1Instituto de Investigaciones Biotecnologicas, Universidad Nacional de San Martin, Buenos Aires, Argentina
2Center for Biological Sequence Analysis,
Technical University of Denmark,
DK-2800 Lyngby, Denmark
Binding of peptides to MHC class I molecules (MHC-I) is essential for antigen presentation to cytotoxic T-cells. Here, we demonstrate how a simple alignment step allowing insertions and deletions in a pan-specific MHC-I binding machine-learning model enables combining information across both multiple MHC molecules and peptide lengths. This pan-allele/pan-length algorithm significantly outperforms state-of-the-art methods, and captures differences in the length profile of binders to different MHC molecules leading to increased accuracy for ligand identification. Using this model, we demonstrate that percentile ranks in contrast to affinity-based thresholds are optimal for ligand identification due to uniform sampling of the MHC space.
NetMHCpan - MHC class I binding prediction beyond humans
Immunogenetics. (2009) Jan;61(1):1-13.
1Center for Biological Sequence Analysis,
Technical University of Denmark,
DK-2800 Lyngby, Denmark
2Division of Experimental Immunology,
Institute of Medical Microbiology and Immunology,
University of Copenhagen, Denmark
3La Jolla Institute for Allergy and Immunology, San Diego, California, United States of America
Binding of peptides to major histocompatibility complex (MHC) molecules
is the single most selective step in the recognition of pathogens by
the cellular immune system. The human MHC genomic region (called HLA)
is extremely polymorphic comprising several thousand alleles, each
encoding a distinct MHC molecule. The potentially unique specificity
of the majority of HLA alleles that have been identified to date
remains uncharacterized. Likewise, only a limited number of chimpanzee
and rhesus macaque MHC class I molecules have been characterized
experimentally. Here, we present NetMHCpan-2.0, a method that generates
quantitative predictions of the affinity of any peptide-MHC class I
interaction. NetMHCpan-2.0 has been trained on the hitherto largest set
of quantitative MHC binding data available, covering HLA-A and HLA-B,
as well as chimpanzee, rhesus macaque, gorilla, and mouse MHC class
I molecules. We show that the NetMHCpan-2.0 method can accurately
predict binding to uncharacterized HLA molecules, including HLA-C and
HLA-G. Moreover, NetMHCpan-2.0 is demonstrated to accurately predict
peptide binding to chimpanzee and macaque MHC class I molecules. The power
of NetMHCpan-2.0 to guide immunologists in interpreting cellular immune
responses in large out-bred populations is demonstrated. Further, we used
NetMHCpan-2.0 to predict potential binding peptides for the pig MHC class
I molecule SLA-1*0401. Ninety-three percent of the predicted peptides
were demonstrated to bind stronger than 500 nM. The high performance
of NetMHCpan-2.0 for non-human primates documents the method's ability
to provide broad allelic coverage also beyond human MHC molecules. The
method is available at http://www.cbs.dtu.dk/services/NetMHCpan.