DTU Health Tech

Department of Health Technology

NetMHCIIpan - 3.2

Pan-specific binding of peptides to MHC class II alleles of known sequence

NetMHCIIpan 3.2 server predicts binding of peptides to MHC class II molecules. The predictions are available for the three human MHC class II isotypes HLA-DR, HLA-DP and HLA-DQ, as well as mouse molecules (H-2).

Submission is accepted in two formats - as a list of peptides or as a protein sequence in FASTA format. A comprehensive list of MHC molecules is available for prediction, alternatively the user can upload their MHC protein sequence of interest.

The prediction values are given in IC50 values (in nanoMolars) and as %Rank. The percentile rank for a peptide is generated by comparing its score against the scores of 200,000 random natural peptides of the same length of the query peptide. For example, if a peptide is assigned a rank of 1%, it means that its predicted affinity is among the top 1% scores for the specified molecule.
Strong and weak binding peptides are identified based on %Rank, with customizable thresholds. You may sort the output based on predicted binding affinity and filter out non-binders.

The project is a collaboration between CBS, IMMI at copenhagen university and LIAI.

NEW: visualize sequence motifs of the molecules in the NetMHCIIpan library with the Motif viewer.

Note, if you download the stand alone version of the tool, please access the needed data.tar.gz file from data.Linux.tar.gz (Linux) or data.Darwin.tar.gz (MAC)

Submission


Hover the mouse cursor over the symbol for a short description of the options

Type of input

Paste a single sequence or several sequences in FASTA format into the field below:

or submit a file in FASTA format directly from your local disk:

Peptide length  


Select species/loci


Select Allele (max. 20 per submission)


or type a list of molecules names separated by commas (no spaces)
Max 20 alleles per submission. 

For the list of available molecule names click here: List of MHC molecule names.

Alternatively, upload full length Alpha and Beta chain protein sequences:

Definition of binding peptides:
Threshold for strong binder (% Rank)  
Threshold for weak binder (% Rank)  

Turn on filtering options 

Print only the strongest binding core 

Sort output by affinity 

Exclude offset correction 

Graphical representation of binding registers 
for peptides with %Rank <   %

Save predictions to xls file 

Restrictions:
At most 5000 sequences per submission; each sequence not more than 20,000 amino acids and not less than 8 amino acids. Max 20 MHC alleles per submission.

Confidentiality:
The sequences are kept confidential and will be deleted after processing.


CITATIONS

For publication of results, please cite:

  • Improved methods for predicting peptide binding affinity to MHC class II molecules.
    Jensen KK, Andreatta M, Marcatili P, Buus S, Greenbaum JA, Yan Z, Sette A, Peters B, Nielsen M.
    Immunology. 2018 Jan 6. doi: 10.1111/imm.12889.
    PubMed: 29315598

DATA RESOURCES

Benchmark data used to develop this server were obtained from:

Usage instructions


1. Specify the input sequences

All the input sequences must be in one-letter amino acid code. The alphabet (not case-sensitive) is as follows:

A C D E F G H I K L M N P Q R S T V W Y and X (unknown)

Any other symbol will be converted to X before processing.

The server allows for input in either FASTA or PEPTIDE format.

The sequences can be input in the following two ways:

  • Paste a single sequence (just the amino acids) or a number of sequences in FASTA format or a list of peptides into the upper window of the main server page.

  • Select a FASTA or PEPTIDE file on your local disk, either by typing the file name into the lower window or by browsing the disk.

There is a limit of 5000 sequences per submission, no longer than 20000 amino acids.

2. Customize your run

1. Specify peptide length (only for FASTA input). Buy default the server uses 15-mer peptides.

2. Select species/loci from the scroll-down menu.

3. Select allele(s) from the scroll-down menu or type in the allele names separated by commas (without blank spaces). Note: if you have chosen DP or DQ loci, you have to choose alpha and beta chains separately from the scroll-down menu. If you choose to type in the allele names, you can consult the List of MHC molecule names..

4. You can also paste a single full length MHC protein sequence in FASTA format or submit a file containing a full length MHC protein sequence in FASTA format directly from your local disk. For the DR molecules paste or submit only a sequence of the beta chain. For all other loci, paste or submit alpha and beta chain sequences separately.

5. Optionally specify thresholds for strong and weak binders, expressed in %Rank. The %Rank defines how the predicted affinity for a given peptide ranks compared to a set of 200,000 random natural peptides of the same length. For example, if a 15mer peptide is assigned a rank of 1%, it means that one can expect 2000 out of 200,000 random 15mers to have equal or higher affinity.

6. For large submissions, you may want to filter the results on % Rank and IC50 values. Only predictions for peptides with % Rank OR binding affinity (IC50) below the specified thresholds will be shown in the results page.

7. Optionally run the program in fast mode (recommended for large data sets). It uses a reduced ensemble of only 10 neural networks.

8. Tick the box Print only the strongest binding core to display the results only for the strongest binding core in overlapping consecutive peptides (Fasta submissions).

9. Tick the box Sort by affinity to have the output sorted by descending predicted binding affinity.

10. The server can produce a graphical representation of the peptide binding core registers. All possible binding registers are plotted with the fraction of networks in the ensemble selecting each register. The graphics can be made only for a maximum of 20 peptides (use together with the sorting option to display the graphics for the strongest predicted binders).

11. Tick the box Save prediction to xls file if you want the output to be exported in xls format.

3. Submit the job

Click on the "Submit" button. The status of your job (either 'queued' or 'running') will be displayed and constantly updated until it terminates and the server output appears in the browser window.

At any time during the wait you may enter your e-mail address and simply leave the window. Your job will continue; when it terminates you will be notified by e-mail with a URL to your results. They will be stored on the server for 24 hours.

Output format



DESCRIPTION


The prediction output for each molecule consists of the following columns:

  • Seq Residue number (starting from 0)

  • Allele MHC molecule name

  • Peptide Amino acid sequence

  • Identity Annotation of the input sequence, if specified

  • Pos Starting position of the optimal binding core (starting from 0)

  • Core Binding core register

  • Core_Rel Reliability of the binding core, expressed as the fraction of networks in the ensemble selecting the optimal core (see the Core_Histograms for the reliability values of all cores)

  • 1-log50K(aff) Predicted binding affinity in log-scale

  • Affinity(nM) Predicted binding affinity in nanomolar IC50

  • %Rank - % Rank of predicted affinity compared to a set of 200.000 random natural peptides. This measure is not affected by inherent bias of certain molecules towards higher or lower mean predicted affinities

  • Exp_bind If the input was given in PEPTIDE format with an annotated affinity value (mainly for benchmarking purposes).

  • BindingLevel (SB: strong binder, WB: weak binder). The peptide will be identified as a strong binder if the % Rank is below the specified threshold for the strong binders. The peptide will be identified as a weak binder if the % Rank is above the threshold of the strong binders but below the specified threshold for the weak binders.

    If the "Graphical representation of the binding registers" option was selected, a list of figure links will be displayed. For the top 20 peptides, these graphs show the number of networks in the ensemble that agreed on each binding register (the Core_Rel measure for all registers). If the histogram shows alternative binding registers with comparable reliability measure (= similar height of the bars) for a predicted binder, that may suggest the presence of competitive binding cores within the peptide.


  • EXAMPLE OUTPUT

    
    
    # NetMHCIIpan version 3.1
    
    # Input is in PEPTIDE format
    
    # Threshold for Strong binding peptides (IC50)	50.000 nM
    # Threshold for Weak binding peptides (IC50)	500.000 nM
    
    # Threshold for Strong binding peptides (%Rank)	0.5%
    # Threshold for Weak binding peptides (%Rank)	2%
    
    
    
    
    # Allele: DRB1_0301
    --------------------------------------------------------------------------------------------------------------------------------------------
       Seq          Allele              Peptide    Identity  Pos      Core  Core_Rel 1-log50k(aff)  Affinity(nM)  %Rank Exp_Bind  BindingLevel
    --------------------------------------------------------------------------------------------------------------------------------------------
         0         DRB1_0301      AGFKGEQGPKGEPG    Sequence    2    FKGEQGPKG 0.810         0.080      21036.68  50.00   9.999       
         1         DRB1_0301     GELIGTLNAAKVPAD    Sequence    2    LIGTLNAAK 0.650         0.340       1268.50  32.00   9.999       
         2         DRB1_0301    PEVIPMFSALSEGATP    Sequence    5    MFSALSEGA 0.385         0.180       7161.16  50.00   9.999       
         3         DRB1_0301       PKYVKQNTLKLAT    Sequence    2    YVKQNTLKL 0.575         0.442        418.70   6.00   9.999   <=WB
         4         DRB1_0301     VGSDWRFLRGYHQYA    Sequence    0    VGSDWRFLR 0.575         0.466        322.07  10.00   9.999   <=WB
         5         DRB1_0301         XFVKQNAAALX    Sequence    2    VKQNAAALX 0.500         0.262       2939.20  15.00   9.999       
         6         DRB1_0301     AAYSDQATPLLLSPR    Sequence    1    AYSDQATPL 0.395         0.291       2152.21  50.00   9.999       
         7         DRB1_0301     PVSKMRMATPLLMQA    Sequence    4    MRMATPLLM 0.890         0.770         12.00   0.01   9.999   <=SB
         8         DRB1_0301        AYMRADAAAGGA    Sequence    2    MRADAAAGG 0.835         0.303       1887.87  15.00   9.999       
         9         DRB1_0301       PKYVKQNTLKLAT    Sequence    2    YVKQNTLKL 0.575         0.442        418.70   6.00   9.999   <=WB
        10         DRB1_0301     ENPVVHFFKNIVTPR    Sequence    6    FFKNIVTPR 0.425         0.357       1049.04  32.00   9.999       
        11         DRB1_0301      GGVYHFVKKHVHES    Sequence    2    VYHFVKKHV 0.450         0.354       1084.85  32.00   9.999       
        12         DRB1_0301 NPVVHFFKNIVTPRTPPPSQ    Sequence    5    FFKNIVTPR 0.575         0.415        562.38  50.00   9.999       
        13         DRB1_0301     VHFFKNIVTPRTPGG    Sequence    2    FFKNIVTPR 0.685         0.347       1166.24  32.00   9.999       
        14         DRB1_0301    MPLAQMLLPTAMRMKM    Sequence    5    MLLPTAMRM 0.465         0.479        279.91  15.00   9.999   <=WB
        15         DRB1_0301     KMRMATPLLMQALPM    Sequence    1    MRMATPLLM 0.910         0.712         22.67   0.10   9.999   <=SB
        16         DRB1_0301 KPVSKMRMATPLLMQALPM    Sequence    5    MRMATPLLM 0.875         0.792          9.49   0.03   9.999   <=SB
        17         DRB1_0301      XPKWVKQNTLKLAT    Sequence    4    VKQNTLKLA 0.475         0.447        397.31   8.00   9.999   <=WB
        18         DRB1_0301     PVSKMRMATPLLMQA    Sequence    4    MRMATPLLM 0.890         0.770         12.00   0.01   9.999   <=SB
        19         DRB1_0301      GSDARFLRGYHLYA    Sequence    3    ARFLRGYHL 0.380         0.325       1485.69  32.00   9.999       
        20         DRB1_0301     APPAYEKLSAEQSPP    Sequence    4    YEKLSAEQS 0.345         0.101      16682.44  50.00   9.999       
        21         DRB1_0301        VVKQNCLKLATK    Sequence    1    VKQNCLKLA 0.665         0.275       2539.02  32.00   9.999       
        22         DRB1_0301       PEVIPMFSALSEG    Sequence    2    VIPMFSALS 0.475         0.143      10603.35  50.00   9.999       
        23         DRB1_0301    WNRQLYPEWTEAQRLD    Sequence    4    LYPEWTEAQ 0.545         0.235       3925.16  50.00   9.999       
        24         DRB1_0301       SAVRLRSSVPGVR    Sequence    4    LRSSVPGVR 0.790         0.405        624.44   9.00   9.999       
        25         DRB1_0301       GVYATRSSAVRLR    Sequence    1    VYATRSSAV 0.285         0.376        859.30  15.00   9.999       
        26         DRB1_0301     ATEYRVRVNSAYQDK    Sequence    5    VRVNSAYQD 0.575         0.312       1708.18  50.00   9.999       
        27         DRB1_0301       SAVRLRSSVPGVR    Sequence    4    LRSSVPGVR 0.790         0.405        624.44   9.00   9.999       
    --------------------------------------------------------------------------------------------------------------------------------------------
    Number of strong binders: 4 Number of weak binders: 5
    --------------------------------------------------------------------------------------------------------------------------------------------
    
    
    


    Article abstract


    Improved methods for predicting peptide binding affinity to MHC class II molecules.
    Jensen KK, Andreatta M, Marcatili P, Buus S, Greenbaum JA, Yan Z, Sette A, Peters B, Nielsen M
    Immunology. 2018 Jan 6. doi: 10.1111/imm.12889

    Major histocompatibility complex class II (MHC-II) molecules are expressed on the surface of professional antigen-presenting cells where they display peptides to T helper cells, which orchestrate the onset and outcome of many host immune responses. Understanding which peptides will be presented by the MHC-II molecule is therefore important for understanding the activation of T helper cells and can be used to identify T-cell epitopes. We here present updated versions of two MHC-II-peptide binding affinity prediction methods, NetMHCII and NetMHCIIpan. These were constructed using an extended data set of quantitative MHC-peptide binding affinity data obtained from the Immune Epitope Database covering HLA-DR, HLA-DQ, HLA-DP and H-2 mouse molecules. We show that training with this extended data set improved the performance for peptide binding predictions for both methods. Both methods are publicly available at www.cbs.dtu.dk/services/NetMHCII-2.3 and www.cbs.dtu.dk/services/NetMHCIIpan-3.2.

    PMID: 29315598


    Supplementary Material

    Here, you will find the data set used for training and testing, as well as the T cell epitope data used for evaluation of the NetMHCIIpan-3.2 method.


    Training data

    The training binding data are partitioned in 5 files to be used for cross-validation. For instance does the train1 file contain training data, and test1 file test data for the first cross-validation partitioning. It is critical that this data partitioning is maintained.

    The format for each of the files is

    AAAGAEAGKATTEEQ 0.190842        DRB1_0101
    AAAGAEAGKATTEEQ 0.006301        DRB1_0301
    AAAGAEAGKATTEEQ 0.066851        DRB1_0401
    AAAGAEAGKATTEEQ 0.006344        DRB1_0405
    AAAGAEAGKATTEEQ 0.035130        DRB1_0701
    AAAGAEAGKATTEEQ 0.006288        DRB1_0802
    AAAGAEAGKATTEEQ 0.176268        DRB1_0901
    AAAGAEAGKATTEEQ 0.042555        DRB1_1101
    AAAGAEAGKATTEEQ 0.114855        DRB1_1302
    AAAGAEAGKATTEEQ 0.006377        DRB1_1501
    

    where the first column gives the peptide, the second column the log50k transformed binding affinity (i.e. 1 - log50k( aff nM)), and the last column the class II allele.

    When classifying the peptides into binders and non-binders for calculation of the AUC values for instance, a threshold of 500 nM is used. This means that peptides with log50k transformed binding affinity values greater than 0.426 are classified as binders.

    train1 (Train data) test1 (Test data)
    train2 (Train data) test2 (Test data)
    train3 (Train data) test3 (Test data)
    train4 (Train data) test4 (Test data)
    train5 (Train data) test5 (Test data)

    T cell evaluation data

    T cell evaluation data

    The format is

    >0705172A=AAHAEINEA=H2-IAb 385 gi|223299|prf||0705172A
    GSIGAASMEFCFDVFKELKVHHANENIFYCPIAIMSALAMVYLGAKDSTRTQINKVVRFD
    KLPGFGDSIEAQCGTSVNVHSSLRDILNQITKPNDVYSFSLASRLYAEERYPILPEYLQC
    VKELYRGGLEPINFQTAADQARELINSWVESQTNGIIRNVLQPSSVDSQTAMVLVNAIVF
    KGLWEKAFKDEDTQAMPFRVTEQESKPVQMMYQIGLFRVASMASEKMKILELPFASGTMS
    MLVLLPDEVSGLEQLESIINFEKLTEWTSSNVMEERKIKVYLPRMKMEEKYNLTSVLMAM
    GITDVFSSSANLSGISSAESLKISQAVHAAHAEINEAGREVVGSAEAGVDAASVSEEFRA
    DHPFLFCIKHIATNAVLFFGRCVSP
    

    where the first part of the fasta header contains the proteinID (0705172A), the epitope (AAHAEINEA), and the MHC restriction (H2-IAb)


    References

    Improved methods for predicting peptide binding affinity to MHC class II molecules.
    Jensen KK, Andreatta M, Marcatili P, Buus S, Greenbaum JA, Yan Z, Sette A, Peters B, Nielsen M.
    Immunology. 2018 Jan 6. doi: 10.1111/imm.12889.
    PubMed: 29315598

    Version history


    Please click on the version number to activate the corresponding server.

    3.2 The current server (online since January 2018). New in this version:
    • Method retrained on an extensive dataset of over 100,000 datapoints, covering 36 HLA-DR, 27 HLA-DQ, 9 HLA-DP, and 8 mouse MHC-II molecules.
    Main publication:

    • Improved methods for predicting peptide binding affinity to MHC class II molecules.
      Jensen KK, Andreatta M, Marcatili P, Buus S, Greenbaum JA, Yan Z, Sette A, Peters B, Nielsen M.
      Immunology. 2018 Jan 6. doi: 10.1111/imm.12889.
      PubMed: 29315598
    3.1 (online since December 2014). New in this version:
    • Improved binding core identification by realigning individual networks in the ensemble.
    • Introduced a reliability measure on the predicted binding core (Core_Rel column).
    • Graphical representation of the binding core register and of possible multiple cores.
    Main publication:

    • Accurate pan-specific prediction of peptide-MHC class II binding affinity with improved binding core identification
      Andreatta M, Karosiene E, Rasmussen M, Stryhn A, Buus S, and Nielsen M
      Immunogenetics (2015)
      PubMed: 26416257
    3.0 (online since June 2013). New in this version:
    • The user can make predictions for all DR, DP and DQ molecules with known protein sequence. Likewise can the user upload full length MHC class II alpha and beta chain and have the server predict MHC restricted peptides from any given protein of interest
    2.1 (online since 6 June 2011). New in this version:
    • User can upload full length MHC class II beta chain and have the server predict MHC restricted peptides from any given protein of interest.
    2.0 (online since 17 Nov 2010). New in this version:
    • New concurent algorithm used to train the network.
    1.1 (online since 15 April 2010). New in this version:
    • %-rank measure include for each prediction value. The %-rank score give the rank of the prediction score to a distribution of prediction scores from 200.000 natural random 15mer peptides.
    1.0 Original version (online version until April 15 2010):

    Main publication:

    • Quantitative predictions of peptide binding to any HLA-DR molecule of known sequence: NetMHCIIpan.
      Nielsen M, et al. (2008) PLoS Comput Biol. Jul 4;4(7):e1000107. View the full text article at PLoS Compu: Full text.

    Software Downloads




    GETTING HELP

    If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

    If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

    Correspondence: Technical Support: