DTU Health Tech

Department of Health Technology

NetMHCIIpan - 2.1

Pan-specific binding of peptides to MHC class II alleles of known sequence

NetMHCIIpan server predicts binding of peptides to more than 500 HLA-DR alleles using artificial neural networks (ANNs). The prediction values are given in nM IC50 values and as %-Rank to a set of 200.000 random natural peptides. The project is a collaboration between CBS and IMMI.

New in version 2.1. User can upload full length MHC class II beta chain and have the server predict MHC restricted peptides from any given protein of interest.

View the version history of this server. All the previous versions are available on line, for comparison and reference.

SUBMISSION

Type of input

Paste a single sequence or several sequences in FASTA format into the field below:

or submit a file in FASTA format directly from your local disk:

Peptide length  

Select Loci

Select Allele (max 15 per submission) or type allele names (ie DRBX_XXXX) seperated by commas (max 15 per submission)  

For list of allowed allele names click here List of MHC allele names.

or paste a single full length MHC protein sequence in FASTA format into the field below:

or submit a file containing a full length MHC protein sequence in FASTA format directly from your local disk:

Threshold for strong binder: % Rank  OR IC50 
Threshold for weak binder: % Rank  OR IC50 

Use fast mode (recommended for large calculations) 
Sort by affinity 
Print only strongest binding core 
Threshold for filtering predictions (only predictions greater than the threshold are displayed. Value is in 1-log15k units)  
Save prediction to xls file 

(please be patient, the calculation might take a few minutes)

Restrictions:
At most 5000 sequences per submission; each sequence not more than 20,000 amino acids and not less than 9 amino acids.

Confidentiality:
The sequences are kept confidential and will be deleted after processing.


Usage instructions



1. Specify the input sequences

All the input sequences must be in one-letter amino acid code. The allowed alphabet (not case sensitive) is as follows:

A C D E F G H I K L M N P Q R S T V W Y and X (unknown)

All the other symbols will be converted to X before processing.

The server allows for input in either FASTA or PEPTIDE format.

The sequences can be input in the following two ways:

  • Paste a single sequence (just the amino acids) or a number of sequences in FASTA format or a list of peptides into the upper window of the main server page.

  • Select a FASTA or PEPTIDE file on your local disk, either by typing the file name into the lower window or by browsing the disk.

Both ways can be employed at the same time: all the specified sequences will be processed.


2. Customize your run

For FASTA input, select the length of the peptides. The fasta input is divided into overlapping peptides of the given length.

Select the allele(s) you want to make predictions for from the scroll-down menu (select multiple alleles using the ctrl key), or type in the allele names separated by commas (with out blank spaces).

Give threshold value for binding values to be displayed.

Click the box Sort by affinity to have the output sorted by descending predicted binding affinity

Click the box save prediction to xls file to save the raw prediction output to an excel file. This file will be available in the bottum of the results output file.


3. Submit the job

Click on the "Submit" button. The status of your job (either 'queued' or 'running') will be displayed and constantly updated until it terminates and the server output appears in the browser window.

At any time during the wait you may enter your e-mail address and simply leave the window. Your job will continue; you will be notified by e-mail when it has terminated. The e-mail message will contain the URL under which the results are stored; they will remain on the server for 24 hours for you to collect them.

Output format



DESCRIPTION

The prediction output consists of 10 columns.

  • Residue number (starting from 0)
  • HLA Allele
  • Peptide sequence
  • Protein identifier
  • Start position of binding core (starting from 0)
  • Binding core
  • Prediction score (1-log15K(aff))
  • Affinity IC50 value in nM
  • %Rank score - Percential rank for score to a set of 200,000 random natural peptides
  • Binding level, WB for weak biding, SB for strong binder



  • EXAMPLE OUTPUT

    
    
    # Input is in FSA format
    # Threshold for Strong binding peptides  50.000
    # Threshold for Weak binding peptides 500.000
    -----------------------------------------------------------------------------------------------------------------
      pos       HLA            peptide        Identity Pos       Core 1-log50k(aff) Affinity(nM)    %Rank  BindLevel
    -----------------------------------------------------------------------------------------------------------------
        0 DRB1*0401    ASQKRPSQRHGSKYL seq2_optional_c   6  SQRHGSKYL         0.041     10132.07    50.00
        1 DRB1*0401    SQKRPSQRHGSKYLA seq2_optional_c   5  SQRHGSKYL         0.060      8394.55    50.00
        2 DRB1*0401    QKRPSQRHGSKYLAT seq2_optional_c   4  SQRHGSKYL         0.077      7144.65    50.00
        3 DRB1*0401    KRPSQRHGSKYLATA seq2_optional_c   4  QRHGSKYLA         0.098      5824.18    50.00
        4 DRB1*0401    RPSQRHGSKYLATAS seq2_optional_c   3  QRHGSKYLA         0.121      4663.47    50.00
        5 DRB1*0401    PSQRHGSKYLATAST seq2_optional_c   6  SKYLATAST         0.167      3012.29    50.00
        6 DRB1*0401    SQRHGSKYLATASTM seq2_optional_c   6  KYLATASTM         0.324       665.84    32.00
        7 DRB1*0401    QRHGSKYLATASTMD seq2_optional_c   6  YLATASTMD         0.476       153.62     8.00 <= WB
        8 DRB1*0401    RHGSKYLATASTMDH seq2_optional_c   5  YLATASTMD         0.595        48.95     2.00 <= SB
        9 DRB1*0401    HGSKYLATASTMDHA seq2_optional_c   4  YLATASTMD         0.695        18.83     0.40 <= SB
       10 DRB1*0401    GSKYLATASTMDHAR seq2_optional_c   3  YLATASTMD         0.740        12.24     0.15 <= SB
       11 DRB1*0401    SKYLATASTMDHARH seq2_optional_c   2  YLATASTMD         0.722        14.55     0.20 <= SB
       12 DRB1*0401    KYLATASTMDHARHG seq2_optional_c   1  YLATASTMD         0.663        25.44     0.70 <= SB
       13 DRB1*0401    YLATASTMDHARHGF seq2_optional_c   0  YLATASTMD         0.480       148.99     7.00 <= WB
       14 DRB1*0401    LATASTMDHARHGFL seq2_optional_c   0  LATASTMDH         0.225      1727.75    50.00
    
    

    Article abstracts


    Main reference:

    NetMHCIIpan-2.0 - Improved pan-specific HLA-DR predictions using a novel concurrent alignment and weight optimization training procedure
    Nielsen M1, Lundegaard C1, Justesen S2, Lund O1, and Buus S2
    Immunome Res. 2010 Nov 13;6(1):9. .

    1Center for Biological Sequence Analysis, Technical University of Denmark, DK-2800 Lyngby, Denmark
    2Division of Experimental Immunology, Institute of Medical Microbiology and Immunology, University of Copenhagen, Denmark

    BACKGROUND: Binding of peptides to Major Histocompatibility class II (MHC-II) molecules play a central role in governing responses of the adaptive immune system. MHC-II molecules sample peptides from the extracellular space allowing the immune system to detect the presence of foreign microbes from this compartment. Predicting which peptides bind to an MHC-II molecule is therefore of pivotal importance for understanding the immune response and its effect on host-pathogen interactions. The experimental cost associated with characterizing the binding motif of an MHC-II molecule is significant and large efforts have therefore been placed in developing accurate computer methods capable of predicting this binding event. Prediction of peptide binding to MHC-II is complicated by the open binding cleft of the MHC-II molecule, allowing binding of peptides extending out of the binding groove. Moreover, the genes encoding the MHC molecules are immensely diverse leading to a large set of different MHC molecules each potentially binding a unique set of peptides. Characterizing each MHC-II molecule using peptide-screening binding assays is hence not a viable option.

    RESULTS: Here, we present an MHC-II binding prediction algorithm aiming at dealing with these challenges. The method is a pan-specific version of the earlier published allele-specific NN-align algorithm and does not require any pre-alignment of the input data. This allows the method to benefit also from information from alleles covered by limited binding data. The method is evaluated on a large and diverse set of benchmark data, and is shown to significantly out-perform state-of-the-art MHC-II prediction methods. In particular, the method is found to boost the performance for alleles characterized by limited binding data where conventional allele-specific methods tend to achieve poor prediction accuracy.

    CONCLUSIONS: The method thus shows great potential for efficient boosting the accuracy of MHC-II binding prediction, as accurate predictions can be obtained for novel alleles at highly reduced experimental costs. Pan-specific binding predictions can be obtained for all alleles with know protein sequence and the method can benefit by including data in the training from alleles even where only few binders are known. The method and benchmark data are available at www.cbs.dtu.dk/services/NetMHCIIpan-2.0.

    PMID: 21073747

    Full text

    Supplementary Material

    Here, you will find the data set used for training and testing, as well as the T cell epitope data used for evaluation of the NetMHCIIpan-3.2 method.


    Training data

    The training binding data are partitioned in 5 files to be used for cross-validation. For instance does the train1 file contain training data, and test1 file test data for the first cross-validation partitioning. It is critical that this data partitioning is maintained.

    The format for each of the files is

    AAAGAEAGKATTEEQ 0.190842        DRB1_0101
    AAAGAEAGKATTEEQ 0.006301        DRB1_0301
    AAAGAEAGKATTEEQ 0.066851        DRB1_0401
    AAAGAEAGKATTEEQ 0.006344        DRB1_0405
    AAAGAEAGKATTEEQ 0.035130        DRB1_0701
    AAAGAEAGKATTEEQ 0.006288        DRB1_0802
    AAAGAEAGKATTEEQ 0.176268        DRB1_0901
    AAAGAEAGKATTEEQ 0.042555        DRB1_1101
    AAAGAEAGKATTEEQ 0.114855        DRB1_1302
    AAAGAEAGKATTEEQ 0.006377        DRB1_1501
    

    where the first column gives the peptide, the second column the log50k transformed binding affinity (i.e. 1 - log50k( aff nM)), and the last column the class II allele.

    When classifying the peptides into binders and non-binders for calculation of the AUC values for instance, a threshold of 500 nM is used. This means that peptides with log50k transformed binding affinity values greater than 0.426 are classified as binders.

    train1 (Train data) test1 (Test data)
    train2 (Train data) test2 (Test data)
    train3 (Train data) test3 (Test data)
    train4 (Train data) test4 (Test data)
    train5 (Train data) test5 (Test data)

    T cell evaluation data

    T cell evaluation data

    The format is

    >0705172A=AAHAEINEA=H2-IAb 385 gi|223299|prf||0705172A
    GSIGAASMEFCFDVFKELKVHHANENIFYCPIAIMSALAMVYLGAKDSTRTQINKVVRFD
    KLPGFGDSIEAQCGTSVNVHSSLRDILNQITKPNDVYSFSLASRLYAEERYPILPEYLQC
    VKELYRGGLEPINFQTAADQARELINSWVESQTNGIIRNVLQPSSVDSQTAMVLVNAIVF
    KGLWEKAFKDEDTQAMPFRVTEQESKPVQMMYQIGLFRVASMASEKMKILELPFASGTMS
    MLVLLPDEVSGLEQLESIINFEKLTEWTSSNVMEERKIKVYLPRMKMEEKYNLTSVLMAM
    GITDVFSSSANLSGISSAESLKISQAVHAAHAEINEAGREVVGSAEAGVDAASVSEEFRA
    DHPFLFCIKHIATNAVLFFGRCVSP
    

    where the first part of the fasta header contains the proteinID (0705172A), the epitope (AAHAEINEA), and the MHC restriction (H2-IAb)


    References

    Improved methods for predicting peptide binding affinity to MHC class II molecules.
    Jensen KK, Andreatta M, Marcatili P, Buus S, Greenbaum JA, Yan Z, Sette A, Peters B, Nielsen M.
    Immunology. 2018 Jan 6. doi: 10.1111/imm.12889.
    PubMed: 29315598

    Software Downloads




    GETTING HELP

    If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

    If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

    Correspondence: Technical Support: