DTU Health Tech

Department of Health Technology

NetMHCIIphosPan - 1.0

Pan-specific binding of phosphorylated peptides to MHC class II molecules of known sequence

The NetMHCIIphosPan-1.0 server predicts binding of phosphorylated peptides to HLA class II molecules using Artificial Neural Networks (ANNs). It is trained on an dataset of over 19,000 measurements of mass-spectrometry eluted ligands (EL), covering the three human MHC class II isotypes: HLA-DR, HLA-DP and HLA-DQ.

The server allows predictions for any HLA class II molecule with a known sequence, which users can input in FASTA format. Predictions can be made for phosphorylated peptides of any length.

The output of the method is a prediction score for the likelihood of a phosphorylated peptide to be naturally presented by an HLA-II receptor of choice. The output also includes a %rank score, which normalizes the prediction score by comparing to predictions of a set of random phosphopeptides. Optionally, the model also outputs binding affinity (BA) prediction and %rank scores.

It is important to note that this model was also trained on over 120,000 unmodified peptides measured in binding affinity (BA) experiments and over 145,000 unmodified MS-eluted ligands (EL) from the NetMHCIIpan-4.3 training dataset.

All the same, this method is specifically tailored to make binding predictions for phosphorylated peptides (i.e. peptides with one or more phosphorylated serine, threonine, or tyrosine). To make predictions for unmodified peptides, please refer to the NetMHCIIpan-4.3 method.

The project is a collaboration between DTU-Bioinformatics, and LIAI.

SUBMISSION

Hover the mouse cursor over the symbol for a short description of the options

INPUT TYPE:

Paste a single sequence or several sequences in FASTA format into the field below:

... or upload a file in FASTA format directly from your local disk:

... or load some sample data:


PEPTIDE LENGTH (specify variable length as a comma separated list):  


SELECT Molecules:



Select Allele(s) (max. 15 per submission)



... or type a list of molecules names separated by commas without spaces (max 15 per submission)

For the list of available molecule names click here

Alternatively, upload full length Alpha and Beta chain protein sequences:


ADDITIONAL CONFIGURATION:

Threshold for strong binder (% Rank)  
Threshold for weak binder (% Rank)  

Allow peptide inversion for other loci than HLA-DP 

Turn on filtering options 

Print only the strongest binding core 
Sort output by prediction score 

Save predictions to xls file 

Restrictions:
At most 5000 sequences per submission; each sequence not more than 20,000 amino acids and not less than 9 amino acids. Max 15 MHC alleles per submission.

Confidentiality:
The sequences are kept confidential and will be deleted after processing.


CITATIONS

For publication of results, please cite:

    NetMHCIIphosPan. A tool for prediction of HLA class II presentation of phosphorylated peptides.
    Heli M. Garcia Alvarez, Saghar Kaabinejadian, Hooman Yari, William H. Hildebrand, Alessandro Sette, Bjoern Peters, Robert Parker, Nicola Ternette and Morten Nielsen.
    In preparation (2025)

PORTABLE VERSION

NetMHCIIphosPan-1.0 is available as a stand-alone software package, with the same functionality as the service above. Ready-to-ship packages exist for Linux and macOS. There is a download page for academic users; other users are requested to contact Health Tech Software Package Manager at health-software@dtu.dk.

Instructions

INPUT DATA

In this section, the user must define the input for the prediction server following these steps:

1) Specify the desired type of input data (FASTA or PEPTIDE) using the drop down menu.

2) Provide the input data by pasting it into the blank field, uploading it using the "Choose File" button, or by loading sample data using the "Load Data" button. All input sequences must be in one-letter amino acid code. The alphabet is as follows (case sensitive):

s t y A C D E F G H I K L M N P Q R S T V W Y and X (unknown)


Please note that "s" represents phosphoserine, "t" represents phosphothreonine and "y" phosphotyrosine.

Any other symbol will be converted to X before processing. A maximum of 5000 sequences are allowed per submission; each sequence must be between 9 and 20,000 amino acids long.

3) If FASTA was selected, the user must select the peptide length(s) for the prediction server. NetMHCIIphosPan-1.0 will "chop" the FASTA sequence into overlapping peptides of the selected length and predict binding for each. By default, input proteins are digested into 15-mer peptides. If PEPTIDE was selected, this step is unnecessary, and the peptide length selector will not appear.

Note that context encoding is NOT available for this method.

Input

MHC SELECTION

In this section, the user must define which MHC molecule(s) to predict against:

1) Select MHC molecules from a list by selecting a group and choosing MHCs. MS-COVERED refers to molecules covered by the NetMHCIIphosPan-1.0 training data.

2) Alternatively, the user can type the molecule names. Both ALPHA and BETA chains must be typed (see List of MHC molecule names). Selections from step 1 populate this bar.

3) If the desired molecule is not in the list, the user can input ALPHA and BETA sequences in FASTA format. Rank score predictions are not available in this case.

MHCSelection

ADDITIONAL CONFIGURATION

In this section, additional parameters can be defined to customize the run:

1) Specify thresholds for strong and weak binders (%Rank). Peptides identified in the top x% are strong binders (default: 1%). Peptides between strong and weak thresholds are weak binders (default: 5%).

2) Include Binding Affinity predictions alongside Eluted Ligand likelihood.

3) Enable peptide inversion prediction for all selected MHC-II molecules (optional; default is HLA-DP only).

4) Output only peptides below a specified %Rank score (useful for large submissions).

5) Output only the strongest binding core.

6) Sort output by descending prediction score.

7) Export output to .XLS format.

OutputOptions

SUBMISSION

After completing the "INPUT DATA", "MHC SELECTION", and "ADDITIONAL CONFIGURATION" steps, the submission can now be done. Click "Submit" to send the job to the server, or click "Clear fields" to reset the form.

Job status ('queued' or 'running') will be displayed and updated until it terminates, and the output will appear in the browser window.

After completion, an output page will be delivered. A description of the output format can be found here.

You can enter your email address at any time to receive notification when the job is complete.

SubmissionOptions

Output format


EXAMPLE OUTPUT

For the following FASTA input example:

>sp|Q16655|MAR1_HUMAN Melanoma antigen recognized by T-cells 1 OS=Homo sapiens OX=9606 GN=MLANA PE=1 SV=1
MPREDAHFIYGYPKKGHGHSYTTAEEAAGIGILTVILGVLLLIGCWYCRRRNGYRALMDK
SLHVGTQCALTRRCPQEGFDHRDSKVsLQEKNCEPVVPNAPPAyEKLsAEQSPPPysP


With parameters:

Peptide length: 15
Allele: DRB1_0101
Sort by prediction score: On

NetMHCIIphosPan-1.0 will return the following output (showing the top predicted peptides):

# NetMHCIIphosPan version 1.0a

# Input is in FASTA format

# Peptide length 15

# Prediction Mode: EL

# Threshold for Strong binding peptides (%Rank)  1.00%
# Threshold for Weak binding peptides (%Rank)    5.00%

# DRB1_0101 : Distance to training data  0.000 (using nearest neighbor DRB1_0101)

# Allele: DRB1_0101
--------------------------------------------------------------------------------------------------------------------------------------------
 Pos               MHC              Peptide   Of        Core  Core_Rel Inverted        Identity      Score_EL %Rank_EL  Exp_Bind  BindLevel
--------------------------------------------------------------------------------------------------------------------------------------------
 101         DRB1_0101      PPAyEKLsAEQSPPP    3   yEKLsAEQS     1.000        0 sp_Q16655_MAR1_      0.866951     0.09        NA   <= SB
  85         DRB1_0101      KVsLQEKNCEPVVPN    3   LQEKNCEPV     0.900        0 sp_Q16655_MAR1_      0.034566    14.72        NA
  82         DRB1_0101      RDSKVsLQEKNCEPV    3   KVsLQEKNC     0.720        0 sp_Q16655_MAR1_      0.020288    19.72        NA
  76         DRB1_0101      QEGFDHRDSKVsLQE    3   FDHRDSKVs     0.960        0 sp_Q16655_MAR1_      0.019929    19.89        NA
  93         DRB1_0101      CEPVVPNAPPAyEKL    3   VVPNAPPAy     0.930        0 sp_Q16655_MAR1_      0.011383    25.98        NA
  83         DRB1_0101      DSKVsLQEKNCEPVV    3   VsLQEKNCE     0.100        0 sp_Q16655_MAR1_      0.006428    32.83        NA
  90         DRB1_0101      EKNCEPVVPNAPPAy    3   CEPVVPNAP     0.690        0 sp_Q16655_MAR1_      0.001225    54.80        NA
  97         DRB1_0101      VPNAPPAyEKLsAEQ    3   APPAyEKLs     0.310        0 sp_Q16655_MAR1_      0.000078    87.60        NA
--------------------------------------------------------------------------------------------------------------------------------------------
Number of strong binders: 1 Number of weak binders: 0
--------------------------------------------------------------------------------------------------------------------------------------------


DESCRIPTION


The prediction output for each molecule consists of the following columns:

  • Pos Residue number (starting from 0)

  • MHC MHC molecule name

  • Peptide Amino acid sequence

  • Of Starting position offset of the optimal binding core (starting from 0)

  • Core Binding core register

  • Core_Rel Reliability of the binding core, expressed as the fraction of networks in the ensemble selecting the optimal core

  • Inverted Whether the peptide binds inverted to the given MHC molecule (1: inverted, 0: forward)

  • Identity Annotation of the input sequence, if specified

  • Score_EL Eluted ligand prediction score

  • %Rank_EL Percentile rank of eluted ligand prediction score

  • Exp_bind If the input was given in PEPTIDE format with an annotated affinity value (mainly for benchmarking purposes).

  • Score_BA Predicted binding affinity in log-scale (printed only if binding affinity predictions were selected)

  • Affinity(nM) Predicted binding affinity in nanomolar IC50 (printed only if binding affinity predictions were selected)

  • %Rank_BA % Rank of predicted affinity compared to a set of 100.000 random natural peptides. This measure is not affected by inherent bias of certain molecules towards higher or lower mean predicted affinities (printed only if binding affinity predictions were selected)

  • BindLevel (SB: strong binder, WB: weak binder). The peptide will be identified as a strong binder if the % Rank is below the specified threshold for the strong binders. The peptide will be identified as a weak binder if the % Rank is above the threshold of the strong binders but below the specified threshold for the weak binders.

  • Article abstract




    NetMHCIIphosPan. A tool for prediction of HLA class II presentation of phosphorylated peptides.
    Heli M. Garcia Alvarez, Saghar Kaabinejadian, Hooman Yari, William H. Hildebrand, Alessandro Sette, Bjoern Peters, Robert Parker, Nicola Ternette and Morten Nielsen.

    In preparation (2025)

    Supplementary material


    Training data

    Here, you will find the datasets used for the training of NetMHCIIphosPan-1.0.

    NetMHCIIphosPan_train.tar.gz

    Download the file and untar the content using:

    cat NetMHCIIphosPan_train.tar.gz | tar xvf -
    

    This will create the directory called NetMHCIIphosPan_train. In this directory you will find 12 files.

    • 10 files (final_c00?_ba, final_c00?_el) with partitions with Binding Affinity (BA) or Eluted Ligand data (EL).

    • The format for the EL files is the following:

      	LTGIKHELQANCyEEVKDR 1 Racle__3869_GA 1
      	LtGMAFRVPTANVSVVD 1 Racle__TIL3 1
      	LtHCQDINECLTLG 1 Saghar_9090_DQ 1
      	LTKIHPKAFLtTKK 1 Racle__3830NJF 1
      	LtLHKPTQVMPCRAPKVG 1 Racle__TIL3 1
      	...
      	VKKFPRFRNREL 0 PvanBalen_DP_AZP_2877 0
      	GMRLKEAGNINR 0 PvanBalen_DP_AZP_2877 0
      
      	
      where the different columns are peptide (1st), target value (2nd), MHC-molecule/cell-line (3rd) and peptide type (unmodified=0, phosphorylated=1) (4th).
      In cases where the 3rd column is a cell-line ID, the MHC molecules expressed by the corresponding cell-line are listed in the allelelist file.
      The BA files do not contain the 4th column, as they only include unmodified peptides.

    • The allelelist file contains the information about alleles expressed in each cell line dataset.
    • The MHC pseudo sequences for each MHC molecule can be found in the pseudosequence.2023.all_HLA.X.dat file.



    External benchmark data

    You can download the external benchmark dataset used in the NetMHCIIphosPan-1.0 publication here:
    NetMHCIIphosPan_ext_benchmark.tar.gz

    Download and untar the file as indicated above.
    In the directory called NetMHCIIphosPan_ext_benchmark you will find 3 files.

    • The external benchmark data with the same format (3 first columns only) as the training data mentioned before.
    • Two allelelist files:
      • The allelelist_all file which contains information about all alleles expressed in each cell line dataset.
      • The allelelist_reduced_MixMHC2pred file which only contains information about alleles expressed in each cell line dataset that are also covered by the MixMHC2pred-1.3 tool.

    Version history


    Please click on the version number to activate the corresponding server.

    1.0 The current version (online since January 2025).

    Main publication:
    • NetMHCIIphosPan. A tool for prediction of HLA class II presentation of phosphorylated peptides.

      Heli M. Garcia Alvarez, Saghar Kaabinejadian, Hooman Yari, William H. Hildebrand, Alessandro Sette, Bjoern Peters, Robert Parker, Nicola Ternette and Morten Nielsen.

      In preparation (2025)

    This service offers no downloadable software

    See a list of available software


    GETTING HELP

    If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

    If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

    Correspondence: Technical Support: