DTU Health Tech

Department of Health Technology

NetTCR - 2.2

Sequence-based prediction of peptide-TCR binding.

NetTCR-2.2 allows for predictions of binding probability between a T-cell receptor (TCR) and MHC-I peptides. In contrast to NetTCR 2.1, NetTCR 2.2 has a pre-trained component, which has been trained on 26 different MHC-I peptides, prior to training further models specific for each of these peptides.

NetTCR-2.2 thus contains a model specific for predictions on each of the following peptides:
GILGFVFTL, RAKFKQLL, KLGGALQAK, AVFDRKSDAK, ELAGIGILTV, NLVPMVATV, IVTDFSVIK, LLWNGPMAV, CINGVCWTV, GLCTLVAML, SPRWYFYYL, ATDALMTGF, DATYQRTRALVR, KSKRTPMGF, YLQPRTFLL, HPVTKYIM, RFPLTFGWCF, GPRLGVRAT, CTELKLSDY, RLRAEAQVK, RLPGVLPRA, SLFNTVATLY, RPPIFIRRL, FEDLRLLSF, VLFGLGFAI and FEDLRVLSF

To further improve performance, the model predictions are scaled by similarity to known binders, which is calculated with the TCRbase tool.


Pan-specific prediction

While NetTCR-2.2 primarily attempts to use the pre-trained models, which have the best performance, pan-specific predictions can also be carried out for peptides other than the one listed above.

Note however that performance may vary a lot, and that performance is generally poor for peptides that are not highly similar (>95% kernel similarity) to the peptides in the training data (in other words, use with caution!). Predictions for these peptides are also not scaled via TCRbase.


Submit data


Paste in peptide- and CDR amino acid sequences, or upload a csv/text file containing sequences in the following order:
Peptide,CDR1α,CDR2α,CDR3α,CDR1β,CDR2β,CDR3β

One TCR sequence per line is required with no header. For each TCR, the different peptide- and CDR sequences should be comma separated. A binding-label (named "binder" in the output) can be included as an extra column after CDR3β, but any additional inputs will be given arbitrary column names in the output.

Only amino acid input is accepted. For detailed instructions, see Instructions tab above.

For an overview of the method and citation information, see Abstract tab.


Sequence submission

Paste the sequence(s):

or load some sample data:
or upload a local file:

Similarity scaling factor (α) 
Percentile rank threshold 

Cite

Jensen, M. F., & Nielsen, M. (2023). NetTCR 2.2—Improved TCR specificity predictions by combining pan- and peptide-specific training strategies, loss-scaling and integration of sequence similarity. bioRxiv. https://doi.org/10.1101/2023.10.12.562001

Instructions for NetTCR-2.2

Input format

  • The server only accepts amino acid sequences in the form of peptide- and CDR sequences for the first seven inputs per datapoint. These sequences should be comma-separated, and in the following order:
    Peptide, CDR1α, CDR2α, CDR3α, CDR1β, CDR2β, CDR3β.

  • These sequences should have a maximum length of 12, 7, 7, 22, 6, 7, 23 amino acids, respectively, and should contain only uppercase standard amino acid;

Submission

  1. Paste the peptide and CDR sequences into the box (A1), load an example file (A2), or load a file from your local machine (A3). The input file should be a text or .csv file with no headers for the columns.

  2. The predictions are scaled via similarity to known binders, which has been shown to improve performance. This scaling is done via TCRbase lifted to a power of α (default of 10);

  3. This default scaling factor can be changed (B), and if set to 0, turns off the TCRbase scaling. If a model for a peptide does not exist, the TCRbase scaling is not performed.

  4. It is also possible to filter the shown output, so that only observations with a percentile rank at or below a given threshold are reported (C). By default, all observations are shown regardless of percentile rank. If another threshold is selected, observations where the percentile rank could not be determined are still shown.

Click the Submit button (D) when all the sequences have been entered, or press Clear Fields (E) to reset everything.

Cite

Jensen, M. F., & Nielsen, M. (2023). NetTCR 2.2—Improved TCR specificity predictions by combining pan- and peptide-specific training strategies, loss-scaling and integration of sequence similarity. bioRxiv. https://doi.org/10.1101/2023.10.12.562001

Abstract

The ability to predict binding between peptides presented by the Major Histocompatibility Complex (MHC) class I molecules and T-cell receptors (TCR) is of great interest in areas of vaccine development, cancer treatment and treatment of autoimmune diseases. However, the scarcity of paired-chain data, combined with the bias towards a few well-studied epitopes, has challenged the development of pan-specific machine-learning (ML) models with accurate predictive power towards peptides characterized by little or no TCR data. To deal with this, we here benefit from a larger paired-chain peptide-TCR dataset and explore different ML model architectures and training strategies to better deal with imbalanced data. We show that while simple changes to the architecture and training results in greatly improved performance, particularly for peptides with little available data, predictions on unseen peptides remain challenging, especially for peptides distant to the training peptides. We also demonstrate that ML models can be used to detect potential outliers, and that the removal of such outliers from training further improves the overall performance. Furthermore, we show that a model combining the properties of pan-specific and peptide-specific models achieves improved performance, and that performance can be further improved by integrating similarity-based predictions, especially when a low false positive rate is desirable. Moreover, in the context of the IMMREP benchmark, this updated modeling framework archived state-of-the-art performance. Finally, we show that combining all these approaches results in acceptable predictive accuracy for peptides characterized with as little as 15 positive TCRs. This observation thus places great promise on rapidly expanding the peptide covering of the current models for predicting TCR specificity. The final NetTCR 2.2 models are available at https://github.com/mnielLab/NetTCR-2.2, and as a web server at https://services.healthtech.dtu.dk/services/NetTCR-2.2/.



GETTING HELP

If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0). If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Correspondence: Technical Support: