DTU Health Tech

Department of Health Technology

TCRbase - 1.0

Sequence similarity-based prediction of peptide-TCR binding.

TCRbase-1.0: a similarity-based model to predict TCR specificity using CDR1, CDR2 and CDR3 loops.

Submit data

Paste in CDR sequences. One TCR sequence per line is required. The CDR loops should be tab-separated.
Alternatively, load an example or upload a file from your local machine.

Only amino acid input is accepted. For detailed instructions, see Instructions tab above.

Test sequences submission

Paste the sequence(s):

... or load some sample data:

... or upload a local file:

Training sequences submission

Upload the TCRs database (example):

Weigths on the CDR loops

Instructions for TCRbase-1.0

Input format

  • The server only accepts amino acid sequences takes in newspace separated TCR sequences. The input file should contain the 6 CDR loops, tab-separated. The input file can also contain the target as last column.
  • Note that, even in the case where only CDR3s are used, CDR1 and CDR2 sequences must be present in the input file. If these are not available, they can be filled with dummy values, such as "XXXXX"


  1. Paste the test CDR sequence(s) into the box; an example of the input format (both for the box or file upload) is shown in the paste box;
  2. Load an test example input or upload a test file from your local machine. It is possible to either upload a file or load an example. Please refer to the input format description;
  3. Upload a training set from your local machine. Also this file should follow the input format;
  4. Type the desired weigths to use on the CDRs
Click the submit button when protein sequences are entered.


After the server successfully finishes the job, a Server Output page shows up.
Computational time can range from a couple of seconds to several minutes depending on the queue and the sample size.
The output contains the 6 CDR loops from the query, the prediction, the best hit in the training set (the most similar TCR from the database) and the target value (if part of the input).

Output of TCRbase-1.0

After submitting a query and a database to TCRbase-1.0, an output will be shown. Each row of the output corresponds to a query TCR.
  1. The first six columns contain the CDR sequences of the query TCR.;
  2. The column "prediction" contains TCRbase-1.0 prediction, that is the kernel similarity score between the query TCR and the nearest neighbour in the database;
  3. If the input query contains a column "target", this will be shown also in the output;
  4. The last six columns, i.e. {chain}_db represent the CDR loops of the hit in the database, i.e. query TCR's nearest neighbour.
The output can also be downloaded as a .csv file.


TCRbase-1.0 is a similarity-based model, used to predict TCR specificity. It is solely based on TCR similarities, under the assumption that similar TCRs recognize the same epitope. Kernel similarity [1] similarity measure is used. This measure assigns a similarity score between two sequences by comparing all the k-mers, with k = 1,..,30. For a fixed value of k, the BLOSUM62 score of all the k-mers from the first sequence against the k-mers from the second sequence is computed. The similarity score is then given by the sum of all the BLOSUM scores, for all the values of k.

TCRbase-1.0 requires a training set (database) of TCRs and a test set (query), with the TCRs to predict. Each TCR in the query is scored against the database using the kernel similarity score. The prediction for a given TCR in the test set is then given by the nearest neighbor in the training set. For the CDR3 model, the similarity score is given by the average of similarities of alpha and beta chains. When adding CDR1 and 2 to the model, the overall similarity is given by a weighted average of the similarities of each of the 6 CDR loops (3 for the alpha and 3 for the beta). Previous studies suggest that the CDR3s should be weighted four times higher than CDR1s and 2.

[1] Shen, W.-J., Wong, H.-S., Xiao, Q.-W., Guo, X. & Smale, S. Towards a mathematical foundation of immunology and amino acid chains. arXiv arXiv:1205.6031 (2012).


If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Correspondence: Technical Support: