Services
TCRbase - 1.0
Sequence similarity-based prediction of peptide-TCR binding.
TCRbase-1.0: a similarity-based model to predict TCR specificity using CDR1, CDR2 and CDR3 loops.
Submit data
Instructions for TCRbase-1.0
Input format
- The server only accepts amino acid sequences takes in newspace separated TCR sequences. The input file should contain the 6 CDR loops, tab-separated. The input file can also contain the target as last column.
- Note that, even in the case where only CDR3s are used, CDR1 and CDR2 sequences must be present in the input file. If these are not available, they can be filled with dummy values, such as "XXXXX"
Submission
- Paste the test CDR sequence(s) into the box; an example of the input format (both for the box or file upload) is shown in the paste box;
- Load an test example input or upload a test file from your local machine. It is possible to either upload a file or load an example. Please refer to the input format description;
- Upload a training set from your local machine. Also this file should follow the input format;
- Type the desired weigths to use on the CDRs
Output
After the server successfully finishes the job, a Server Output page shows up.Computational time can range from a couple of seconds to several minutes depending on the queue and the sample size.
The output contains the 6 CDR loops from the query, the prediction, the best hit in the training set (the most similar TCR from the database) and the target value (if part of the input).
Output of TCRbase-1.0
After submitting a query and a database to TCRbase-1.0, an output will be shown. Each row of the output corresponds to a query TCR.
- The first six columns contain the CDR sequences of the query TCR.;
- The column "prediction" contains TCRbase-1.0 prediction, that is the kernel similarity score between the query TCR and the nearest neighbour in the database;
- If the input query contains a column "target", this will be shown also in the output;
- The last six columns, i.e. {chain}_db represent the CDR loops of the hit in the database, i.e. query TCR's nearest neighbour.
Description
TCRbase-1.0 is a similarity-based model, used to predict TCR specificity. It is solely based on TCR similarities, under the assumption that similar TCRs recognize the same epitope. Kernel similarity [1] similarity measure is used. This measure assigns a similarity score between two sequences by comparing all the k-mers, with k = 1,..,30. For a fixed value of k, the BLOSUM62 score of all the k-mers from the first sequence against the k-mers from the second sequence is computed. The similarity score is then given by the sum of all the BLOSUM scores, for all the values of k.
TCRbase-1.0 requires a training set (database) of TCRs and a test set (query), with the TCRs to predict. Each TCR in the query is scored against the database using the kernel similarity score. The prediction for a given TCR in the test set is then given by the nearest neighbor in the training set. For the CDR3 model, the similarity score is given by the average of similarities of alpha and beta chains. When adding CDR1 and 2 to the model, the overall similarity is given by a weighted average of the similarities of each of the 6 CDR loops (3 for the alpha and 3 for the beta). Previous studies suggest that the CDR3s should be weighted four times higher than CDR1s and 2.
[1] Shen, W.-J., Wong, H.-S., Xiao, Q.-W., Guo, X. & Smale, S. Towards a mathematical foundation of immunology and amino acid chains. arXiv arXiv:1205.6031 (2012).