Predicting epitopes recognized by cytotoxic T cells has been a long standing challenge within the field of immuno- and bioinformatics. While reliable predictions of peptide binding are available for most HLA class I alleles, prediction models, due to a lack of training data the accuracy for predicting the interaction between T cell receptors (TCR) and HLA-peptide complexes remains poor. Recent sequencing projects have generated a considerable amount of data relating TCR sequences with their cognate HLA-peptide complex target.

Here, we utilize such data to train a sequence-based predictor of the interaction between TCR and peptides presented by HLA-A*02:01. The model is based on convolutional neural networks, which are especially designed to meet the challenges posed by the large length variations of TCRs. We show that such a sequence-based model allows for the identification of TCRs binding a given cognate peptide-HLA target out of a large pool of non-binding TCRs.