References


Main references:


Gapped sequence alignment using artificial neural networks: application to the MHC class I system
Massimo Andreatta1 and Morten Nielsen1,2

Bioinformatics, Feb 15;32(4):511-7 2016

1Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martin, San Martín, Buenos Aires, Argentina
2Center for Biological Sequence Analysis, Technical University of Denmark, DK-2800 Lyngby, Denmark

Motivation: Many biological processes are guided by receptor interactions with linear ligands of variable length. One such receptor is the MHC class I molecule. The length preferences vary depending on the MHC allele, but are generally limited to peptides of length 8 to 11 amino acids. On this relatively simple system, we developed a sequence alignment method based on artificial neural networks that allows insertions and deletions in the alignment.
Results: We show that prediction methods based on alignments that include insertions and deletions have significantly higher performance than methods trained on peptides of single lengths. Also, we illustrate how the location of deletions can aid the interpretation of the modes of binding of the peptide-MHC, as in the case of long peptides bulging out of the MHC groove or protruding at either terminus. Finally, we demonstrate that the method can learn the length profile of different MHC molecules, and quantified the reduction of the experimental effort required to identify potential epitopes using our prediction algorithm.
Availability: The NetMHC-4.0 method for the prediction of peptide-MHC class I binding affinity using gapped sequence alignment is publicly available at: http://www.cbs.dtu.dk/services/NetMHC-4.0.
Contact: mniel@cbs.dtu.dk
Supplementary information: Supplementary data are available at Bioinformatics online.

PMID: 26515819   [PDF]


NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8–11
Lundegaard C1Lamberth K2Harndahl M2Buus S2Lund O1Nielsen M1

Nucleic Acids Research 36 (suppl 2): W509-W512. 2008

1Center for Biological Sequence Analysis, Technical University of Denmark, DK-2800 Lyngby, Denmark
2Division of Experimental Immunology, Institute of Medical Microbiology and Immunology, University of Copenhagen, Denmark

NetMHC-3.0 is trained on a large number of quantitative peptide data using both affinity data from the Immune Epitope Database and Analysis Resource (IEDB) and elution data from SYFPEITHI. The method generates high-accuracy predictions of major histocompatibility complex (MHC): peptide binding. The predictions are based on artificial neural networks trained on data from 55 MHC alleles (43 Human and 12 non-human), and position-specific scoring matrices (PSSMs) for additional 67 HLA alleles. As only the MHC class I prediction server is available, predictions are possible for peptides of length 8–11 for all 122 alleles. artificial neural network predictions are given as actual IC50 values whereas PSSM predictions are given as a log-odds likelihood scores. The output is optionally available as download for easy post-processing. The training method underlying the server is the best available, and has been used to predict possible MHC-binding peptides in a series of pathogen viral proteomes including SARS, Influenza and HIV, resulting in an average of 75–80% confirmed MHC binders. Here, the performance is further validated and benchmarked using a large set of newly published affinity data, non-redundant to the training set.

PMID: 18463140   (full text version available)


Reliable prediction of T-cell epitopes using neural networks with novel sequence representations.
Nielsen M1, Lundegaard C1, Worning P1, Lauemoller SL2, Lamberth K2,Buus S2, Brunak S1, Lund O1

Protein Sci., 12:1007-17, 2003.

1Center for Biological Sequence Analysis, Technical University of Denmark, DK-2800 Lyngby, Denmark
2Division of Experimental Immunology, Institute of Medical Microbiology and Immunology, University of Copenhagen, Denmark

In this paper we describe an improved neural network method to predict T-cell class I epitopes. A novel input representation has been developed consisting of a combination of sparse encoding, Blosum encoding, and input derived from hidden Markov models. We demonstrate that the combination of several neural networks derived using different sequence-encoding schemes has a performance superior to neural networks derived using a single sequence-encoding scheme. The new method is shown to have a performance that is substantially higher than that of other methods. By use of mutual information calculations we show that peptides that bind to the HLA A*0204 complex display signal of higher order sequence correlations. Neural networks are ideally suited to integrate such higher order correlations when predicting the binding affinity. It is this feature combined with the use of several neural networks derived from different and novel sequence-encoding schemes and the ability of the neural network to be trained on data consisting of continuous binding affinities that gives the new method an improved performance. The difference in predictive performance between the neural network methods and that of the matrix-driven methods is found to be most significant for peptides that bind strongly to the HLA molecule, confirming that the signal of higher order sequence correlation is most strongly present in high-binding peptides. Finally, we use the method to predict T-cell epitopes for the genome of hepatitis C virus and discuss possible applications of the prediction method to guide the process of rational vaccine design.

PMID: 2323871   (full text version available)


Sensitive quantitative predictions of peptide-MHC binding by a 'Query by Committee' artificial neural network approach.
Buus S1, Lauemoller SL1, Worning P2, Kesmir C2, Frimurer T2, Corbet S3, Fomsgaard A3, Hilden J4, Holm A5, Brunak S2.
Tissue Antigens., 62:378-84, 2003.

1Division of Experimental Immunology, Institute of Medical Microbiology and Immunology, University of Copenhagen, Denmark
2Center for Biological Sequence Analysis, Technical University of Denmark, DK-2800 Lyngby, Denmark
3Department of Virology, State Serum Institute, Denmark
4Department of Biostatistics, University of Copenhagen, Denmark
5Research Center for Medical Biotechnology, Chemistry Department, Royal Veterinary and Agricultural University, Denmark

We have generated Artificial Neural Networks (ANN) capable of performing sensitive, quantitative predictions of peptide binding to the MHC class I molecule, HLA-A*0204. We have shown that such quantitative ANN are superior to conventional classification ANN, that have been trained to predict binding vs non-binding peptides. Furthermore, quantitative ANN allowed a straightforward application of a 'Query by Committee' (QBC) principle whereby particularly information-rich peptides could be identified and subsequently tested experimentally. Iterative training based on QBC-selected peptides considerably increased the sensitivity without compromising the efficiency of the prediction. This suggests a general, rational and unbiased approach to the development of high quality predictions of epitopes restricted to this and other HLA molecules. Due to their quantitative nature, such predictions will cover a wide range of MHC-binding affinities of immunological interest, and they can be readily integrated with predictions of other events involved in generating immunogenic epitopes. These predictions have the capacity to perform rapid proteome-wide searches for epitopes. Finally, it is an example of an iterative feedback loop whereby advanced, computational bioinformatics optimize experimental strategy, and vice versa.

PMID: 14617044   (full text version available)