Interest in signal peptides has for a long time been one of the
hot topics in bioinformatics. The importance of signal peptides
was emphasized in 1999 when Günter Blobel received the Nobel Prize in
physiology or medicine for his discovery "proteins have intrinsic
signal that govern their transport and localization in the cell".
He pointed out the importance of defined peptide motifs for
targeting proteins to their site of function.
The press release can be read
here
For biological background of protein localization we refer to the following
pages.
Signal peptides
Signal anchors
Other secretory signals
A very important task in machine learning methods is to obtain a clean and accurate dataset for training
and testing. Bias and noise in the data set often lead to wrong predictions, which is undesirable.
Description of data sets
Dataset extraction
Dataset cleanup
Sequence logos
Length distributions
Characteristics of signal peptides
Download the training sets
With the current growth of sequence databases and speed of genome sequencing,
accurate prediction methods have become increasingly important.
For SignalP we have focused on neural networks as well as Hidden Markov Models.
Neural Networks
Hidden Markov Models
Any machine learning approach must be evaluated to test the predictive performance on unknown sequences.
Performance of the current prediction method
Five fold crossvalidation
Independent test set by Menne
Signal anchor prediction
The information on these pages are partly generated by the initial creator of SignalP, Henrik Nielsen. The information provided have been updated with new knowledge, but most of the biological background text emerges from Henriks work.