Promoter2.0 predicts transcription start sites of vertebrate PolII promoters in DNA sequences.
It has been developed as an evolution of simulated transcription factors that interact with sequences in promoter regions. i
It builds on principles that are common to neural networks and genetic algorithms.
Sequence submission: paste the sequence(s) and/or upload a local file
At most 50 sequences and 1,500,000 nucleotides in total per submission.
The sequences are kept confidential and will be deleted after processing.
For publication of results, please cite:
Promoter 2.0: for the recognition of PolII promoter sequences.
Steen Knudsen Bioinformatics 15, 356-361, 1999.
For each input sequence the name and length are first printed, followed
by a table in the form:
where 'Position' is a position in the sequence, 'Score' is the prediction
score for a transcription start site occurring within 100 base pairs upstream
from that position and 'Likelihood' is a descriptive label associated with
that score. The scores are always positive numbers; they are labelled as
0.5 - 0.8
0.8 - 1.0
Medium likely prediction
Highly likely prediction
Consult the performance notes for comments
on the prediction scores.
The input sequence will be included in the output, preceeding the predictions
if "Full output" has been selected.
The accuracy of the software has been tested on a set of 100 vertebrate
promoters. The positions scoring 0.5-0.8 (Marginal predictions) contain
about 65% true transcription start sites within 100 base pairs upstream.
The positions scoring 0.8-1.0 (Medium likely predictions) are about 80%
true. Finally, the positions scoring above 1.0 (Highly likely predictions)
are about 95% true. On average, the software picks up about 80% of all PolII
promoters. These numbers are rough estimates based on a limited test set.
For a favorable comparison of this software to other promoter prediction
Promoter 2.0: for the recognition of PolII promoter sequences. S. Knudsen.,
Motivation: a new approach to the prediction of eukaryotic Pol II
promoters from DNA sequence takes advantage of a combination of elements
similar to neural networks and genetic algorithms to recognize a set of
discrete subpatterns with variable separation as one pattern, a promoter. The
neural networks use as input a small window of DNA sequence, as well as the
output of other neural networks. Through the use of genetic algorithms, the
weights in the neural networks are optimized to maximally discriminate between
promoters and non-promoters.
Results: after several thousand generations of optimization, the
algorithm was able to discriminate between vertebrate promoter and non-promoter
sequences in a test set with a correlation coefficient of 0.63. In addition,
all five known transcription start sites on the plus strand of the complete
Adenovirus genome were within 161 bp of 35 predicted transcription start
sites. On standardized test sets consisting of human genomic DNA, the
performance of Promoter 2.0 compares well with other software developed for the