Motivation: a new approach to the prediction of eukaryotic Pol II
promoters from DNA sequence takes advantage of a combination of elements
similar to neural networks and genetic algorithms to recognize a set of
discrete subpatterns with variable separation as one pattern, a promoter. The
neural networks use as input a small window of DNA sequence, as well as the
output of other neural networks. Through the use of genetic algorithms, the
weights in the neural networks are optimized to maximally discriminate between
promoters and non-promoters.
Results: after several thousand generations of optimization, the
algorithm was able to discriminate between vertebrate promoter and non-promoter
sequences in a test set with a correlation coefficient of 0.63. In addition,
all five known transcription start sites on the plus strand of the complete
Adenovirus genome were within 161 bp of 35 predicted transcription start
sites. On standardized test sets consisting of human genomic DNA, the
performance of Promoter 2.0 compares well with other software developed for the
same purpose.