DTU Health Tech

Department of Health Technology

Promoter - 2.0

Transcription start sites in vertebrate DNA


Promoter2.0 predicts transcription start sites of vertebrate PolII promoters in DNA sequences. It has been developed as an evolution of simulated transcription factors that interact with sequences in promoter regions. i It builds on principles that are common to neural networks and genetic algorithms.



Submission


Sequence submission: paste the sequence(s) and/or upload a local file

Paste a single sequence or several sequences in FASTA format into the field below:

Submit a file in FASTA format directly from your local disk:

Full output


Restrictions
At most 50 sequences and 1,500,000 nucleotides in total per submission.
Confidentiality
The sequences are kept confidential and will be deleted after processing.


CITATIONS

For publication of results, please cite:

Promoter 2.0: for the recognition of PolII promoter sequences.
Steen Knudsen
Bioinformatics 15, 356-361, 1999.

Output format


DESCRIPTION

For each input sequence the name and length are first printed, followed by a table in the form:

Position Score Likelihood

where 'Position' is a position in the sequence, 'Score' is the prediction score for a transcription start site occurring within 100 base pairs upstream from that position and 'Likelihood' is a descriptive label associated with that score. The scores are always positive numbers; they are labelled as follows:

below 0.5 ignored
0.5 - 0.8 Marginal prediction
0.8 - 1.0 Medium likely prediction
above 1.0 Highly likely prediction

Consult the performance notes for comments on the prediction scores.

The input sequence will be included in the output, preceeding the predictions if "Full output" has been selected.


EXAMPLE OUTPUT


Promoter 2.0 Prediction Results 

INPUT SEQUENCE:

>gi_209811_gb_J01917_ADRCG Adenovirus type 2, complete genome.
CATCATCATAATATACCTTATTTTGGATTGAAGCCAATATGATAATGAGGGGGTGGAGTT
TGTGACGTGGCGCGGGGCGTGGGAACGGGGCGGGTGACGTAGTAGTGTGGCGGAAGTGTG
ATGTTGCAAGTGTGGCGGAACACATGTAAGCGCCGGATGTGGTAAAAGTGACGTTTTTGG
TGTGCGCCGGTGTATACGGGAAGTGACAATTTTCGCGCGGTTTTAGGCGGATGTTGTAGT
AAATTTGGGCGTAACCAAGTAATGTTTGGCCATTTTCGCGGGAAAACTGAATAAGAGGAA
GTGAAATCTGAATAATTCTGTGTTACTCATAGCGCGTAATATTTGTCTAGGGCCGCGGGG
ACTTTGACCGTTTACGTGGAGACTCGCCCAGGTGTTTTTCTCAGGTGTTTTCCGCGTTCC
GGGTCAAAGTTGGCGTTTTATTATTATAGTCAGCTGACGCGCAGTGTATTTATACCCGGT
GAGTTCCTCAAGAGGCCACTCTTGAGTGCCAGCGAGTAGAGTTTTCTCCTCCGAGCCGCT
CCGACACCGGGACTGAAAATGAGACATATTATCTGCCACGGAGGTGTTATTACCGAAGAA
ATGGCCGCCAGTCTTTTGGACCAGCTGATCGAAGAGGTACTGGCTGATAATCTTCCACCT
CCTAGCCATTTTGAACCACCTACCCTTCACGAACTGTATGATTTAGACGTGACGGCCCCC
GAAGATCCCAACGAGGAGGCGGTTTCGCAGATTTTTCCCGAGTCTGTAATGTTGGCGGTG
CAGGAAGGGATTGACTTATTCACTTTTCCGCCGGCGCCCGGTTCTCCGGAGCCGCCTCAC
CTTTCCCGGCAGCCCGAGCAGCCGGAGCAGAGAGCCTTGGGTCCGGTTTCTATGCCAAAC
CTTGTGCCGGAGGTGATCGATCTTACCTGCCACGAGGCTGGCTTTCCACCCAGTGACGAC
GAGGATGAAGAGGGTGAGGAGTTTGTGTTAGATTATGTGGAGCACCCCGGGCACGGTTGC
AGGTCTTGTCATTATCACCGGAGGAATACGGGGGACCCAGATATTATGTGTTCGCTTTGC
TATATGAGGACCTGTGGCATGTTTGTCTACAGTAAGTGAAAATTATGGGCAGTCGGTGAT
AGAGTGGTGGGTTTGGTGTGGTAATTTTTTTTTAATTTTTACAGTTTTGTGGTTTAAAGA


PREDICTED TRANSCRIPTION START SITES:

gi_209811_gb_J01917_ADRCG Adenovirus type 2, complete genome., 1200 nucleotides

  Position  Score  Likelihood
       600  1.063  Highly likely prediction

Performance notes



The accuracy of the software has been tested on a set of 100 vertebrate promoters. The positions scoring 0.5-0.8 (Marginal predictions) contain about 65% true transcription start sites within 100 base pairs upstream. The positions scoring 0.8-1.0 (Medium likely predictions) are about 80% true. Finally, the positions scoring above 1.0 (Highly likely predictions) are about 95% true. On average, the software picks up about 80% of all PolII promoters. These numbers are rough estimates based on a limited test set.

For a favorable comparison of this software to other promoter prediction software, see:

Eukaryotic promoter recognition.
J.W. Fickett and A.G. Hatzigeorgiou.
Genome Res. 7(9), 861-878, 1997.

References


Promoter 2.0: for the recognition of PolII promoter sequences.
S. Knudsen., Bioinformatics,15, 356-361,1999

Abstract

Motivation: a new approach to the prediction of eukaryotic Pol II promoters from DNA sequence takes advantage of a combination of elements similar to neural networks and genetic algorithms to recognize a set of discrete subpatterns with variable separation as one pattern, a promoter. The neural networks use as input a small window of DNA sequence, as well as the output of other neural networks. Through the use of genetic algorithms, the weights in the neural networks are optimized to maximally discriminate between promoters and non-promoters.

Results: after several thousand generations of optimization, the algorithm was able to discriminate between vertebrate promoter and non-promoter sequences in a test set with a correlation coefficient of 0.63. In addition, all five known transcription start sites on the plus strand of the complete Adenovirus genome were within 161 bp of 35 predicted transcription start sites. On standardized test sets consisting of human genomic DNA, the performance of Promoter 2.0 compares well with other software developed for the same purpose.

Software Downloads




GETTING HELP

If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Correspondence: Technical Support: