Sequence submission: paste the sequence(s) and/or upload a local file
At most 50 sequences and 1,000,000 nucleotides per submission; each sequence not more than 500,000 nucleotides.
The sequences are kept confidential and will be deleted after processing.
For publication of results, please cite:
Neural network prediction of translation initiation sites in eukaryotes: perspectives for EST and genome analysis.
A.G. Pedersen and H. Nielsen, ISMB: b,226-233,1997.
1. Specify the input sequences
The sequences intended for processing can be input in the following two ways:
Paste a single sequence (just the nucleotides) or a number of sequences in
format into the upper window of the main server page.
Select a FASTA
file on your local disk, either by typing the file name into the lower window
or by browsing the disk.
Both ways can be employed at the same time: all the specified sequence will be
The allowed input alphabet is A, C, G, T, U
and X (unknown); all the other symbols will be converted to X
before processing. T and U are treated as equivalent.
2. Select organism type
Depending on the origin of your input sequences click on either
or "A. Thaliana"
. The former is the default.
3. Submit the job
Click on the "Submit"
button. The status of your job (either 'queued'
or 'running') will be displayed and constantly updated until it terminates and
the server output appears in your browser window.
At any time during the wait you may enter your e-mail address and simply leave
the window. Your job will continue; you will be notified by e-mail when it has
terminated. The e-mail message will contain the URL under which the results are
stored; they will remain on the server for 24 hours for you to collect them.
Each input sequence will be shown with the predicted translation start
site indicated, followed by a table showing the positions and the scores
of all instances of "ATG" in the sequence.
In the lines below the sequence the predicted start codon is indicated by
the letter "i" (initiation), other instances of "ATG" by the letter "N"
(non-start). The dots (".") are place holders for all the other
The scores are always in [0.0, 1.0]; when greater than 0.5 they represent
a probable translation start.
Translation start predictions for 1 vertebrate sequence
Pos Score Pred
18 0.821 Yes
117 0.034 -
Neural network prediction of translation initiation sites
in eukaryotes: perspectives for EST and genome analysis.
ISMB: 5, 226-233 1997.
The complete article in
Translation in eukaryotes does not always start at the first
AUG in an mRNA, implying that context information also plays a role.
This makes prediction of translation initiation sites a non-trivial
task, especially when analysing EST and genome data where the entire
mature mRNA sequence is not known. In this paper, we employ artificial
neural networks to predict which AUG triplet in an mRNA sequence is the
start codon. The trained networks correctly classified 88 % of
Arabidopsis and 85 % of vertebrate AUG triplets. We find that our
trained neural networks use a combination of local start codon context
and global sequence information. Furthermore, analysis of false
predictions shows that AUGs in frame with the actual start codon are
more frequently selected than out-of-frame AUGs, suggesting that our
networks use reading frame detection. A number of conflicts between
neural network predictions and database annotations are analysed in
detail, leading to identification of possible database errors.