DTU Health Tech

Department of Health Technology

NetStart - 1.0

Translation start in vertebrate and A. thaliana DNA

The NetStart server produces neural network predictions of translation start in vertebrate and Arabidopsis thaliana nucleotide sequences.

NetStart has been trained on cDNA-like sequences and will therefore presumably have better performance for cDNAs and ESTs. We have not tested the performance on genome data which may contain introns adjacent to the start codon.

Submission


Sequence submission: paste the sequence(s) and/or upload a local file

Paste a single sequence or several sequences in FASTA format into the field below:

Submit a file in FASTA format directly from your local disk:

Vertebrate A. thaliana


Restrictions
At most 50 sequences and 1,000,000 nucleotides per submission; each sequence not more than 500,000 nucleotides.
Confidentiality
The sequences are kept confidential and will be deleted after processing.


CITATIONS

For publication of results, please cite:

Neural network prediction of translation initiation sites in eukaryotes: perspectives for EST and genome analysis.
A.G. Pedersen and H. Nielsen, ISMB: b,226-233,1997.

Instructions


1. Specify the input sequences

The sequences intended for processing can be input in the following two ways:

  • Paste a single sequence (just the nucleotides) or a number of sequences in FASTA format into the upper window of the main server page.

  • Select a FASTA file on your local disk, either by typing the file name into the lower window or by browsing the disk.

Both ways can be employed at the same time: all the specified sequence will be processed.

The allowed input alphabet is A, C, G, T, U and X (unknown); all the other symbols will be converted to X before processing. T and U are treated as equivalent.


2. Select organism type

Depending on the origin of your input sequences click on either "Vertebrate" or "A. Thaliana". The former is the default.


3. Submit the job

Click on the "Submit" button. The status of your job (either 'queued' or 'running') will be displayed and constantly updated until it terminates and the server output appears in your browser window.

NOTE: At any time during the wait you may enter your e-mail address and simply leave the window. Your job will continue; you will be notified by e-mail when it has terminated. The e-mail message will contain the URL under which the results are stored; they will remain on the server for 24 hours for you to collect them.

Output format


DESCRIPTION

Each input sequence will be shown with the predicted translation start site indicated, followed by a table showing the positions and the scores of all instances of "ATG" in the sequence.

In the lines below the sequence the predicted start codon is indicated by the letter "i" (initiation), other instances of "ATG" by the letter "N" (non-start). The dots (".") are place holders for all the other sequence elements.

The scores are always in [0.0, 1.0]; when greater than 0.5 they represent a probable translation start.


EXAMPLE OUTPUT

Translation start predictions for 1 vertebrate sequence

  Name: AT2A6.1
123456789012345678901234567890123456789012345678901234567890
CACGCGTCCGAAGCAAGATGGAGTCAAGTGATCGTTCAAGTCAAGCAAAAGCTTTCGACG
AGACAAAAACCGGCGTGAAAGGGCTTGTGGCTTCGGGAATCAAAGAGATTCCAGCCATGT
TCCATACACCTCCGGATACTCTAACAAGCCTGAAACAAACAGCACCA
.................i..........................................
........................................................N...
...............................................

   Pos    Score     Pred
------------------------
    18    0.821     Yes
   117    0.034     -

References


Neural network prediction of translation initiation sites in eukaryotes: perspectives for EST and genome analysis.
A. G. Pedersen and H. Nielsen., ISMB: 5, 226-233 1997.

The complete article in PDF.

Abstract

Translation in eukaryotes does not always start at the first AUG in an mRNA, implying that context information also plays a role. This makes prediction of translation initiation sites a non-trivial task, especially when analysing EST and genome data where the entire mature mRNA sequence is not known. In this paper, we employ artificial neural networks to predict which AUG triplet in an mRNA sequence is the start codon. The trained networks correctly classified 88 % of Arabidopsis and 85 % of vertebrate AUG triplets. We find that our trained neural networks use a combination of local start codon context and global sequence information. Furthermore, analysis of false predictions shows that AUGs in frame with the actual start codon are more frequently selected than out-of-frame AUGs, suggesting that our networks use reading frame detection. A number of conflicts between neural network predictions and database annotations are analysed in detail, leading to identification of possible database errors.

Software Downloads




GETTING HELP

If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Correspondence: Technical Support: