NetAspGene - 1.0

Intron splice sites in Aspergillus DNA

NetAspGene produces predictions of splice sites in Aspergillus fumigatus and other Aspergillus species DNA sequences. NetAspGene uses multiple artificial neural networks to predict both exon/intron gene structure and splice sites by a combined algorithm, automatically generates graphic display and provides standard gene annotation "GFF3" format output.

Submission

Sequence submission: paste the sequence(s) or upload a local file

Confidentiality:
The sequences and outputs are kept confidential and will be deleted after processing.

CITATIONS

NetAspGene is free to use. If used for publication, please cite:

Analysis and prediction of gene splice sites in four Aspergillus genomes.
Kai Wang, David Wayne Ussery, Søren Brunak
Fungal Genetics and Biology. Volume 46, Issue 1, (s)14-18, March 2009

Instructions

In order to use the NetAspGene server for splice site prediction in Aspergillus DNA:

Paste Fasta sequence or select a local Fasta file by pressing the 'Choose File' bottom. Remember to make the correct file selection mask (like *.fasta). A fasta file is an ascii file with the sequence, as shown below, and should contain only one sequence.
The fasta file must be submitted using the one letter abbreviations for the nucleotides: `acgtuACGTU', 'N' or "X" for unknown.
The sequence must be more than 200 (preferably more than 250) and less than 80.000 nucleotides long. Shorter sequences are accepted, but the prediction will be suboptimal. Long sequences may provoke a time out.
Press the "Submit" button.
A WWW page will return the results as the prediction finishes. Response time depends on system load.

EXAMPLE FASTA FILE

>sequenceident
TCCCTTCCATCCATTGCACGATGAGCTCTCTTCGTTTCGCTCGCTCTGCTCTCAGGGCTC
GTCCCTCTGCTCTCCGCGTTCCTCTCCAGCGCAGAGGTTACGCTGAGGCTGTGTCGGACA
AGATCAAGCTTTCTCTGGCCCTTCCTCACCAGGTAAGATCCGAGATAACTGAACGCACCC
TTTTCGTCTTAATAGGTTGGAAACTAATATGAGAACTTTGCAACAGACTATCTTCAAGTC
GGCCGACGTGTACGTGACGACCAACTCCTCCCTCGTTCACGATCGGCAATTCTGAAGATG
GCTTGAATGCGTACTGATGACCTCCCCCTACAGTGTCCAGGTCAACATCCCCGCCGAGTC
CGGAGAAATGGGTGTCCTCGCCAACCACGTTCCTTCCATTGAGCAGCTGAAGCCTGGTCT
TGTTGAGATCGTTGAGGAGAGTGGTGCCAACAAGAAGTTCTTCCGTACGTCCGGACAACC
CCGCTGAGCTTTGCGCTGCGATATCGTGGGACCACGAAGATGTCGCATTGCTTCCTATAG
CATCGCACTAACGAGTCTGCGTTCTTCAGTTTCTGGTGGTTTCGCCGTCGTTCAGCCTGA
CTCTGCTCTGAGCATCAACGCCGTGGAGGCCTACCCCCTCGAGGACTTCAGCGCCGATGT
AAGTTGTGGAAACGAAGAAAATGTCTTGATATACTTTTTTGACCCAATCTTTCAATTACA
GGCTGTCAAATCCCAGATCGCCGAGGCCCAGAAGATTGCCAATGGTAGTGGCAGTGAGCA
GGACATTGCTGAGGCTAAGATTGAGCTCGAGGTAGGTCAACCACGAACGCGCTAGCACGA
GGCTATACTCAATTGCTAATGTTGTCACAGGTTCTGGAGACCCTGCAAGCCGTCCTGAAA
TAGATACCTGATGTACATAACCACTCGCGATTGCAATTCTGAACTTGTAGAATTATAACA
ATTCTCCGGCA

Output format

The prediction output for server consist of the prediction for both direct (+) and complentary (-) strand. The output lists the predictions for donor and acceptor sites in the submitted sequence.

Position:	The position of the splice site in your sequence given as first (donor), or last (acceptor) nucleotide in the intron. The numbering of the direct (+) strand proceeds from the 5' end to the 3' end. For the complement (-) strand the numbering is given in both directions.
Frame:	The predicted frame offset (1,2 or 3) of the acceptor/donor site.
Strand:	The sequence strand (direct or complement).
Confidence:	The level of confidence for the sites (relative to the cutoff used to find nearly all true sites). Sites found by using cutoff values for highly confident sites are marked by the symbol H.
exon^intron:	Gives 20 bases of sequence around the predicted site.

Please observe that the lists contain predictions made by TWO detection levels for true sites, one level where around 50% of the true sites are detected with very few false positive, and another level where nearly all true sites are found, but with more false predictions as well. Sites indicated by (H) are highly confident, and represent very seldom a false positive prediction, while those comprising nearly all sites are not marked. The confindence values for the predictions can be compared within each type only. This means that confidence values not marked by (H) in some cases can be larger than those for the (H) marked sites.

FORMAT OF NetAspGene GRAPHICS OUTPUT

The output from the prediction is displayed in the output page of the prediction server. The postscript files can be retrieved directly by Netscape by selecting one of the two references in the bottom of the prediction output. If your viewer is set up to handle postscript, it will display the graphs. Otherwise you can retrieve the compressed postscript files directly to your computer using Netscape.

The top part of the figure designated "Coding" is the activity of an ensemble of coding predicting networks, values close to 0.0 indicate intron region, while values close to 1.0 indicates exon. In the "Donor" panel the activity of the ensemble of the donor site predicting networks is shown as impulses. An impulse with a hight close to 2.0 indicates a strong donor site. A cyan impulse is a prediction that has been discarded during the refinement, and a magenta colored impulse is a prediction that has been changed by the rule based system. The variable threshold computed from the coding predicting ensemble output, is used to select donor and acceptor site predictions.

FORMAT OF NetAspGene PREDICTION SCORE FILES

The predictions in a numerical form may be downloaded from the output page of the prediction server. They are useful for detailed analysis of a sequence. The file produced contains a line starting with the symbol `>' followed by the name and the length of the sequence. This is followed by twelve columns, with the following information given by column number below.

Position in the sequence numbered from 1 to the length of the sequence.
Nucleotides of the sequence.
Neural network donor site score.
Neural network acceptor site score.
Neural network coding score.
Neural network frame score.
High sensitivity level cutoff value for donor site predictions.
High sensitivity level cutoff value for acceptor site predictions.
Confidence of the donor site prediction.
Confidence of the acceptor site prediction.
HMM acceptor site branchpoint score.
Branchpoint position.

References

Analysis and prediction of gene splice sites in four Aspergillus genomes.
Kai Wang, David Wayne Ussery, Søren Brunak^*. Fungal Genetics and Biology:Volume 46, Issue 1, (s) 14-18, March 2009

Center for Biological Sequence Analysis, Dept. of Systems Biology, Technical University of Denmark, DK-2800 Lyngby, Denmark
^*Corresponding author

Abstract

Several Aspergillus fungal genomic sequences have been published, with many more in progress. Obviously, it is essential to have high-quality, consistently annotated sets of proteins from each of the genomes, in order to make meaningful comparisons. We have developed a dedicated, publicly available, splice site prediction program called NetAspGene, for the genus Aspergillus. Gene sequences from Aspergillus fumigatus, the most common mould pathogen, were used to build and test our model. Compared to many animals and plants, Aspergillus contains smaller introns; thus we have applied a larger window size on single local networks for training, to cover both donor and acceptor site information. We have applied NetAspGene to other Aspergilli, including A. nidulans, A. oryzae, and A. niger. Evaluation with independent data sets reveal that NetAspGene performs substantially better splice site prediction than other available tools. NetAspGene will be very helpful for the study in Aspergillus splice sites and especially in alternative splicing.

GETTING HELP

If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Correspondence: Technical Support: