Define the GenBank entries to be analysed, by specifying GenBank accession IDs (past in or upload) or by pasting in (or uploading) GenBank files. A combination of ID's and GenBank files is equally acceptable. Hitting "Submit query" at this point, will run the server with default settings: All protein coding genes ("CDS's") will be extracted with full intron/exon annotation.
The wanted feature types (CDS, rRNA, etc.), preferences for naming and definition of flanking regions can be specified using the Basic options.
Please notice that all three "Submit query" buttons perform the same action. The idea is that is not necessary to scroll down the web page if the options are not altered.
The easiest way to specify GenBank information is by simply supplying a list of GenBank entry ID's. The GenBank database the FeatureExtract server using is a mirror of the GenBank flat file distribution with the addtion of several Eukaryotic genomes (see databases for details).
Use the "Upload file" option for large file(s). Smaller files can be pasted in. Multiple files can be concatenated.
Any file complying with the GenBank format definition can be used here. For example this could be chromosome files from the Eukaryotic genome mentioned above. An other example could be files with custom gene/promoter ect predictions.
Select which feature type(s) to extract. A number of predefined feature type can be selected. Multiple features can entered in the text-field as as comma-separated list, e.g. CDS,rRNA,tRNA,repeat. The MOST keyword (see below) can be useful when extracting intergenic regions.
Notice that some feature types are not always defined to mean the same. Especially the actual meaning of gene and mRNA vary a lot.
Integenic regions: Selecting this option will include the intergenic regions in the set of extracted sequences. The intergenic regions are simply defined as the areas between the features defined here. Intergenic regions can be extratced with flanks.
Specify the preferred naming of each extratced entry. If the desired type of name is not avialable, fall back to the next level: 1 > 2 > 3.
Define flanking regions, if any.
Notice: computations concerning flanking region elements are only performed if flanking regions have been requested using this option.
Click on the "Submit query" button. If the processing of the query takes more than a few seconds you'll will get the option of supplying your email address and be notified when the job is done.
FeatureExtract has support for a number of advanced options. Typically it is not necessary to set these manually and most users can safely skip this section and proceed to submitting the query.
This options defines the cut-off value which determines if an intervening sequence will be annotated as a frameshift or an intron. Intervening sequence shorter than the specified value will be considered frameshifts - this includes negative frameshifts.
Using this options it is possible to extend (or redefine) the build in annotation table.
Notice: For all intron and frameshift containing sequences, the spliced sequence and annotation is by default added to the comment field.
Splice all intron containing seqeunces
Enabeling this option will cause the server to produce spliced sequences
(and annotation) for all intron containing sequences. The full length
sequence and annotation is then moved into the comment field.
Only output intron containing sequnces
Enabeling this option will supress the output which does not
contain introns or frameshifts. This option can be use in combination
with the "Splice all..." option mentioned above, as a quick way of
producing a spliced only dataset.
This option governs which feature type to annotate in the flanking regions. The default value, the keyword MOST, is a list built to minimize the problem with feature type synonyms (e.g 'CDS' vs. 'gene' vs. 'mRNA') but at the same time extract as much information as possible. The keysword are defined below:
A custom defined list can be specified as a comma separated list.
This option governs how features in flanking regions are annotated.
Verbose mode: Output additional information about the contents of the GenBank files and the general progress of the extraction
The following list of GenBank entries contains alpha globins from a wide range of organisms. This example illustrates the annotation of exon and intron regions in protein coding genes.
Instruction: Paste in the list and hit "submit".
AB001981 X01831 J00923 J00043 J00044 X01086 X07053 AF098919
This is an example of how to work with an uploaded GenBank file.
Instructions: Download GenBank file NC_001224 (This file contains the Yeast mitochondrial chromosome - part of the Yeast genome build from SGD). Upload the file to the FeatureExtract server, using the "Upload file containing one or more GenBank files" option. Hit "Submit query".
CDS,rRNA,tRNA
.