Tables illustrating the data format: Sample output.
File containing 270 intron containing genes from Yeast:
yeast_genome.with_introns.tab [1470 kb]
View the TAB file using a text editor (e.g. UltraEdit on Windows, BBedit on Mac or NEdit on Unix), or import the file into a spreadsheet like Excel or a database like MySQL or Access.
The output data format uses a scheme with one entry per line in the following format (tab separated): name seq ann com name: The sequence name, as determined by the "Naming preference" option. seq: The DNA sequence it self. UPPERCASE is used for the main sequence, lowercase is used for flanks (if any). ann: Single letter sequence annotation. Position for position the annotation descripes the DNA sequence: The first letter in the annotation, descriped the annotation for the first position in the DNA sequence and so forth. The annotation code is defined as follows: FEATURE BLOCKS (AKA. "EXON BLOCKS") ( First position E Exon T tRNA exonic region R rRNA / generic RNA exonic region P Promotor X Unknown feature type ) Last position ? Ambiguous first or last position [ First UTR region position 3 3'UTR 5 5'UTR ] Last UTR region position NOTICE: custom feature block can be defined using the "Custom defined annotation" option. INTRONS and FRAMESHIFTS D First intron position (donor site) I Intron position A Last intron position (acceptor site) < Start of frameshift F Frameshift > End of frameshift REGIONS WITHOUT FEATURES . NULL annotation (no annotation). ONLY IN FLANKING REGIONS: + Other feature defined on the SAME STRAND as the current entry. - Other feature defined on the OPPOSITE STRAND relative to the current entry. # Multiple or overlapping features. A..Z: Feature on the SAME STRAND as the current entry. a..z: Feature on the OPPOSITE STRAND as the current entry. Notice: The type of features annotated in the flanking regions is determined by the following option: "Feature types to annotate in flanking regions" com: Comments (free text). All text, extra information etc defined in the GenBank files are concatenated into a single comment. The following extra information is added by this program: *) Strand ("+" or "-"). *) GenBank entry ID ("LOCUS"). *) Feature type (e.g. "CDS" or "rRNA") *) Spliced DNA sequence. Simply the DNA sequence defined by the JOIN statement. This is provied for two reasons. 1) To overcome negative frameshifts. 2) As an easy way of extracting the sequence of the spliced producted. *) Spliced DNA annotation.