Output format


Tables illustrating the data format: Sample output.

File containing 270 intron containing genes from Yeast:
yeast_genome.with_introns.tab [1470 kb]

View the TAB file using a text editor (e.g. UltraEdit on Windows, BBedit on Mac or NEdit on Unix), or import the file into a spreadsheet like Excel or a database like MySQL or Access.


	The output data format uses a scheme with one
	entry per line in the following format (tab separated):
	name	seq	ann	com
	name:	The sequence name, as determined by the "Naming preference"
	seq:	The DNA sequence it self. UPPERCASE is used for the
		main sequence, lowercase is used for flanks (if any).
	ann:	Single letter sequence annotation. Position for position
		the annotation descripes the DNA sequence: The first
		letter in the annotation, descriped the annotation for
		the first position in the DNA sequence and so forth.
		The annotation code is defined as follows:
		(	First position
		E	Exon
		T	tRNA exonic region
		R	rRNA / generic RNA exonic region
		P	Promotor
		X	Unknown feature type
		)	Last position
		?	Ambiguous first or last position
		[	First UTR region position
		3	3'UTR
		5	5'UTR
		]	Last UTR region position		
			NOTICE: custom feature block can be defined using
			the "Custom defined annotation" option.
		D	First intron position (donor site)
		I	Intron position
		A	Last intron position (acceptor site)
		<	Start of frameshift
		F	Frameshift
		>	End of frameshift
		.	NULL annotation (no annotation).
		+	Other feature defined on the SAME STRAND
			as the current entry.
		-	Other feature defined on the OPPOSITE STRAND
			relative to the current entry.
		#	Multiple or overlapping features.

		A..Z:	Feature on the SAME STRAND as the current entry.
		a..z:	Feature on the OPPOSITE STRAND as the current entry.
			Notice: The type of features annotated in the flanking
			regions is determined by the following option: 
			"Feature types to annotate in flanking regions"
	com:	Comments (free text). All text, extra information etc
		defined in the GenBank files are concatenated into a single
		The following extra information is added by this program:
		*) Strand ("+" or "-").
		*) GenBank entry ID ("LOCUS").
		*) Feature type (e.g. "CDS" or "rRNA")
		*) Spliced DNA sequence. Simply the DNA sequence defined
		   by the JOIN statement. 
		   This is provied for two reasons. 1) To overcome negative
		   frameshifts. 2) As an easy way of extracting the sequence
		   of the spliced producted.
		*) Spliced DNA annotation.