DTU Health Tech
Department of Health Technology
This link is for the general contact of the DTU Health Tech institute.
If you need help with the bioinformatics programs, see the "Getting Help" section below the program.
The EasyGene 1.2 server produces a list of predicted genes given a sequence of prokaryotic DNA. The current version contains models for 138 different organisms. Each prediction is attributed with a significance score (R-value) indicating how likely it is to be just a non-coding open reading frame rather than a real gene. All that is required of you as a user is to the query sequence(s) and to select the organism model to use.
The pre-calculated EasyGene 1.2 predictions for the complete genomes of the 138 organisms can be downloaded from BINF easygene at Copenhagen university
Sequence submission: paste the sequence(s) or upload a local file
Restrictions
At most 10,000,000 nucleotides per submission in at most 50 sequences.
Confidentiality
The sequences are kept confidential and will be deleted after processing.
CITATIONS
For publication of results, please cite:
Large-scale prokaryotic gene prediction and comparison
to genome annotation.
P. Nielsen and A. Krogh.
Bioinformatics: 21:4322-4329, 2005.
PMID: 16249266
EasyGene - a prokaryotic gene finder that ranks ORFs by statistical
significance.
Thomas Schou Larsen and Anders Krogh.
BMC Bioinformatics: 4:21, 2003
PMID: 12783628 View the full article
All the other symbols will be converted to N before processing. The sequences can be input in the following two ways:
Both ways can be employed at the same time: all the specified sequences
will be processed. However, there may be at most 50 sequences and
1,000,000 nucleotides per submission; each sequence not more than
500,000 nucleotides.
At any time during the wait you may enter your e-mail address and simply leave the window. Your job will continue; you will be notified by e-mail when it has terminated. The e-mail message will contain the URL under which the results are stored; they will remain on the server for 24 hours for you to collect them.
The example below shows the EasyGene 1.2 output for the sequence taken from the GenBank entry AB010576, containing Bacillus subtilis ComX, ComQ and DegQ genes. All the three genes are predicted as annotated in the database (shown in green), with high confidence, although an alternative translation start is preferred for comQ (shown in orange). Two additional genes not annotated in the GenBank entry are also predicted.
##gff-version 2 ##source-version easygene-1.2b ##date 2007-08-15 ##Type DNA # model: BS03 Bacillus subtilis # seqname model feature start end score +/- ? startc odds # --------------------------------------------------------------------------------------------- AB010576 BS03 CDS 67 324 0.0271875 + 0 #ATG 20.1861 AB010576 BS03 CDSsub 55 324 0.031955 + 0 #ATG 20.1731 AB010576 BS03 CDS 1129 1269 0.0190622 + 0 #ATG 15.7102 AB010576 BS03 CDS 1370 2314 2.13273e-12 + 0 #ATG 74.7815 AB010576 BS03 CDSsub 1454 2314 1.92405e-12 + 0 #ATG 74.6356 AB010576 BS03 CDS 2327 2491 0.0167943 + 0 #ATG 17.2951 AB010576 BS03 CDS 300 668 1.43511 - 0 #ATG 10.6215 # ---------------------------------------------------------------------------------------------
AP02 Aeropyrum pernix ATW03 Agrobacterium tumefaciens str. C58 AA02 Aquifex aeolicus AF02 Archaeoglobus fulgidus DSM 4304 BAA03 Bacillus anthracis str. Ames BCE03 Bacillus cereus ATCC 10987 BH03 Bacillus halodurans BPS01 Burkholderia pseudomallei K96243 BS03 Bacillus subtilis BT02 Bacteroides thetaiotaomicron VPI-5482 BBA01 Bdellovibrio bacteriovorus BL03 Bifidobacterium longum NCC2705 BBR02 Bordetella bronchiseptica BPA02 Bordetella parapertussis BPE02 Bordetella pertussis BJ02 Bradyrhizobium japonicum BM02 Brucella melitensis BSU03 Brucella suis 1330 BAS02 Buchnera aphidicola CJ02 Campylobacter jejuni CF02 Candidatus Blochmannia floridanus CC02 Caulobacter crescentus CB15 CM02 Chlamydia muridarum CPN03 Chlamydia pneumoniae AR39 CT02 Chlamydia trachomatis CCA02 Chlamydophila caviae GPIC CTE02 Chlorobium tepidum TLS CV02 Chromobacterium violaceum ATCC 12472 CA02 Clostridium acetobutylicum ATCC824 CP02 Clostridium perfringens CTEE02 Clostridium tetani E88 CDI01 Corynebacterium diphtheriae CEF01 Corynebacterium efficiens YS-314 CG03 Corynebacterium glutamicum ATCC 13032 CB02 Coxiella burnetii RSA 493 DR02 Deinococcus radiodurans EF02 Enterococcus faecalis V583 ECC02 Escherichia coli CFT073 EC03 Escherichia coli K12 ECE03 Escherichia coli O157:H7 EDL933 ECO02 Escherichia coli O157:H7 FN02 Fusobacterium nucleatum subsp. nucleatum ATCC 2558... GS01 Geobacter sulfurreducens PCA GV01 Gloeobacter violaceus HD02 Haemophilus ducreyi 35000HP HI02 Haemophilus influenzae Rd HM01 Haloarcula marismortui ATCC 43049 HS02 Halobacterium sp. NRC-1 HW01 Haloquadratum walsbyi DSM 16790 HP02 Helicobacter pylori 26695 HPJ02 Helicobacter pylori str. J99 LJ01 Lactobacillus johnsonii NCC 533 LP02 Lactobacillus plantarum WCFS1 LL02 Lactococcus lactis subsp. lactis LIN02 Leptospira interrogans serovar lai str. 56601 LI02 Listeria innocua Clip11262 LM02 Listeria monocytogenes EGD MLO03 Mesorhizobium loti MET02 Methanobacterium thermoautotrophicum str. Delta H MBU01 Methanococcoides burtonii DSM 6242 MJ02 Methanococcus jannaschii MM01 Methanococcus maripaludis S2 MK02 Methanopyrus kandleri AV19 MTE01 Methanosaeta thermophila PT MA02 Methanosarcina acetivorans str. C2A MBA01 Methanosarcina barkeri str. fusaro MM02 Methanosarcina mazei Goe1 MST01 Methanosphaera stadtmanae DSM 3091 MHU01 Methanospirillum hungatei JF-1 MAP01 Mycobacterium avium subsp. paratuberculosis str. k... MB02 Mycobacterium bovis subsp. bovis AF2122/97 MT03 Mycobacterium tuberculosis CDC1551 MTH03 Mycobacterium tuberculosis H23Rv NEQ01 Nanoarchaeum equitans Kin4-M NP01 Natronomonas pharaonis DSM 2160 NMA02 Neisseria meningitidis serogroup A Z2491 NM02 Neisseria meningitidis serogroup B MC58 NE02 Nitrosomonas europaea NO02 Nostoc sp. PCC 7120 OI02 Oceanobacillus iheyensis HTE831 OYP01 Onion yellows phytoplasma PM02 Pasteurella multocida PL01 Photorhabdus luminescens subsp. laumondii TTO1 PT01 Picrophilus torridus DSM 9790 PI02 Pirellula sp PG02 Porphyromonas gingivalis W83 PMMI02 Prochlorococcus marinus str. MIT 9313 PMA02 Prochlorococcus marinus subsp marinus CCMP1375 PMM02 Prochlorococcus marinus subsp. pastoris str. CCMP1... PA02 Pseudomonas aeruginosa PA01 PS02 Pseudomonas syringae pv. tomato str. DC3000 PAE02 Pyrobaculum aerophilum PRI01 Pyrobaculum islandicum DSM 4184 PAB02 Pyrococcus abyssi PF02 Pyrococcus furiosus DSM 3638 PH02 Pyrococcus horikoshii RS02 Ralstonia solanacearum RPA01 Rhodopseudomonas palustris CGA009 RC02 Rickettsia conorii Malish 7 RP02 Rickettsia prowazekii Madrid E SE02 Salmonella enterica subsp. enterica serovar Typhi ... STT02 Salmonella enterica subsp. enterica serovar Typhi ... STY02 Salmonella typhimurium LT2 SO02 Shewanella oneidensis MR-1 SM02 Sinorhizobium meliloti 1021 SAM03 Staphylococcus aureus MU50 SA02 Staphylococcus aureus subsp aureus N315 SEA02 Staphylococcus epidermidis ATCC 12228 SAG02 Streptococcus agalactiae 2603V/R SAN02 Streptococcus agalactiae NEM316 SMU02 Streptococcus mutans UA159 SP02 Streptococcus pneumoniae SPY02 Streptococcus pyogenes SAV03 Streptomyces avermitilis MA-4680 SC02 Streptomyces coelicolor A3(2) UC01 Sulfolobus acidocaldarius DSM 639 SS02 Sulfolobus solfataricus ST02 Sulfolobus tokodaii SSW02 Synechococcus sp. WH 8102 SPC02 Synechocystis sp. PCC 6803 TT02 Thermoanaerobacter tengcongensis strain MB4T TK01 Thermococcus kodakarensis KOD1 THP01 Thermofilum pendens Hrk 5 TA02 Thermoplasma acidophilum TV02 Thermoplasma volcanium TE02 Thermosynechococcus elongatus BP-1 TM02 Thermotoga maritima TD01 Treponema denticola ATCC 35405 TP02 Treponema pallidum TW02 Tropheryma whipplei Twist VC02 Vibrio cholerae VP02 Vibrio parahaemolyticus RIMD 2210633 VV02 Vibrio vulnificus WB02 Wigglesworthia glossinidia endosymbiont of Glossin... WDM01 Wolbachia endosymbiont of Drosophila melanogaster XA02 Xanthomonas axonopodis pv. citri str. 306 XC02 Xanthomonas campestris pv. campestris str. ATCC 33... XF02 Xylella fastidiosa YP02 Yersinia pestis
Bioinformatics Centre, Institute of Molecular Biology and Physiology, University of Copenhagen, Universitetsparken 15, 2100 Copenhagen, Denmark
PMID: 16249266
MOTIVATION: Prokaryotic genomes are sequenced and annotated at an increasing rate. The methods of annotation vary between sequencing groups. It makes genome comparison difficult and may lead to propagation of errors when questionable assignments are adapted from one genome to another. Genome comparison either on a large or small scale would be facilitated by using a single standard for annotation, which incorporates a transparency of why an open reading frame (ORF) is considered to be a gene. RESULTS: A total of 143 prokaryotic genomes were scored with an updated version of the prokaryotic genefinder EasyGene. Comparison of the GenBank and RefSeq annotations with the EasyGene predictions reveals that in some genomes up to approximately 60% of the genes may have been annotated with a wrong start codon, especially in the GC-rich genomes. The fractional difference between annotated and predicted confirms that too many short genes are annotated in numerous organisms. Furthermore, genes might be missing in the annotation of some of the genomes. We predict 41 of 143 genomes to be over-annotated by >5%, meaning that too many ORFs are annotated as genes. We also predict that 12 of 143 genomes are under-annotated. These results are based on the difference between the number of annotated genes not found by EasyGene and the number of predicted genes that are not annotated in GenBank. We argue that the average performance of our standardized and fully automated method is slightly better than the annotation.
1Center for Biological Sequence Analysis BioCentrum,
Technical University of Denmark Building 208, 2800 Lyngby, Denmark
2Present address: The Bioinformatics Centre,
University of Copenhagen Universitetsparken 15, 2100 Copenhagen, Denmark
PMID: 12783628 View the full article
BACKGROUND: Contrary to other areas of sequence analysis, a measure of statistical significance of a putative gene has not been devised to help in discriminating real genes from the masses of random Open Reading Frames (ORFs) in prokaryotic genomes. Therefore, many genomes have too many short ORFs annotated as genes. RESULTS: In this paper, we present a new automated gene-finding method, EasyGene, which estimates the statistical significance of a predicted gene. The gene finder is based on a hidden Markov model (HMM) that is automatically estimated for a new genome. Using extensions of similarities in Swiss-Prot, a high quality training set of genes is automatically extracted from the genome and used to estimate the HMM. Putative genes are then scored with the HMM, and based on score and length of an ORF, the statistical significance is calculated. The measure of statistical significance for an ORF is the expected number of ORFs in one megabase of random sequence at the same significance level or better, where the random sequence has the same statistics as the genome in the sense of a third order Markov chain. CONCLUSIONS: The result is a flexible gene finder whose overall performance matches or exceeds other methods. The entire pipeline of computer processing from the raw input of a genome or set of contigs to a list of putative genes with significance is automated, making it easy to apply EasyGene to newly sequenced organisms.
If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).
If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.
Correspondence:
Technical Support: