DTU Health Tech
Department of Health Technology
This link is for the general contact of the DTU Health Tech institute.
If you need help with the bioinformatics programs, see the "Getting Help" section below the program.
RevTrans takes a set of DNA sequences, virtually translates them, aligns the peptide sequences, and uses this as a scaffold for constructing the corresponding DNA multiple alignment.
New in RevTrans 2.0: Integration with Virtual Ribosome for translation and ORF finding, visualization of alignments using JalView, more alignments programs: MAFFT, T-COFFEE, Dialign 2, Dialign-T and ClustalW2. Improved tab-based interface.
Confidentiality:
The sequences are kept confidential and will be deleted
after processing.
For publication of results, please cite:
Rasmus Wernersson and Anders Gorm Pedersen.
RevTrans - Constructing alignments of coding DNA from aligned amino acid sequences.
Nucl. Acids Res., 2003, 31(13), 3537-3539.
Paste in or upload DNA sequences and hit "Submit query". The RevTrans server will then virtually translate the DNA sequences and align the resulting peptide sequences using MAFFT with default settings (other alignment program can be selected). Finally RevTrans constructs a multiple DNA alignment using the peptide alignment as a scaffold.
If you want more control over the alignment process RevTrans also accepts user provided peptide alignments. This will give you the opportunity to use your preferred alignment software and to optimize the parameters. If you need to translate your DNA sequences prior to alignment this can be done by using the "Translate only" button (or by following the link to the "Virtual Ribosome" server if you want more fine.grained control over the translation process). The translation has full support for degenerate nucleotides and alternative translation tables can be selected.
When providing your own peptide alignment, RevTrans will accept arbitrarily large input files.
The full IUPAC degenerated DNA alphabet (not case sensitive) is supported:
Please note that gaps and unknown symbolse.g. - and X will be discarded before processing.
For greater control of the alignment process you have the option of also supplying a pre-computed peptide alignment. RevTrans will then use this as the scaffold for the DNA alignment. The peptide alignment must be in FASTA, MSF, or ALN (Clustal) format.
By default "-", "." and "~" will be interpreted as gap symbols.
If a peptide alignment is not supplied the RevTrans web-server will automatically construct one using the selected multiple alignment program (deafult: MAFFT). In all case the alignment program will be run with default parameters.
Click on the "Submit query" button. If the processing of the query takes more than a few seconds you'll will get the option of supplying your email address and be notified when the job is done.
RevTrans has support for a number of advanced options. Typically it is not necessary to set these manually and most users can safely skip this section and proceed to submitting the query.
Data format, DNA sequences:
By default the DNA file format is automatically detected. Alternatively you may specify
the format as being FASTA, MSF, or ALN (Clustal).
Data format, aligned peptide sequences:
By default the peptide alignment file format is automatically detected. Alternatively you may specify
the format as being FASTA, MSF, or ALN (Clustal).
Output format:
By default the final multiple DNA alignment will be in ALN (Clustal) format.
Alternatively you may specify FASTA or MSF.
Gap-In:
Here you can specify which symbol(s) denote(s) a gap in user provided peptide alignemnt.
The default should be correct for virtually all standard alignment files.
Gap-out:
Here you can specify which gap symbol to use in the output.
Match DNA and peptide sequences by:
This option gives the user control over how DNA sequences paired to their peptide
counterpart.
Translation:
(Default) The DNA sequences are translated using the standard genetic code
(or an alternative translation table if selected below)
with full IUPAC support and compared to the peptide sequences. The DNA sequence
is paired with the first matching peptide sequence found.
Name:
DNA sequences are paired with peptide counterparts based on sequence entry names.
Entry names must be unique within files and identical across files.
If you experience trouble when using name based matching, please make sure
that sequences names do match across files as some alignment software
may truncate or otherwise alter sequence names.
Position:
DNA sequences are paired with peptide counterparts simply based on their order
of appearance in the files.
Translation table:
Select an alternative translation table - used for "matching-by-translation" and
with the "Translate Only" functionality.
The numbering of the translation table is the one defined by the NCBI Taxonomy Group. For a detailed description of each genetic code, please consult the following web page at NCBI: The Genetic Codes . [Main site: Taxonomy]
Alignment method
(New in RevTrans 1.4)
RevTrans offers a selection of programs for performing the peptide alignment step:
Dialign 2.2:
Reference:
B. Morgenstern (1999).
DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment.
Bioinformatics 15, 211 - 218.
Dialign-T 0.1.3:
Reference:
Amarendran R. Subramanian, Jan Weyer-Menkhoff, Michael Kaufmann, Burkhard Morgenstern:
DIALIGN-T: An improved algorithm
for segment-based multiple sequence alignment
Bioinformatics 2005, 6:66.
ClustalW 1.83:
Reference:
Higgins D., Thompson J., Gibson T. Thompson J. D., Higgins D. G., Gibson T. J.(1994).
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through
sequence weighting,position-specific gap penalties and weight matrix choice.
Nucleic Acids Res. 22:4673-4680.
>Sheep ATGGCCCTGTGGACACGCCTGGTGCCCCTGCTGGCCCTGCTGGCACTCTGGGCCCCCGCC CCGGCCCACGCCTTCGTCAACCAGCACCTGTGCGGCTCCCACCTGGTGGAGGCGCTGTAC CTGGTGTGCGGAGAGCGCGGCTTCTTCTACACGCCCAAGGCCCGCCGGGAGGTGGAGGGC CCCCAGGTGGGGGCGCTGGAGCTGGCCGGAGGCCCCGGCGCGGGTGGCCTGGAGGGGCCC CCGCAGAAGCGTGGCATCGTGGAGCAGTGCTGCGCCGGCGTCTGCTCTCTCTACCAGCTG GAGAACTACTGTAACTAG >Pig ATGGCCCTGTGGACGCGCCTCCTGCCCCTGCTGGCCCTGCTGGCCCTCTGGGCGCCCGCC CCGGCCCAGGCCTTCGTGAACCAGCACCTGTGCGGCTCCCACCTGGTGGAGGCGCTGTAC CTGGTGTGCGGGGAGCGCGGCTTCTTCTACACGCCCAAGGCCCGTCGGGAGGCGGAGAAC CCTCAGGCAGGTGCCGTGGAGCTGGGCGGAGGCCTGGGCGGCCTGCAGGCCCTGGCGCTG GAGGGGCCCCCGCAGAAGCGTGGCATCGTGGAGCAGTGCTGCACCAGCATCTGTTCCCTC TACCAGCTGGAGAACTACTGCAACTAG >Dog ATGGCCCTCTGGATGCGCCTCCTGCCCCTGCTGGCCCTGCTGGCCCTCTGGGCGCCCGCG CCCACCCGAGCCTTCGTTAACCAGCACCTGTGTGGCTCCCACCTGGTAGAGGCTCTGTAC CTGGTGTGCGGGGAGCGCGGCTTCTTCTACACGCCTAAGGCCCGCAGGGAGGTGGAGGAC CTGCAGGTGAGGGACGTGGAGCTGGCCGGGGCGCCTGGCGAGGGCGGCCTGCAGCCCCTG GCCCTGGAGGGGGCCCTGCAGAAGCGAGGCATCGTGGAGCAGTGCTGCACCAGCATCTGC TCCCTCTACCAGCTGGAGAATTACTGCAACTAG >OwlMonkey ATGGCCCTGTGGATGCACCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCCGAG CCAGCCCCGGCCTTTGTGAACCAGCACCTGTGCGGCCCCCACCTGGTGGAAGCCCTCTAC CTGGTGTGCGGGGAGCGAGGTTTCTTCTACGCACCCAAGACCCGCCGGGAGGCGGAGGAC CTGCAGGTGGGGCAGGTGGAGCTGGGTGGGGGCTCTATCACGGGCAGCCTGCCACCCTTG GAGGGTCCCATGCAGAAGCGTGGCGTCGTGGATCAGTGCTGCACCAGCATCTGCTCCCTC TACCAGCTGCAGAACTACTGCAACTAG >Human ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGAC CCAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTAC CTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGAC CTGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTG GCCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGC TCCCTCTACCAGCTGGAGAACTACTGCAACTAG >GreenMonkey ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGAC CCGGTCCCGGCCTTTGTGAACCAGCACCTGTGCGGCTCCCACCTGGTGGAAGCCCTCTAC CTGGTGTGCGGGGAGCGAGGCTTCTTCTACACGCCCAAGACCCGCCGGGAGGCAGAGGAC CCGCAGGTGGGGCAGGTAGAGCTGGGCGGGGGCCCTGGCGCAGGCAGCCTGCAGCCCTTG GCGCTGGAGGGGTCCCTGCAGAAGCGCGGCATCGTGGAGCAGTGCTGTACCAGCATCTGC TCCCTCTACCAGCTGGAGAACTACTGCAACTAG >Chimp ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGTGCTGCTGGCCCTCTGGGGACCTGAC CCAGCCTCGGCCTTTGTGAACCAACACCTGTGCGGCTCCCACCTGGTGGAAGCTCTCTAC CTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGAC CTGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTG GCCCTGGAGGGGTCCCTGCAGAAGCGTGGTATCGTGGAACAATGCTGTACCAGCATCTGC TCCCTCTACCAGCTGGAGAACTACTGCAACTAG >GuineaPig ATGGCTCTGTGGATGCATCTCCTCACCGTGCTGGCCCTGCTGGCCCTCTGGGGGCCCAAC ACTAATCAGGCCTTTGTCAGCCGGCATCTGTGCGGCTCCAACTTAGTGGAGACATTGTAT TCAGTGTGTCAGGATGATGGCTTCTTCTATATACCCAAGGACCGTCGGGAGCTAGAGGAC CCACAGGTGGAGCAGACAGAACTGGGCATGGGCCTGGGGGCAGGTGGACTACAGCCCTTG GCACTGGAGATGGCACTACAGAAGCGTGGCATTGTGGATCAGTGCTGTACTGGCACCTGC ACACGCCACCAGCTGCAGAGCTACTGCAACTAG >Mouse ATGGCCCTGTTGGTGCACTTCCTACCCCTGCTGGCCCTGCTTGCCCTCTGGGAGCCCAAA CCCACCCAGGCTTTTGTCAAACAGCATCTTTGTGGTCCCCACCTGGTAGAGGCTCTCTAC CTGGTGTGTGGGGAGCGTGGCTTCTTCTACACACCCAAGTCCCGCCGTGAAGTGGAGGAC CCACAAGTGGAACAACTGGAGCTGGGAGGAAGCCCCGGGGACCTTCAGACCTTGGCGTTG GAGGTGGCCCGGCAGAAGCGTGGCATTGTGGATCAGTGCTGCACCAGCATCTGCTCCCTC TACCAGCTGGAGAACTACTGCAACTAA >Chicken ATGGCTCTCTGGATCCGATCACTGCCTCTTCTGGCTCTCCTTGTCTTTTCTGGCCCTGGA ACCAGCTATGCAGCTGCCAACCAGCACCTCTGTGGCTCCCACTTGGTGGAGGCTCTCTAC CTGGTGTGTGGAGAGCGTGGCTTCTTCTACTCCCCCAAAGCCCGACGGGATGTCGAGCAG CCCCTAGTGAGCAGTCCCTTGCGTGGCGAGGCAGGAGTGCTGCCTTTCCAGCAGGAGGAA TACGAGAAAGTCAAGCGAGGGATTGTTGAGCAATGCTGCCATAACACGTGTTCCCTCTAC CAACTGGAGAACTACTGCAACTAG
>Sheep MALWTRLVPLLALLALWAPAPAHAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREVEG PQVGALELAGGPGAGG-----LEGPPQKRGIVEQCCAGVCSLYQLENYCN >Pig MALWTRLLPLLALLALWAPAPAQAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREAEN PQAGAVELGGGLG--GLQALALEGPPQKRGIVEQCCTSICSLYQLENYCN >Dog MALWMRLLPLLALLALWAPAPTRAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREVED LQVRDVELAGAPGEGGLQPLALEGALQKRGIVEQCCTSICSLYQLENYCN >OwlMonkey MALWMHLLPLLALLALWGPEPAPAFVNQHLCGPHLVEALYLVCGERGFFYAPKTRREAED LQVGQVELGGGSITGSLPP--LEGPMQKRGVVDQCCTSICSLYQLQNYCN >Human MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED LQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN >GreenMonkey MALWMRLLPLLALLALWGPDPVPAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED PQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN >Chimp MALWMRLLPLLVLLALWGPDPASAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED LQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN >GuineaPig MALWMHLLTVLALLALWGPNTNQAFVSRHLCGSNLVETLYSVCQDDGFFYIPKDRRELED PQVEQTELGMGLGAGGLQPLALEMALQKRGIVDQCCTGTCTRHQLQSYCN >Mouse MALLVHFLPLLALLALWEPKPTQAFVKQHLCGPHLVEALYLVCGERGFFYTPKSRREVED PQVEQLELGGSPG--DLQTLALEVARQKRGIVDQCCTSICSLYQLENYCN >Chicken MALWIRSLPLLALLVFSGPGTSYAAANQHLCGSHLVEALYLVCGERGFFYSPKARRDVEQ PLVSS-PLRGEAG--VLPFQQEEYEKVKRGIVEQCCHNTCSLYQLENYCN
By default the DNA alignment will be in ALN (Clustal) format. The user may select FASTA or MSF instead. The full alignment is shown in the browser window and is also available for download via a link at the top of the page (click on the floppy icon).
In the event of errors and warnings these will be shown at the top of the result page.
New in RevTrans 1.3: the case of the individual letters in the amino acid alignments is now carried over to the DNA alignment. Dialign2 marks amino acids considered to be fully aligned with UPPERCASE and uses lowercase to mark less well aligned regions.
Why is it problematic to align DNA sequences of protein encoding genes?
First, if you align coding DNA at the DNA level, then you are in effect
ignoring your prior knowledge of the structure of the genetic code. Second,
you are also ignoring the known evolutionary tendency of amino acids to
be substituted with other amino acids that have similar physico-chemical
properties. An example should make this clear:
Codon-aligned: DNA-aligned:
M L L I G
ATG CTG TTA ATA GGG ATGCT-GTTAATAGGG
ATG CTC GTT AAT GGG ATGCTCGTTAAT-GGG
M L V T G
In the context of the genetic code, it makes perfect sense to align CTG and CTC which both encode the amino acid leucine. However, from a "DNA point of view" it makes more sense to insert a gap so the terminal G in this codon aligns with the first G in the next codon. It is also acceptable to align the codons TTA (encoding leucine) and GTT (encoding valine) since the encoded amino acids have similar properties (they are both hydrophobic).
Note: these observations also hold true for database searches. Always use a translated version of your coding sequence to search for similar genes!
Rasmus Wernersson and Anders Gorm Pedersen.
RevTrans - Constructing alignments of coding DNA from aligned amino acid sequences.
Nucl. Acids Res., 2003, 31(13), 3537-3539.
Contact
Rasmus Wernersson: raz@cbs.dtu.dk
( Web)
- Anders Gorm Pedersen: gorm@cbs.dtu.dk
The simple fact that proteins are built from 20 amino acids while DNA only contains four different bases, means that the 'signal-to-noise ratio' in protein sequence alignments is much better than in alignments of DNA. Besides this information-theoretical advantage, protein alignments also benefit from the information that is implicit in empirical substitution matrices such as BLOSUM-62.
Taken together with the generally higher rate of synonymous mutations over non-synonymous ones, this means that the phylogenetic signal disappears much more rapidly from DNA sequences than from the encoded proteins. It is therefore preferable to align coding DNA at the amino acid level and it is for this purpose we have constructed the program RevTrans.
RevTrans constructs a multiple DNA alignment by: (i) translating the DNA; (ii) aligning the resulting peptide sequences; and (iii) building a multiple DNA alignment by 'reverse translation' of the aligned protein sequences. In the resulting DNA alignment, gaps occur in groups of three corresponding to entire codons, and analogous codon positions are therefore always lined up. These features are useful when constructing multiple DNA alignments for phylogenetic analysis. RevTrans also accepts user-provided protein alignments for greater control of the alignment process.
If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).
If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.
Correspondence:
Technical Support: