RevTrans 2.0 - DTU Health Tech - Bioinformatics Tools

Submission

Paste in DNA sequences

Upload file containing DNA sequences

View example DNA sequences

Optional: Paste in peptide alignment

Optional: Upload peptide alignment

View example peptide alignment

Valid formats: FASTA, MSF and ALN (Clustal) - any gaps will be removed from DNA sequences

Instructions: Paste in or upload DNA sequences and hit "Submit query". The RevTrans server will then virtually translate the sequences and align the resulting peptide sequences using MAFFT with default settings. Finally RevTrans constructs a multiple DNA alignment using the peptide alignment as a scaffold. The translation process is by default done using the Standard Genetic Code (alternative translation tables can be selected) and has support for the full IUPAC alphabet of degenerate nucleotides.

If you want more control over the alignment process RevTrans also accepts user provided peptide alignments. This will give you the opportunity to use your preferred alignment software and to optimize the parameters. If you need to translate your DNA sequences prior to alignment this can be done by using the "Translate only" button above.

Please read the CBS access policies for information about limitations on the daily number of submissions.

Confidentiality:
The sequences are kept confidential and will be deleted after processing.

CITATIONS

For publication of results, please cite:

Rasmus Wernersson and Anders Gorm Pedersen.
RevTrans - Constructing alignments of coding DNA from aligned amino acid sequences.
Nucl. Acids Res., 2003, 31(13), 3537-3539.

Usage instructions

Quick start

Paste in or upload DNA sequences and hit "Submit query". The RevTrans server will then virtually translate the DNA sequences and align the resulting peptide sequences using MAFFT with default settings (other alignment program can be selected). Finally RevTrans constructs a multiple DNA alignment using the peptide alignment as a scaffold.

If you want more control over the alignment process RevTrans also accepts user provided peptide alignments. This will give you the opportunity to use your preferred alignment software and to optimize the parameters. If you need to translate your DNA sequences prior to alignment this can be done by using the "Translate only" button (or by following the link to the "Virtual Ribosome" server if you want more fine.grained control over the translation process). The translation has full support for degenerate nucleotides and alternative translation tables can be selected.

When providing your own peptide alignment, RevTrans will accept arbitrarily large input files.

Detailed instructions

Supply DNA sequences

The DNA sequences must either be pasted into the webpage or uploaded via the "Choose file" button. The input must be in FASTA, MSF, or ALN (Clustal) format.

The full IUPAC degenerated DNA alphabet (not case sensitive) is supported:

A C G T R Y M K W S B D H V N

Please note that gaps and unknown symbolse.g. - and X will be discarded before processing.

Optional: Supply aligned peptide sequences

For greater control of the alignment process you have the option of also supplying a pre-computed peptide alignment. RevTrans will then use this as the scaffold for the DNA alignment. The peptide alignment must be in FASTA, MSF, or ALN (Clustal) format.

By default "-", "." and "~" will be interpreted as gap symbols.

If a peptide alignment is not supplied the RevTrans web-server will automatically construct one using the selected multiple alignment program (deafult: MAFFT). In all case the alignment program will be run with default parameters.

Submit query

Click on the "Submit query" button. If the processing of the query takes more than a few seconds you'll will get the option of supplying your email address and be notified when the job is done.

Advanced options

RevTrans has support for a number of advanced options. Typically it is not necessary to set these manually and most users can safely skip this section and proceed to submitting the query.

Data format, DNA sequences:
By default the DNA file format is automatically detected. Alternatively you may specify the format as being FASTA, MSF, or ALN (Clustal).
Data format, aligned peptide sequences:
By default the peptide alignment file format is automatically detected. Alternatively you may specify the format as being FASTA, MSF, or ALN (Clustal).
Output format:
By default the final multiple DNA alignment will be in ALN (Clustal) format. Alternatively you may specify FASTA or MSF.
Gap-In:
Here you can specify which symbol(s) denote(s) a gap in user provided peptide alignemnt. The default should be correct for virtually all standard alignment files.
Gap-out:
Here you can specify which gap symbol to use in the output.
Match DNA and peptide sequences by:
This option gives the user control over how DNA sequences paired to their peptide counterpart.
- Translation:
  (Default) The DNA sequences are translated using the standard genetic code (or an alternative translation table if selected below) with full IUPAC support and compared to the peptide sequences. The DNA sequence is paired with the first matching peptide sequence found.
- Name:
  DNA sequences are paired with peptide counterparts based on sequence entry names. Entry names must be unique within files and identical across files. If you experience trouble when using name based matching, please make sure that sequences names do match across files as some alignment software may truncate or otherwise alter sequence names.
- Position:
  DNA sequences are paired with peptide counterparts simply based on their order of appearance in the files.
Translation table:
Select an alternative translation table - used for "matching-by-translation" and with the "Translate Only" functionality.
The numbering of the translation table is the one defined by the NCBI Taxonomy Group. For a detailed description of each genetic code, please consult the following web page at NCBI: The Genetic Codes . [Main site: Taxonomy]

Alignment method
(New in RevTrans 1.4) RevTrans offers a selection of programs for performing the peptide alignment step:

Dialign 2.2:

Reference:
B. Morgenstern (1999).
DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment.
Bioinformatics 15, 211 - 218.
Dialign-T 0.1.3:

Reference:
Amarendran R. Subramanian, Jan Weyer-Menkhoff, Michael Kaufmann, Burkhard Morgenstern:
DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment
Bioinformatics 2005, 6:66.
ClustalW 1.83:

Reference:
Higgins D., Thompson J., Gibson T. Thompson J. D., Higgins D. G., Gibson T. J.(1994).
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting,position-specific gap penalties and weight matrix choice.
Nucleic Acids Res. 22:4673-4680.

Example data

Sample DNA dataset

The following is a set of unaligned Alpha-globin genes from a range for organisms.

>Sheep
ATGGCCCTGTGGACACGCCTGGTGCCCCTGCTGGCCCTGCTGGCACTCTGGGCCCCCGCC
CCGGCCCACGCCTTCGTCAACCAGCACCTGTGCGGCTCCCACCTGGTGGAGGCGCTGTAC
CTGGTGTGCGGAGAGCGCGGCTTCTTCTACACGCCCAAGGCCCGCCGGGAGGTGGAGGGC
CCCCAGGTGGGGGCGCTGGAGCTGGCCGGAGGCCCCGGCGCGGGTGGCCTGGAGGGGCCC
CCGCAGAAGCGTGGCATCGTGGAGCAGTGCTGCGCCGGCGTCTGCTCTCTCTACCAGCTG
GAGAACTACTGTAACTAG
>Pig
ATGGCCCTGTGGACGCGCCTCCTGCCCCTGCTGGCCCTGCTGGCCCTCTGGGCGCCCGCC
CCGGCCCAGGCCTTCGTGAACCAGCACCTGTGCGGCTCCCACCTGGTGGAGGCGCTGTAC
CTGGTGTGCGGGGAGCGCGGCTTCTTCTACACGCCCAAGGCCCGTCGGGAGGCGGAGAAC
CCTCAGGCAGGTGCCGTGGAGCTGGGCGGAGGCCTGGGCGGCCTGCAGGCCCTGGCGCTG
GAGGGGCCCCCGCAGAAGCGTGGCATCGTGGAGCAGTGCTGCACCAGCATCTGTTCCCTC
TACCAGCTGGAGAACTACTGCAACTAG
>Dog
ATGGCCCTCTGGATGCGCCTCCTGCCCCTGCTGGCCCTGCTGGCCCTCTGGGCGCCCGCG
CCCACCCGAGCCTTCGTTAACCAGCACCTGTGTGGCTCCCACCTGGTAGAGGCTCTGTAC
CTGGTGTGCGGGGAGCGCGGCTTCTTCTACACGCCTAAGGCCCGCAGGGAGGTGGAGGAC
CTGCAGGTGAGGGACGTGGAGCTGGCCGGGGCGCCTGGCGAGGGCGGCCTGCAGCCCCTG
GCCCTGGAGGGGGCCCTGCAGAAGCGAGGCATCGTGGAGCAGTGCTGCACCAGCATCTGC
TCCCTCTACCAGCTGGAGAATTACTGCAACTAG
>OwlMonkey
ATGGCCCTGTGGATGCACCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCCGAG
CCAGCCCCGGCCTTTGTGAACCAGCACCTGTGCGGCCCCCACCTGGTGGAAGCCCTCTAC
CTGGTGTGCGGGGAGCGAGGTTTCTTCTACGCACCCAAGACCCGCCGGGAGGCGGAGGAC
CTGCAGGTGGGGCAGGTGGAGCTGGGTGGGGGCTCTATCACGGGCAGCCTGCCACCCTTG
GAGGGTCCCATGCAGAAGCGTGGCGTCGTGGATCAGTGCTGCACCAGCATCTGCTCCCTC
TACCAGCTGCAGAACTACTGCAACTAG
>Human
ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGAC
CCAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTAC
CTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGAC
CTGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTG
GCCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGC
TCCCTCTACCAGCTGGAGAACTACTGCAACTAG
>GreenMonkey
ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGAC
CCGGTCCCGGCCTTTGTGAACCAGCACCTGTGCGGCTCCCACCTGGTGGAAGCCCTCTAC
CTGGTGTGCGGGGAGCGAGGCTTCTTCTACACGCCCAAGACCCGCCGGGAGGCAGAGGAC
CCGCAGGTGGGGCAGGTAGAGCTGGGCGGGGGCCCTGGCGCAGGCAGCCTGCAGCCCTTG
GCGCTGGAGGGGTCCCTGCAGAAGCGCGGCATCGTGGAGCAGTGCTGTACCAGCATCTGC
TCCCTCTACCAGCTGGAGAACTACTGCAACTAG
>Chimp
ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGTGCTGCTGGCCCTCTGGGGACCTGAC
CCAGCCTCGGCCTTTGTGAACCAACACCTGTGCGGCTCCCACCTGGTGGAAGCTCTCTAC
CTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGAC
CTGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTG
GCCCTGGAGGGGTCCCTGCAGAAGCGTGGTATCGTGGAACAATGCTGTACCAGCATCTGC
TCCCTCTACCAGCTGGAGAACTACTGCAACTAG
>GuineaPig
ATGGCTCTGTGGATGCATCTCCTCACCGTGCTGGCCCTGCTGGCCCTCTGGGGGCCCAAC
ACTAATCAGGCCTTTGTCAGCCGGCATCTGTGCGGCTCCAACTTAGTGGAGACATTGTAT
TCAGTGTGTCAGGATGATGGCTTCTTCTATATACCCAAGGACCGTCGGGAGCTAGAGGAC
CCACAGGTGGAGCAGACAGAACTGGGCATGGGCCTGGGGGCAGGTGGACTACAGCCCTTG
GCACTGGAGATGGCACTACAGAAGCGTGGCATTGTGGATCAGTGCTGTACTGGCACCTGC
ACACGCCACCAGCTGCAGAGCTACTGCAACTAG
>Mouse
ATGGCCCTGTTGGTGCACTTCCTACCCCTGCTGGCCCTGCTTGCCCTCTGGGAGCCCAAA
CCCACCCAGGCTTTTGTCAAACAGCATCTTTGTGGTCCCCACCTGGTAGAGGCTCTCTAC
CTGGTGTGTGGGGAGCGTGGCTTCTTCTACACACCCAAGTCCCGCCGTGAAGTGGAGGAC
CCACAAGTGGAACAACTGGAGCTGGGAGGAAGCCCCGGGGACCTTCAGACCTTGGCGTTG
GAGGTGGCCCGGCAGAAGCGTGGCATTGTGGATCAGTGCTGCACCAGCATCTGCTCCCTC
TACCAGCTGGAGAACTACTGCAACTAA
>Chicken
ATGGCTCTCTGGATCCGATCACTGCCTCTTCTGGCTCTCCTTGTCTTTTCTGGCCCTGGA
ACCAGCTATGCAGCTGCCAACCAGCACCTCTGTGGCTCCCACTTGGTGGAGGCTCTCTAC
CTGGTGTGTGGAGAGCGTGGCTTCTTCTACTCCCCCAAAGCCCGACGGGATGTCGAGCAG
CCCCTAGTGAGCAGTCCCTTGCGTGGCGAGGCAGGAGTGCTGCCTTTCCAGCAGGAGGAA
TACGAGAAAGTCAAGCGAGGGATTGTTGAGCAATGCTGCCATAACACGTGTTCCCTCTAC
CAACTGGAGAACTACTGCAACTAG

Sample peptide dataset

The following is a the Alpha-globin genes from the dataset above: translated using the standard genetic code, and aligned using MAFFT.

>Sheep
MALWTRLVPLLALLALWAPAPAHAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREVEG
PQVGALELAGGPGAGG-----LEGPPQKRGIVEQCCAGVCSLYQLENYCN
>Pig
MALWTRLLPLLALLALWAPAPAQAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREAEN
PQAGAVELGGGLG--GLQALALEGPPQKRGIVEQCCTSICSLYQLENYCN
>Dog
MALWMRLLPLLALLALWAPAPTRAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREVED
LQVRDVELAGAPGEGGLQPLALEGALQKRGIVEQCCTSICSLYQLENYCN
>OwlMonkey
MALWMHLLPLLALLALWGPEPAPAFVNQHLCGPHLVEALYLVCGERGFFYAPKTRREAED
LQVGQVELGGGSITGSLPP--LEGPMQKRGVVDQCCTSICSLYQLQNYCN
>Human
MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED
LQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
>GreenMonkey
MALWMRLLPLLALLALWGPDPVPAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED
PQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
>Chimp
MALWMRLLPLLVLLALWGPDPASAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED
LQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
>GuineaPig
MALWMHLLTVLALLALWGPNTNQAFVSRHLCGSNLVETLYSVCQDDGFFYIPKDRRELED
PQVEQTELGMGLGAGGLQPLALEMALQKRGIVDQCCTGTCTRHQLQSYCN
>Mouse
MALLVHFLPLLALLALWEPKPTQAFVKQHLCGPHLVEALYLVCGERGFFYTPKSRREVED
PQVEQLELGGSPG--DLQTLALEVARQKRGIVDQCCTSICSLYQLENYCN
>Chicken
MALWIRSLPLLALLVFSGPGTSYAAANQHLCGSHLVEALYLVCGERGFFYSPKARRDVEQ
PLVSS-PLRGEAG--VLPFQQEEYEKVKRGIVEQCCHNTCSLYQLENYCN

Output format

DESCRIPTION

By default the DNA alignment will be in ALN (Clustal) format. The user may select FASTA or MSF instead. The full alignment is shown in the browser window and is also available for download via a link at the top of the page (click on the floppy icon).

In the event of errors and warnings these will be shown at the top of the result page.

New in RevTrans 1.3: the case of the individual letters in the amino acid alignments is now carried over to the DNA alignment. Dialign2 marks amino acids considered to be fully aligned with UPPERCASE and uses lowercase to mark less well aligned regions.

RevTrans - Background

It is always preferable to align coding DNA in translated form

Why is it problematic to align DNA sequences of protein encoding genes? First, if you align coding DNA at the DNA level, then you are in effect ignoring your prior knowledge of the structure of the genetic code. Second, you are also ignoring the known evolutionary tendency of amino acids to be substituted with other amino acids that have similar physico-chemical properties. An example should make this clear:

               Codon-aligned:                 DNA-aligned:

              M   L   L   I   G
			   
             ATG CTG TTA ATA GGG             ATGCT-GTTAATAGGG  
             ATG CTC GTT AAT GGG             ATGCTCGTTAAT-GGG

              M   L   V   T   G

In the context of the genetic code, it makes perfect sense to align CTG and CTC which both encode the amino acid leucine. However, from a "DNA point of view" it makes more sense to insert a gap so the terminal G in this codon aligns with the first G in the next codon. It is also acceptable to align the codons TTA (encoding leucine) and GTT (encoding valine) since the encoded amino acids have similar properties (they are both hydrophobic).

Note: these observations also hold true for database searches. Always use a translated version of your coding sequence to search for similar genes!

Article abstract

REFERENCE

Rasmus Wernersson and Anders Gorm Pedersen.
RevTrans - Constructing alignments of coding DNA from aligned amino acid sequences.
Nucl. Acids Res., 2003, 31(13), 3537-3539.

Contact
Rasmus Wernersson: raz@cbs.dtu.dk ( Web) - Anders Gorm Pedersen: gorm@cbs.dtu.dk

ABSTRACT

The simple fact that proteins are built from 20 amino acids while DNA only contains four different bases, means that the 'signal-to-noise ratio' in protein sequence alignments is much better than in alignments of DNA. Besides this information-theoretical advantage, protein alignments also benefit from the information that is implicit in empirical substitution matrices such as BLOSUM-62.

Taken together with the generally higher rate of synonymous mutations over non-synonymous ones, this means that the phylogenetic signal disappears much more rapidly from DNA sequences than from the encoded proteins. It is therefore preferable to align coding DNA at the amino acid level and it is for this purpose we have constructed the program RevTrans.

RevTrans constructs a multiple DNA alignment by: (i) translating the DNA; (ii) aligning the resulting peptide sequences; and (iii) building a multiple DNA alignment by 'reverse translation' of the aligned protein sequences. In the resulting DNA alignment, gaps occur in groups of three corresponding to entire codons, and analogous codon positions are therefore always lined up. These features are useful when constructing multiple DNA alignments for phylogenetic analysis. RevTrans also accepts user-provided protein alignments for greater control of the alignment process.

RevTrans - 2.0

Multiple alignment of coding DNA using protein level information