DTU Health Tech

Department of Health Technology

RevTrans - 2.0

Multiple alignment of coding DNA using protein level information


RevTrans takes a set of DNA sequences, virtually translates them, aligns the peptide sequences, and uses this as a scaffold for constructing the corresponding DNA multiple alignment.

New in RevTrans 2.0: Integration with Virtual Ribosome for translation and ORF finding, visualization of alignments using JalView, more alignments programs: MAFFT, T-COFFEE, Dialign 2, Dialign-T and ClustalW2. Improved tab-based interface.

Submission


Paste in DNA sequences

Upload file containing DNA sequences


View example DNA sequences

Optional: Paste in peptide alignment

Optional: Upload peptide alignment


View example peptide alignment
Valid formats: FASTA, MSF and ALN (Clustal) - any gaps will be removed from DNA sequences

Alignment method

Translation table:


Translate from position 1
ORF finder: require ATG start codon.
ORF finder: require any start codon (ATG, TTG, CTG in the standard code).
ORF finder: do not require a start codon.

The translation is performed using the VirtualRibosome software. For more fine-grained control of the translation process (including advanced settings for the ORF finder), please follow this link to the VirtualRibosome server.

Gap-In

Characters defined as gaps in the input files.


Leave blank to use default settings: ".-~".

 

Match DNA and peptide sequences by:

Translation
Name
Position

Gap-Out

Character to use as gap indicator in the final DNA alignment.


Leave blank to use default settings: "-".

RevTrans can generate several levels of verbose rapports

This is especially useful for debugging issues were the corresponding DNA/Peptide sequences are not matched correctly.

Level 0: Error messages only (Default)
Level 1: Info about files/input, number of sequences read etc.
Level 2: As level 1 + Print detailed info about all the sequence names
Level 3: As level 2 + Do a sanity check on the degapped lenght of the sequences. Warn if the sizes do not match.

Instructions: Paste in or upload DNA sequences and hit "Submit query". The RevTrans server will then virtually translate the sequences and align the resulting peptide sequences using MAFFT with default settings. Finally RevTrans constructs a multiple DNA alignment using the peptide alignment as a scaffold. The translation process is by default done using the Standard Genetic Code (alternative translation tables can be selected) and has support for the full IUPAC alphabet of degenerate nucleotides.

If you want more control over the alignment process RevTrans also accepts user provided peptide alignments. This will give you the opportunity to use your preferred alignment software and to optimize the parameters. If you need to translate your DNA sequences prior to alignment this can be done by using the "Translate only" button above.

Please read the CBS access policies for information about limitations on the daily number of submissions.

Confidentiality:
The sequences are kept confidential and will be deleted after processing.


CITATIONS

For publication of results, please cite:

Rasmus Wernersson and Anders Gorm Pedersen.
RevTrans - Constructing alignments of coding DNA from aligned amino acid sequences.
Nucl. Acids Res., 2003, 31(13), 3537-3539.

Usage instructions


Quick start

Paste in or upload DNA sequences and hit "Submit query". The RevTrans server will then virtually translate the DNA sequences and align the resulting peptide sequences using MAFFT with default settings (other alignment program can be selected). Finally RevTrans constructs a multiple DNA alignment using the peptide alignment as a scaffold.

If you want more control over the alignment process RevTrans also accepts user provided peptide alignments. This will give you the opportunity to use your preferred alignment software and to optimize the parameters. If you need to translate your DNA sequences prior to alignment this can be done by using the "Translate only" button (or by following the link to the "Virtual Ribosome" server if you want more fine.grained control over the translation process). The translation has full support for degenerate nucleotides and alternative translation tables can be selected.

When providing your own peptide alignment, RevTrans will accept arbitrarily large input files.


Detailed instructions

Supply DNA sequences

The DNA sequences must either be pasted into the webpage or uploaded via the "Choose file" button. The input must be in FASTA, MSF, or ALN (Clustal) format.

The full IUPAC degenerated DNA alphabet (not case sensitive) is supported:

A C G T R Y M K W S B D H V N

Please note that gaps and unknown symbolse.g. - and X will be discarded before processing.


Optional: Supply aligned peptide sequences

For greater control of the alignment process you have the option of also supplying a pre-computed peptide alignment. RevTrans will then use this as the scaffold for the DNA alignment. The peptide alignment must be in FASTA, MSF, or ALN (Clustal) format.

By default "-", "." and "~" will be interpreted as gap symbols.

If a peptide alignment is not supplied the RevTrans web-server will automatically construct one using the selected multiple alignment program (deafult: MAFFT). In all case the alignment program will be run with default parameters.


Submit query

Click on the "Submit query" button. If the processing of the query takes more than a few seconds you'll will get the option of supplying your email address and be notified when the job is done.


Advanced options

RevTrans has support for a number of advanced options. Typically it is not necessary to set these manually and most users can safely skip this section and proceed to submitting the query.

  • Data format, DNA sequences:
    By default the DNA file format is automatically detected. Alternatively you may specify the format as being FASTA, MSF, or ALN (Clustal).

  • Data format, aligned peptide sequences:
    By default the peptide alignment file format is automatically detected. Alternatively you may specify the format as being FASTA, MSF, or ALN (Clustal).

  • Output format:
    By default the final multiple DNA alignment will be in ALN (Clustal) format. Alternatively you may specify FASTA or MSF.

  • Gap-In:
    Here you can specify which symbol(s) denote(s) a gap in user provided peptide alignemnt. The default should be correct for virtually all standard alignment files.

  • Gap-out:
    Here you can specify which gap symbol to use in the output.

  • Match DNA and peptide sequences by:
    This option gives the user control over how DNA sequences paired to their peptide counterpart.

    • Translation:
      (Default) The DNA sequences are translated using the standard genetic code (or an alternative translation table if selected below) with full IUPAC support and compared to the peptide sequences. The DNA sequence is paired with the first matching peptide sequence found.

    • Name:
      DNA sequences are paired with peptide counterparts based on sequence entry names. Entry names must be unique within files and identical across files. If you experience trouble when using name based matching, please make sure that sequences names do match across files as some alignment software may truncate or otherwise alter sequence names.

    • Position:
      DNA sequences are paired with peptide counterparts simply based on their order of appearance in the files.

  • Translation table:
    Select an alternative translation table - used for "matching-by-translation" and with the "Translate Only" functionality.

    The numbering of the translation table is the one defined by the NCBI Taxonomy Group. For a detailed description of each genetic code, please consult the following web page at NCBI: The Genetic Codes . [Main site: Taxonomy]

  • Alignment method
    (New in RevTrans 1.4) RevTrans offers a selection of programs for performing the peptide alignment step:

    • Dialign 2.2:

      Reference:
      B. Morgenstern (1999).
      DIALIGN 2: improvement of the segment-to-segment approach to multiple sequence alignment.
      Bioinformatics 15, 211 - 218.

    • Dialign-T 0.1.3:

      Reference:
      Amarendran R. Subramanian, Jan Weyer-Menkhoff, Michael Kaufmann, Burkhard Morgenstern:
      DIALIGN-T: An improved algorithm for segment-based multiple sequence alignment
      Bioinformatics 2005, 6:66.

    • ClustalW 1.83:

      Reference:
      Higgins D., Thompson J., Gibson T. Thompson J. D., Higgins D. G., Gibson T. J.(1994).
      CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting,position-specific gap penalties and weight matrix choice.
      Nucleic Acids Res. 22:4673-4680.

    Example data

    Sample DNA dataset

    The following is a set of unaligned Alpha-globin genes from a range for organisms.
    >Sheep
    ATGGCCCTGTGGACACGCCTGGTGCCCCTGCTGGCCCTGCTGGCACTCTGGGCCCCCGCC
    CCGGCCCACGCCTTCGTCAACCAGCACCTGTGCGGCTCCCACCTGGTGGAGGCGCTGTAC
    CTGGTGTGCGGAGAGCGCGGCTTCTTCTACACGCCCAAGGCCCGCCGGGAGGTGGAGGGC
    CCCCAGGTGGGGGCGCTGGAGCTGGCCGGAGGCCCCGGCGCGGGTGGCCTGGAGGGGCCC
    CCGCAGAAGCGTGGCATCGTGGAGCAGTGCTGCGCCGGCGTCTGCTCTCTCTACCAGCTG
    GAGAACTACTGTAACTAG
    >Pig
    ATGGCCCTGTGGACGCGCCTCCTGCCCCTGCTGGCCCTGCTGGCCCTCTGGGCGCCCGCC
    CCGGCCCAGGCCTTCGTGAACCAGCACCTGTGCGGCTCCCACCTGGTGGAGGCGCTGTAC
    CTGGTGTGCGGGGAGCGCGGCTTCTTCTACACGCCCAAGGCCCGTCGGGAGGCGGAGAAC
    CCTCAGGCAGGTGCCGTGGAGCTGGGCGGAGGCCTGGGCGGCCTGCAGGCCCTGGCGCTG
    GAGGGGCCCCCGCAGAAGCGTGGCATCGTGGAGCAGTGCTGCACCAGCATCTGTTCCCTC
    TACCAGCTGGAGAACTACTGCAACTAG
    >Dog
    ATGGCCCTCTGGATGCGCCTCCTGCCCCTGCTGGCCCTGCTGGCCCTCTGGGCGCCCGCG
    CCCACCCGAGCCTTCGTTAACCAGCACCTGTGTGGCTCCCACCTGGTAGAGGCTCTGTAC
    CTGGTGTGCGGGGAGCGCGGCTTCTTCTACACGCCTAAGGCCCGCAGGGAGGTGGAGGAC
    CTGCAGGTGAGGGACGTGGAGCTGGCCGGGGCGCCTGGCGAGGGCGGCCTGCAGCCCCTG
    GCCCTGGAGGGGGCCCTGCAGAAGCGAGGCATCGTGGAGCAGTGCTGCACCAGCATCTGC
    TCCCTCTACCAGCTGGAGAATTACTGCAACTAG
    >OwlMonkey
    ATGGCCCTGTGGATGCACCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCCGAG
    CCAGCCCCGGCCTTTGTGAACCAGCACCTGTGCGGCCCCCACCTGGTGGAAGCCCTCTAC
    CTGGTGTGCGGGGAGCGAGGTTTCTTCTACGCACCCAAGACCCGCCGGGAGGCGGAGGAC
    CTGCAGGTGGGGCAGGTGGAGCTGGGTGGGGGCTCTATCACGGGCAGCCTGCCACCCTTG
    GAGGGTCCCATGCAGAAGCGTGGCGTCGTGGATCAGTGCTGCACCAGCATCTGCTCCCTC
    TACCAGCTGCAGAACTACTGCAACTAG
    >Human
    ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGAC
    CCAGCCGCAGCCTTTGTGAACCAACACCTGTGCGGCTCACACCTGGTGGAAGCTCTCTAC
    CTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGAC
    CTGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTG
    GCCCTGGAGGGGTCCCTGCAGAAGCGTGGCATTGTGGAACAATGCTGTACCAGCATCTGC
    TCCCTCTACCAGCTGGAGAACTACTGCAACTAG
    >GreenMonkey
    ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCCCTCTGGGGACCTGAC
    CCGGTCCCGGCCTTTGTGAACCAGCACCTGTGCGGCTCCCACCTGGTGGAAGCCCTCTAC
    CTGGTGTGCGGGGAGCGAGGCTTCTTCTACACGCCCAAGACCCGCCGGGAGGCAGAGGAC
    CCGCAGGTGGGGCAGGTAGAGCTGGGCGGGGGCCCTGGCGCAGGCAGCCTGCAGCCCTTG
    GCGCTGGAGGGGTCCCTGCAGAAGCGCGGCATCGTGGAGCAGTGCTGTACCAGCATCTGC
    TCCCTCTACCAGCTGGAGAACTACTGCAACTAG
    >Chimp
    ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGTGCTGCTGGCCCTCTGGGGACCTGAC
    CCAGCCTCGGCCTTTGTGAACCAACACCTGTGCGGCTCCCACCTGGTGGAAGCTCTCTAC
    CTAGTGTGCGGGGAACGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGAC
    CTGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGCAGCCTGCAGCCCTTG
    GCCCTGGAGGGGTCCCTGCAGAAGCGTGGTATCGTGGAACAATGCTGTACCAGCATCTGC
    TCCCTCTACCAGCTGGAGAACTACTGCAACTAG
    >GuineaPig
    ATGGCTCTGTGGATGCATCTCCTCACCGTGCTGGCCCTGCTGGCCCTCTGGGGGCCCAAC
    ACTAATCAGGCCTTTGTCAGCCGGCATCTGTGCGGCTCCAACTTAGTGGAGACATTGTAT
    TCAGTGTGTCAGGATGATGGCTTCTTCTATATACCCAAGGACCGTCGGGAGCTAGAGGAC
    CCACAGGTGGAGCAGACAGAACTGGGCATGGGCCTGGGGGCAGGTGGACTACAGCCCTTG
    GCACTGGAGATGGCACTACAGAAGCGTGGCATTGTGGATCAGTGCTGTACTGGCACCTGC
    ACACGCCACCAGCTGCAGAGCTACTGCAACTAG
    >Mouse
    ATGGCCCTGTTGGTGCACTTCCTACCCCTGCTGGCCCTGCTTGCCCTCTGGGAGCCCAAA
    CCCACCCAGGCTTTTGTCAAACAGCATCTTTGTGGTCCCCACCTGGTAGAGGCTCTCTAC
    CTGGTGTGTGGGGAGCGTGGCTTCTTCTACACACCCAAGTCCCGCCGTGAAGTGGAGGAC
    CCACAAGTGGAACAACTGGAGCTGGGAGGAAGCCCCGGGGACCTTCAGACCTTGGCGTTG
    GAGGTGGCCCGGCAGAAGCGTGGCATTGTGGATCAGTGCTGCACCAGCATCTGCTCCCTC
    TACCAGCTGGAGAACTACTGCAACTAA
    >Chicken
    ATGGCTCTCTGGATCCGATCACTGCCTCTTCTGGCTCTCCTTGTCTTTTCTGGCCCTGGA
    ACCAGCTATGCAGCTGCCAACCAGCACCTCTGTGGCTCCCACTTGGTGGAGGCTCTCTAC
    CTGGTGTGTGGAGAGCGTGGCTTCTTCTACTCCCCCAAAGCCCGACGGGATGTCGAGCAG
    CCCCTAGTGAGCAGTCCCTTGCGTGGCGAGGCAGGAGTGCTGCCTTTCCAGCAGGAGGAA
    TACGAGAAAGTCAAGCGAGGGATTGTTGAGCAATGCTGCCATAACACGTGTTCCCTCTAC
    CAACTGGAGAACTACTGCAACTAG
    

    Sample peptide dataset

    The following is a the Alpha-globin genes from the dataset above: translated using the standard genetic code, and aligned using MAFFT.
    >Sheep
    MALWTRLVPLLALLALWAPAPAHAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREVEG
    PQVGALELAGGPGAGG-----LEGPPQKRGIVEQCCAGVCSLYQLENYCN
    >Pig
    MALWTRLLPLLALLALWAPAPAQAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREAEN
    PQAGAVELGGGLG--GLQALALEGPPQKRGIVEQCCTSICSLYQLENYCN
    >Dog
    MALWMRLLPLLALLALWAPAPTRAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREVED
    LQVRDVELAGAPGEGGLQPLALEGALQKRGIVEQCCTSICSLYQLENYCN
    >OwlMonkey
    MALWMHLLPLLALLALWGPEPAPAFVNQHLCGPHLVEALYLVCGERGFFYAPKTRREAED
    LQVGQVELGGGSITGSLPP--LEGPMQKRGVVDQCCTSICSLYQLQNYCN
    >Human
    MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED
    LQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
    >GreenMonkey
    MALWMRLLPLLALLALWGPDPVPAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED
    PQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
    >Chimp
    MALWMRLLPLLVLLALWGPDPASAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAED
    LQVGQVELGGGPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN
    >GuineaPig
    MALWMHLLTVLALLALWGPNTNQAFVSRHLCGSNLVETLYSVCQDDGFFYIPKDRRELED
    PQVEQTELGMGLGAGGLQPLALEMALQKRGIVDQCCTGTCTRHQLQSYCN
    >Mouse
    MALLVHFLPLLALLALWEPKPTQAFVKQHLCGPHLVEALYLVCGERGFFYTPKSRREVED
    PQVEQLELGGSPG--DLQTLALEVARQKRGIVDQCCTSICSLYQLENYCN
    >Chicken
    MALWIRSLPLLALLVFSGPGTSYAAANQHLCGSHLVEALYLVCGERGFFYSPKARRDVEQ
    PLVSS-PLRGEAG--VLPFQQEEYEKVKRGIVEQCCHNTCSLYQLENYCN
    

Output format


DESCRIPTION

By default the DNA alignment will be in ALN (Clustal) format. The user may select FASTA or MSF instead. The full alignment is shown in the browser window and is also available for download via a link at the top of the page (click on the floppy icon).

In the event of errors and warnings these will be shown at the top of the result page.

New in RevTrans 1.3: the case of the individual letters in the amino acid alignments is now carried over to the DNA alignment. Dialign2 marks amino acids considered to be fully aligned with UPPERCASE and uses lowercase to mark less well aligned regions.

RevTrans - Background

It is always preferable to align coding DNA in translated form


Why is it problematic to align DNA sequences of protein encoding genes? First, if you align coding DNA at the DNA level, then you are in effect ignoring your prior knowledge of the structure of the genetic code. Second, you are also ignoring the known evolutionary tendency of amino acids to be substituted with other amino acids that have similar physico-chemical properties. An example should make this clear:

               Codon-aligned:                 DNA-aligned:

M L L I G

ATG CTG TTA ATA GGG ATGCT-GTTAATAGGG
ATG CTC GTT AAT GGG ATGCTCGTTAAT-GGG

M L V T G

In the context of the genetic code, it makes perfect sense to align CTG and CTC which both encode the amino acid leucine. However, from a "DNA point of view" it makes more sense to insert a gap so the terminal G in this codon aligns with the first G in the next codon. It is also acceptable to align the codons TTA (encoding leucine) and GTT (encoding valine) since the encoded amino acids have similar properties (they are both hydrophobic).

Note: these observations also hold true for database searches. Always use a translated version of your coding sequence to search for similar genes!

Article abstract


REFERENCE

Rasmus Wernersson and Anders Gorm Pedersen.
RevTrans - Constructing alignments of coding DNA from aligned amino acid sequences.
Nucl. Acids Res., 2003, 31(13), 3537-3539.

Contact
Rasmus Wernersson: raz@cbs.dtu.dk ( Web) - Anders Gorm Pedersen: gorm@cbs.dtu.dk

ABSTRACT

The simple fact that proteins are built from 20 amino acids while DNA only contains four different bases, means that the 'signal-to-noise ratio' in protein sequence alignments is much better than in alignments of DNA. Besides this information-theoretical advantage, protein alignments also benefit from the information that is implicit in empirical substitution matrices such as BLOSUM-62.

Taken together with the generally higher rate of synonymous mutations over non-synonymous ones, this means that the phylogenetic signal disappears much more rapidly from DNA sequences than from the encoded proteins. It is therefore preferable to align coding DNA at the amino acid level and it is for this purpose we have constructed the program RevTrans.

RevTrans constructs a multiple DNA alignment by: (i) translating the DNA; (ii) aligning the resulting peptide sequences; and (iii) building a multiple DNA alignment by 'reverse translation' of the aligned protein sequences. In the resulting DNA alignment, gaps occur in groups of three corresponding to entire codons, and analogous codon positions are therefore always lined up. These features are useful when constructing multiple DNA alignments for phylogenetic analysis. RevTrans also accepts user-provided protein alignments for greater control of the alignment process.



GETTING HELP

If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Correspondence: Technical Support: