TreeHugger - 1.0

Constructing NJ or UPGMA trees from DNA and Protein alignments.

Sequence submission: Paste an alignment or upload an aligment file. TreeHugger accepts both DNA and protein alignments, and recognizes the following formats: Fasta, Clustal, Nexus, Phylip, Stockholm, tab, raw.
Paste alignment
Upload file

Algorithm Neighbor Joining UPGMA	Handling of alignment gaps Complete deletion Pairwise deletion Count gaps	Rooting of tree Minimum variance rooting Midpoint rooting Outgroup rooting (one name per line)

Sequence submission

Paste an alignment or upload an aligment file. TreeHugger accepts both DNA and protein alignments, and recognizes the following alignment formats: Fasta, Clustal, Nexus, Phylip, Stockholm, tab, raw.

Note:

The following characters are illegal in sequence names, and will be automatically replaced by underscores if present: ,:;()[]

Algorithm

TreeHugger can reconstruct phylogenetic trees using two different distance-based algorithms:

Neighbor joining (default): NJ creates trees that are unrooted, and that are not constrained to be ultrametric. NJ trees therefore typically more accurately reflect the distances between sequences in the original alignment.
UPGMA: (Unweighted Pair Group Method with Arithmetic Mean) creates rooted, ultrametric trees, where all leaves are the same distance from the root.

Note: Since NJ trees are unrooted you have to actively place the root before interpreting the tree when using this algorithm. This can be done using one of the rooting options in TreeHugger ("minimum variance", "midpoint", or "outgroup"), or it can be done in postprocessing (for instance FigTree allows you to select any branch in the tree and root on that).

Handling of alignment gaps

When computing the distance matrix, TreeHugger uses the so-called p-distance as a measure of the difference between each pair of sequences. The p-distance is calculated by counting the number of positions in the sequences where the two sequences have different residues, and dividing by the total number of positions in the alignment. The resulting value is a decimal between 0 and 1, where a value of 0 indicates that the two sequences are identical, and a value of 1 indicates that they are completely different.

TreeHugger can deal with gaps in three different ways when computing the distance matrix:

Complete deletion (default): remove all gap-containing columns from the multiple alignment before computing the p-distances. Note: for gappy alignments this option may remove a large part of the alignment!
Pairwise deletion: when computing the p-distance between a pair of sequences: ignore positions with gaps in either sequence. This means that different positions in the alignment may be used when computing distances for different pairs of sequences.
Count gaps: include gaps when computing p-distances between pairs of sequences, i.e., treat the gap character "-" as an extra residue type.

Pairwise deletion has the advantage, compared to complete deletion, that more sequence information is retained. However, since different regions of an alignment typically evolves at different rates (some sites are more conserved than others), distances can be skewed by this procedure. Counting gaps retains all information, but may lead to artefactually high distances when there are multiple gap characters in a row (each gap character is counted as one change, even if entire gap was created as one evolutionary event, thus overcounting the differences).

Rooting trees

TreeHugger can root trees in three different ways:

Minimum variance rooting (default): The root is placed such that the variance of the root-to-tip distances is minimal.
Midpoint rooting: the root is placed halfway between the two most distant leaves.
Outgroup rooting: the root is placed between the outgroup and the ingroup. You specify an outgroup by adding one or more names in the textbox below the option button. Note: The listed names have to form a monophyletic group on the resulting tree (there must be a clade with only the outgroup members).

Note: UPGMA automatically creates rooted trees, so the rooting options are disabled if this algorithm has been chosen.

Output

The phylogenetic tree is output in Newick format in a text box. There is also a link for downloading the tree as a textfile. The newick format tree can be viewed using a treeviewer such as e.g., FigTree.

Example Data

Alignment of DNA sequences in FASTA format

>Sheep
ATGGCCCTGTGGACACGCCTGGTGCCCCTGCTGGCCCTGCTGGCA
CTCTGGGCCCCCGCCCCGGCCCACGCCTTCGTCAACCAGCACCTG
TGCGGCTCCCACCTGGTGGAGGCGCTGTACCTGGTGTGCGGAGAG
CGCGGCTTCTTCTACACGCCCAAGGCCCGCCGGGAGGTGGAGGGC
CCCCAGGTGGGGGCGCTGGAGCTGGCCGGAGGCCCC------GGC
---------GCGGGTGGCCTGGAGGGGCCCCCGCAGAAGCGTGGC
ATCGTGGAGCAGTGCTGCGCCGGCGTCTGCTCTCTCTACCAGCTG
GAGAACTACTGTAAC
>OwlMonkey
ATGGCCCTGTGGATGCACCTCCTGCCCCTGCTGGCGCTGCTGGCC
CTCTGGGGACCCGAGCCAGCCCCGGCCTTTGTGAACCAGCACCTG
TGCGGCCCCCACCTGGTGGAAGCCCTCTACCTGGTGTGCGGGGAG
CGAGGTTTCTTCTACGCACCCAAGACCCGCCGGGAGGCGGAGGAC
CTGCAGGTGGGGCAGGTGGAGCTGGGTGGGGGCTCTATCACGGGC
AGCCTGCCACCC------TTGGAGGGTCCCATGCAGAAGCGTGGC
GTCGTGGATCAGTGCTGCACCAGCATCTGCTCCCTCTACCAGCTG
CAGAACTACTGCAAC
>Chimp
ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGTGCTGCTGGCC
CTCTGGGGACCTGACCCAGCCTCGGCCTTTGTGAACCAACACCTG
TGCGGCTCCCACCTGGTGGAAGCTCTCTACCTAGTGTGCGGGGAA
CGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGAC
CTGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGC
AGCCTGCAGCCCTTGGCCCTGGAGGGGTCCCTGCAGAAGCGTGGT
ATCGTGGAACAATGCTGTACCAGCATCTGCTCCCTCTACCAGCTG
GAGAACTACTGCAAC
>Dog
ATGGCCCTCTGGATGCGCCTCCTGCCCCTGCTGGCCCTGCTGGCC
CTCTGGGCGCCCGCGCCCACCCGAGCCTTCGTTAACCAGCACCTG
TGTGGCTCCCACCTGGTAGAGGCTCTGTACCTGGTGTGCGGGGAG
CGCGGCTTCTTCTACACGCCTAAGGCCCGCAGGGAGGTGGAGGAC
CTGCAGGTGAGGGACGTGGAGCTGGCCGGGGCGCCTGGCGAGGGC
GGCCTGCAGCCCCTGGCCCTGGAGGGGGCCCTGCAGAAGCGAGGC
ATCGTGGAGCAGTGCTGCACCAGCATCTGCTCCCTCTACCAGCTG
GAGAATTACTGCAAC
>Pig
ATGGCCCTGTGGACGCGCCTCCTGCCCCTGCTGGCCCTGCTGGCC
CTCTGGGCGCCCGCCCCGGCCCAGGCCTTCGTGAACCAGCACCTG
TGCGGCTCCCACCTGGTGGAGGCGCTGTACCTGGTGTGCGGGGAG
CGCGGCTTCTTCTACACGCCCAAGGCCCGTCGGGAGGCGGAGAAC
CCTCAGGCAGGTGCCGTGGAGCTGGGCGGAGGCCTG------GGC
GGCCTGCAGGCCCTGGCGCTGGAGGGGCCCCCGCAGAAGCGTGGC
ATCGTGGAGCAGTGCTGCACCAGCATCTGTTCCCTCTACCAGCTG
GAGAACTACTGCAAC
>GuineaPig
ATGGCTCTGTGGATGCATCTCCTCACCGTGCTGGCCCTGCTGGCC
CTCTGGGGGCCCAACACTAATCAGGCCTTTGTCAGCCGGCATCTG
TGCGGCTCCAACTTAGTGGAGACATTGTATTCAGTGTGTCAGGAT
GATGGCTTCTTCTATATACCCAAGGACCGTCGGGAGCTAGAGGAC
CCACAGGTGGAGCAGACAGAACTGGGCATGGGCCTGGGGGCAGGT
GGACTACAGCCCTTGGCACTGGAGATGGCACTACAGAAGCGTGGC
ATTGTGGATCAGTGCTGTACTGGCACCTGCACACGCCACCAGCTG
CAGAGCTACTGCAAC
>GreenMonkey
ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCC
CTCTGGGGACCTGACCCGGTCCCGGCCTTTGTGAACCAGCACCTG
TGCGGCTCCCACCTGGTGGAAGCCCTCTACCTGGTGTGCGGGGAG
CGAGGCTTCTTCTACACGCCCAAGACCCGCCGGGAGGCAGAGGAC
CCGCAGGTGGGGCAGGTAGAGCTGGGCGGGGGCCCTGGCGCAGGC
AGCCTGCAGCCCTTGGCGCTGGAGGGGTCCCTGCAGAAGCGCGGC
ATCGTGGAGCAGTGCTGTACCAGCATCTGCTCCCTCTACCAGCTG
GAGAACTACTGCAAC
>Human
ATGGCCCTGTGGATGCGCCTCCTGCCCCTGCTGGCGCTGCTGGCC
CTCTGGGGACCTGACCCAGCCGCAGCCTTTGTGAACCAACACCTG
TGCGGCTCACACCTGGTGGAAGCTCTCTACCTAGTGTGCGGGGAA
CGAGGCTTCTTCTACACACCCAAGACCCGCCGGGAGGCAGAGGAC
CTGCAGGTGGGGCAGGTGGAGCTGGGCGGGGGCCCTGGTGCAGGC
AGCCTGCAGCCCTTGGCCCTGGAGGGGTCCCTGCAGAAGCGTGGC
ATTGTGGAACAATGCTGTACCAGCATCTGCTCCCTCTACCAGCTG
GAGAACTACTGCAAC
>Chicken
ATGGCTCTCTGGATCCGATCACTGCCTCTTCTGGCTCTCCTTGTC
TTTTCTGGCCCTGGAACCAGCTATGCAGCTGCCAACCAGCACCTC
TGTGGCTCCCACTTGGTGGAGGCTCTCTACCTGGTGTGTGGAGAG
CGTGGCTTCTTCTACTCCCCCAAAGCCCGACGGGATGTCGAGCAG
CCCCTAGTGAGCAGTCCCTTGCGTGGCGAGGCA---------GGA
GTGCTGCCTTTCCAGCAGGAGGAATACGAGAAAGTCAAGCGAGGG
ATTGTTGAGCAATGCTGCCATAACACGTGTTCCCTCTACCAACTG
GAGAACTACTGCAAC
>Mouse
ATGGCCCTGTTGGTGCACTTCCTACCCCTGCTGGCCCTGCTTGCC
CTCTGGGAGCCCAAACCCACCCAGGCTTTTGTCAAACAGCATCTT
TGTGGTCCCCACCTGGTAGAGGCTCTCTACCTGGTGTGTGGGGAG
CGTGGCTTCTTCTACACACCCAAGTCCCGCCGTGAAGTGGAGGAC
CCACAAGTGGAACAACTGGAGCTGGGAGGAAGCCCC------GGG
GACCTTCAGACCTTGGCGTTGGAGGTGGCCCGGCAGAAGCGTGGC
ATTGTGGATCAGTGCTGCACCAGCATCTGCTCCCTCTACCAGCTG
GAGAACTACTGCAAC

Alignment of protein sequences in CLUSTAL format

CLUSTAL W (1.82) multiple sequence alignment


Sheep            MALWTRLVPLLALLALWAPAPAHAFVNQHLCGSHLVEALYLVCGE
OwlMonkey        MALWMHLLPLLALLALWGPEPAPAFVNQHLCGPHLVEALYLVCGE
Chimp            MALWMRLLPLLVLLALWGPDPASAFVNQHLCGSHLVEALYLVCGE
Dog              MALWMRLLPLLALLALWAPAPTRAFVNQHLCGSHLVEALYLVCGE
Pig              MALWTRLLPLLALLALWAPAPAQAFVNQHLCGSHLVEALYLVCGE
GuineaPig        MALWMHLLTVLALLALWGPNTNQAFVSRHLCGSNLVETLYSVCQD
GreenMonkey      MALWMRLLPLLALLALWGPDPVPAFVNQHLCGSHLVEALYLVCGE
Human            MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGE
Chicken          MALWIRSLPLLALLVFSGPGTSYAAANQHLCGSHLVEALYLVCGE
Mouse            MALLVHFLPLLALLALWEPKPTQAFVKQHLCGPHLVEALYLVCGE
                 ***  : :.:*.**.:  * .  * ..:****.:***:** ** :

Sheep            RGFFYTPKARREVEGPQVGALELAGGP--G---AGGLEGPPQKRG
OwlMonkey        RGFFYAPKTRREAEDLQVGQVELGGGSITGSLPP--LEGPMQKRG
Chimp            RGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRG
Dog              RGFFYTPKARREVEDLQVRDVELAGAPGEGGLQPLALEGALQKRG
Pig              RGFFYTPKARREAENPQAGAVELGGGL--GGLQALALEGPPQKRG
GuineaPig        DGFFYIPKDRRELEDPQVEQTELGMGLGAGGLQPLALEMALQKRG
GreenMonkey      RGFFYTPKTRREAEDPQVGQVELGGGPGAGSLQPLALEGSLQKRG
Human            RGFFYTPKTRREAEDLQVGQVELGGGPGAGSLQPLALEGSLQKRG
Chicken          RGFFYSPKARRDVEQPLVSSPLRGEA---GVLPFQQEEYEKVKRG
Mouse            RGFFYTPKSRREVEDPQVEQLELGGSP--GDLQTLALEVARQKRG
                  **** ** **: *   .     . .   *       *    ***

Sheep            IVEQCCAGVCSLYQLENYCN
OwlMonkey        VVDQCCTSICSLYQLQNYCN
Chimp            IVEQCCTSICSLYQLENYCN
Dog              IVEQCCTSICSLYQLENYCN
Pig              IVEQCCTSICSLYQLENYCN
GuineaPig        IVDQCCTGTCTRHQLQSYCN
GreenMonkey      IVEQCCTSICSLYQLENYCN
Human            IVEQCCTSICSLYQLENYCN
Chicken          IVEQCCHNTCSLYQLENYCN
Mouse            IVDQCCTSICSLYQLENYCN
                 :*:*** . *: :**:.***

Rasmus Wernersson & Anders Gorm Pedersen
TreeHugger - your friendly local neighborhood tree builder.

GETTING HELP

If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Correspondence: Technical Support: