Constructing NJ or UPGMA trees from DNA and Protein alignments.
Paste an alignment or upload an aligment file.
TreeHugger accepts both DNA and protein alignments,
and recognizes the following alignment formats: Fasta, Clustal, Nexus, Phylip, Stockholm, tab, raw.
The following characters are illegal in sequence names, and will be automatically replaced by underscores if present:
TreeHugger can reconstruct phylogenetic trees using two different distance-based algorithms:
Neighbor joining (default): NJ creates trees that are unrooted, and that are not constrained to be ultrametric. NJ trees therefore typically more accurately reflect the distances between sequences in the original alignment.
UPGMA: (Unweighted Pair Group Method with Arithmetic Mean) creates rooted, ultrametric trees, where all leaves are the same distance from the root.
Note: Since NJ trees are unrooted you have to actively place the root before interpreting the tree when using this algorithm. This can be done using one of the rooting options in TreeHugger ("minimum variance", "midpoint", or "outgroup"), or it can be done in postprocessing (for instance FigTree allows you to select any branch in the tree and root on that).
Handling of alignment gaps
When computing the distance matrix, TreeHugger uses the so-called p-distance as a measure of the difference between each pair of sequences. The p-distance is calculated by counting the number of positions in the sequences where the two sequences have different residues, and dividing by the total number of positions in the alignment. The resulting value is a decimal between 0 and 1, where a value of 0 indicates that the two sequences are identical, and a value of 1 indicates that they are completely different.
TreeHugger can deal with gaps in three different ways when computing the distance matrix:
Complete deletion (default): remove all gap-containing columns from the multiple alignment before computing the p-distances. Note: for gappy alignments this option may remove a large part of the alignment!
Pairwise deletion: when computing the p-distance between a pair of sequences: ignore positions with gaps in either sequence. This
means that different positions in the alignment may be used when computing distances for different pairs of sequences.
Count gaps: include gaps when computing p-distances between pairs of sequences, i.e., treat the gap character "-" as an extra residue type.
Pairwise deletion has the advantage, compared to complete deletion, that more sequence information is retained. However, since different regions of an alignment typically evolves at different rates (some sites are more conserved than others), distances can be skewed by this procedure. Counting gaps retains all information, but may lead to artefactually high distances when there are multiple gap characters in a row (each gap character is counted as one change, even if entire gap was created as one evolutionary event, thus overcounting the differences).
TreeHugger can root trees in three different ways:
Minimum variance rooting (default): The root is placed such that the variance of the root-to-tip distances is minimal.
Midpoint rooting: the root is placed halfway between the two most distant leaves.
Outgroup rooting: the root is placed between the outgroup and the ingroup. You specify an outgroup by adding one or more names in the textbox below the option button. Note: The listed names have to form a monophyletic group on the resulting tree (there must be a clade with only the outgroup members).
Note: UPGMA automatically creates rooted trees, so the rooting options are disabled if this algorithm has been chosen.
The phylogenetic tree is output in Newick format in a text box.
There is also a link for downloading the
tree as a textfile. The newick format tree can be viewed using a treeviewer such as e.g.,