HaploCart - 1.0

Human mtDNA haplogroup classification using a pangenome reference graph

Submit data

Background

HaploCart performs maximum likelihood estimation to predict the mitochondrial haplogroup for reads originating from a modern human sample. The program also optionally provides confidence estimation of the phylogenetic placement of the sample. HaploCart maps reads in a pangenomic context using the VG graph as an underlying data structure.

(A): A variation graph with four embedded haplogroups. Each haplogroup sequence can be reconstructed by walking the appropriate nodes of the graph. Suppose we observe three DNA reads (top left). Read 1 is derived unambiguously from the purple haplogroup. Read 2 is equally likely to have come from the purple or red haplogroup. Read 3 could equiprobably have come from any of the four embedded haplogroups.

(B) Based on observation of the reads (R) we compute the posterior probability P(h_k | R) for each embedded haplogroup h_k. In this case the haplogroup which maximizes this quantity is the purple one, which becomes the haplogroup assignment for the sample.

(C) HaploCart (optionally) reports the proportion of posterior mass which falls on the assigned haplogroup (purple). It then goes up each ontological level of the tree, up to the mt-MRCA, reporting the proportion of posterior mass for all haplogroups within the relevant clade.

The HaploCart-1.0 server classifies mtDNA haplogroups from one or more input sequences in FASTA, FASTQ, or GAM format.

Instructions

Default usage

Select an input file in FASTA, FASTQ, or GAM format by clicking the "Choose file" button (the top arrow). Then click the green "Submit" button (the bottom arrow). Your job should then be submitted to the server.

When the job is done, the assigned haplogroup will appear in bold.

Report clade-level posterior probabilities

If you wish to see posterior probabilities for the clade-level phylogenetic placement of the sample, click the button "Compute posterior probabilities" at the bottom of the web page.

Please cite:

Rubin, J. D., Vogel, N. A., Gopalakrishnan, S., Sackett, P. W., & Renaud, G. (2023). HaploCart: Human mtDNA haplogroup classification using a pangenomic reference graph. PLOS Computational Biology, 19(6), e1011148.

Abstract

Current mitochondrial DNA (mtDNA) haplogroup classification tools map reads to a single reference genome, and are thereby susceptible to reference bias. To mitigate this issue, we present HaploCart, an mtDNA haplogroup classifier written in C++ which uses VG's pangenomic reference graph framework together with principles of Bayesian inference to confidently infer mtDNA haplogroups from NGS data. We demonstrate a highly significant improvement in the ability to infer mtDNA haplogroups from modern human samples at low depths of coverage, while providing a reliable measure of confidence in the resultant prediction. HaploCart is available both as a command-line tool and through a user-friendly web interface. The program accepts as input consensus FASTA, FASTQ, or GAM files, and outputs a TSV file with the haplogroup assignment. Optionally, an additional TSV file is provided with confidence estimates for each clade subtending the assigned haplogroup, up the tree to the mt-MRCA.

GETTING HELP

If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0) and the options you have selected. If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Technical Support: