Background
HaploCart performs maximum likelihood estimation to predict the mitochondrial haplogroup for reads originating from a modern human sample.
The program also optionally provides confidence estimation of the phylogenetic placement of the sample.
HaploCart maps reads in a pangenomic context using the VG graph as an underlying data structure.
(A): A variation graph with four embedded haplogroups. Each haplogroup sequence can be reconstructed by walking the appropriate nodes of the graph.
Suppose we observe three DNA reads (top left). Read 1 is derived unambiguously from the purple haplogroup.
Read 2 is equally likely to have come from the purple or red haplogroup. Read 3 could equiprobably have come from any of the four embedded haplogroups.
(B) Based on observation of the reads (R) we compute the posterior probability P(hk | R) for each embedded haplogroup hk.
In this case the haplogroup which maximizes this quantity is the purple one, which becomes the haplogroup assignment for the sample.
(C) HaploCart (optionally) reports the proportion of posterior mass which falls on the assigned haplogroup (purple).
It then goes up each ontological level of the tree, up to the mt-MRCA, reporting the proportion of posterior mass for all haplogroups within the relevant clade.
Please cite:
Rubin, J., Vogel, N., Gopalakrishnan, S., Sackett, P., & Renaud, G. (2022). HaploCart: Human mtDNA Haplogroup Classification Using a Pangenomic Reference Graph. bioRxiv.
Abstract
Current mitochondrial DNA (mtDNA) haplogroup classification tools map reads to a single reference genome, and are thereby susceptible to reference bias.
To mitigate this issue, we present HaploCart, an mtDNA haplogroup classifier written in C++ which uses VG's pangenomic reference graph framework together
with principles of Bayesian inference to confidently infer mtDNA haplogroups from NGS data.
We demonstrate a highly significant improvement in the ability to infer mtDNA haplogroups from modern human samples at low depths of coverage,
while providing a reliable measure of confidence in the resultant prediction. HaploCart is available both as a command-line tool and through a user-friendly
web interface. The program accepts as input consensus FASTA, FASTQ, or GAM files, and outputs a TSV file with the haplogroup assignment.
Optionally, an additional TSV file is provided with confidence estimates for each clade subtending the assigned haplogroup, up the tree to the mt-MRCA.