DTU Health Tech

Department of Health Technology

We recently made large changes to the webserver infrastructure, so you might experience errors. Please report issues to health-master@dtu.dk

HaploCart - 1.0

Human mtDNA haplogroup classification using a pangenome reference graph

Submit data

Upload your file in FASTA, FASTQ or GAM format


FASTQ files may be single, paired or interleaved pairs. FASTA files may contain multiple samples. Either may be compressed. The file size limit is 50 megabytes(Mb).


Upload file:



Upload second file (ONLY for paired-end FASTQ):


Background error probability (only for FASTA input): 

Compute posterior probabilities:  Yes   No


Background

HaploCart performs maximum likelihood estimation to predict the mitochondrial haplogroup for reads originating from a modern human sample. The program also optionally provides confidence estimation of the phylogenetic placement of the sample. HaploCart maps reads in a pangenomic context using the VG graph as an underlying data structure.

myPic


(A): A variation graph with four embedded haplogroups. Each haplogroup sequence can be reconstructed by walking the appropriate nodes of the graph. Suppose we observe three DNA reads (top left). Read 1 is derived unambiguously from the purple haplogroup. Read 2 is equally likely to have come from the purple or red haplogroup. Read 3 could equiprobably have come from any of the four embedded haplogroups.

(B) Based on observation of the reads (R) we compute the posterior probability P(hk | R) for each embedded haplogroup hk. In this case the haplogroup which maximizes this quantity is the purple one, which becomes the haplogroup assignment for the sample.

(C) HaploCart (optionally) reports the proportion of posterior mass which falls on the assigned haplogroup (purple). It then goes up each ontological level of the tree, up to the mt-MRCA, reporting the proportion of posterior mass for all haplogroups within the relevant clade.

The HaploCart-1.0 server classifies mtDNA haplogroups from one or more input sequences in FASTA, FASTQ, or GAM format.


Instructions


Default usage

Select an input file in FASTA, FASTQ, or GAM format by clicking the "Choose file" button (the top arrow). Then click the green "Submit" button (the bottom arrow). Your job should then be submitted to the server.

When the job is done, the assigned haplogroup will appear in bold.

Report clade-level posterior probabilities

If you wish to see posterior probabilities for the clade-level phylogenetic placement of the sample, click the button "Compute posterior probabilities" at the bottom of the web page.

Please cite:

Rubin, J., Vogel, N., Gopalakrishnan, S., Sackett, P., & Renaud, G. (2022). HaploCart: Human mtDNA Haplogroup Classification Using a Pangenomic Reference Graph. bioRxiv.

Abstract

Current mitochondrial DNA (mtDNA) haplogroup classification tools map reads to a single reference genome, and are thereby susceptible to reference bias. To mitigate this issue, we present HaploCart, an mtDNA haplogroup classifier written in C++ which uses VG's pangenomic reference graph framework together with principles of Bayesian inference to confidently infer mtDNA haplogroups from NGS data. We demonstrate a highly significant improvement in the ability to infer mtDNA haplogroups from modern human samples at low depths of coverage, while providing a reliable measure of confidence in the resultant prediction. HaploCart is available both as a command-line tool and through a user-friendly web interface. The program accepts as input consensus FASTA, FASTQ, or GAM files, and outputs a TSV file with the haplogroup assignment. Optionally, an additional TSV file is provided with confidence estimates for each clade subtending the assigned haplogroup, up the tree to the mt-MRCA.


GETTING HELP

If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0). If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Technical Support: