DTU Health Tech

Department of Health Technology

We recently made large changes to the webserver infrastructure, so you might experience errors. Please report issues to health-master@dtu.dk

TumorTracer - 1.1

Tissue of origin of tumors from genomics data

TumorTracer was trained to infer the tissue of origin of a tumor, based on somatic mutation and (when available) copy number data.

TumorTracer 1.1 was developed using the COSMIC Whole Genomes database, version 68.

TumorTracer 1.1 differs from version 1.0 only by the user input format; the underlying prediction algorithm is the same.

Currently, TumorTracer covers the following primary sites:
    breast, endometrium, kidney, large intestine, liver, lung, ovary, pancreas, prostate and skin.

When copy number aberration data is available, performance of TumorTracer 1.1 is increased, but only the following primary sites are included:
    breast, endometrium, kidney, large intestine, lung and ovary.

Submission


Sample ID (optional): No spaces allowed

MUTATIONS

Submit VCF file:

Genome coordinates:
   GRCh37 (default)
   GRCh38

COPY NUMBER ABERRATIONS (optional)

Submit SCNA file:

Define cutoffs:
    Cutoff for deletions
    Cutoff for amplifications

 I have read and agree to the Terms and Conditions for use of TumorTracer

Restrictions:
At most 20,000 entries in each of the input files.


CITATIONS

Marquard AM, Birkbak NJ, Thomas CE, Favero F, Krzystanek M, Lefebvre C, Ferté C, Jamal-Hanjani M, Wilson GA, Shafi S, Swanton C, Andre C, Szallasi Z, Eklund AC.
TumorTracer: a method to identify the tissue of origin from the somatic mutations of a tumor specimen.
BMC Medical Genomics 2015 Oct 1; 8:58.
Link to journal / PubMed

Instructions


1. Upload file with mutations

  • VCF file:
    Upload a VCF file with somatic mutations in the tumor, including point mutations and indels.
    Genomic positions must by default be in hg19/GRCh37 coordinates, unless otherwise specified
    See example.

  • Genome version:
    Select the genomic coordinates used in your input files. Default is hg19/GRCh37.

2. Upload file with copy number aberrations (optional)

This part is optional. If this data is available, the full version of TumorTracer can be used, with more accurate predictions as a result.
  • SEG file:
    Upload a segmentation file with somatic CNAs. The last column should contain a numeric representation of the copy number.
    Genomic positions must by default be in hg19/GRCh37 coordinates, unless otherwise specified.
    See example.

  • Define cutoffs:
    Define cutoff values that will be applied to the numeric values in the segmentation file, and used to determine regions with deletions and amplifications. Note that the given cutoffs are exclusive.

3. Submit the job

Click on the "Submit" button. The status of your job (either 'queued' or 'running') will be displayed and constantly updated until it terminates and the server output appears in the browser window.

At any time during the wait you may enter your e-mail address and simply leave the window. Your job will continue; you will be notified by e-mail when it has terminated. The e-mail message will contain the URL under which the results are stored; they will remain on the server for 24 hours for you to collect them.

Output format


DESCRIPTION

The top panel shows the output of TumorTracer when applied to the current tumor. The lower panels are based on test data, only.
  • Top panel: The classification score given to the current tumor for each tissue is shown. The confidence score is defined as the difference between the top two classification scores.
  • Lower left: The distribution of confidence scores among tumors in the test data (obtained during cross-validation). Only the tumors classified to the same primary site as the current tumor are shown.
    The confidence score of the current tumor is marked with a black triangle.
  • Lower right: The actual primary sites of those tumors in the test set that were classified to the same primary site, AND with a similar confidence score, are shown.

EXAMPLE OUTPUT







References


Marquard AM, Birkbak NJ, Thomas CE, Favero F, Krzystanek M, Lefebvre C, Ferté C, Jamal-Hanjani M, Wilson GA, Shafi S, Swanton C, Andre C, Szallasi Z, Eklund AC.
TumorTracer: a method to identify the tissue of origin from the somatic mutations of a tumor specimen.
BMC Medical Genomics 2015 Oct 1; 8:58.
Link to journal / PubMed

Abstract

Background. A substantial proportion of cancer cases present with a metastatic tumor and require further testing to determine the primary site; many of these are never fully diagnosed and remain cancer of unknown primary origin (CUP). It has been previously demonstrated that the somatic point mutations detected in a tumor can be used to identify its site of origin with limited accuracy. We hypothesized that higher accuracy could be achieved by a classification algorithm based on the following feature sets: 1) the number of nonsynonymous point mutations in a set of 232 specific cancer-associated genes, 2) frequencies of the 96 classes of single-nucleotide substitution determined by the flanking bases, and 3) copy number profiles, if available.

Methods. We used publicly available somatic mutation data from the COSMIC database to train random forest classifiers to distinguish among those tissues of origin for which sufficient data was available. We selected feature sets using cross-validation and then derived two final classifiers (with or without copy number profiles) using 80 % of the available tumors. We evaluated the accuracy using the remaining 20 %. For further validation, we assessed accuracy of the without-copy-number classifier on three independent data sets: 1669 newly available public tumors of various types, a cohort of 91 breast metastases, and a set of 24 specimens from 9 lung cancer patients subjected to multiregion sequencing.

Results. The cross-validation accuracy was highest when all three types of information were used. On the left-out COSMIC data not used for training, we achieved a classification accuracy of 85 % across 6 primary sites (with copy numbers), and 69 % across 10 primary sites (without copy numbers). Importantly, a derived confidence score could distinguish tumors that could be identified with 95 % accuracy (32 %/75 % of tumors with/without copy numbers) from those that were less certain. Accuracy in the independent data sets was 46 %, 53 % and 89 % respectively, similar to the accuracy expected from the training data.

Conclusions. Identification of primary site from point mutation and/or copy number data may be accurate enough to aid clinical diagnosis of cancers of unknown primary origin.



GETTING HELP

If you need help regarding technical issues (e.g. errors or missing results) contact Technical Support. Please include the name of the service and version (e.g. NetPhos-4.0). If the error occurs after the job has started running, please include the JOB ID (the long code that you see while the job is running).

If you have scientific questions (e.g. how the method works or how to interpret results), contact Correspondence.

Correspondence: Technical Support: