Tumor mutation analysis

The Tumor mutation mnalysis (TMA) report builder filters, annotates, and summarizes variant data from tumor samples or from paired tumor versus normal sample pairs. TMA then adds annotation and applies user-defined filters to each variant. Variant annotation includes Variant Effect Prediction (VEP) impact and score, COSMIC annotation, relevant FDA approved drugs, and other publicly available data.

../_images/tumorMutationAnalysis.png

Tumor Mutation Analysis (TMA) in Sequence Miner

Example use case

The user has normal and tumor matched sample pairs and wishes to identify all likely somatic variants and to determine which variants are actionable and occur in known cancer genes.

Description of the algorithm

The processing pipeline utilizes the calls from each algorithm. The TMA applies user-defined filters to these lists of variants and adds several annotations.

Interpreting the output

The output can be displayed in Table view or Record view. The Table view of the output contains more than 90 columns. In general, these columns fall into three major categories:

  • Basic information (Chrom, Pos, Reference, Call, PN)

  • Annotations generated by the variant calling algorithms

  • Annotations from public data sources including COSMIC and GO annotations, actionable mutations and pathways, and clinical annotations

The Record view displays the variants in a list format and summarizes the most relevant information about the variant and key annotations:

  • Row 1: Chromosome position, gene, links to databases, transcript ID

  • Row 2: Nucleotide change, amino acid change and position, VEP maximum consequence, VEP maximum impact, maximum allele frequency, conservation score (maximum from SIFT and Polyphen-2)

  • Row 3: Callers for the variant, reads/depth in the tumor and germline samples

  • Row 4: COSMIC annotations

../_images/tmaRecordView.png

Record view

To toggle from Table view to Record view, click the list icon.

Column descriptions

Basic columns and descriptions

Group

Column

Description

Basic

Call

Sequence (variant) called, based on the reference sequence at the designated position

Chrom

Variant chromosome location

PN

Patient number (identifier)

Pos

Variant start bp position

Reference

Nucleotide from the reference build at the base pair position in the Pos column

COMM columns and descriptions

Group

Column

Description

COMM

CLINICAL_SIGNIFICANCE

The clinical significance (e.g., pathogenic, benign, unknown significance, drug-response, risk factor, etc.) of the variant as annotated (commented) by users; if the same variant has several comments, this cell will contain a set of values

MODE_OF_INHERITANCE

The user-annotated (commented) mode of inheritance of the variant; if the same variant has been commented several times, this cell will contain a set of values

TEXT

The description (comment) component for the user annotation of the variant

COSMIC columns and descriptions

Group

Column

Description

COSMIC

count_lociInGene

Count of COSMIC variants in this gene

count_primarysitesInGene

Number of primary site annotations in this gene

count_vars

Count of occurences of variants in COSMIC

count_varsInGene

Count of occurences of variants in this gene in COSMIC

HGVSp

Variant HGVSp notation using the preferred transcript; present only if the gene has a preferred transcript and the variant is found in this transcript

IDs

All COSMIC IDs for the variant

primarysites

List of primary site annotations for this variant

primarysitesInGene

List of primary site annotations in this gene

pscount

Number of primary site annotations for this variant

Drug columns and descriptions

Group

Column

Description

Drug

Gene

Cancer drugs targeting this gene

Gene_ApprovalStatus

Approval status of the cancer drug

Mechanism_of_Action

Mechanism of action of the cancer drug

Pathway

Cancer drugs targeting a pathway of which this gene is a member

Pathway_ApprovalStatus

Approval status of the cancer drug

SourcePathway

The pathway on which the cancer drug acts

Gene columns and descriptions

Group

Column

Description

Gene

ActionableDrugClasses

Drug classes associated with the gene encoding the variant

ActionableMutations

Clinically actionable mutations within genes

ActionablePathways

Clinically actionable pathways associated with genes

ActionableTumors

Clinically actionable tumors associated with genes

Aliases

List of gene aliases that correspond to the GENE_symbol

listcount

A count of the number of Gene_lists

lists

Gene panels or Gene lists

Paralogs

The paralogs of a given gene

symbol

Based on HGNC when it exists, otherwise it is the Ensembl internal alias

Germ columns and descriptions

Group

Column

Description

Germ

depth

The number of reads used in evaluating the corresponding call

depth_hq

The number of high-quality reads used in evaluating the corresponding call

reads

The number of reads containing the variant call

reads_hq

The number of high quality reads containing the variant call

GO columns and descriptions

Group

Column

Description

GO

Descriptions

Gene Ontology category descriptions

IDs

Gene Ontology identifiers

KNOWN columns and descriptions

Group

Column

Description

KNOWN

dbSource

Variants known in databases like HGMD, OMIM and ClinVar

Gene_diseases

Diseases known to be associated with the gene as annotated by HGMD, ClinVar, and OMIM

lis_dbSourceMaxClinImpact

A comma-delimited list of the KNOWN_dbsource and the known clinical impact of the gene (pathogenic/nonpathogenic, etc.)

lis_disease

A comma-delimited list of associated KNOWN_diseases

MaxClinImpact

The known clinical impact of the gene (pathogenic/nonpathogenic, etc.)

var_diseases

Diseases known to be associated with the variant as annotated by HGMD, ClinVar, and OMIM

Mutect2 columns and descriptions

Group

Column

Description

Mutect2

failure_reasons

The reason MuTect judges a variant to be low confidence (REJECT)

FORMAT

Mutect2 Genotype Format Fields

INFO:

Mutect2 Genotype Format Info

judgement

MuTect judgement of site as somatic or not (KEEP or REJECT (not enough evidence or artifact))

NORMAL

Genotype data from the normal sample

Normal_LOD

Log odds ratio (LOD) score for the normal sample

TUMOR

Log odds ratio (LOD) score for the normal sample

Tumor_LOD

Log odds ratio (LOD) score for tumor sample

OMIM columns and descriptions

Group

Column

Description

OMIM

Descriptions

OMIM disease descriptions for the gene

IDs

The OMIM ID of the gene

Tumor columns and descriptions

Group

Column

Description

Tumor

High-quality reads are those with MAPQ >= 20 (MAPQ = -10*log10 (probability mapping position is incorrect))

depth_all

The number of reads used in evaluating the corresponding call

depth_hq

The number of high-quality reads used in evaluating the corresponding call

reads_all

The number of reads containing the variant call

reads_hq

The number of high quality reads containing the variant call

varscan2 columns and descriptions

Group

Column

Description

varscan2

FILTER

Judgement of the variant

FORMAT

VarScan2 Genotype Format Fields

HighConf

VarScan2 further classify the somatic mutations as high-confidence or low-confidence

INFO

VarScan2 Genotype Format Info

NORMAL

Genotype data from the normal sample

somatic_p_value

Variant p-value for Somatic/LOH events

somatic_p_score

Score based on the p-value

somatic_status

Somatic status call (Germline, Somatic, LOH, or Unknown)

TUMOR

Genotype data from the tumor sample

variant_p_value

Variant p-value for germline events

Variant columns and descriptions

Group

Column

Description

Variant

Caller

Variant caller used to judge the variant

Caller_type

Variant caller type: unpaired or paired

PASS_count

Number of variant callers that approve or “pass” the variant

Present_count

Number of variant callers that judge the variant

Venn A columns and descriptions

Group

Column

Description

Venn A

In”Variant caller”

A Boolean value if the variant has been rejected (0) or passed by the variant caller

Venn B columns and descriptions

Group

Column

Description

Venn B

inGenelist

A Boolean value if the variant is absent (0) or present (1) in a cancer gene list.

WithPrimeSite

Variant occurs in a tumor primary site as reported by COSMIC (1) or not (0)

VEP columns and descriptions

Group

Column

Description

VEP

Amino_Acids

The amino acid change given as reference AA/variant AA; if the variation does not affect the protein-coding sequence, then “.”

Biotype

Biological class of transcript or regulatory feature

CDS_position:

Base pair position of the variant in the coding sequence; a value is given for each transcript

HGVSp

Variant notation with the transcript and amino acid change; there is an entry for each transcript

HGVSp_simple

Variant notation with the amino acid change; there is an entry for each transcript

Max_Af

Maximum reported allele frequency across the population surveys from 1000GP3, EVS, EXAC, Kyoto, GONL, and Iceland (Variant View-Frequencies)

max_consequence

Variant classes (high, moderate, low, and/or lowest impact on the gene product)

Max_Impact

Classification of the level of severity of the transcript consequence type assigned by VEP

Max_Score

Maximum score for the variant as observed in dbNSFP [Score=max ((1-Sift_score), Polyphen2_HDIV_score, Polyphen2_HVAR_score)]

Protein_Position

Position of the amino acid in the protein sequence (only if the variant falls within a coding sequence); a value is given for each corresponding transcript specified in the CDS position field

Refgene

The accession number from NCBI of the affected transcripts

Transcript_count:

Number of different transcripts in which the variant is found

Other columns

Group

Column

Description

Other columns

Annotation_Gene

Cancer annotation for this gene

CallRatio

Proportion of reads containing the variant call; expected to be approximately 0.5 for heterozygous calls and close to 1 for homozygous calls

candidate_gene

Candidate genes entered by the user

Category

Denotes if variant is present in an oncogene or tumor suppressor

dbNSFP_Interpro_domain

The annotation from the dbNSF database, the domain of protein

DIAG_ACMGcat

Categorization of the sequence variants according to the ACMG scheme

ReadDiff

Difference in call ratio between tumor and normal samples

Entrez-ID

Entrez gene nomenclature

ResistancexSensitivity_Variants

Resistant to sensitivity information related to the variant

SubjectId

Patient ID

Perspective views

The Default view perspective lists all variants depending on the variantFilter criteria selected in the input parameters. Additional perspectives focus on subsets of the columns in the default view.

Perspectives

Perspective

Description

Candidate genes

Somatic variants present in the user’s candidate genes report file (if selected from the user in the input parameters)

Default view

Lists all variants depending on the variantFilter criteria selected in the input parameters, e.g., if the user selects “Filtered variants”, this perspective will list the same number of variants shown in the “Filtered” perspective

Filtered

Somatic variants that meet the quality criteria (present in at least 2 high-quality reads and ratio of hq reads to depth in tumor is greater than 0.01); high-quality reads are those with MAPQ ≥ 20 (MAPQ = -10*log10 (probability mapping position is incorrect))

Filtered cancer genes

Somatic variants that meet the quality criteria (see Filtered perspective above) and are present in cancer-associated genes (from commercial gene panels); the list of commercial gene panels is listed in the ref/cancer/cancer_commercial.map file

Filtered candidate genes

Somatic variants that meet the quality criteria (see Filtered perspective above) and are present in the user’s candidate genes report file (if selected from the user in the input parameters)

The user can then filter on additional annotation (e.g., COSMIC annotation, known associated gene disease, and actionable mutations/pathways) to further narrow the list of candidate genes and gene variants. These annotations can be found in the right-hand pane, which lists all of the columns, grouped by category.