Internal annotations

The Internal annotations report builder compiles and annotates a list of variants or genes with variants in the project that have been commented on in Clinical Sequence Analyzer (CSA) or Sequence Miner (SM). Annotation is based on the selected reportMode (byGene or byVariant) and whether the user selects “yes” or “no” to includePhenotypes.

../_images/internalAnnotations.png

Internal Annotations module in Sequence Miner

Example use case

The user wishes to compile a list of all variants in the project that are related to cardiomyopathy and that have been annotated (commented on) by all users in the project. The user runs the Comments report builder setting reportMode to “byVariant” and selects “yes” to includePhenotypes. The user then filters the description column on cardiomyopathy.

Description of the algorithm

Variants or genes with variants that have been commented on internally (by users in the project) are retrieved and joined with corresponding annotation tables based on the selected report format.

Interpreting the output

Column headers for annotations are described in the tables below. Annotation from numerous sources are provided. These include VEP, allele frequency and ACMG Category for variants and GO, OMIM, ClinVar and HGMD for genes.

Column descriptions

Columns in both reports (byGene and byVariant)

Column

Description

Chrom

The chromosome of the variant represented as chr1, chr2, …, chr22, chrXY, chrX, chrY, chrM

gene_symbol

Based on HGNC when it exists, otherwise it is the Ensembl internal alias

set_clinical_significance

Clinical significance (e.g., pathogenic, benign, unknown significance, drug-response, risk factor, etc.) of the variant as annotated (commented) by users; if the same variant has multiple associated comments, this cell will contain a set of values

set_mode_of_inheritance

User annotated (commented) mode of inheritance of the variant; if the same variant has multiple associated comments, this cell will contain a set of values

code

HPO or OMIM taxonomy code for designated phenotype (shown only when includePhenotypes is set to “yes”)

description

HPO or OMIM description for designated phenotype (shown only when includePhenotypes is set to “yes”)

Columns specific to byGene reports

Column

Description

gene_start

The start base-pair position of the gene (zero based, i.e., the position of the base-pair before the first base-pair in the gene)

gene_end

Coordinate of the last nucleotide on the gene

GENE_Aliases

The aliases of the given gene

GENE_Paralogs

The paralogs of the given gene

GO_IDs

Gene Ontology identifiers

GO_Descriptions

Gene Ontology category descriptions

KNOWN_Gene_diseases

Diseases known to be associated with the gene as annotated by HGMD, ClinVar, and OMIM

KNOWN_lis_disease

Diseases known to be associated with the gene as annotated by HGMD, ClinVar, and OMIM

KNOWN_MaxClinImpact

The maximum clinical consequence of the variant (pathogenic, unknown etc) annotated by any one of HGMD, OMIM or ClinVar

OMIM_IDs

OMIM ID of the gene

OMIM_Descriptions

OMIM disease descriptions for the gene

dis_PN

Number of carriers of this variant with comments

Columns specific to byVariant reports

Column

Description

pos

The (first) basepair position of the sequence variant, e.g., the position of the first nucleotide in the Reference column

ref

Reference allele in vcf format

alt

Alternative allele in vcf format

VEP_Amino_Acids

The amino acid with and without variant (only provided if the variant affects the protein-coding sequence), otherwise “.”

VEP_Biotype

Biological class of transcript or regulatory feature

VEP_CDS_position

Position of the base pair in the coding sequence; a value is given for each transcript

VEP_max_consequence

VEP predicted consequence for a variant producing the the greatest impact on the transcript

VEP_Max_Impact

Classification of the level of severity of the transcript consequence type assigned by VEP

VEP_Max_Score

Maximum score for the variant as observed in dbNSFP [Score=max ((1-Sift_score), Polyphen2_HDIV_score, Polyphen2_HVAR_score)]

VEP_Protein_Position

Position of the amino acid in the protein sequence (only if the variant falls within a coding sequence); a value is given for each corresponding transcript specified in the CDS position field

VEP_Refgene

The Accession number from NCBI of the affected transcripts

VEP_Transcript_count

Number of different transcripts in which the variant is found

MAX_AF

Maximum reported allele frequency (1000GP3, EVS, EXAC, Kyoto, GONL)

set_pn

IDs of a variant carriers

DIAG_ACMGcat

Categorization of the likelihood pathogenicity of sequence variants according to the ACMG scheme; values range from cat1 (most severe) to cat4 (least severe). See table below for ACMG categorization descriptions.

../_images/ACMG_categories.png