Gene analysis

Description of the report

The Gene analysis report builder provides a summary count of the variants identified in genes for a defined set of individuals (cases) versus the count for the same variants in all other individuals in the project. The summary counts are generated for the individual variant and for alternative variants within the same gene. This includes a count of individuals homozygous for any alternative variant in the gene.

By default, the statistics are calculated for the Ensembl reference gene set. Alternatively, the results may be restricted to a user-defined gene list.

../_images/geneAnalysis.png

Gene analysis in Sequence Miner

Example use case

For each gene in a given gene-list, the user may wish to do the following:

  • Identify unique loss-of-function variants in the “case” group, which are not found in other individuals in the project

  • Find genes with a higher occurrence of truncating variants in the case group compared to all other samples in the project

As an example, several subjects in the case group may be homozygous for a truncating variant and the user may wish to tally the number of other subjects in the project who carry this variant, are homozygous for the variant, and carry alternative variants in the same gene.

This report builder can also add clinical annotation to every variant in the report.

Description of the algorithm

All variants in the project are stored in a central repository on the server along with sample source and data quality information. This report builder query retrieves all variants for a given set of samples and optionally filters out low-quality, common, and low-impact variants. The filtered variants are tallied to generate a count of the number of samples (designated as “CASEs” for selected samples and “OTHERs” for all other subjects in the project) containing the same variant.

Additionally, these variants are mapped onto an Ensembl reference set of genes to tally the number of samples (“CASEs” or “OTHERs”) that share variants in the same gene. VEP predicted-functional-effect annotation is added to the variants and genes.

Interpreting the output

The resulting table tallies the counts of variants and genes with variants for the CASEs and OTHERs groups. These columns can be used to compare, for example, the number of cases homozygous for a truncating variant (using the CASEs_hom and VEP_Max_Impact columns) compared to the remaining samples in the project (using the OTHERs_hom column).

There are several columns to consider for an overview of the analysis results.

Column descriptions

Report output columns and descriptions

Group

Column

Description

Basic

Call

Chrom

POS

Reference

CASEs

hom

The number of designated cases homozygous for the variant

hom_InGene

The number of designated cases with a homozygous variant in the indicated gene

var

The number of designated cases containing the variant (includes heterozygous and homozygous)

var_InGene

The number of designated cases with a variant in the gene (includes heterozygous and homozygous)

OTHERs

hom

The number of all other subjects (subtract selected cases from the subjects in the project) homozygous for this variant

hom_InGene

The aggregated number of all other subjects (subtract selected cases from the subjects in the project) with a homozygous variant in the gene

var

The number of other subjects (subtract selected cases from the subjects in the project) containing this variant (includes heterozygous and homozygous)

var_InGene

The aggregated number of variants from all other subjects (subtract selected cases from the subjects in the project) in the gene

VEP

Amino_Acids

The amino acid with and without variant (only provided if the variant affects the protein-coding sequence), otherwise “.”

CDS_position

Position of the base pair in the coding sequence; a value is given for each transcript

max_af

Maximum reported allele frequency across the population surveys from 1000GP3, EVS, EXAC, Kyoto, GONL

max_consequence

Classification of the level of severity of the transcript consequence type assigned by VEP

Max_Impact

VEP predicted consequence for a variant producing the the greatest impact on the transcript

Protein_Position

Position of the amino acid in the protein sequence (only if the variant falls within a coding sequence); a value is given for each corresponding transcipt specified in the CDS position field

Refgene

The accession number from NCBI of the affected transcripts

Other columns

GENE_SYMBOL

Based on HGNC when it exists, otherwise it is the Ensembl internal alias

The user may compare the CASEs_var and OTHERs_var columns first to evaluate how common a specific identified variant is among cases.

The user may focus on CASEs_hom and OTHERs_hom if interested in homozygous carriers, or CASEs_var_InGene and OTHERs_var_InGene for compound heterozygous carriers.

Perspective views

The Default view perspective lists all variants identified in the CASEs. Additional Perspectives subtabs focus on subsets of columns in the default view.

../_images/geneAnalysis_perspectives.png

Perspectives in the Gene analysis report builder

Perspectives in the gene analysis report builder

Perspective

Description

Default view

Lists all the variants identified in the CASEs.

GenedeNovo

Lists genes (by variant) that contain one variant in the CASEs group but do not contain variants from all other subjects (CASEs_var_InGene = 1 and OTHERs_var_InGene = 0).

GeneInAllMembers

Lists genes (by variant) in which all cases share a variant (CASEs_var_InGene = total number of selected samples).

GeneInSomeMembers

Lists genes (by variant) in which any of the cases selected contains a variant (CASEs_var_InGene > 0).

GeneOnlyInMembers

Lists genes (by variant) uniquely containing variants from the cases but not any other subjects in the project (CASEs_var_InGene >= 1 and OTHERs_var_InGene = 0)

VardeNovo

List variants that are uniquely present in one of the selected case (CASEs_var = 1 and OTHERs_var = 0).

VarInAllMembers

List variants that exist in all the cases selected (CASEs_var = total number of selected samples).

VarInSomeMembers

List variants that exist in any one of the cases (CASEs_var > 0).

VarOnlyInMembers

List variants that exist uniquely in selected cases but not in other samples (CASEs_var >= 1 and OTHERs_var = 0).