Carrier

The Carrier report builder analyzes variants in parents to identify potential disease-causing variants that are heterozygous in the parents and could be inherited by a child to produce a recessive disorder.

../_images/carrier.png

Carrier module in Sequence Miner

Example use case

The user wishes to screen whole exome sequencing (WES) reads from expectant parents to identify all sequence variants that may be inherited by their offspring. The user designates several filters to apply to the identified variants: a range of VEP consequences, maximum allele frequency, a target range of whole genome or exon-only, and sequence quality thresholds.

Description of the algorithm

The algorithm extracts all variants from the VEP whole exome file (source/anno/vep_v3-4-2/vep_single_wes.gord) that meet the user’s filtering criteria for max_consequence (maximum predicted consequence) and max_af (maximum allele frequency). Annotation from the clinical_variants.gorz file is joined to Known pathogenic variants in this subset in order to include the clinical database source (HGMD, ClinVar or OMIM in the Known_dbSource column), Gene_diseases, MaxClinImpact, and Var_diseases. Further annotation includes coverage-per-base and average coverage across a 10 base window (within +/- 5 bases adjacent to the variant).

Interpreting the output

The results are summarized in four different perspectives:

  • Default view - Results are presented in multiple columns of annotation including zygosity; VEP-predicted consequence, impact and score; genomic coordinates; descriptions; gene ontology ID; OMIM ID; etc.

  • InBothParents - This perspective presents only the subset of identified variants that meet the applied filters and that are present in both parents.

  • RecessiveCandidates - This perspective presents all variants that meet the applied filters and for which either or both parents are heterozygous.

  • RecessiveKnownVar - Recessive known variants are those present in either parent and that are reported in HGMD, ClinVar or OMIM as known Category 1 (pathogenic) variants.

Column descriptions

Report output columns and descriptions

Group

Column

Description

Basic

CHROM

POS

Reference

father

CallRatio

Proportion of reads containing the variation call; expected to be close to 0.5 for heterozygous calls and close to 1 for homozygous calls

Depth

The number of reads used in evaluating the corresponding call

extCovAvgDepth10

The coverage depth extracted for variants and averaged across a 10 base window centered on the variant, meaning within 5bp on either side of the variant

extCovDepth

The depth of coverage at the variant nucleotide(s)

FILTER

Quality parameter using the ratio between gt-quality and depth showing if the call is considered of LowQual quality or acceptable quality (PASS); this is a crude quality measure

formatZip

VCF genotype field

FS

Fisher’s exact test of read strand; if the reference reads are balanced between forward and reverse strands, then the alternate reads should be as well

genesum_het

The sum of the number of heterozygous variants for a given gene

genesum_hom

The sum of the number of homozygous variants for a given gene

genesum_miss

The number of variants in the gene that have coverage less than 8 and are therefore excluded from the analysis

GL_call

A statistical measure indicating the likelihood that the call is wrong; the scale has been converted to use only integer numbers - the higher the number, the less likely that the call is wrong

het

Boolean for hereozygosity (1) or absence of heterozygosity (0) of the variant defined by a CallCopies value = 1

hetORhom

“hom” for homozygosity and “het” for heterozygosity

hom

Boolean for homozygosity (1) or absence of homozygosity (0) of the variant defined by a CallCopies value = 2

miss

A Boolean column set to 1 if the variant has coverage less than 8

PN

Subject ID

vcf_alt

Alternate (variant) nucleotide sequence as reported in the vcf

vcf_pos

Coordinate (position) of the reference nucleotide sequence

vcf_ref

Reference nucleotide sequence

Gene

Aliases

Aliases of the given gene

biotype

Biological class of gene

cdsEnd

cDNA end position

cdsStart

cDNA start position

Description

Description of the gene, i.e. full gene name

gene_stable_id

Ensembl stable ID for the gene

Paralogs

Paralogs of the given gene

Pathways

Biological pathway in which a given gene is found

Strand

Strand of the gene

Symbol

Based on HGNC when it exists, otherwise it is the Ensembl internal alias

GO

Description

Gene ontology category description

IDs

Gene ontology identifiers

KNOWN

dbsource

Variants known in one of the following databases: HGMD, OMIM and ClinVar

Gene_diseases

Diseases known to be associated with the gene as annotated by the dbsource column

MaxClinImpact

The maximum clinical consequence of the variant (pathogenic, unknown etc) annotated by any one of HGMD, OMIM or ClinVar

var_diseases

Diseases known to be associated with the variant as annotated by the dbsource column

lis

dbSourceMaxClinImpact

disease

mother

Same columns as in father group

Same as described in father group

OMIM

Descriptions

OMIM disease descriptions for the gene

IDs

OMIM ID of the gene

VEP

Amino_Acids

Amino acid with and without mutation (only provided if the variation affects the protein-coding sequence), otherwise “.”

max_consequence

VEP predicted consequence for a variant producing the the greatest impact on the transcript

Max_Impact

Classification of the level of severity of the transcript consequence type assigned by VEP

Max_Score

Maximum score for the variant as observed in dbNSFP [Score=max ((1-Sift_score), Polyphen2_HDIV_score, Polyphen2_HVAR_score)]

Protein_Position

Position of the amino acid in the protein sequence (only if the variant falls within a coding sequence); a value is given for each corresponding transcipt specified in the CDS position field

Other columns

Allele

Alternative allele in VCF format

DIAG_AMCGcat

Categorization of the likelihood of pathogenicity of sequence variants according to the ACMG scheme; values range from cat1 (most severe) to cat4 (least severe); see AMCG categories below

MAX_AF

Maximum reported allele frequency across the population surveys from 1000GP3, EVS, EXAC, Kyoto, GONL`

MaxClinImpact

Max clinical impact from the clinImpact column

Recessive

Indicates whether the child has a high probability of being recessive for a variant or not (“true” or “false”); “true” means the father and mother are heterozygous for a variant or compound heterozygous for the gene

set_pathway

List of Ensembl pathways annotated for this gene from BIOCYC,HGMD,KEGG, Pathway Interaction Database (PID), REACTOME, and WikiPathways

AMCG categories

../_images/ACMG_categories.png

AMCG categories

Perspective views

The Default view perspective lists all variants. Additional perspectives focus on subsets of the columns in the default view.

Perspectives

Perspective

Description

Default view

Lists all variants

InBothParents

The variant is categorized as “recessive” (condition 1); the variant is heterozygous or homozygous in both father and mother (condition 2); the gene is related to known disease (condition 3); (Recessive = ‘true’ and FATHER_hetORhom != ‘’ and MOTHER_hetORhom != ‘’ and (len(KNOWN_Gene_diseases)>1 or len(OMIM_Descriptions)>1))

RecessiveCandidates

The variant is categorized as “recessive” (condition 1); the gene is related to a known disease (condition 2). This criterion is less stringent than InBothParents (Recessive = ‘true’ and (len(KNOWN_Gene_diseases)>1 or len(OMIM_Descriptions)>1))

RecessiveKnownVar

The variant is categorized as “recessive” (condition 1); the variant is a known pathogenic variant labeled ‘cat1’ (condition 2)

Drill-in reports

Drill-in reports

Drill-in

Description

GeneVariantsInParents

For the selected gene variant, this drill-in report lists all variants in the same gene that are carried by the parents