Transcripts¶

The Transcripts report builder comprehensively annotates a given variant to assist the user in evaluating the effect of the variant. Annotation includes consequence and max consequence reported for the variant, based on transcripts drawn from the source/anno/vep_3-4-2/vep_multi_wgs.gord file (in the Sequence Miner File Explorer). The vep_multi_wgs.gord file lists all reported transcripts for the following:

A single variant at a given locus
Multiple variants at the same locus
Nearby and overlapping upstream and downstream genes/pseudogenes

../_images/transcripts.png — Transcripts module in Sequence Miner¶

Example use case¶

The user has identified a variant in a case and wishes to annotate the variant with all known transcripts along with the associated variant effect consequences and the maximum observed consequence among all transcripts carrying the variant.

Description of the algorithm¶

This query annotates the input variant with structural and functional features of the sequence (e.g., protein domain information). The annotation also includes IDs for transcripts carrying the variant or near the variant (in these cases, the distance column lists the number of bases between the variant and the transcript) along with the associated transcript-annotation from the Variant Effect Predictor (VEP) algorithm.

Interpreting the output¶

Column descriptions¶

Report output columns and descriptions¶
Group	Column	Description
Basic	Call	The actual called sequence (variant), found by replacing a part of the reference sequence, denoted by Pos and Reference, with the sequence in the Call column
	chrom	The chromosome of the variant, represented as chr1, chr2, …, chr22, chrXY, chrX, chrY, chrM
	Pos	The (first) base pair position of the sequence variant, e.g., the position of the first nucleotide in the Reference column
	Reference	Sequence from the reference build, the first base starting at the base pair position in the Pos column
Protein	position	Relative position of the amino acid in the protein
	Size	Additional annotation using Ensembl lookup, based on protein (ENSP)
VEP	gene	Ensembl stable ID of the affected gene
	impact
Other columns	Amino_acids	Reference amino acid/substituted amino acid (in the case of a missense variant)
	Biotype	Biological class of transcript or regulatory feature
	CallRatio	Proportion of reads containing the variant call; expected to be close to 0.5 for heterozygous calls and close to 1 for homozygous calls
	cDNA_position	Relative position of the base pair in the cDNA sequence
	CDS_position	Relative position of the base pair in the coding sequence
	Codons	The alternative codons with the variant base in uppercase
	Consequence	Consequence type of this variant
	DISTANCE	Shortest distance from variant to transcript (applies to up- and downstream variants)
	Depth	The number of reads used in evaluating the corresponding call
	DOMAINS	The source and identifer of any overlapping protein domains
	ENSP	The Ensembl protein identifier of the affected transcript
	Existing_variation	rs name of SNP if it exists
	EXON	Ensembl (or Refgene) exon ID
	Feature	Ensembl stable ID of feature
	Feature_type	Type of feature, currently one of “Transcript”, “RegulatoryFeature”, or “MotifFeature”
	FILTER	Quality parameter using the ratio between gt-quality and depth showing if the call is considered of “LowQual” quality (not useable) or “PASS”; this is still a very crude quality measure
	FS	Fisher’s exact test of read strand; if the reference reads are balanced between forward and reverse strands, then the alternate reads should be as well
	formatZip	VCF genotype field
	Gene_Symbol	Based on HGNC when it exists, otherwise it is the Ensembl internal alias
	GL_Call	A statistical measure indicating the likelihood that the call is wrong; the scale has been converted to using only integers - the higher the number, the less likely it is that the call is wrong
	GMAF	MAF (minor allele frequency) of existing variant in Genomes 1000 Phase I
	HGNC	The HGNC gene identifier
	HGVSc	HGVSc coding sequence name
	HGVSp	HGVSp protein sequence name
	INTRON	The intron number (out of total number)
	Max_Consequence	The consequence reported for this variant having the maximum impact
	Refgene	Accession number from NCBI using lookup into Ensembl 69 using feature

Perspective views¶

Perspectives subtabs focus on subsets of the columns in the Default view.

Perspectives¶
Perspective	Description
Basic
Default view	Shows all columns