Gene coverage¶

Example use case¶

The Gene coverage report builder performs a sequence read coverage analysis. This analysis can be used as a quality control (QC) check for a set of samples, typically either across all genetic regions or for a specific list of genes. One or more subjects can be selected; results will be reported separately for each subject.

Description of the algorithm¶

Genes are defined by the Ensembl reference gene set. For the current Ensembl version, see the Version information link in CSA.

This report builder extracts coverage data for each subject (PN) from one of the following coverage files in the source/cov folder located in the project repository:

gene_cov_coding_seg.gord - for “Coding only”

gene_cov_all_seg.gord - for “Coding and UTR”

gene_cov_coding_seg.gord and gene_cov_all_seg.gord - for “Report both”

These files are used to calculate the average, maximum, and minimum sequence coverage of the reference set of genes for the selected samples.

The following gene annotations may be included as an additional option:

GO descriptions and IDs

OMIM descriptions and IDs

dbSourceMaxClinicalImpact (pathogenicity of a given gene as annotated by ClinVar, HGMD, and OMIM)

diseases (reported diseases with which the gene is associated as annotated by ClinVar, HGMD, and OMIM)

Interpreting the output¶

Each row of output lists attributes and quality statistics for each gene of interest in a single subject (PN). In each row, the total size of the exons covering the gene is reported in the exomeSize column followed by the average depth across the gene for the indicated PN.

The metrics used to flag regions of low coverage are lt5, lt10, lt15, lt20, lt25, and lt30. Each metric indicates a given coverage threshold, with a value between 0 and 1. For instance, a value of “0.05” for lt10 indicates that 5% of the gene is covered by fewer than 10 sequence reads, and therefore 95% of the gene has at least 10X coverage.

Additional columns are calculated for the gene across the set of selected samples. The avg_depth column for each sample for a given gene is compared to produce a minimum (“min_avg_depth”), maximum (“max_avg_depth”), and average (“avg_avg_depth”) depth value observed across the selected samples. These three values will be the same for all samples for a given gene.

Column descriptions¶

Report output columns and descriptions¶
Group	Column name	Description
Basic	Chrom
	PN	Subject ID
avg	avg_depth	For a set of selected samples, the average read depth for the gene across the set of samples
	depth	The average of the number of reads across the gene for the designated PN
gene	end	Gene end position
	start	Gene start position
	Symbol	HGNC gene symbol
Other columns	exomeSize	Number of bases of coding sequence or coding + UTR sequence (defined by the gene_end-gene_start)
	exontype	“Coding” (Coding only) or “all” (Coding and UTR)
	lt10	The fraction of the exome with sequence read coverage less than 10X
	lt15	The fraction of the exome with sequence read coverage less than 15X
	lt20	The fraction of the exome with sequence read coverage less than 20X
	lt25	The fraction of the exome with sequence read coverage less than 25X
	lt30	The fraction of the exome with sequence read coverage less than 30X
	lt5	The fraction of the exome with sequence read coverage less than 5X
	max_avg_depth	For a set of selected samples, the maximum average depth observed for the gene
	min_avg_depth	For a set of selected samples, the minimum average depth observed for the gene

Additional annotation columns¶

If the add_gene_annotations option is set to “yes”, the following annotation columns are added:

Additional annotation columns¶
Group	Column name	Description
gene	Aliases	List of gene aliases that correspond to the GENE_symbol
	Biotype	Biological class of gene as annotated by Ensembl
	cdsEnd	cDNA end position
	cdsStart	cDNA start position
	Description	Description of the gene, i.e. full gene name and gene name source (e.g., HGNC)
	diseases	Diseases known to be associated with the gene as annotated by ClinVar, HGMD, and OMIM
	gene_stable_id	Ensembl stable ID for the gene
	Paralogs	The paralogs of the given gene as annotated by Ensembl
	Strand	The strand of DNA (+/-) from which the gene is transcribed
GO	Descriptions	Gene Ontology category descriptions
	IDs	Gene Ontology identifiers
lis	dbSourceMaxClinImpact	List of clinical impact database source (ClinVar, HGMD, and OMIM)
	disease	List of the associated diseases annotated for the gene by ClinVar, HGMD, and OMIM
OMIM	Descriptions	OMIM disease descriptions for the gene
	IDs	The OMIM ID of the gene
Other columns	MaxClinImpact	Max clinical impact annotated by ClinVar, HGMD, and OMIM
	set_Pathway

Perspective views¶

If “Report both” is selected in the exon_reference field, the Coding only and Coding and UTR sequence information can be viewed in separate tables by selecting the Coding or AllExome perspectives, respectively.

Perspectives¶
Perspective	Description
AllExome	Coding and UTR (if selected)
Coding	Coding only (if selected)
Default View	Results displayed based on initial selection (“Coding only”, “Coding and UTR”, or “Report both”)