Exon coverage¶

Example use case¶

The Exon coverage query reports sequence read coverage and additional attributes for each exon. One or more subjects can be selected, and results are reported separately for each subject. The analysis is based on a pileup from the original BAM files and is presented in terms of the following:

The average depth of coverage over each exon, on a per subject basis
The fraction of each exon with coverage below specific thresholds; specifically the fraction of each exon with coverage less than 5, less than 10, and so on up to 30

Exonic regions are defined by Ensembl or RefGenes (both are obtained from the VEP and Ensembl release shown in the version.txt file in the ref directory, or the Version Information link in CSA). Selection between the Ensembl and RefGene reference gene sets is determined by the gene reference set input argument.

Approximating the coverage per exon based on segment coverage¶

When BAM files for a sample are imported into the WuXi NextCODE system, a segment coverage file ( segment_cov.gord ) is generated with the total read depth for each base. This depth is binned to the nearest interval to reduce file size and improve query speed. Therefore, the results from this report builder are slightly approximated.

Limiting the input list of subjects to maintain a reasonable output report size¶

The output report can be quite large, with a total number of rows proportionate to the number of input PNs. In order to avoid “out of memory” issues on the local computer, the number of input subjects is limited to 20 or fewer. In order to analyze a larger number of subjects, the report can be run multiple times on batches of 20.

Description of the algorithm¶

Interpreting the output¶

Each row of output lists attributes and quality statistics per exon for each gene of interest in a single subject (PN). These values include the genomic coordinates of the exon, the exome size (length of the exon in bp), average depth over the exon, Ensembl gene and transcript annotation, the biotype of the exon, and a breakdown of the proportion of the exon with different degrees of coverage. The metrics used to flag regions of low coverage are lt5, lt10, lt15, lt20, lt25, and lt30. Each metric indicates a given coverage threshold, with values between between 0 and 1. For instance, a value of “0.05” for lt10 indicates that 5% of the exon is covered by fewer than 10 sequence reads, and therefore 95% of the exon has at least 10X coverage.

Input parameters¶

Basic parameters¶
Input	Description	Values
genome_range	All (whole genome) / selected region (from an open Genome Browswer window)	Inputs are defined by any open Genome Browser window
subjects	Subjects may be selected individually or as a list	A grid with the first column named “PN” must be open in order to select a list of subjects
gene_list	Filter to include only variants in the gene list	The list should be open in a grid tab containing a single column of gene symbols with a header labeled “gene_sybmol”
exon_reference	Choose whether to limit to coding exons	Select Coding only or Coding and UTR
gene_reference_set	Choose between Ensembl or RefGene reference sets	Select Ensembl or RefGene

Advanced parameters¶
Input	Description	Values
long_running_query	This query will run as a long running job if set to “Yes”	Yes or No
running_time	Time in hours until long running job will be cancelled	1, 2, 4, 8 hours
ref_path	Path to the reference data directory	Default: ref

Column descriptions¶

Output column descriptions¶
Group	Column name	Description
Basic	Chrom	Chromosome
	pn	The subject ID (identifier)
Gene	biotype	Biological class of gene as annotated by Ensembl
	stable_id	Ensembl stable ID for the gene
	symbol	Based on HGNC when it exists, otherwise it is the Ensembl internal alias
Other columns	avg_depth	The average read depth of the specific exon
	Exon_size	The number of base pairs comprising the given exon
	exon	Ensembl exon ID
	exonend	Coordinate for the last base of the exon
	exonstart	Coordinate for the first base of the exon
	lt10	The fraction of the exome with sequence read coverage less than 10X
	lt15	The fraction of the exome with sequence read coverage less than 15X
	lt20	The fraction of the exome with sequence read coverage less than 20X
	lt25	The fraction of the exome with sequence read coverage less than 25X
	lt30	The fraction of the exome with sequence read coverage less than 30X
	lt5	The fraction of the exome with sequence read coverage less than 5X
	set_Transcript_stable_id	Comma-separated list of Ensembl transcript IDs related to the specific exon
	strand	Plus/minus of the DNA strand encoding the exon sequence
	Transcript_biotype	Biological class of transcript