Risk SNPs¶

The Risk SNPs report builder annotates the variants from a PN or list of PNs that match a set of user-defined rsIDs from the dbSNP reference file.

Example use case¶

The user has a list of rsIDs for SNPs of interest (from dbSNP) and wishes to screen the subject(s) for those variants and to determine the quality of the variant if present.

Description of the algorithm¶

rsIDs matching the user-designated rsID_grid are extracted from the dbSNP reference file (ref/dbsnp/dbsnp_anno.gorz). This table is expanded by creating one row per subject (PN) per rsID.

Each rsID/SNP coordinate for each subject is then joined with the corresponding coordinate and PN (row) in the source/var/wgs_varcalls.gord file.

Allele calls (e.g., “A/T”) for each rsID for each PN are reported in the Alleles column if either of the following sets of conditions is true. Otherwise, the call/call is changed to “Unknown”.

Condition set 1

The variant call is listed as “Missing”
Read depth is > the user designated minReadDepth
There are < 2 reads for the variant OR (reads for the variant) / (reads for the segment ) < 0.10

Condition set 2

The variant call is NOT listed as “Missing”
Read depth is > the user designated minReadDepth

Interpreting the output¶

The resulting table provides the following information:

If the selected sample carries the variant
Read depth of the variant
Read depth of that locus
Raw reads containing the variant

The readsWithNonRef columns display the read depth of the allele derived from the VCF - “0” indicates that this variant is not carried by the selected subject.

The Depth columns display the total read depth of the locus in that particular sample, which is the DP value in the VCF file - “unknown” indicates that this variant does not exist in that particular sample.

The values displayed in the Depth and readsWithNonRef columns are different:

The readsWithNonRef column is calculated from the raw read count file named candidate_variants.gord and contains a pile-up of the number of reads containing the variant allele in the BAM.
The Depth column is the total read depth for the given position and is derived from the VCF file DP column (source/var/wes_varcalls.gord).

The SegDepth column displays the raw read depth of the locus calculated from the BAM in the per base coverage file pileup, which is extracted from the segment_cov.gord file. This value can be used to determine if there is sufficient read coverage of the locus to validate the variant call.

Column descriptions¶

Report output columns and descriptions¶
Group	Column	Description
Basic	Chrom	The chromosome of the variant represented as chr1, chr2, …, chr22, chrXY, chrX, chrY, chrM
	PN	The patient number (identifier)
	POS	The (first) base pair position of the sequence variant, i.e., the position of the first nucleotide in the Reference column
Other columns	Alleles	“unknown” indicates that this rsID-related variant is not present in the selected sample because the coverage is too low or there are insufficient reads containing the variant
	Call	“missing” indicates that this rsID-related variant is not present in the selected sample
	dbSNP_Alleles	the matching variant alleles from the dbSNP reference database
	Depth	The total read depth reported in the VCF file
	readsWithNonRef	The number of reads that contain the variant, calculated based on the candidate_variants.gord file (based on BAM file)
	rsID	The rsID from the input file
	SegDepth	The total read depth reads that cover the locus, which is calculated based on segment_cov.gord file (based on BAM file)
	Strand	Chromosome strand, either “+” or “-” to indicate the direction in which the variant sequence is transcribed

Perspective views¶

Perspectives subtabs focus on subsets of the columns in the Default view.

Perspectives¶
Perspective	Description
Basic
Default view	Shows all columns