Risk SNPs

The Risk SNPs report builder annotates the variants from a PN or list of PNs that match a set of user-defined rsIDs from the dbSNP reference file.

../_images/riskSNPs.png

Risk SNPs module in Sequence Miner

Example use case

The user has a list of rsIDs for SNPs of interest (from dbSNP) and wishes to screen the subject(s) for those variants and to determine the quality of the variant if present.

Description of the algorithm

rsIDs matching the user-designated rsID_grid are extracted from the dbSNP reference file (ref/dbsnp/dbsnp_anno.gorz). This table is expanded by creating one row per subject (PN) per rsID.

Each rsID/SNP coordinate for each subject is then joined with the corresponding coordinate and PN (row) in the source/var/wgs_varcalls.gord file.

Allele calls (e.g., “A/T”) for each rsID for each PN are reported in the Alleles column if either of the following sets of conditions is true. Otherwise, the call/call is changed to “Unknown”.

Condition set 1

  • The variant call is listed as “Missing”

  • Read depth is > the user designated minReadDepth

  • There are < 2 reads for the variant OR (reads for the variant) / (reads for the segment ) < 0.10

Condition set 2

  • The variant call is NOT listed as “Missing”

  • Read depth is > the user designated minReadDepth

Interpreting the output

The resulting table provides the following information:

  • If the selected sample carries the variant

  • Read depth of the variant

  • Read depth of that locus

  • Raw reads containing the variant

The readsWithNonRef columns display the read depth of the allele derived from the VCF - “0” indicates that this variant is not carried by the selected subject.

The Depth columns display the total read depth of the locus in that particular sample, which is the DP value in the VCF file - “unknown” indicates that this variant does not exist in that particular sample.

The values displayed in the Depth and readsWithNonRef columns are different:

  • The readsWithNonRef column is calculated from the raw read count file named candidate_variants.gord and contains a pile-up of the number of reads containing the variant allele in the BAM.

  • The Depth column is the total read depth for the given position and is derived from the VCF file DP column (source/var/wes_varcalls.gord).

The SegDepth column displays the raw read depth of the locus calculated from the BAM in the per base coverage file pileup, which is extracted from the segment_cov.gord file. This value can be used to determine if there is sufficient read coverage of the locus to validate the variant call.

Column descriptions

Report output columns and descriptions

Group

Column

Description

Basic

Chrom

The chromosome of the variant represented as chr1, chr2, …, chr22, chrXY, chrX, chrY, chrM

PN

The patient number (identifier)

POS

The (first) base pair position of the sequence variant, i.e., the position of the first nucleotide in the Reference column

Other columns

Alleles

“unknown” indicates that this rsID-related variant is not present in the selected sample because the coverage is too low or there are insufficient reads containing the variant

Call

“missing” indicates that this rsID-related variant is not present in the selected sample

dbSNP_Alleles

the matching variant alleles from the dbSNP reference database

Depth

The total read depth reported in the VCF file

readsWithNonRef

The number of reads that contain the variant, calculated based on the candidate_variants.gord file (based on BAM file)

rsID

The rsID from the input file

SegDepth

The total read depth reads that cover the locus, which is calculated based on segment_cov.gord file (based on BAM file)

Strand

Chromosome strand, either “+” or “-” to indicate the direction in which the variant sequence is transcribed

Perspective views

Perspectives subtabs focus on subsets of the columns in the Default view.

Perspectives

Perspective

Description

Basic

Default view

Shows all columns