Sharing

The Sharing query calculates the relatedness between samples by calculating the relative fraction of shared rare SNP alleles.

../_images/sharing.png

Sharing module in Sequence Miner

Example use case

This analysis tool expects germline WES/WGS data as input.

Using the report builder

Select the set of individuals to compare and whether you want the analysis to be limited to the exome. You can also specify a minimum threshold for coverage in order for variants to be considered in the analysis.

The output returns the average fraction of SNPs found in common between each pair of samples. Two samples returns one output row, three samples returns three output rows, four samples returns six output rows, etc.

Description of the algorithm

Identification of rare variants

The Sharing report builder measures sharing of rare alleles to estimate the degree of relatedness between selected samples. The query first generates a list of rare variants with minor allele frequency between 0.00001 and 0.05 according to the ref/freq_max.gorz file. The allele frequencies in this file are derived from the maximum of several population allele frequency databases, for example, gnomAD, ExAC, 1000 Genomes, and others.

Selection of rare variants from the subjects

A list of variants in the input list of subjects is identified that a) pass a quality filter and b) are present in the rare variants table created above.

Pairwise comparison between subjects, for each variant

Each variant is checked for presence in all subjects, and each variant site is checked for good coverage (defined by the input argument min_read_depth) in each subject. For each pair of subjects, a total count is calculated for a) how many times a variant is present in both subjects and b) how many times a variant site has good coverage in both subjects. These counts yeild the calculation of the PN1_SNPsUsed column.

Intrepreting the output

Relationship column

The relationship column shows the relationship between subjects you selected, either unrelated, half sibling, full sibling, or identical twins. The sharing column shows the ratio of shared rare SNPs in total rare variants in one of the samples. The sharing value is directly proportional to the degree of relatedness between the pair of subjects:

  • sharing > 0.9: relationship = IdenticalTwins

  • sharing between 0.4 - 0.6: relationship = FullSibs/ParentChild

  • sharing between 0.2 - 0.3: relationship = HalfSibs/GrandParents

  • sharing < 0.2: relationship = unrelated

  • For sharing values outside of these ranges: relationship = undetermined

Status

Because each pairwise comparison is done twice, once using one subject as the reference and again using the other subject as the reference, minimum, maximum, and average sharing values are calculated. Status (“ok” or “bad”) is based on the following test:

  • (minimum - average) / average < 0.1 → status is “ok”

  • (minimum - average) / average > 0.1 → status is “bad”

Note

minimum or maximum can be used in this test because (minimum - average) = (maximum - average)

Column descriptions

Report output columns and descriptions

Column

Description

coeff_variation

(PN1_sharing - avg_sharing)/avg_sharing

PN1_SNPsUsed

The number of rare, high-quality SNPS identified in PN1

PN1vsPN2

The samples (subjectIDs) being compared

PN2_SNPsUsed

The number of rare, high-quality SNPS identified in PN2

relationship

  • sharing > 0.9: relationship = IdenticalTwins

  • sharing between 0.4 - 0.6: relationship = FullSibs/ParentChild

  • sharing between 0.2 - 0.3: relationship = HalfSibs/GrandParents

  • sharing < 0.2: relationship = unrelated

  • for sharing values outside of these ranges: relationship = undetermined

Sharing

The average number of SNPs present in both PN1 and PN2, divided by the total number of SNPs
  • 2*[count of SNPs found in both PN1 and PN2] / [count of SNPs found in PN1 + count of SNPs found in PN2]

status

Indicator of quality based on an arbitrary threshold; if coeff_variation < 0.1, this value is “ok”, otherwise this indicator is “bad”