Mendelian analysis¶

Sequence Miner is a research tool for integrated analysis of genotype and phenotype data on patients. Clinical and other phenotype data can be imported into Sequence Miner through the metadata query language NOR (Non-Ordered Relational). Data fields can be sorted and filtered, allowing the user to quickly classify patients into categories (e.g., affected and unaffected). The NOR-defined subsets can then be used within Sequence Miner as input list(s) for genomic analysis applications called report builders that apply the GOR (Genomically Ordered Relational) query language.

Inheritance report builders can be used to perform single or multi-family analysis. These tools allow users to identify rare disease variants that can be inherited in a Mendelian fashion. Single-family analysis with trios or quads can be performed in the Mendelian analysis report builder, while multi-family analysis can be performed by using the Multi-family Mendelian analysis, De novo/CHZ for trios, and Venn variant analysis report builders.

The Mendelian analysis report builder analyzes variants in a single index case in the context of Mendelian inheritance. The analysis filters variants based on low allele frequency, functional impact of the variant, and, if parents’ data are available, mode of inheritance. Each variant can be annotated with many different attributes such as predicted pathogenicity, associated known diseases, conservation scores, population allele frequencies, etc.

The workflow for Mendelian analysis in Sequence Miner is shown below:

guides/images/mendelianAnalysis_01.png — Mendelian analysis workflow¶

Running the Mendelian analysis report builder¶

Mendelian analysis can be performed in Sequence Miner by opening the Mendelian analysis report builder and entering the input parameters.

In the Sequence Miner Report Builders tab, select the Mendelian analysis report builder. You can select Inheritance in the Category filter to filter only Inheritance Reports, and then click on the Mendelian analysis report builder in the menu grid.
Set up the parameters of the analysis by entering the input parameters in the Mendelian analysis report builder dialog. The Mendelian analysis report builder requires only an index case. However, a nuclear family is preferred to determine mode of inheritance patterns.
1. Clicking the INDEXCASE field opens a new window with a list of all subjects in the project. Select one index case and click Apply. Repeat these steps for the FATHER and MOTHER fields.
2. The sex of the index case and the affected status of the parents must also be specified in the corresponding fields.
3. Additional family members can be included in the (m/f) CASEs and (m/f) CTRLs fields depending on their sex and affected status. For example, an affected female sibling of the index case can be included in fCASEs.
4. The next important parameter to define is the VEP (Variant Effect Predictor) filter. VEP is an algorithm from Ensembl to predict the impact of a given variant in the transcript or protein. To filter variants by VEP consequence, click inside the VEP_consequence field. A pop-up window appears which lists the VEP consequence categories with corresponding maximum impact: HIGH, MODERATE, LOW, and LOWEST. Selecting HIGH and MODERATE is recommended because these include loss of function and missense mutations respectively. To select multiple consequences, highlight the consequences, right-click and select Set Selected, and then click Apply.
5. Another important filter is maximum allele frequency. Maximum allele frequencies can be set based on mode of inheritance of the disorder, such as Dominant (DmaxAf) and Recessive (RmaxAf). The Recessive maximum genotypic frequency (RmaxGf) is used for compound heterozygous and is dependent on the individual allele frequencies.
6. All remaining parameters have default settings which be accepted or modified. These include settings for penetrance, gene coverage, labeling of pathogenic variants, quality, etc. The analysis can also be customized by uploading specific gene list, genomic region, or allele frequency files.
7. Once the input parameters have been entered, initiate the analysis by clicking Create Report.

Interpreting the results¶

Navigating and sorting columns¶

When the analysis is complete, a new table window labeled “Mendel_1” opens in Sequence Miner. The result is a list of variants identified in the index case that meet the criteria set in the input parameters. The variants are ordered by chromosome and position. At the bottom of the table, the number of rows is displayed. In this case, every row represents a variant.

To navigate through the table columns, use the Columns panel on the right-hand side. Columns are organized in groups that can be expanded or collapsed by clicking on the arrows.
To sort columns, click on the column header. A triangle appears next to the selected column to indicate ascending order (triangle pointing up) or descending order (triangle pointing down). Click the triangle to toggle the sort order.

Filtering variants¶

Table columns can be filtered by right-clicking the column name (either in the column header or in the Columns panel). A new window opens which lists the values or categories for the selected column. If the column contains numerical values, a distribution of the values is plotted. Cutoffs can be defined by entering a custom value or a range of values. In addition, the counts for each value or category are listed. Sort by ascending or descending order to identify the number of variants falling into that specific value or category by clicking Count. After selecting a value or category, click Apply.
Variants can be priortized according to their predicted level of pathogenicity by filtering on the ACMGcat column and selecting Cat1 to list only variants that have been reported as pathogenic in clinical databases such as ClinVar, OMIM, and HGMD. For more information about ACMG categorization, see Variant analysis in the CSA user manual.
Once the ACMG category filter is applied, information about the variant-associated diseases can be found in the Known_var_diseases column. The clinical database source of that information is listed in the Known_source column.
Depending on the clinical presentation of the disease in the nuclear family, variants can also be filtered by the expected mode of inheritance. For example, in a case where whole genome sequencing is performed in a child with a suspected inherited disorder and the parents are unaffected, this can suggest a homozygous recessive mode of inheritance or a de novo mutation. In this case, variants can be filtered using these suspected mode of inheritance filters in the Mendelian analysis output. In the following example, variants were filtered on the HomRecess column to list variants that are homozygous recessive in the index case:
Once filters are applied to the output, the filtered columns appear in green at the top of the Columns panel. In addition, the query defining the filter is shown at the bottom of the table output and can be copied into a text file for future reference.
Filters can be edited, disabled, or removed at any time by right-clicking the filtered columns at the top the Columns panel and selecting Edit Filter, Enable/Disable Filter, or Remove filter.

Drill in reports¶

Variants of interest can be further annotated with drill-in reports.

To view available drill-in report annotations, highlight the rows containing the variants of interest, and then right-click and select Drill in Reports to open a drop-down list of annotations.
Select an annotation from the drop-down list. For example, to rank variants based on functional predictions, select var dbNSFP full annotations. This can be particularly useful for ranking variants of unknown significance (VUS), also categorized as Cat 3 variants.
Once a drill-in report is selected, a new window opens with the defining query at the top and the table output at the bottom. The new output includes the new annotation columns at the end of the table. The values of each column can be sorted by right-clicking the column header or using the filters in the Columns panel.

Confirming variants in the Genome Browser¶

Variants of interest can be confirmed in the aligned reads using the Genome Browser tool in Sequence Miner. The Genome Browser enables the visualization of raw data such as BAMs and VCFs.

To view the BAM files of a family, open the SubjectReports/Participants.rep file in the Sequence Miner File Explorer. This file includes the information on the studies created in CSA (see Getting started in the CSA user manual ). In the file, highlight the PN and KIND columns of the family members in the study. Next, click the Show tracks in Genome Browser icon in the toolbar.
Next, a prompt appears to select a browser template file such as BAM (bam_tracks.gbt), VCF (var_tracks.gbt), coverage (cov_track.gbt), or a combination of all the above (study_tracks.gbt). In this example, select the study_tracks.gbt. After the Genome Browser track has loaded, return to the results of the Mendelian analysis output and highlight the row with the variant of interest. Click the Synchronize icon in the toolbar to center the browser window around the selected genomic coordinate.
In the Genome Browser, two sections are displayed. The top section includes the BAM, VCF, and coverage information; the bottom section includes several annotation tracks. The variant position is highlighted by a vertical green line that extends across the two sections. The left- and right-aligned reads are displayed in blue and orange respectively. In the BAM files, the variants are highlighted in yellow (single nucleotide), green (deletion), or red (insertion). In the variation file, variants are displayed in brown.
The annotation tracks display information on the variant position and the surrounding areas, providing additional information about other variants in the gene. These tracks include annotations from clinical databases, VEP consequences, dbSNP, etc.
The UCSC browser is also available as an annotation track and can be selected to obtain live information from this browser, including vertebrate conservation and gene expression.