Multi-family Mendelian analysis¶
Sequence Miner is a research tool for integrated analysis of genotype and phenotype data on patients. Clinical and other phenotype data can be imported into Sequence Miner through the metadata query language NOR (Non-Ordered Relational). Data fields can be sorted and filtered allowing the user to quickly classify patients into categories (e.g., affected and unaffected). The NOR-defined subsets can then be used within Sequence Miner as input list(s) for genomic analysis applications called report builders that apply the GOR (Genomically Ordered Relational) query language.
Inheritance report builders can be used to perform single or multi-family analysis. These tools allow users to identify rare disease variants that can be inherited in a Mendelian fashion. Single-family analysis with trios or quads can be performed in the Mendelian analysis report builder, while multi-family analysis can be performed by using the Multi-family Mendelian analysis, De novo/CHZ for trios, and Venn variant analysis report builders.
Inheritance report builders identify rare variants in a subset of affected and unaffected individuals. The analysis provides the number of homozygous, heterozygous, and compound heterozygous carriers. Variants are further classified into diagnostic attributes depending on their zygosity distribution across affected and unaffected subjects. For example, variants are classified as “Dominant” if they are present in affected subjects but absent in unaffected subjects.
The workflow for Multi-family Mendelian analysis in Sequence Miner is shown below:
Defining a set of affected and unaffected subjects¶
The first step to analyze individuals from multiple families in the Multi-family Mendelian analysis report builder is to create input files based on the sex and affected status of the subjects. This information can be found in files located in the Sequence Miner File Explorer.
Open Sequence Miner from the CSA dashboard. In the Sequence Miner File Explorer, open the
SubjectReports/All.rep
file. TheAll.rep
file contains a list of subjects (PNs) with their corresponding sex.Create input files
In the
All.rep
file, right-click the Gender column and select Filter on column. A new window appears which shows the number of female and male subjects. Select female or male and click Apply.A new list of PNs with the selected filter is generated. Highlight all of the PNs, and then right-click and select Open in a new grid.
Next, add the affected status to the grid of female or male subjects. To do this, open the
SubjectReports/Participants.rep
file in Nor-viewer mode by right-clicking the file and selecting Open in Nor Viewer. This file contains the affected status of all the subjects in the project. Once the file is open in Nor-viewer mode, a query pane appears at the top of the window. By default, the query calls the top 100 variants. Delete thetop 100
command to type in the desired query.Type the command
inset –c PN
and click the Select Virtual Relations for View icon. A new window appears with the list of files opened in the Sequence Miner browser. Select the Grid file with the list of females or males that were previously selected, and run the query by clicking the Execute query icon. The result is a table containing the affected status of the female or male subjects.Once the new table has been generated, right-click the Affected column and select Filter on column. This prompts a new window with the values “yes” or “no” corresponding to affected or unaffected respectively. Select the desired value and click Apply. Once the filter is applied, save the results in the
user_data
folder.
Running the Multi-family Mendelian analysis report builder¶
After defining the set of input files, start the analysis by opening the Multi-family Mendelian analysis report builder and entering the input parameters.
In the Sequence Miner Report Builders tab, select the Multi-family Mendelian analysis report builder. You can select Inheritance in the Category filter to filter only Inheritance Reports, and then click on the Multi-family Mendelian analysis report builder in the menu grid.
Set up the parameters of the analysis by entering the input parameters in the Multi-family Mendelian analysis report builder dialog.
First, select the list of affected or unaffected subjects. In order to select the files, they must be open in the browser.
Click the corresponding fields to enter mCASEs (list of affected males), fCASEs (list of affected females, mCTRLs (list of unaffected males), and fCTRLs (list of unaffected females). When a field is selected, a new window appears with the files opened in the browser. Select the corresponding list and click Commit.
The next parameter to define is Penetrance. In this case, because the input files contain individuals from multiple families, you can modify the number of cases or controls that may be included to define a variant as “Dominant” or “Recessive”. As an example, imagine there are a total of six cases and six controls. To identify dominant variants only if they are present in the six cases, use the default value of “0” for the CaseDelta parameter. To identify any dominant variant present even in one case, a value of “5” must be entered in the CaseDelta field.
An important setting to define is the VEP_consequence filter. Variant effect predictor (VEP) is an algorithm from Ensembl to predict the impact of a given variant in the transcript or protein. Variants can be filtered by VEP consequence by clicking inside the VEP_consequence field. A pop-up window appears which lists the VEP consequence categories with corresponding maximum impact: HIGH, MODERATE, LOW, and LOWEST. Selecting HIGH and MODERATE is recommended because these include loss of function and missense mutations respectively. To select multiple consequences, highlight the consequences, right-click and select Set Selected, and then click Apply.
Another important parameter is maximum allele frequency. Maximum allele frequencies can be set based on mode of inheritance. The recessive maximum genotypic frequency (RmaxGf) is used for compound heterozygous and is dependent on the individual allele frequencies.
All remaining parameters have default settings which be accepted or modified. These include settings for gene coverage, labeling of pathogenic variants, quality, etc. The analysis can also be customized by uploading specific gene list, genomic region, or allele frequency files.
After defining the parameters, initiate the analysis by clicking Create Report.
Interpreting the results¶
Prioritizing variants¶
The Multi-family Mendelian analysis report builder calculates diagnostic attributes for the variants present in the controls. Depending on the penetrance settings, it is possible to identify Dominant variants that are present in some or all cases but absent in the controls, or Homozygous Recessive variants that are present in some or all cases and are heterozygous in the controls. Only variants with good coverage are included in the analysis.
For example, to filter variants by the Dominant attribute, right-click the Diag_Dom column and select Filter on Column. A new window opens which displays the number of variants with a value of “0” or “1”. A value of “1” means that the variant is diagnosed as Dominant. Select “1” and click Apply.
Adding a calculated column¶
The Multi-family Mendelian analysis report builder displays variant information for each one of the affected and unaffected groups. For example, in the fCASEs group the count of total variants, homozygous variants, and variants with coverage is provided. To add the counts from both female and male affected or unaffected subjects, add a column to the output.
To add a column, select the Add calculated column icon in the toolbar. In the pop-up window that opens, enter a name for the new column and define the types of values that should be included. For example, to add the variant counts from the affected females and males, enter the names of the corresponding output columns (fCASEs_subjWithVar + fCASEs_subjWithVar). Note that column names must be typed with the correct case letter.
The new column will be added to the end of the table but can be moved to any position in the output. Sort by ascending or descending order to quickly determine the variants that are most prevalent among cases by clicking the column header.
Drill in reports¶
Variants of interest can be further annotated with drill-in reports.
To view available drill-in report annotations, highlight the rows containing the variants of interest, and then right-click and select Drill in Reports to open a drop-down list of annotations.
Select an annotation from the drop-down list. For example, select AllWithVar annotations to view the list of PNs with a given variant.
Once a drill-in report has been selected, a new window opens with the defining query at the top and the table output at the bottom. The new output includes the annotation columns at the end of the table.
Confirming variants in the Genome Browser¶
Variants of interest can be confirmed in the aligned reads using the Genome Browser tool in Sequence Miner. The Genome Browser enables the visualization of raw data such as BAMs and VCFs.
The list of PNs can be selected from the table generated from the AllWithVar drill-in report by highlighting the list of subjects and clicking the Show tracks in Genome Browser icon in the toolbar. You will be prompted to select a browser template file such as BAM (
bam_tracks.gbt
), VCF (var_tracks.gbt
), coverage (cov_track.gbt
), or a combination of all the above (study_tracks.gbt
). In this example, selectbam_track.gbt
.After the Genome Browser track has loaded, return to the results of the Multi-family Mendelian analysis output and highlight the row with the variant of interest. Click the Synchronize icon in the toolbar to center the browser window around the selected genomic coordinate.
In the Genome Browser, two sections are displayed. The top section includes the BAM, VCF, and coverage information, and the bottom section includes several annotation tracks. The variant position is highlighted by a vertical green line that extends across the two sections. The left- and right-aligned reads are displayed in blue and orange respectively. The variant is highlighted in yellow in the BAM files.
The annotation tracks display information on the variant position and the surrounding areas, providing additional information about other variants of the gene. These annotations include annotations from clinical databases, VEP consequences, dbSNP, etc.