De novo/CHZ for trios¶
Sequence Miner is a research tool for integrated analysis of genotype and phenotype data on patients. Clinical and other phenotype data can be imported into Sequence Miner through the metadata query language NOR (Non-Ordered Relational). Data fields can be sorted and filtered allowing the user to quickly classify patients into categories (e.g., affected and unaffected) The NOR-defined subsets can then be used as input list(s) for genomic analysis applications in Sequence Miner called report builders that apply the GOR (Genomically Ordered Relational) query language.
Report Builders can be used to perform single or multi-family analysis. These tools allow users to identify rare variants that can be inherited in a Mendelian fashion. Single-family analysis such as trios or quads can be performed in the Mendelian analysis report builder, while multi-family analysis can be performed using Multi-family Mendelian analysis, De novo/CHZ for trios, and Venn variant analysis report builders.
Multi-family analysis in Sequence Miner filters variants based on low allele frequency, the functional impact of the variant, and mode of inheritance.
The De novo/CHZ for trios report builder identifies de novo and compound heterozygous variants in individuals from multiple families. Users can prioritize variants that are more prevalent among cases. T
The workflow for De novo/CHZ for trios analysis in Sequence Miner is shown below:
Creating a pedigree file¶
The first step to perform a De novo/CHZ for trios analysis is to create a pedigree file. This file requires three columns with the following column names and in the following order: PN, FN, MN. These columns list the index case, father, and mother of a family respectively. Therefore, every row in the pedigree file represents a family.
Open Sequence Miner from the CSA dashboard. In the Sequence Miner File Explorer, open the
SubjectReports/Participants.rep
file. TheParticipants.rep
file contains four columns: PN, Study_Name, KIND, and AFFECTED. Since only the first three columns are required to create a pedigree file, you can hide the AFFECTED column by right-clicking the column header and selecting Hide Column.Next, select the Data query icon in the toolbar on the left-hand side of the Seqeunce Miner window. In the query window, type the command
nor
for non-ordered relational database and click the Select Virtual Relations for View icon. A new dialog opens to select the files opened in the browser. Select theParticipants.rep
grid.In the query window, type the following query
“| pivot KIND –v father, mother, index –gc STUDY_NAME”
. This transposes the PNs from the father, mother, and index values in the KIND column and groups them by the Study_Name column. Run the query by clicking the Execute query icon. The output will contain one family per row.Rename the columns from the family subjects. The pedigree file requires the column header names “PN” (index), “FN” (father) and “MN” (mother). To rename the columns, type the following query:
|rename father_PN FN | rename mother_PN MN| rename index_PN PN
.Type the command
select
to list only the PN, FN, and MN columns (in that order). Once the required columns are present in the table, they can be saved in the user_data folder for later use to select any family of interest.
Running the De novo/CHZ for trios report builder¶
After generating the pedigree file, start the analysis by opening the De novo/CHZ for trios report builder and entering the input parameters.
In the Sequence Miner Report Builders tab, select the De novo/CHZ for trios report builder. You can select Inheritance in the Category filter to filter only Inheritance Reports, and then click on the De novo/CHZ for trios report builder in the menu grid.
Set up the parameters of the analysis by entering the input parameters in the De novo/CHZ for trios report builder dialog.
First, select the pedigree file. In order to select the file, it must be open in the browser.
An important setting to define is the VEP_consequence filter. VEP (Variant effect predictor) is an algorithm from Ensembl to predict the impact of a given variant in the transcript or protein. Variants can be filtered by VEP consequence by clicking inside the VEP_consequence field. A pop-up window appears which lists the VEP consequence categories with corresponding maximum impact: HIGH, MODERATE, LOW, and LOWEST. Selecting HIGH and MODERATE is recommended because these include loss of function and missense mutations respectively. To select multiple consequences, highlight the consequences, right-click and select Set Selected, and then click Apply.
Another important parameter is maximum allele frequency. Maximum allele frequencies can be set based on mode of inheritance. The recessive maximum genotypic frequency (RmaxGf) is used for compound heterozygous and is dependent on the individual allele frequencies.
Setting up the parameters of the analysis¶
In the De novo/CHZ for trios report builder dialog, enter the input parameters. First, select the pedigree file. In order to select the file, it must be opened in the browser.
All remaining parameters have default settings which can be accepted or modified. These include settings for gene coverage, labeling of pathogenic variants, quality, etc. The analysis can also be customized by uploading specific gene list, genomic region, or allele frequency files.
After defining the parameters, initiate the analysis by clicking Create Report.
Interpreting the results¶
Prioritizing variants¶
The De novo/CHZ for trios report builder identifies de novo and compound heterozygote variants present in individuals from multiple families. Start the analysis by prioritizing variants that are more prevalent across cases. For example, to identify de novo variants that are more prevalent among the cases of the analysis, sort by the sum_denovo column.
Drill in reports¶
Variants of interest can be further annotated with drill-in reports.
To view available drill-in report annotations, highlight the rows containing the variants of interest, and then right-click and select Drill in-Reports to open a drop-down list of annotations.
Select an annotation from the drop-down list. For example, select var Clinical disease gene annotations to obtain more information about the clinical impact of the selected variant and genes.
Once a drill-in report is selected, a new window opens with the defining query at the top and the table output at the bottom. The new output includes the annotation columns at the end of the table.
Confirming variants in the Genome Browser¶
Variants of interest can be confirmed in the aligned reads using the Genome Browser tool in Sequence Miner. The Genome Browser enables the visualization of raw data such as BAMs and VCFs.
To view the BAM files of a family, open the
SubjectReports/Participants.rep
file from the Sequence Miner File Explorer. In the file, search for a specific family ID by filtering on the PN column. Type the PN of the index case and then identify his/her father and mother. After the filter has been applied, highlight the PN and KIND columns of the selected family members and click Apply.Next, click the Show tracks in Genome Browser icon in the toolbar and select a browser template file such as BAM (
bam_tracks.gbt
), VCF (var_tracks.gbt
), coverage (cov_track.gbt
), or a combination of all the above (study_tracks.gbt
). In this example, selectstudy_track.gbt
.Once the Genome Browser track has loaded, return to the results of the De novo/CHZ for trios output and highlight the row with the variant of interest. Click the Synchronize icon in the toolbar to center the browser window around the selected genomic coordinate.
In the Genome Browser, two sections are displayed. The top section includes the BAM, VCF, and coverage information; the bottom section includes several annotation tracks. The variant position is highlighted by a vertical green line that extends across the two sections. The left- and right-aligned reads are displayed in blue and orange respectively. The variant is highlighted in yellow in the BAM files.
Repeat the same workflow to prioritize and confirm compound heterozygous variants. In that case, the variant must be confirmed for both variants in the gene.