Sample import through the tertiary pipeline

Tertiary pipeline summary

Input files

The WuXi NextCODE system is designed to work with a small set of files derived from fully-processed BAM and VCF files. Input BAMs should initially be sorted, aligned, and de-duplicated, and should be available with corresponding BAI index files. Variant call files should be single-sample VCFs (multi-sample VCF should be split into individual VCFs) and should be compliant with the VCF specification for 4.1 and onward.

In general, output from standard best practices pipelines will be suitable without further modification. Import pipelines for RNA-Seq, CNV calling, SV calling, and other outputs and file formats are also compatible but are not described here.

guides/images/tertiaryPipelineImport.png
Steps to convert BAMs and VCFs into analysis-ready files in the WuXi NextCODE system

Process

Description

Output files and usage

BAM to Candidate Variants file

Aligned reads converted to an extremely permissive variant call file

Candvars file lists all possible variants along with the number of reads that support each variant allele, even low quality candidates. Used primarily to assist with calling de novo variants.

BAM to Coverage files

BAM pileup segmented into contiguous regions of similar coverage

Seg.cov files are used to simplify operations involving coverage thresholds. Additional files are prepared with calculated coverage over all genetic and exonic regions. Goodcov files are also prepared to distinguish between low coverage and reference calls across all samples in the project.

VCF to Genotype file

Structural variants and excessively long indels are removed to avoid low-quality SV calls

Genotypes files with required call and quality information ready for use in the GOR database architecture.

VCF to VEP files

Ensembl Variant Effect Predictor (VEP) is used with both Ensembl and RefGene reference data

VEP files are made available to annotate variants with consequence and impact class for filtering functional variants in common queries.

Output files

After the data import pipeline, the resulting files have been optimized for flexible and rapid querying with the GOR database architecture. The original BAM and VCF files remain available for reference and manual confirmation of variants as needed.