Sample import through the tertiary pipeline¶

Tertiary pipeline summary¶

Input files¶

The WuXi NextCODE system is designed to work with a small set of files derived from fully-processed BAM and VCF files. Input BAMs should initially be sorted, aligned, and de-duplicated, and should be available with corresponding BAI index files. Variant call files should be single-sample VCFs (multi-sample VCF should be split into individual VCFs) and should be compliant with the VCF specification for 4.1 and onward.

In general, output from standard best practices pipelines will be suitable without further modification. Import pipelines for RNA-Seq, CNV calling, SV calling, and other outputs and file formats are also compatible but are not described here.

guides/images/tertiaryPipelineImport.png

Steps to convert BAMs and VCFs into analysis-ready files in the WuXi NextCODE system¶
Process	Description	Output files and usage
BAM to Candidate Variants file	Aligned reads converted to an extremely permissive variant call file	Candvars file lists all possible variants along with the number of reads that support each variant allele, even low quality candidates. Used primarily to assist with calling de novo variants.
BAM to Coverage files	BAM pileup segmented into contiguous regions of similar coverage	`Seg.cov` files are used to simplify operations involving coverage thresholds. Additional files are prepared with calculated coverage over all genetic and exonic regions. `Goodcov` files are also prepared to distinguish between low coverage and reference calls across all samples in the project.
VCF to Genotype file	Structural variants and excessively long indels are removed to avoid low-quality SV calls	Genotypes files with required call and quality information ready for use in the GOR database architecture.
VCF to VEP files	Ensembl Variant Effect Predictor (VEP) is used with both Ensembl and RefGene reference data	VEP files are made available to annotate variants with consequence and impact class for filtering functional variants in common queries.

Output files¶

After the data import pipeline, the resulting files have been optimized for flexible and rapid querying with the GOR database architecture. The original BAM and VCF files remain available for reference and manual confirmation of variants as needed.