Sample import through the tertiary pipeline¶
Tertiary pipeline summary¶
Input files¶
The WuXi NextCODE system is designed to work with a small set of files derived from fully-processed BAM and VCF files. Input BAMs should initially be sorted, aligned, and de-duplicated, and should be available with corresponding BAI index files. Variant call files should be single-sample VCFs (multi-sample VCF should be split into individual VCFs) and should be compliant with the VCF specification for 4.1 and onward.
In general, output from standard best practices pipelines will be suitable without further modification. Import pipelines for RNA-Seq, CNV calling, SV calling, and other outputs and file formats are also compatible but are not described here.
Process |
Description |
Output files and usage |
---|---|---|
BAM to Candidate Variants file |
Aligned reads converted to an extremely permissive variant call file |
Candvars file lists all possible variants along with the number of reads that support each variant allele, even low quality candidates. Used primarily to assist with calling de novo variants. |
BAM to Coverage files |
BAM pileup segmented into contiguous regions of similar coverage |
|
VCF to Genotype file |
Structural variants and excessively long indels are removed to avoid low-quality SV calls |
Genotypes files with required call and quality information ready for use in the GOR database architecture. |
VCF to VEP files |
Ensembl Variant Effect Predictor (VEP) is used with both Ensembl and RefGene reference data |
VEP files are made available to annotate variants with consequence and impact class for filtering functional variants in common queries. |
Output files¶
After the data import pipeline, the resulting files have been optimized for flexible and rapid querying with the GOR database architecture. The original BAM and VCF files remain available for reference and manual confirmation of variants as needed.