CompoundHetVIP: Compound Heterozygous Variant Identification Pipeline

Summary

Compound Heterozygous ( CH) variant identification requires distinguishing maternally from paternally derived nucleotides, a process that requires numerous computational tools. Using such tools often introduces unforeseen challenges such as installation procedures that are operating-system specific, software dependencies that must be installed, and formatting requirements for input files. To overcome these challenges, we developed Compound Heterozygous Variant Identification Pipeline (CompoundHetVIP), which uses a single Docker image to encapsulate commonly used software tools for file aggregation ( BCFtools or GATK4), VCF liftover ( Picard Tools), joint-genotyping ( GATK4), file conversion ( Plink2), phasing ( SHAPEIT2, Beagle, and/or Eagle2), variant normalization ( vt tools), annotation ( SnpEff), relational database generation ( GEMINI), and identification of CH, homozygous alternate, and de novo variants in a series of 13 steps. To begin using our tool, researchers need only install the Docker engine and download the CompoundHetVIP Docker image. The tools provided in CompoundHetVIP, subject to the limitations of the underlying software, can be applied to whole-genome, whole-exome, or targeted exome sequencing data of individual samples or trios (a child and both parents), using VCF or gVCF files as initial input. Each step of the pipeline produces an analysis-ready output file that can be further evaluated. To illustrate its use, we applied CompoundHetVIP to data from a publicly available Ashkenazim trio and identified two genes with a candidate CH variant and two genes with a candidate homozygous alternate variant after filtering based on user-set thresholds for global minor allele frequency, Combined Annotation Dependent Depletion, and Gene Damage Index. While this example uses genomic data from a healthy child, we anticipate that most researchers will use CompoundHetVIP to uncover missing heritability in human diseases and other phenotypes. CompoundHetVIP is open-source software and can be found at https://github.com/dmiller903/CompoundHetVIP; this repository also provides detailed, step-by-step examples. Copyright: © 2021 Miller DB and Piccolo SR.

Authors Miller DB, Piccolo SR
Journal F1000Research
Publication Date 2020;9:1211
PubMed 33680433
PubMed Central PMC7905494
DOI 10.12688/f1000research.26848.2

Research Projects

Cell Lines