2023 FDA Science Forum
CANcer Variant Allele Sequencing (CANVAS): A computational pipeline for the quantitation of ultra-low frequency hotspot mutations in myeloid neoplasm-associated genes via targeted error-corrected sequencing of human peripheral blood DNA
- Authors:
- Center:
-
Contributing OfficeNational Center for Toxicological Research
Abstract
Error-corrected sequencing describes a set of sequencing protocols that focus on mitigating the effects of induced errors in calling variants and is often used for evaluating somatic mutations associated with carcinogenesis. There are several laboratory techniques that can be used which must also be matched with a data processing computational workflow that processes the specialized raw sequencing data to highly accurate consensus sequences, allowing for the observation of smaller effect sizes between a control set and an exposed or treated set of samples. The current report of CANVAS describes the preparation of uniquely barcoded DNA libraries of synthesized sequences in known amounts and previously evaluated genomic DNA (gDNA) and the computational workflow that uses mostly standard sequencing software tools to output sequences corrected for errors introduced during PCR, library preparation and sequencing. Mutant fraction samples were generated using in vitro constructed standards to represent ratios of 10-1, 10-2, 10-3, 10-4, and 10-5 for a set of known blood-based cancer driver mutations and gDNA of samples previously evaluated with ACB-PCR and ddPCR were also prepared and sequenced. The performance of CANVAS was evaluated using both the results of the mutant fraction standard samples and by comparison of the gDNA evaluation to the quantitative range of ACB-PCR and ddPCR (10-1 to 10-5). The target amplification adds molecular tags to both ends of target sequence and the CANVAS program takes advantage of these unique identifiers to provide highly accurate counts of sequences. The logic of CANVAS removes the introduction of unidentifiable nucleotides (“N”) by comparing sequences with the same UID as whole words rather than single letters, making it less noisy and using less computational time. The results show this method accurately evaluates the abundance of mutations at each position over the range of 10-1 to 10-5 as determined using the constructed mutant fraction samples and comparison to results obtained from orthogonal methods. Application to future avenues of research include iPSC-derived biologic product safety assessment, and blood-based biomarkers of risk for cancer (e.g. therapy-related or after Phase I clinical trials) and noncancer diseases of aging and inflammation.