ConsensusCruncher

Suppresses errors in next-generation sequencing data by using unique molecular identifiers

Description: Detection of cancer-associated somatic mutations has broad applications for oncology and precision medicine. However, this becomes challenging when cancer-derived DNA is in low abundance, such as in impure tissue specimens or in circulating cell-free DNA. Next-generation sequencing (NGS) is particularly prone to technical artefacts that can limit the accuracy for calling low-allele-frequency mutations. State-of-the-art methods to improve detection of low-frequency mutations often employ unique molecular identifiers (UMIs) for error suppression; however, these methods are highly inefficient as they depend on redundant sequencing to assemble consensus sequences. Here, we present a novel strategy to enhance the efficiency of UMI-based error suppression by retaining single reads (singletons) that can participate in consensus assembly. This ‘Singleton Correction’ methodology outperformed other UMI-based strategies in efficiency, leading to greater sensitivity with high specificity in a cell line dilution series. Significant benefits were seen with Singleton Correction at sequencing depths ≤16 000×. We validated the utility and generalizability of this approach in a cohort of >300 individuals whose peripheral blood DNA was subjected to hybrid capture sequencing at ∼5000× depth. Singleton Correction can be incorporated into existing UMI-based error suppression workflows to boost mutation detection accuracy, thus improving the cost-effectiveness and clinical impact of NGS.
Authors: Ting Ting Wang, Sagi Abelson, Jinfeng Zou, Tiantian Li, Zhen Zhao, John E Dick, Liran I Shlush, Trevor J Pugh, Scott V Bratman
Lab: Pugh
Version: -
Keywords: ConsensusCruncher, UMI-based error suppression, singleton, NGS, Python
Licensing: Apache License, Version 2.0

Citation

Wang, T. T., Abelson, S., Zou, J., Li, T., Zhao, Z., Dick, J. E., ... & Bratman, S. V. (2019). High efficiency error suppression for accurate detection of low-frequency variants. Nucleic acids research, 47(15), e87-e87.