An ensemble deep learning framework to refine large deletions in linked-reads

Published in BIBM, 2021

Recommended citation: Y. Hu, S. V. Mangal, L. Zhang, X. Zhou. An ensemble deep learning framework to refine large deletions in linked-reads. The IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2021) http://oliiverhu.github.io/files/bibm2021.pdf

Abstract: The detection of structural variants (SVs) remains challenging due to inconsistencies in detected breakpoints and biological complexity of some rearrangements. Linked-reads have demonstrated their superiority in diploid genome assembly and SV detection. Recently developed tools Aquila and Aquila_stLFR use a reference sequence and linked-reads to generate a high quality diploid genome assembly, using which they then detect and phase personal genetic variations. However, they both produce a substantial proportion of false positive deletion SV calls. To take full advantage of linked-reads, an effective downstream filtering and refinement framework is needed pressingly. In this work, we propose AquilaDeepFilter to filter large deletion SVs from Aquila and Aquila_stLFR. AquilaDeepFilter relies on a deep learning ensemble approach by integrating several state-of-the-art CNN backbones. The filtering of deletion SVs is formulated as a binary classification task on image data that are generated through the extraction of multiple alignment signals, including read depth, split reads and discordant read pairs. Three linked-reads libraries sequenced from the well-studied sample NA24385 and the gold standard of GiaB benchmark were used to perform thorough experiments on our proposed method. The results demonstrated that AquilaDeepFilter could increase the precision rate of Aquila while the recall rate of Aquila decreased only slightly, and the overall F1 improved by 20%. Furthermore, AquilaDeepFilter outperformed another deep learning based method for SV filtering, DeepSVFilter. Even though we designed AquilaDeepFilter for linked-reads, the framework could also be used to improve SV detection on short reads.

Download paper here

Recommended citation: Y. Hu, S. V. Mangal, L. Zhang, X. Zhou. An ensemble deep learning framework to refine large deletions in linked-reads. The IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2021).