Publications

CNVeil enables accurate and robust tumor subclone identification and copy number estimation from single-cell DNA sequencing data.

Published:

To address these challenges, we introduce CNVeil, a robust quantitative algorithm designed to accurately reveal CNV profiles while overcoming the inherent noise and bias in scDNA-seq data. CNVeil incorporates a unique bias correction method using normal cell profiles identified by a PCA-based Gini coefficient, effectively mitigating sequencing bias.

Recommended citation: Yuan, W., Luo, C., Hu, Y., Zhang, L., Wen, Z., Liu, Y. H., ... & Zhou, X. M. (2024). CNVeil enables accurate and robust tumor subclone identification and copy number estimation from single-cell DNA sequencing data. bioRxiv, 2024-02. https://www.biorxiv.org/content/10.1101/2024.02.21.581409.abstract

MaskGraphene: Advancing joint embedding, clustering, and batch correction for spatial transcriptomics using graph-based self-supervised learning

Published:

To address this, we introduce a method called MaskGraphene, for the purpose of better aligning and integrating different ST slices using both self-supervised and contrastive learning. MaskGraphene learns the joint embeddings to capture the geometric information efficiently. MaskGraphene further facilitates spatial aware data integration and simultaneous identification of shared and unique cell/domain types across different slices.

Recommended citation: Hu, Yunfei, et al. "MaskGraphene: Advancing joint embedding, clustering, and batch correction for spatial transcriptomics using graph-based self-supervised learning." bioRxiv (2024): 2024-02. https://www.biorxiv.org/content/10.1101/2024.02.21.581387v1.abstract

Benchmarking clustering, alignment, and integration methods for spatial transcriptomics

Published:

Numerous clustering, alignment, and integration methods have been specifically designed for ST data by leveraging its spatial information. The absence of benchmark studies complicates the selection of methods and future method development. Here we systematically benchmark a variety of state-of-the-art algorithms with a wide range of real and simulated datasets of varying sizes, technologies, species, and complexity.

Recommended citation: Hu, Yunfei, et al. "Benchmarking clustering, alignment, and integration methods for spatial transcriptomics." bioRxiv (2024). https://www.biorxiv.org/content/10.1101/2024.03.12.584114v1

ADEPT: Autoencoder with Differentially Expressed Genes and Imputation for a Robust Spatial Transcriptomics Clustering

Published:

To harness both spatial context and transcriptional profile in ST data, we develop a novel graph-based multi-stage framework for robust clustering, called ADEPT. To control and stabilize data quality, ADEPT relies on selection of differentially expressed genes (DEGs) and imputation of the multiple DEG-based matrices for the initial and final clustering of a graph autoencoder backbone that minimizes the variance of clustering results.

Recommended citation: Y. Hu, Y. Zhao, C. T. Schunk, Y. Ma, T. Derr*, X. M. Zhou*. ADEPT: autoencoder with differentially expressed genes and imputation for a robust spatial transcriptomics clustering. (Recomb-seq 2023). http://oliiverhu.github.io/files/recombseq2023.pdf

Haplotyping-Assisted Diploid Assembly and Variant Detection with Linked Reads

Published:

Motivated by current limitations in generating high-quality diploid assemblies and detecting variants, a new suite of software tools, Aquila, was developed to fully take advantage of linked-read sequencing technology. The overarching goal of Aquila is to exploit the strengths of linked-read technology including long-range connectivity and inherent phasing of variants for reference-assisted local de novo assembly at the whole-genome scale.

Recommended citation: Hu, Y., Yang, C., Zhang, L., Zhou, X. (2023). Haplotyping-Assisted Diploid Assembly and Variant Detection with Linked Reads. In: Peters, B.A., Drmanac, R. (eds) Haplotyping. Methods in Molecular Biology, vol 2590. http://oliiverhu.github.io/files/mimb2022.pdf

Automated filtering of genome-wide large deletions through an ensemble deep learning framework

Published:

After extending the algorithm to shortreads dataset, we tested the performance of AquilaDeepFilter on all five linked-reads and short-read libraries sequenced from the well-studied NA24385 sample, validated against the Genome in a Bottle benchmark. To demonstrate the filtering ability of AquilaDeepFilter, we utilized the SV calls from three upstream SV detection tools including Aquila, Aquila_stLFR and Delly as the baseline.

Recommended citation: Yunfei Hu, Sanidhya Mangal, Lu Zhang, Xin Zhou, Automated filtering of genome-wide large deletions through an ensemble deep learning framework, Methods, Volume 206, 2022, Pages 77-86. http://oliiverhu.github.io/files/methods2022.pdf

An ensemble deep learning framework to refine large deletions in linked-reads

Published:

In this work, we propose AquilaDeepFilter to filter large deletion SVs from Aquila and Aquila_stLFR. AquilaDeepFilter relies on a deep learning ensemble approach by integrating several state-of-the-art CNN backbones.

Recommended citation: Y. Hu, S. V. Mangal, L. Zhang, X. Zhou. An ensemble deep learning framework to refine large deletions in linked-reads. The IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2021) http://oliiverhu.github.io/files/bibm2021.pdf

Text mining of gene–phenotype associations reveals new phenotypic profiles of autism-associated genes

Published:

Given the abundance of the autism-related literature, we were thus motivated to develop Autism_genepheno, a text mining pipeline to identify sentence-level mentions of autism-associated genes and phenotypes in literature through natural language processing methods.

Recommended citation: Li, S., Guo, Z., Ioffe, J.B. et al. Text mining of gene–phenotype associations reveals new phenotypic profiles of autism-associated genes. Sci Rep 11, 15269 (2021). http://oliiverhu.github.io/files/sr2021.pdf