Publications

Benchmarking clustering, alignment, and integration methods for spatial transcriptomics

Published:

Spatial transcriptomics (ST) is advancing our understanding of complex tissues and organisms. However, building a robust clustering algorithm to define spatially coherent regions in a single tissue slice and aligning or integrating multiple tissue slices originating from diverse sources for essential downstream analyses remains challenging. Numerous clustering, alignment, and integration methods have been specifically designed for ST data by leveraging its spatial information. The absence of comprehensive benchmark studies complicates the selection of methods and future method development. In this study, we systematically benchmark a variety of state-of-the-art algorithms with a wide range of real and simulated datasets of varying sizes, technologies, species, and complexity. We analyze the strengths and weaknesses of each method using diverse quantitative and qualitative metrics and analyses, including eight metrics for spatial clustering accuracy and contiguity, uniform manifold approximation and projection visualization, layer-wise and spot-to-spot alignment accuracy, and 3D reconstruction, which are designed to assess method performance as well as data quality. The code used for evaluation is available on our GitHub. Additionally, we provide online notebook tutorials and documentation to facilitate the reproduction of all benchmarking results and to support the study of new methods and new datasets. Our analyses lead to comprehensive recommendations that cover multiple aspects, helping users to select optimal tools for their specific needs and guide future method development.

Recommended citation: Hu, Yunfei, et al. "Benchmarking clustering, alignment, and integration methods for spatial transcriptomics." Genome Biology 25.1 (2024): 212. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-024-03361-0

CNVeil enables accurate and robust tumor subclone identification and copy number estimation from single-cell DNA sequencing data.

Published:

To address these challenges, we introduce CNVeil, a robust quantitative algorithm designed to accurately reveal CNV profiles while overcoming the inherent noise and bias in scDNA-seq data. CNVeil incorporates a unique bias correction method using normal cell profiles identified by a PCA-based Gini coefficient, effectively mitigating sequencing bias.

Recommended citation: Yuan, W., Luo, C., Hu, Y., Zhang, L., Wen, Z., Liu, Y. H., ... & Zhou, X. M. (2024). CNVeil enables accurate and robust tumor subclone identification and copy number estimation from single-cell DNA sequencing data. bioRxiv, 2024-02. https://www.biorxiv.org/content/10.1101/2024.02.21.581409.abstract

MaskGraphene: Advancing joint embedding, clustering, and batch correction for spatial transcriptomics using graph-based self-supervised learning

Published:

To address this, we introduce a method called MaskGraphene, for the purpose of better aligning and integrating different ST slices using both self-supervised and contrastive learning. MaskGraphene learns the joint embeddings to capture the geometric information efficiently. MaskGraphene further facilitates spatial aware data integration and simultaneous identification of shared and unique cell/domain types across different slices.

Recommended citation: Hu, Yunfei, et al. "MaskGraphene: Advancing joint embedding, clustering, and batch correction for spatial transcriptomics using graph-based self-supervised learning." bioRxiv (2024): 2024-02. https://www.biorxiv.org/content/10.1101/2024.02.21.581387v1.abstract

ADEPT: Autoencoder with Differentially Expressed Genes and Imputation for a Robust Spatial Transcriptomics Clustering

Published:

To harness both spatial context and transcriptional profile in ST data, we develop a novel graph-based multi-stage framework for robust clustering, called ADEPT. To control and stabilize data quality, ADEPT relies on selection of differentially expressed genes (DEGs) and imputation of the multiple DEG-based matrices for the initial and final clustering of a graph autoencoder backbone that minimizes the variance of clustering results.

Recommended citation: Y. Hu, Y. Zhao, C. T. Schunk, Y. Ma, T. Derr*, X. M. Zhou*. ADEPT: autoencoder with differentially expressed genes and imputation for a robust spatial transcriptomics clustering. (Recomb-seq 2023). http://oliiverhu.github.io/files/recombseq2023.pdf

Haplotyping-Assisted Diploid Assembly and Variant Detection with Linked Reads

Published:

Motivated by current limitations in generating high-quality diploid assemblies and detecting variants, a new suite of software tools, Aquila, was developed to fully take advantage of linked-read sequencing technology. The overarching goal of Aquila is to exploit the strengths of linked-read technology including long-range connectivity and inherent phasing of variants for reference-assisted local de novo assembly at the whole-genome scale.

Recommended citation: Hu, Y., Yang, C., Zhang, L., Zhou, X. (2023). Haplotyping-Assisted Diploid Assembly and Variant Detection with Linked Reads. In: Peters, B.A., Drmanac, R. (eds) Haplotyping. Methods in Molecular Biology, vol 2590. http://oliiverhu.github.io/files/mimb2022.pdf

Automated filtering of genome-wide large deletions through an ensemble deep learning framework

Published:

After extending the algorithm to shortreads dataset, we tested the performance of AquilaDeepFilter on all five linked-reads and short-read libraries sequenced from the well-studied NA24385 sample, validated against the Genome in a Bottle benchmark. To demonstrate the filtering ability of AquilaDeepFilter, we utilized the SV calls from three upstream SV detection tools including Aquila, Aquila_stLFR and Delly as the baseline.

Recommended citation: Yunfei Hu, Sanidhya Mangal, Lu Zhang, Xin Zhou, Automated filtering of genome-wide large deletions through an ensemble deep learning framework, Methods, Volume 206, 2022, Pages 77-86. http://oliiverhu.github.io/files/methods2022.pdf

An ensemble deep learning framework to refine large deletions in linked-reads

Published:

In this work, we propose AquilaDeepFilter to filter large deletion SVs from Aquila and Aquila_stLFR. AquilaDeepFilter relies on a deep learning ensemble approach by integrating several state-of-the-art CNN backbones.

Recommended citation: Y. Hu, S. V. Mangal, L. Zhang, X. Zhou. An ensemble deep learning framework to refine large deletions in linked-reads. The IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (2021) http://oliiverhu.github.io/files/bibm2021.pdf

Text mining of gene–phenotype associations reveals new phenotypic profiles of autism-associated genes

Published:

Given the abundance of the autism-related literature, we were thus motivated to develop Autism_genepheno, a text mining pipeline to identify sentence-level mentions of autism-associated genes and phenotypes in literature through natural language processing methods.

Recommended citation: Li, S., Guo, Z., Ioffe, J.B. et al. Text mining of gene–phenotype associations reveals new phenotypic profiles of autism-associated genes. Sci Rep 11, 15269 (2021). http://oliiverhu.github.io/files/sr2021.pdf