A unified framework for correcting batch effects and integrating multi-omics data
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Multi-omics studies enable a comprehensive understanding of biological systems by integrating complementary molecular layers such as gene expression, DNA methylation, and chromatin accessibility. However, the generation of multi-omics data remains costly and labor-intensive, leading researchers to combine publicly available datasets collected from different cohorts, laboratories, and platforms. Integrating such heterogeneous datasets introduces substantial batch effects and technical variability that can obscure true biological structure. While numerous batch correction methods exist for single-omics data, systematic approaches for multi-omics batch effect correction remain limited. Correcting each omics layer independently risks disrupting cross-omics concordance and fails to ensure that samples are aligned within a unified multi-modal space, underscoring the need for coordinated, modality-aware harmonization that preserves shared molecular structure while removing technical variation across studies. To address this gap, we developed MoDAmix, a unified framework that leverages domain adaptation to remove technical variation while preserving shared molecular structure across omics layers. In particular, MoDAmix aligns feature distributions across batches and modalities through adversarial learning, enforcing consistency both within and between omics types to achieve coherent cross-omics integration. MoDAmix proceeds through four stages: (1) pre-training to learn initial feature representations, (2) adversarial adaptation to reduce batch effects within each omics type, (3) multi-omics adversarial alignment to harmonize modalities in a shared latent space, and (4) semi-supervised class alignment to refine subtype separability through pseudo-labeling and centroid consistency. Evaluations on both single-cell and bulk datasets-including mouse brain (gene expression and chromatin accessibility) and cancer cohorts (gene expression and DNA methylation)-demonstrated that MoDAmix effectively mitigates batch effects, improves clustering and classification performance, and preserves subtype structure across domains. Together, these results highlight MoDAmix as a robust framework for multi-omics batch effect correction and integration, enabling reliable cross-cohort analysis in systems biology and precision medicine. MoDAmix is publicly available at https://github.com/cbi-bioinfo/MoDAmix.

키워드

GENE-EXPRESSION
제목
A unified framework for correcting batch effects and integrating multi-omics data
저자
Choi, Joung MinChae, Heejoon
DOI
10.1038/s41598-026-42355-9
발행일
2026-03
유형
Article
저널명
Scientific Reports
16
1