SinWaveFusion: Learning a single image diffusion model in wavelet domain
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Although recent advancements in large-scale image generation models have substantially improved visual fidelity and reliability, current diffusion models continue to encounter significant challenges in maintaining stylistic consistency with the original images. These challenges stem primarily from the intrinsic stochastic nature of the diffusion process, leading to noticeable variability and inconsistency in edited outputs. To address these challenges, this paper proposes a novel framework termed single image wavelet diffusion (SinWaveFusion), explicitly designed to enhance the consistency and fidelity in generating images derived from a single source image while also mitigating information leakage. SinWaveFusion addresses generative artifacts by employing the multi-scale properties inherent in wavelet decomposition, which incorporates a built-in up-down scaling mechanism. This approach enables refined image manipulation while enhancing stylistic coherence. The proposed diffusion model, trained exclusively on a single source image, utilizes the hierarchical structure of wavelet subbands to effectively capture spatial and spectral information in the sampling process, minimizing reconstruction loss and ensuring high-quality, diverse outputs. Moreover, the architecture of the denoiser features a reduced receptive field, strategically preventing the model from memorizing the entire training image and thereby offering additional computational efficiency benefits. Experimental results demonstrate that SinWaveFusion achieves improved performance in both conditional and unconditional generation compared to existing generative models trained on a single image.

키워드

Single image generationDenoising diffusion modelsWavelet transform
제목
SinWaveFusion: Learning a single image diffusion model in wavelet domain
저자
Kim, JisooKang, JiwooKim, TaewanOh, Heeseok
DOI
10.1016/j.imavis.2025.105551
발행일
2025-06
유형
Article
저널명
Image and Vision Computing
159