VLM-Guided Inpainting for Anomaly Detection
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Anomaly detection (AD) aims to identify regions in an image that deviate from the expected distribution of normal visual data, a task critical for applications such as industrial inspection. Recent CLIP-based approaches have enabled zero-shot anomaly detection by comparing image features with text-derived embeddings, leveraging pretrained vision-language alignment. While effective in general scenarios, these methods struggle to capture domain-specific normality and often fail to accurately localize subtle anomalies. We introduce a novel framework that integrates CLIP-guided mask inference with a diffusion-based generative inpainting module trained on normal data. To improve semantic consistency and reconstruction fidelity, we incorporate score distillation sampling (SDS) loss, which aligns the inpainted output with the distribution of normal images in the embedding space. Our method is model-agnostic and can be integrated into existing CLIP-based detectors without requiring anomaly annotations. Experiments on datasets from industrial and medical domains demonstrate consistent improvements when integrated with various backbones in both image-level and pixel-level detection tasks. Qualitative results show improved reconstruction and precise localization of fine-grained anomalies.

키워드

Anomaly DetectionVision-Language ModelsDiffusion ModelsScore Distillation Sampling.
제목
VLM-Guided Inpainting for Anomaly Detection
저자
Seo, JungyeonHong, Kibeom
DOI
10.33851/JMIS.2025.12.3.87
발행일
2025-09
유형
Y
저널명
Journal of Multimedia Information System
12
3
페이지
87 ~ 94