Multi-Noise Representation Learning for Robust Speaker Recognition
Citations

WEB OF SCIENCE

3
Citations

SCOPUS

2

초록

Speaker recognition in noisy environments remains a challenging issue due to highly variable noise, which hinders convergence to an optimal solution. To address the information discrepancies caused by noise variability during the training process, we explore a multi-modal learning scheme by treating different types of noise as distinct modalities. We propose a multi-noise representation learning method to extract embeddings that encode discriminative characteristics for each noise type, along with integrated commonalities from various types of noise. Specifically, the multi-noise learning network is jointly trained with an embedding extractor to continuously incorporate refined features under noisy conditions into the speaker embeddings. Experiments on VoxCeleb1 demonstrate that the proposed method is effective when used in conjunction with embedding extractors, outperforming state-of-the-art methods in noisy conditions.

키워드

NoiseNoise measurementFeature extractionTrainingData miningSpeaker recognitionDistortionNoise robustnessCorrelationRepresentation learningNoisy environmentrepresentation learningspeaker embeddingspeaker recognition
제목
Multi-Noise Representation Learning for Robust Speaker Recognition
저자
Cho, SunyoungWee, Kyungchul
DOI
10.1109/LSP.2025.3530879
발행일
2025-01
유형
Article
저널명
IEEE Signal Processing Letters
32
페이지
681 ~ 685