상세 보기
- Cho, Sunyoung;
- Wee, Kyungchul
WEB OF SCIENCE
3SCOPUS
2초록
Speaker recognition in noisy environments remains a challenging issue due to highly variable noise, which hinders convergence to an optimal solution. To address the information discrepancies caused by noise variability during the training process, we explore a multi-modal learning scheme by treating different types of noise as distinct modalities. We propose a multi-noise representation learning method to extract embeddings that encode discriminative characteristics for each noise type, along with integrated commonalities from various types of noise. Specifically, the multi-noise learning network is jointly trained with an embedding extractor to continuously incorporate refined features under noisy conditions into the speaker embeddings. Experiments on VoxCeleb1 demonstrate that the proposed method is effective when used in conjunction with embedding extractors, outperforming state-of-the-art methods in noisy conditions.
키워드
- 제목
- Multi-Noise Representation Learning for Robust Speaker Recognition
- 저자
- Cho, Sunyoung; Wee, Kyungchul
- 발행일
- 2025-01
- 유형
- Article
- 권
- 32
- 페이지
- 681 ~ 685