Machine learning-based efficient audio production separation method

Zhang, Wenzhu; Kim, Byung-Gyu

doi:10.1007/s13042-025-02542-y

상세 보기

Machine learning-based efficient audio production separation method

Zhang, Wenzhu;
Kim, Byung-Gyu

Citations

WEB OF SCIENCE

0

Citations

SCOPUS

0

초록

Audio production separation, extracting individual sound sources from a mixture signal, has numerous applications in audio processing, audio remixing, and hearing aids. However, most existing methods only utilize spectral information and neglect spatial cues available in multi-microphone setups, limiting their performance. This paper proposes a novel audio production separation algorithm combining hyperdirectional beamforming and a long short-term memory (LSTM) network to exploit spatial and spectral information for efficient multi-speaker audio production separation. The hyperdirectional beamforming enhances target audio signals from desired directions while suppressing interference. The enhanced signals are processed by an LSTM network that predicts time-frequency masks for separating individual sources using a multi-task learning objective. Extensive experiments on simulated and real-world datasets demonstrate the superiority of the proposed algorithm over benchmark algorithms in terms of objective metrics across various acoustic conditions. Subjective listening tests with human participants further validate the proposed algorithm's improved perceptual quality and intelligibility. An ablation study highlights the importance of both hyperdirectional beamforming and LSTM components, as well as their synergistic effect. The proposed algorithm offers a practical approach for exploiting spatial and spectral information in multi-speaker audio production separation, with potential applications in teleconferencing, hearing aids, and audio signal processing.

키워드

Audio production separation; Hyperdirectional beamforming; Long short-term memory network; Short-time fourier transform

제목: Machine learning-based efficient audio production separation method

저자: Zhang, Wenzhu; Kim, Byung-Gyu

DOI: 10.1007/s13042-025-02542-y

발행일: 2025-11

유형: Article; Early Access

저널명: International Journal of Machine Learning and Cybernetics

권: 16

호: 11

페이지: 8603 ~ 8616

ScholarWorks@숙명여자대학교

상세 보기

초록

키워드