Scopus

Multi Learning on Discriminative Embedding Vector and Masking for Cocktail Party Effect

Năm XB 2026 Tạp chí / Hội thảo IEEE Access Volume 14 Đơn vị ICTU DOI / Link https://doi.org/10.1109/ACCESS.2026.3681807 ↗

Tác giả

Trung-Nghia Phung ^✉ ; Duc-Quang Vu ^✉ ; Duyen Nguyen Thi ^✉

Tóm tắt

Nowadays, the incorporation of cutting-edge deep learning techniques into speech processing is regarded as groundbreaking, exerting a significant influence on various domains such as speech recognition, speech separation, audio-visual content creation, telecommunication, and hearing aid technologies. This study delves into the exploration of both deep learning models and learning methods for speech separation. Two distinct approaches are considered as the first involves end-to-end networks that directly estimate masks or utterances. In contrast, the second employs deep clustering, a time-frequency-based voice separation framework. Deep clustering, functioning as a deep embedding approach, has demonstrated remarkable performance by training embedding vectors during learning and isolating them during inference. The end-to-end networks capitalize on a direct approximation of utterances or masks …

← Quay lại danh sách bài báo