Scopus

Multi Learning on Discriminative Embedding Vector and Masking for Cocktail Party Effect

Tạp chí / Hội thảo: IEEE Access Đơn vị: ICTU DOI / Link:

Tác giả

Tác giả liên hệ

Tóm tắt

Nowadays, the incorporation of cutting-edge deep learning techniques into speech processing is regarded as groundbreaking, exerting a significant influence on various domains such as speech recognition, speech separation, audio-visual content creation, telecommunication, and hearing aid technologies. This study delves into the exploration of both deep learning models and learning methods for speech separation. Two distinct approaches are considered as the first involves end-to-end networks that directly estimate masks or utterances. In contrast, the second employs deep clustering, a time-frequency-based voice separation framework. Deep clustering, functioning as a deep embedding approach, has demonstrated remarkable performance by training embedding vectors during learning and isolating them during inference. The end-to-end networks capitalize on a direct approximation of utterances or masks …