Multi Learning on Discriminative Embedding Vector and Masking for Cocktail Party Effect
Tóm tắt
Nowadays, the incorporation of cutting-edge deep learning techniques into speech processing is regarded as groundbreaking, exerting a significant influence on various domains such as speech recognition, speech separation, audio-visual content creation, telecommunication, and hearing aid technologies. This study delves into the exploration of both deep learning models and learning methods for speech separation. Two distinct approaches are considered as the first involves end-to-end networks that directly estimate masks or utterances. In contrast, the second employs deep clustering, a time-frequency-based voice separation framework. Deep clustering, functioning as a deep embedding approach, has demonstrated remarkable performance by training embedding vectors during learning and isolating them during inference. The end-to-end networks capitalize on a direct approximation of utterances or masks …