location: Current position: Home >> Scientific Research >> Paper Publications

Audio Source Separation from a Monaural Mixture Using Convolutional Neural Network in the Time Domain

Hits:

Indexed by:会议论文

Date of Publication:2017-01-01

Included Journals:EI、CPCI-S

Volume:10262

Page Number:388-395

Key Words:Monaural source separation; Convolutional neural network; Deep learning

Abstract:Audio source separation from a monaural mixture, which is termed as monaural source separation, is an important and challenging problem for applications. In this paper, a monaural source separation method using convolutional neural network in the time domain is proposed. The proposed neural network, input and output of which are both time-domain signals, consists of three convolutional layers, each of which is followed by a max-pooling layer, and two fully-connected layers. There are two key ideas behind the time-domain convolutional network: one is learning features automatically by the convolutional layers instead of extracting features such as spectra; the other is that the phase can be recovered automatically since both the input and output are in the time domain. The proposed approach is evaluated using the TSP speech corpus for monaural source separation, and achieves around 4.31-7.77 SIR gain with respect to the deep neural network, the recurrent neural network and nonnegative matrix factorization, while maintaining better SDR and SAR.

Pre One:A multi-branch hand pose estimation network with joint-wise feature extraction and fusion

Next One:Phase Constraint and Deep Neural Network for Speech Separation