location: Current position: Home >> Scientific Research >> Paper Publications

Distributed multiple speaker tracking based on time delay estimation in microphone array network

Hits:

Indexed by:期刊论文

Date of Publication:2020-12-01

Journal:IET SIGNAL PROCESSING

Volume:14

Issue:9

Page Number:591-601

ISSN No.:1751-9675

Key Words:delay estimation; reverberation; microphone arrays; speaker recognition; Kalman filters; sensor fusion; distributed microphone array network; DMA; multiple speaker scenarios; ambiguous observation; noisy environments; distributed multiple speaker tracking method; time delay estimation strategy; reliable time delays; distributed Kalman filter framework

Abstract:Multiple speaker tracking in distributed microphone array (DMA) network is a challenging task. A critical issue for multiple speaker scenarios is to distinguish the ambiguous observation and associate it to the corresponding speaker, especially under reverberant and noisy environments. To address the problem, a distributed multiple speaker tracking method based on time delay estimation in DMA is proposed in this study. Specifically, the time delay estimated by the generalised cross-correlation function is treated as an observation. In order to distinguish the observation for each speaker, the possible time delays, refer to as candidates, are extracted based on data association technique. Considering the ambient influence, a time delay estimation strategy is designed to calculate the time delay for each speaker from the candidates. Finally, only the reliable time delays in DMA are propagated throughout the whole network by diffusion fusion algorithm and used for updating the speakers' state within the distributed Kalman filter framework. The proposed approach can track multiple speakers successfully in a non-centralised manner under reverberant and noisy environments. Simulation results indicate that, compared with other methods, the proposed method can achieve a smaller root mean square error for multiple speaker tracking, especially in adverse conditions.

Pre One:Speech enhancement based on simple recurrent unit network

Next One:An improved H-infinity unscented FastSLAM with adaptive genetic resampling