Speech Recognition and Localization


Overview

Artificial reverberation has been added to anechoic speech data for training more robust speech related machine learning models. We propose to use a novel geometric acoustic simulation method to model real-world speech more accurately and significantly reduce the performance gap between training on synthetic data and testing on real-world data. We have observed benefits in tasks including automated speech recognition (ASR), wake-up-word (WUW) spotting, and speech localization.


Software

  • pygsound: pygsound is a python package for impulse response generation based on state-of-the-art geometric sound propagation engine. The simulation is implemented with C++ and uses pybind11 for python APIs.


Datasets


Publications

Project Conference/Journal Year
FAST-RIR: Fast neural diffuse room impulse response generator arXiv 2021
Scene-aware Far-field Automatic Speech Recognition arXiv 2021
Improving Reverberant Speech Separation with Multi-stage Training and Curriculum Learning IEEE ASRU 2021
TS-RIR: Translated synthetic room impulse responses for speech augmentation IEEE ASRU 2021
IR-GAN: Room Impulse Response Generator for Speech Augmentation Interspeech 2021
Low-frequency Compensated Synthetic Impulse Responses for Improved Far-field Speech Recognition ICASSP 2020
Improving Reverberant Speech Training Using Diffuse Acoustic Simulation ICASSP 2020
Diffraction-Aware Sound Localization for a Non-Line-of-Sight Source ICRA 2019
Reflection-Aware Sound Source Localization ICRA 2019
Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks Interspeech 2019
Receiver placement for speech enhancement using sound propagation optimization Applied Acoustics 2019