Automatic Speech Processing


Artificial reverberation has been added to anechoic speech data to train more robust machine learning models for automatic speech processing. We are developing methods for automatic speech recognition, source separation and localization, binaural audio generation, and speech emotion recognition.


  • pygsound: pygsound is a python package for impulse response generation based on state-of-the-art geometric sound propagation engine. The simulation is implemented with C++ and uses pybind11 for python APIs.



Project Conference/Journal Year
Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models Odyssey 2022
FAST-RIR: Fast neural diffuse room impulse response generator ICASSP 2022
Binaural Audio Generating via Multi-Task Learning SIGGRAPH Asia 2021
Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation ICMI 2021
Scene-aware Far-field Automatic Speech Recognition arXiv 2021
Improving Reverberant Speech Separation with Multi-stage Training and Curriculum Learning IEEE ASRU 2021
TS-RIR: Translated synthetic room impulse responses for speech augmentation IEEE ASRU 2021
IR-GAN: Room Impulse Response Generator for Speech Augmentation Interspeech 2021
Low-frequency Compensated Synthetic Impulse Responses for Improved Far-field Speech Recognition ICASSP 2020
Improving Reverberant Speech Training Using Diffuse Acoustic Simulation ICASSP 2020
Diffraction-Aware Sound Localization for a Non-Line-of-Sight Source ICRA 2019
Reflection-Aware Sound Source Localization ICRA 2019
Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks Interspeech 2019
Receiver placement for speech enhancement using sound propagation optimization Applied Acoustics 2019