Automatic Speech Processing


Overview

Artificial reverberation has been added to anechoic speech data to train more robust machine learning models for automatic speech processing. We are developing methods for automatic speech recognition, source separation and localization, binaural audio generation, and speech emotion recognition.


Software

  • pygsound: pygsound is a python package for impulse response generation based on state-of-the-art geometric sound propagation engine. The simulation is implemented with C++ and uses pybind11 for python APIs.


Datasets


Publications

Project Conference/Journal Year
FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning ICASSP 2024
Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition ICASSP 2024
AdVerb: Visually Guided Audio Dereverberation ICCV 2023
MMER: Multimodal Multi-task Learning for Speech Emotion Recognition INTERSPEECH 2023
UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation ICASSP (SASB Workshop) 2023
MAST: Multiscale Audio Spectrogram Transformers ICASSP 2023
SLICER: Learning universal audio representations using low-resource self-supervised pre-training ICASSP 2023
Towards Improved Room Impulse Response Estimation for Speech Recognition ICASSP 2023
Multimodal Emotion Recognition using Transfer Learning from Speaker Recognition and BERT-based models Odyssey 2022
FAST-RIR: Fast neural diffuse room impulse response generator ICASSP 2022
Improved Speech Emotion Recognition using Transfer Learning and Spectrogram Augmentation ICMI 2021
Scene-aware Far-field Automatic Speech Recognition arXiv 2021
Improving Reverberant Speech Separation with Multi-stage Training and Curriculum Learning IEEE ASRU 2021
TS-RIR: Translated synthetic room impulse responses for speech augmentation IEEE ASRU 2021
IR-GAN: Room Impulse Response Generator for Speech Augmentation Interspeech 2021
Low-frequency Compensated Synthetic Impulse Responses for Improved Far-field Speech Recognition ICASSP 2020
Improving Reverberant Speech Training Using Diffuse Acoustic Simulation ICASSP 2020
Diffraction-Aware Sound Localization for a Non-Line-of-Sight Source ICRA 2019
Reflection-Aware Sound Source Localization ICRA 2019
Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks Interspeech 2019
Receiver placement for speech enhancement using sound propagation optimization Applied Acoustics 2019