Abstract
We propose a method for generating low-frequency compensated synthetic impulse responses that improve the performance of far-field speech recognition systems trained on artificially augmented datasets. We design linear-phase filters that adapt the simulated impulse responses to equalization distributions corresponding to real-world captured impulse responses. Our filtered synthetic impulse responses are then used to augment clean speech data from LibriSpeech dataset [1]. We evaluate the performance of our method on the real-world LibriSpeech test set. In practice, our low-frequency compensated synthetic dataset can reduce the word-error-rate by up to 8.8% for far-field speech recognition.
Paper
Low-frequency Compensated Synthetic Impulse Responses for Improved Far-field Speech Recognition, ICASSP 2020.
Zhenyu Tang, Hsien-Yu Meng, and Dinesh Manocha
@inproceedings{9054454,
author={Z. {Tang} and H. {Meng} and D. {Manocha}},
booktitle={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Low-Frequency Compensated Synthetic Impulse Responses For Improved Far-Field Speech Recognition},
year={2020},
volume={},
number={},
pages={6974-6978},
}