Improved hybrid streaming ASR with transformer language models

P Baquero-Arnal, J Jorge-Cano, A Giménez Pastor… - 2020 - riunet.upv.es
2020riunet.upv.es
[EN] Streaming ASR is gaining momentum due to its wide applicability, though it is still
unclear how best to come close to the accuracy of state-of-the-art off-line ASR systems when
the output must come within a short delay after the incoming audio stream. Following our
previous work on streaming one-pass decoding with hybrid ASR systems and LSTM
language models, in this work we report further improvements by replacing LSTMs with
Transformer models. First, two key ideas are discussed so as to run these models fast during …
[EN] Streaming ASR is gaining momentum due to its wide applicability, though it is still unclear how best to come close to the accuracy of state-of-the-art off-line ASR systems when the output must come within a short delay after the incoming audio stream. Following our previous work on streaming one-pass decoding with hybrid ASR systems and LSTM language models, in this work we report further improvements by replacing LSTMs with Transformer models. First, two key ideas are discussed so as to run these models fast during inference. Then, empirical results on LibriSpeech and TED-LIUM are provided showing that Transformer language models lead to improved recognition rates on both tasks. ASR systems obtained in this work can be seamlessly transfered to a streaming setup with minimal quality losses. Indeed, to the best of our knowledge, no better results have been reported on these tasks when assessed under a streaming setup.
riunet.upv.es
Showing the best result for this search. See all results