Speech & Audio

Regularized Entropy Information Adaptation with Temporal-Awareness Networks for Simultaneous Speech Translation

Share

Author

Joseph Liu, Nameer Hirschkind, Xiao Yu, Mahesh Kumar Nandwana

Venue

Interspeech 2026

Abstract

Simultaneous Speech Translation (SimulST) requires balancing high translation quality with low latency. Recent work introduced REINA, a method that trains a Read/Write policy based on estimating the information gain of reading more audio. However, we find that information-based policies often lack temporal context, leading the policy to bias itself toward reading most of the audio before starting to write. We improve REINA using two distinct strategies: a supervised alignment network (REINA-SAN) and a timestep-augmented network (REINA-TAN). Our results demonstrate that while both methods significantly outperform the baseline and resolve stability issues, REINA-TAN provides a slightly superior Pareto frontier for streaming efficiency, whereas REINA-SAN offers more robustness against 'read loops'. Applied to Whisper, both methods improve the pareto frontier of streaming efficiency as measured by Normalized Streaming Efficiency (NoSE) scores up to 7.1% over existing competitive baselines.

Join us in shaping the future

View All Jobs

Latest

More results

Regularized Entropy Information Adaptation with Temporal-Awareness Networks for Simultaneous Speech Translation

Author

Venue

Abstract

Join us in shaping the future

Regularized Entropy Information Adaptation with Temporal-Awareness Networks for Simultaneous Speech Translation

Author

Venue

Abstract

Related Publications

Enhancing Multilingual Voice Toxicity Detection with Speech-Text Alignment

Diffusion Synthesizer for Efficient Multilingual Speech to Speech Translation

Voice Toxicity Detection Using Multi-task Learning

Join us in shaping the future