For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2025/07/voice-interfaces-on-a-budget-building-real-time-speech-recognition-on-low-cost-hardware-a-presentation-from-useful-sensors/
Pete Warden, CEO of Useful Sensors, presents the “Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-cost Hardware” tutorial at the May 2025 Embedded Vision Summit.
In this talk, Warden presents Moonshine, a speech-to-text model that outperforms OpenAI’s Whisper by a factor of five in terms of speed. Leveraging this efficiency, he shows how to build a voice interface on a low-cost, resource-constrained Cortex-A SoC using open-source tools. He also covers how to use voice activity detection as a first step before running speech-to-text to avoid false positives on noise that isn’t speech. In addition, he demonstrates how to use Python to control speech recognition and take actions based on recognized words.
The Moonshine model’s compact size (as small as 26 MB) and high accuracy (<5% word error rate) make it ideal for embedded applications. All code and documentation are available online, allowing you to replicate the project. This presentation showcases the potential for voice-enabled interfaces on affordable hardware, enabling a wide range of innovative applications.