“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-cost Hardware,” a Presentation from Useful Sensors

0 likes19 views

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2025/07/voice-interfaces-on-a-budget-building-real-time-speech-recognition-on-low-cost-hardware-a-presentation-from-useful-sensors/ Pete Warden, CEO of Useful Sensors, presents the “Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-cost Hardware” tutorial at the May 2025 Embedded Vision Summit. In this talk, Warden presents Moonshine, a speech-to-text model that outperforms OpenAI’s Whisper by a factor of five in terms of speed. Leveraging this efficiency, he shows how to build a voice interface on a low-cost, resource-constrained Cortex-A SoC using open-source tools. He also covers how to use voice activity detection as a first step before running speech-to-text to avoid false positives on noise that isn’t speech. In addition, he demonstrates how to use Python to control speech recognition and take actions based on recognized words. The Moonshine model’s compact size (as small as 26 MB) and high accuracy (<5% word error rate) make it ideal for embedded applications. All code and documentation are available online, allowing you to replicate the project. This presentation showcases the potential for voice-enabled interfaces on affordable hardware, enabling a wide range of innovative applications.

Technology

Voice Interfaces on a Budget:
Building Real-Time Speech
Recognition on Low-Cost
Hardware
Pete Warden
CEO
Useful Sensors

What Is This Talk About?
• Voice interfaces in the past:
• Cost $$$
• Only available to big tech co’s
• Required specialists
• Took years to build
• Voice interfaces now:
• Open source
• Available to everyone
• Usable by any software engineer
© 2025 Useful Sensors Inc 2

What You’ll Leave With
• How to build a simple voice app
• Running on low-cost hardware
(Raspberry Pi)
• Without a cloud API or network
connectivity
• Using open-source, freely available
software and models
© 2025 Useful Sensors Inc 3

Does Anyone Use Voice Interfaces?
• Siri, Alexa, Ok Google?
• No, except for timers and baby shark
• However, a thought experiment:
• You’re next to your significant other
on a couch
• Do you text them to decide what
show to watch?
• So, we do like voice interfaces, but
the current ones aren’t good enough
© 2025 Useful Sensors Inc 4

What Do You Need to Start?
• Cortex A CPU or equivalent
• MCUs soon, hopefully
• Open source frameworks
• Open weights speech recognition
models
© 2025 Useful Sensors Inc 5

Speech Recognition Models
• OpenAI’s Whisper
• First production-quality open weights
ASR model
• Smallest version is 40 million
parameters
• Can run on APUs, but hard to get
real-time
• Always processes 30 seconds of audio
at once, very wasteful for interactive
use cases
• Useful Sensors’ Moonshine
• Open weights
• Achieves same accuracy as Whisper
for tiny and base models
• Smallest version is 26 million
parameters
• Able to run well on most modern
APUs
• Flexible input window, so you only
compute what you need
© 2025 Useful Sensors Inc 6

Moonshine Options
• Tiny version:
• 26 million parameters
• Word error rate of 4.51%
• Base version:
• 52 million parameters
• Word error rate of 3.29%
• Tutorial uses Tiny
• Many frameworks supported:
• PyTorch
• Keras
• TensorFlow
• ONNX
• We’re using ONNX
• Quantized versions available:
• 26 MB / 56 MB file sizes
• 1.6x faster than float
© 2025 Useful Sensors Inc 7

Now I’ll show you how to run an
interactive speech application using
Moonshine on a Raspberry Pi 5.
Based on material from my Stanford
EE292D Edge AI course:
https://github.com/ee292d/labs/tree
/main/lab4
You’ll need a Pi 5, some way to
connect to it, and a USB microphone.
© 2025 Useful Sensors Inc
Tutorial
8

Live Coding Demo
© 2025 Useful Sensors Inc 9

Next Steps
• How can you take action based on
speech?
• Plain old string matching can work for
simple uses
• Recognizing a natural speaking style
needs speech to intent
• Still a research problem
• https://github.com/AIWintermuteAI
/Speech-to-Intent-Micro
• What about text to speech?
• Speakers are a cheaper alternative to
displays
• PiperTTS is very efficient, runs on Pi’s,
and sounds good
• Hyper-realistic models are emerging,
but they use a lot of resources, won’t
work on a Pi (yet)
© 2025 Useful Sensors Inc 10

Conclusion
• It’s never been easier to build a
voice-driven product
• It’s still early days for voice, don’t
write it off because Siri isn’t popular
• You’ve got this!
© 2025 Useful Sensors Inc 11

Whisper:
https://openai.com/index/whisper/
Moonshine:
https://github.com/usefulsensors/mo
onshine
PiperTTS: https://piper.ttstool.com/
Speech-to-intent Micro:
https://github.com/AIWintermuteAI/S
peech-to-Intent-Micro
EE292D Tutorial:
https://github.com/ee292d/labs/tree/
main/lab4
Me: pete@usefulsensors.com
© 2025 Useful Sensors Inc
Resources
12

More Related Content

Similar to “Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-cost Hardware,” a Presentation from Useful Sensors (20)

PDF

PHP, Cloud And Microsoft Symfony Live 2010guest5a7126

PDF

IoT is Something to Figure OutPeter Hoddie

PDF

Raspberry pi overview Matthew Karas

PDF

Embedded Systems: Lecture 8: The Raspberry Pi as a Linux BoxAhmed El-Arabawy

PDF

The latest tools for developing your IBM i systemsProximity Group

PDF

Language Matters: JavaScript  from IoT Product Concept  to ProductionKinoma

PPTX

Neev Open Source ContributionsNeev Technologies

PDF

Contributing to Open SourceAmol A. Sale

PDF

Vimeo and Open Source (SMPTE Forum 2015)Derek Buitenhuis

PPTX

Live Panel: Appium Core Committers Answer Your Questions Sauce Labs

KEY

Future of MobileBrian LeRoux

PPTX

What is cool with Domino V10, Proton and Node.JS, and why would I use it in ...Heiko Voigt

PPT

NodeConf EU 2015 Keynote ibmwebspheresoftware

PDF

Web APIs: The future of softwareReuven Lerner

PPTX

Global Azure2021 Verona.pptxLuis Beltran

PDF

Voice Applications with AdhearsionMojo Lingo

PDF

HTML5 or Android for Mobile Development?Reto Meier

KEY

Development of a mobile app for AndroidAlexJReid

PDF

Citizen Developer Tools - session at SPS New England 10/20/2018Antti Koskela

PDF

Video + Konferecja Polska 2014. Sześć najważniejszych koncepcji związanych z ...TrueConf

PHP, Cloud And Microsoft Symfony Live 2010guest5a7126

IoT is Something to Figure OutPeter Hoddie

Raspberry pi overview Matthew Karas

Embedded Systems: Lecture 8: The Raspberry Pi as a Linux BoxAhmed El-Arabawy

The latest tools for developing your IBM i systemsProximity Group

Language Matters: JavaScript  from IoT Product Concept  to ProductionKinoma

Neev Open Source ContributionsNeev Technologies

Contributing to Open SourceAmol A. Sale

Vimeo and Open Source (SMPTE Forum 2015)Derek Buitenhuis

Live Panel: Appium Core Committers Answer Your Questions Sauce Labs

Future of MobileBrian LeRoux

What is cool with Domino V10, Proton and Node.JS, and why would I use it in ...Heiko Voigt

NodeConf EU 2015 Keynote ibmwebspheresoftware

Web APIs: The future of softwareReuven Lerner

Global Azure2021 Verona.pptxLuis Beltran

Voice Applications with AdhearsionMojo Lingo

HTML5 or Android for Mobile Development?Reto Meier

Development of a mobile app for AndroidAlexJReid

Citizen Developer Tools - session at SPS New England 10/20/2018Antti Koskela

Video + Konferecja Polska 2014. Sześć najważniejszych koncepcji związanych z ...TrueConf

More from Edge AI and Vision Alliance (20)

PDF

“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...Edge AI and Vision Alliance

PDF

“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...Edge AI and Vision Alliance

PDF

“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...Edge AI and Vision Alliance

PDF

“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...Edge AI and Vision Alliance

PDF

“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...Edge AI and Vision Alliance

PDF

“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...Edge AI and Vision Alliance

PDF

“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...Edge AI and Vision Alliance

PDF

“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...Edge AI and Vision Alliance

PDF

“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...Edge AI and Vision Alliance

PDF

“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...Edge AI and Vision Alliance

PDF

“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochipsEdge AI and Vision Alliance

PDF

“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...Edge AI and Vision Alliance

PDF

“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...Edge AI and Vision Alliance

PDF

“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...Edge AI and Vision Alliance

PDF

“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...Edge AI and Vision Alliance

PDF

“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...Edge AI and Vision Alliance

PDF

“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...Edge AI and Vision Alliance

PDF

“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...Edge AI and Vision Alliance

PDF

“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...Edge AI and Vision Alliance

PDF

“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...Edge AI and Vision Alliance