SlideShare a Scribd company logo
Voice Interfaces on a Budget:
Building Real-Time Speech
Recognition on Low-Cost
Hardware
Pete Warden
CEO
Useful Sensors
What Is This Talk About?
• Voice interfaces in the past:
• Cost $$$
• Only available to big tech co’s
• Required specialists
• Took years to build
• Voice interfaces now:
• Open source
• Available to everyone
• Usable by any software engineer
© 2025 Useful Sensors Inc 2
What You’ll Leave With
• How to build a simple voice app
• Running on low-cost hardware
(Raspberry Pi)
• Without a cloud API or network
connectivity
• Using open-source, freely available
software and models
© 2025 Useful Sensors Inc 3
Does Anyone Use Voice Interfaces?
• Siri, Alexa, Ok Google?
• No, except for timers and baby shark
• However, a thought experiment:
• You’re next to your significant other
on a couch
• Do you text them to decide what
show to watch?
• So, we do like voice interfaces, but
the current ones aren’t good enough
© 2025 Useful Sensors Inc 4
What Do You Need to Start?
• Cortex A CPU or equivalent
• MCUs soon, hopefully
• Open source frameworks
• Open weights speech recognition
models
© 2025 Useful Sensors Inc 5
Speech Recognition Models
• OpenAI’s Whisper
• First production-quality open weights
ASR model
• Smallest version is 40 million
parameters
• Can run on APUs, but hard to get
real-time
• Always processes 30 seconds of audio
at once, very wasteful for interactive
use cases
• Useful Sensors’ Moonshine
• Open weights
• Achieves same accuracy as Whisper
for tiny and base models
• Smallest version is 26 million
parameters
• Able to run well on most modern
APUs
• Flexible input window, so you only
compute what you need
© 2025 Useful Sensors Inc 6
Moonshine Options
• Tiny version:
• 26 million parameters
• Word error rate of 4.51%
• Base version:
• 52 million parameters
• Word error rate of 3.29%
• Tutorial uses Tiny
• Many frameworks supported:
• PyTorch
• Keras
• TensorFlow
• ONNX
• We’re using ONNX
• Quantized versions available:
• 26 MB / 56 MB file sizes
• 1.6x faster than float
© 2025 Useful Sensors Inc 7
Now I’ll show you how to run an
interactive speech application using
Moonshine on a Raspberry Pi 5.
Based on material from my Stanford
EE292D Edge AI course:
https://github.com/ee292d/labs/tree
/main/lab4
You’ll need a Pi 5, some way to
connect to it, and a USB microphone.
© 2025 Useful Sensors Inc
Tutorial
8
Live Coding Demo
© 2025 Useful Sensors Inc 9
Next Steps
• How can you take action based on
speech?
• Plain old string matching can work for
simple uses
• Recognizing a natural speaking style
needs speech to intent
• Still a research problem
• https://github.com/AIWintermuteAI
/Speech-to-Intent-Micro
• What about text to speech?
• Speakers are a cheaper alternative to
displays
• PiperTTS is very efficient, runs on Pi’s,
and sounds good
• Hyper-realistic models are emerging,
but they use a lot of resources, won’t
work on a Pi (yet)
© 2025 Useful Sensors Inc 10
Conclusion
• It’s never been easier to build a
voice-driven product
• It’s still early days for voice, don’t
write it off because Siri isn’t popular
• You’ve got this!
© 2025 Useful Sensors Inc 11
Whisper:
https://openai.com/index/whisper/
Moonshine:
https://github.com/usefulsensors/mo
onshine
PiperTTS: https://piper.ttstool.com/
Speech-to-intent Micro:
https://github.com/AIWintermuteAI/S
peech-to-Intent-Micro
EE292D Tutorial:
https://github.com/ee292d/labs/tree/
main/lab4
Me: pete@usefulsensors.com
© 2025 Useful Sensors Inc
Resources
12

More Related Content

Similar to “Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-cost Hardware,” a Presentation from Useful Sensors (20)

PDF
PHP, Cloud And Microsoft Symfony Live 2010
guest5a7126
 
PDF
IoT is Something to Figure Out
Peter Hoddie
 
PDF
Raspberry pi overview
Matthew Karas
 
PDF
Embedded Systems: Lecture 8: The Raspberry Pi as a Linux Box
Ahmed El-Arabawy
 
PDF
The latest tools for developing your IBM i systems
Proximity Group
 
PDF
Language Matters: JavaScript 
from IoT Product Concept 
to Production
Kinoma
 
PPTX
Neev Open Source Contributions
Neev Technologies
 
PDF
Contributing to Open Source
Amol A. Sale
 
PDF
Vimeo and Open Source (SMPTE Forum 2015)
Derek Buitenhuis
 
PPTX
Live Panel: Appium Core Committers Answer Your Questions
Sauce Labs
 
KEY
Future of Mobile
Brian LeRoux
 
PPTX
What is cool with Domino V10, Proton and Node.JS, and why would I use it in ...
Heiko Voigt
 
PPT
NodeConf EU 2015 Keynote
ibmwebspheresoftware
 
PDF
Web APIs: The future of software
Reuven Lerner
 
PPTX
Global Azure2021 Verona.pptx
Luis Beltran
 
PDF
Voice Applications with Adhearsion
Mojo Lingo
 
PDF
HTML5 or Android for Mobile Development?
Reto Meier
 
KEY
Development of a mobile app for Android
AlexJReid
 
PDF
Citizen Developer Tools - session at SPS New England 10/20/2018
Antti Koskela
 
PDF
Video + Konferecja Polska 2014. Sześć najważniejszych koncepcji związanych z ...
TrueConf
 
PHP, Cloud And Microsoft Symfony Live 2010
guest5a7126
 
IoT is Something to Figure Out
Peter Hoddie
 
Raspberry pi overview
Matthew Karas
 
Embedded Systems: Lecture 8: The Raspberry Pi as a Linux Box
Ahmed El-Arabawy
 
The latest tools for developing your IBM i systems
Proximity Group
 
Language Matters: JavaScript 
from IoT Product Concept 
to Production
Kinoma
 
Neev Open Source Contributions
Neev Technologies
 
Contributing to Open Source
Amol A. Sale
 
Vimeo and Open Source (SMPTE Forum 2015)
Derek Buitenhuis
 
Live Panel: Appium Core Committers Answer Your Questions
Sauce Labs
 
Future of Mobile
Brian LeRoux
 
What is cool with Domino V10, Proton and Node.JS, and why would I use it in ...
Heiko Voigt
 
NodeConf EU 2015 Keynote
ibmwebspheresoftware
 
Web APIs: The future of software
Reuven Lerner
 
Global Azure2021 Verona.pptx
Luis Beltran
 
Voice Applications with Adhearsion
Mojo Lingo
 
HTML5 or Android for Mobile Development?
Reto Meier
 
Development of a mobile app for Android
AlexJReid
 
Citizen Developer Tools - session at SPS New England 10/20/2018
Antti Koskela
 
Video + Konferecja Polska 2014. Sześć najważniejszych koncepcji związanych z ...
TrueConf
 

More from Edge AI and Vision Alliance (20)

PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
PDF
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
Edge AI and Vision Alliance
 
PDF
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
Edge AI and Vision Alliance
 
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
Edge AI and Vision Alliance
 
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
Edge AI and Vision Alliance
 
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
Edge AI and Vision Alliance
 
PDF
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
Edge AI and Vision Alliance
 
PDF
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
Edge AI and Vision Alliance
 
PDF
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
PDF
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
Edge AI and Vision Alliance
 
PDF
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
PDF
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
PDF
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
PDF
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
Edge AI and Vision Alliance
 
PDF
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
Edge AI and Vision Alliance
 
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
Edge AI and Vision Alliance
 
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
Edge AI and Vision Alliance
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
Edge AI and Vision Alliance
 
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
Edge AI and Vision Alliance
 
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
Edge AI and Vision Alliance
 
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
Edge AI and Vision Alliance
 
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
Edge AI and Vision Alliance
 
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
Edge AI and Vision Alliance
 
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
Ad

Recently uploaded (20)

PPTX
Talbott's brief History of Computers for CollabDays Hamburg 2025
Talbott Crowell
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pdf
ghjghvhjgc
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
Linux schedulers for fun and profit with SchedKit
Alessio Biancalana
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PPTX
Wondershare Filmora Crack Free Download 2025
josanj305
 
PDF
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
Evolution: How True AI is Redefining Safety in Industry 4.0
vikaassingh4433
 
PPTX
Essential Content-centric Plugins for your Website
Laura Byrne
 
PPTX
Role_of_Artificial_Intelligence_in_Livestock_Extension_Services.pptx
DrRajdeepMadavi
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PPTX
Manual Testing for Accessibility Enhancement
Julia Undeutsch
 
PDF
NASA A Researcher’s Guide to International Space Station : Earth Observations
Dr. PANKAJ DHUSSA
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Talbott's brief History of Computers for CollabDays Hamburg 2025
Talbott Crowell
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pdf
ghjghvhjgc
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Linux schedulers for fun and profit with SchedKit
Alessio Biancalana
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Wondershare Filmora Crack Free Download 2025
josanj305
 
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Evolution: How True AI is Redefining Safety in Industry 4.0
vikaassingh4433
 
Essential Content-centric Plugins for your Website
Laura Byrne
 
Role_of_Artificial_Intelligence_in_Livestock_Extension_Services.pptx
DrRajdeepMadavi
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Manual Testing for Accessibility Enhancement
Julia Undeutsch
 
NASA A Researcher’s Guide to International Space Station : Earth Observations
Dr. PANKAJ DHUSSA
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Ad

“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-cost Hardware,” a Presentation from Useful Sensors

  • 1. Voice Interfaces on a Budget: Building Real-Time Speech Recognition on Low-Cost Hardware Pete Warden CEO Useful Sensors
  • 2. What Is This Talk About? • Voice interfaces in the past: • Cost $$$ • Only available to big tech co’s • Required specialists • Took years to build • Voice interfaces now: • Open source • Available to everyone • Usable by any software engineer © 2025 Useful Sensors Inc 2
  • 3. What You’ll Leave With • How to build a simple voice app • Running on low-cost hardware (Raspberry Pi) • Without a cloud API or network connectivity • Using open-source, freely available software and models © 2025 Useful Sensors Inc 3
  • 4. Does Anyone Use Voice Interfaces? • Siri, Alexa, Ok Google? • No, except for timers and baby shark • However, a thought experiment: • You’re next to your significant other on a couch • Do you text them to decide what show to watch? • So, we do like voice interfaces, but the current ones aren’t good enough © 2025 Useful Sensors Inc 4
  • 5. What Do You Need to Start? • Cortex A CPU or equivalent • MCUs soon, hopefully • Open source frameworks • Open weights speech recognition models © 2025 Useful Sensors Inc 5
  • 6. Speech Recognition Models • OpenAI’s Whisper • First production-quality open weights ASR model • Smallest version is 40 million parameters • Can run on APUs, but hard to get real-time • Always processes 30 seconds of audio at once, very wasteful for interactive use cases • Useful Sensors’ Moonshine • Open weights • Achieves same accuracy as Whisper for tiny and base models • Smallest version is 26 million parameters • Able to run well on most modern APUs • Flexible input window, so you only compute what you need © 2025 Useful Sensors Inc 6
  • 7. Moonshine Options • Tiny version: • 26 million parameters • Word error rate of 4.51% • Base version: • 52 million parameters • Word error rate of 3.29% • Tutorial uses Tiny • Many frameworks supported: • PyTorch • Keras • TensorFlow • ONNX • We’re using ONNX • Quantized versions available: • 26 MB / 56 MB file sizes • 1.6x faster than float © 2025 Useful Sensors Inc 7
  • 8. Now I’ll show you how to run an interactive speech application using Moonshine on a Raspberry Pi 5. Based on material from my Stanford EE292D Edge AI course: https://github.com/ee292d/labs/tree /main/lab4 You’ll need a Pi 5, some way to connect to it, and a USB microphone. © 2025 Useful Sensors Inc Tutorial 8
  • 9. Live Coding Demo © 2025 Useful Sensors Inc 9
  • 10. Next Steps • How can you take action based on speech? • Plain old string matching can work for simple uses • Recognizing a natural speaking style needs speech to intent • Still a research problem • https://github.com/AIWintermuteAI /Speech-to-Intent-Micro • What about text to speech? • Speakers are a cheaper alternative to displays • PiperTTS is very efficient, runs on Pi’s, and sounds good • Hyper-realistic models are emerging, but they use a lot of resources, won’t work on a Pi (yet) © 2025 Useful Sensors Inc 10
  • 11. Conclusion • It’s never been easier to build a voice-driven product • It’s still early days for voice, don’t write it off because Siri isn’t popular • You’ve got this! © 2025 Useful Sensors Inc 11