SlideShare a Scribd company logo
Running Accelerated CNNs on
Low-Power Microcontrollers
Using Arm Ethos-U55,
TensorFlow and Numpy
Kwabena W. Agyeman
President
OpenMV, LLC
• Maker of the OpenMV Cam
• A low-power computer vision platform
• Directly integrate into products
• Or licensable for being remixed
• What we do:
• Electrical and PCB design, manufacturing
• High-performance firmware programming
• Camera drivers, DMA, cache coherency, etc.
• SIMD computer vision algorithms, etc.
What is OpenMV?
© 2025 OpenMV, LLC 2
Over 100K
Sold &
Licensed
We make it easy to build a product
© 2025 OpenMV, LLC 3
Your application
provided
MicroPython Vector (SIMD)
accelerated vision
algorithms
& NPU drivers
Microcontroller support Camera sensors
Outline
• Market background – what’s happening with MCUs?
• Introduce the OpenMV Cam N6 and OpenMV AE3.
• Run ML workloads on microcontrollers using Numpy and TensorFlow.
• Multi-core low-power ML processing using MicroPython.
© 2025 OpenMV, LLC 4
New AI microcontrollers are here
• Before:
• 600 MHz M7 CPU
• ~1.2 INT8 GOPS ML performance
• ~1 MB RAM on chip
• ~1.2 GB/s bandwidth
• ~66 MBs FLASH access
• No MIPI CSI, ISP, NPU
Run 224x224 YOLOv5 Nano
at 0.4 FPS @ ~0.8 W
© 2025 OpenMV, LLC 5
• Now:
• 400 MHz M55 CPU
• ~204 INT8 GOPS ML performance
• ~13 MB RAM on chip
• ~3.2 GB/s bandwidth
• ~200 MBs FLASH access
• MIPI CSI, Helium-ISP, NPU
Runs 224x224 YOLOv5 Nano
at 28 FPS @ ~0.25 W
> 200x Better
The market wave
• Running ~2-4 MB YOLO nano models at
30 FPS for < 1 W is now possible.
• Or ~8-10 MB YOLO small models
at 10 FPS for < 1 W.
• With deep sleep power < 1 mW
• For years of application battery Life
• Vision AI for everything, everywhere
© 2025 OpenMV, LLC 6
Introducing the OpenMV AE3
© 2025 OpenMV, LLC 7
• 400 MHz SIMD CPU
• 204 GOPS NPU
• 13 MB RAM
• 32 MB FLASH
• 1 MP color global shutter
• 30 FPS, 120 FPS @ VGA
• w/ mic, ToF, accel, gyro
• USB, WiFi, BLE
• GPIO: I2C, SPI, CAN, PWM
• Full power: 60 mA @ 5V (0.25 W)
• Deepsleep: 500 uA @ 5V (2.5 mW)
1” x 1”
And say hello to the OpenMV-N6
© 2025 OpenMV, LLC 8
32 MB FLASH
@ 400 MB/s
10/100/1000
ethernet
STM32N6
MCU
UHS-I µSD card
socket
(behind camera)
2.4 GHz WiFi
BLE V5.2
Mic and user
RGB LED
1MP 120 FPS
global shutter
color camera
3.7 V LIPO
charger
JTAG &
SWD
USB HS
480 Mb/s
64 MB RAM
@ 800 MB/s
IMU and
user button
JPEG &
H.264
600
GOPS
NPU
800 MHz
SIMD
CPU
MIPI
CSI w/
ISP
Full Power: 150 mA @ 5 V (0.75 W)
Deepsleep: 1 mA @ 5 V (5 mW)
NPU Accelerated TensorFlow +
NumPy Onboard =
Vector Accelerated Python Processing
© 2025 OpenMV, LLC 9
There are a lot of models
© 2025 OpenMV, LLC 10
The Problem
• So many vision models!
How can you quickly support one?
• Quantized models may need
tweaking too, custom output
modifications and more!
How to handle this?
NPU accelerated TensorFlow lite for microcontrollers
© 2025 OpenMV, LLC 11
OpenMV ML Framework
1. Load a model reference to execute
in place from FLASH by the NPU.
2. Create a post-processing object
which will receive the tensor
output from the model.
3. Run inference using the NPU on
image objects and post-process
them in Python with Numpy.
Accepts a list of Tensors and outputs a list
of Tensors for multi-modal inference
Post-process with Numpy on Micropython (1/2)
© 2025 OpenMV, LLC 12
ARM Helium Accelerated Numpy
1. All YOLO V5 bounding box score
outputs are thresholded at the
same time using ARM Helium
accelerated Numpy code!
2. Non-zero indices are then extracted
to produce a new array of just the
passing bounding boxes.
ARM Helium vector acceleration applied
to Numpy can be reused by all ML code.
Post-process with Numpy on Micropython (2/2)
© 2025 OpenMV, LLC 13
Finishing Up
• Numpy makes it easy to find the maximum
class score index of every bounding box row
in one line of code!
• Operations to extract the xmin, ymin, xmax,
ymax of all bounding boxes are vectorized
across all bounding box rows! As fast as C!
• Non-Max-Suppression to filter overlapping
bounding boxes, is implemented in Python
using Numpy too!
Multi-core processing in MicroPython
using OpenAMP on the OpenMV AE3
© 2025 OpenMV, LLC 14
Easy to use multi-core programming using OpenAMP
© 2025 OpenMV, LLC 15
The dream
1. High-efficiency core runs AI
model on Mic/IMU samples
2. Wake up high-performance core
on detection to process images
3. Transmit any detections to the
cloud and go back to sleep
One Python script, two processors, two MicroPython VMs
© 2025 OpenMV, LLC 16
What we’ve done
1. Python function decorator used to specify
asyncio co-routines to run on the low-power
core.
2. The callback running on the main core will receive
messages from the asyncio co-routine.
• Low-power core runs multiple asyncio co-
routines connected to multiple callbacks.
3. Main core starts the low-power core and enters
its own main loop.
A processor and NPU for audio detection
© 2025 OpenMV, LLC 17
46 GOPS available for a Wake Word Detector
1. Low power core has its own MicroPython VM,
stack, heap, 46 GOPS NPU, and Mic.
2. Low power core runs Google MicroSpeech
model to detect a keyword like “OK Google”.
3. Low power core sends any detected label strings
to the main core via the OpenAMP end-point
“ept”.
Which triggers NPU image processing
© 2025 OpenMV, LLC 18
204 GOPS available for an Object Detector
1. Main core loads YOLO V5 224 nano model
reference from ROM to execute-in-place.
2. Main core wakes up when low-power core
sends wake word.
3. If “Ok Google” the main core takes a picture,
runs YOLOv5 on it to detect objects, and
transmits the results.
4. The main core then goes back to sleep.
What will you create?
• The OpenMV AE3
• 1x 400 MHz Cortex-M55 w/ 204 GOPS NPU
• 1x 160 MHz Cortex-M55 w/ 46 GOPS NPU
• Five sensors:
• 1MP color global shutter camera
• 8x8 400 cm ToF distance sensor
• Accelerometer/gyroscope
• Microphone
• Accelerometer/gyroscope/microphone are accessible by
the low-power core during lightsleep() by the main core.
© 2025 OpenMV, LLC 19
OpenMV Website
https://openmv.io
OpenMV N6 Product Page
https://openmv.io/collections/cameras
/products/openmv-n6
OpenMV AE3 Product Page
https://openmv.io/collections/cameras
/products/openmv-ae3
© 2025 OpenMV, LLC 20
Resources
Visit us
at Booth
#909

More Related Content

More from Edge AI and Vision Alliance (20)

PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
Edge AI and Vision Alliance
 
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
Edge AI and Vision Alliance
 
PDF
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
Edge AI and Vision Alliance
 
PDF
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
Edge AI and Vision Alliance
 
PDF
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
PDF
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
Edge AI and Vision Alliance
 
PDF
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
PDF
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
PDF
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
PDF
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
Edge AI and Vision Alliance
 
PDF
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
PDF
“How Qualcomm Is Powering AI-driven Multimedia at the Edge,” a Presentation f...
Edge AI and Vision Alliance
 
PDF
“OAAX: One Standard for AI Vision on Any Compute Platform,” a Presentation fr...
Edge AI and Vision Alliance
 
PDF
“Improved Data Sampling Techniques for Training Neural Networks,” a Presentat...
Edge AI and Vision Alliance
 
PDF
“Cost-efficient, High-quality AI for Consumer-grade Smart Home Cameras,” a Pr...
Edge AI and Vision Alliance
 
PDF
“Edge AI Optimization on Rails—Literally,” a Presentation from Wabtec
Edge AI and Vision Alliance
 
PDF
“How Large Language Models Are Impacting Computer Vision,” a Presentation fro...
Edge AI and Vision Alliance
 
PDF
“Implementing AI/Computer Vision for Corporate Security Surveillance,” a Pres...
Edge AI and Vision Alliance
 
PDF
“Continual Learning thru Sequential, Lightweight Optimization,” a Presentatio...
Edge AI and Vision Alliance
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
Edge AI and Vision Alliance
 
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
Edge AI and Vision Alliance
 
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
Edge AI and Vision Alliance
 
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
Edge AI and Vision Alliance
 
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
Edge AI and Vision Alliance
 
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
Edge AI and Vision Alliance
 
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
“How Qualcomm Is Powering AI-driven Multimedia at the Edge,” a Presentation f...
Edge AI and Vision Alliance
 
“OAAX: One Standard for AI Vision on Any Compute Platform,” a Presentation fr...
Edge AI and Vision Alliance
 
“Improved Data Sampling Techniques for Training Neural Networks,” a Presentat...
Edge AI and Vision Alliance
 
“Cost-efficient, High-quality AI for Consumer-grade Smart Home Cameras,” a Pr...
Edge AI and Vision Alliance
 
“Edge AI Optimization on Rails—Literally,” a Presentation from Wabtec
Edge AI and Vision Alliance
 
“How Large Language Models Are Impacting Computer Vision,” a Presentation fro...
Edge AI and Vision Alliance
 
“Implementing AI/Computer Vision for Corporate Security Surveillance,” a Pres...
Edge AI and Vision Alliance
 
“Continual Learning thru Sequential, Lightweight Optimization,” a Presentatio...
Edge AI and Vision Alliance
 

Recently uploaded (20)

PPTX
Wondershare Filmora Crack Free Download 2025
josanj305
 
PDF
Bitkom eIDAS Summit | European Business Wallet: Use Cases, Macroeconomics, an...
Carsten Stoecker
 
PPTX
Role_of_Artificial_Intelligence_in_Livestock_Extension_Services.pptx
DrRajdeepMadavi
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Modern Decentralized Application Architectures.pdf
Kalema Edgar
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
NASA A Researcher’s Guide to International Space Station : Earth Observations
Dr. PANKAJ DHUSSA
 
PPTX
Essential Content-centric Plugins for your Website
Laura Byrne
 
PDF
Evolution: How True AI is Redefining Safety in Industry 4.0
vikaassingh4433
 
PDF
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
Linux schedulers for fun and profit with SchedKit
Alessio Biancalana
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PDF
Software Development Company Keene Systems, Inc (1).pdf
Custom Software Development Company | Keene Systems, Inc.
 
PDF
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
PDF
NASA A Researcher’s Guide to International Space Station : Fundamental Physics
Dr. PANKAJ DHUSSA
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Wondershare Filmora Crack Free Download 2025
josanj305
 
Bitkom eIDAS Summit | European Business Wallet: Use Cases, Macroeconomics, an...
Carsten Stoecker
 
Role_of_Artificial_Intelligence_in_Livestock_Extension_Services.pptx
DrRajdeepMadavi
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Modern Decentralized Application Architectures.pdf
Kalema Edgar
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
NASA A Researcher’s Guide to International Space Station : Earth Observations
Dr. PANKAJ DHUSSA
 
Essential Content-centric Plugins for your Website
Laura Byrne
 
Evolution: How True AI is Redefining Safety in Industry 4.0
vikaassingh4433
 
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
Linux schedulers for fun and profit with SchedKit
Alessio Biancalana
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Software Development Company Keene Systems, Inc (1).pdf
Custom Software Development Company | Keene Systems, Inc.
 
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
NASA A Researcher’s Guide to International Space Station : Fundamental Physics
Dr. PANKAJ DHUSSA
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Ad

“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, TensorFlow and Numpy,” a Presentation from OpenMV

  • 1. Running Accelerated CNNs on Low-Power Microcontrollers Using Arm Ethos-U55, TensorFlow and Numpy Kwabena W. Agyeman President OpenMV, LLC
  • 2. • Maker of the OpenMV Cam • A low-power computer vision platform • Directly integrate into products • Or licensable for being remixed • What we do: • Electrical and PCB design, manufacturing • High-performance firmware programming • Camera drivers, DMA, cache coherency, etc. • SIMD computer vision algorithms, etc. What is OpenMV? © 2025 OpenMV, LLC 2 Over 100K Sold & Licensed
  • 3. We make it easy to build a product © 2025 OpenMV, LLC 3 Your application provided MicroPython Vector (SIMD) accelerated vision algorithms & NPU drivers Microcontroller support Camera sensors
  • 4. Outline • Market background – what’s happening with MCUs? • Introduce the OpenMV Cam N6 and OpenMV AE3. • Run ML workloads on microcontrollers using Numpy and TensorFlow. • Multi-core low-power ML processing using MicroPython. © 2025 OpenMV, LLC 4
  • 5. New AI microcontrollers are here • Before: • 600 MHz M7 CPU • ~1.2 INT8 GOPS ML performance • ~1 MB RAM on chip • ~1.2 GB/s bandwidth • ~66 MBs FLASH access • No MIPI CSI, ISP, NPU Run 224x224 YOLOv5 Nano at 0.4 FPS @ ~0.8 W © 2025 OpenMV, LLC 5 • Now: • 400 MHz M55 CPU • ~204 INT8 GOPS ML performance • ~13 MB RAM on chip • ~3.2 GB/s bandwidth • ~200 MBs FLASH access • MIPI CSI, Helium-ISP, NPU Runs 224x224 YOLOv5 Nano at 28 FPS @ ~0.25 W > 200x Better
  • 6. The market wave • Running ~2-4 MB YOLO nano models at 30 FPS for < 1 W is now possible. • Or ~8-10 MB YOLO small models at 10 FPS for < 1 W. • With deep sleep power < 1 mW • For years of application battery Life • Vision AI for everything, everywhere © 2025 OpenMV, LLC 6
  • 7. Introducing the OpenMV AE3 © 2025 OpenMV, LLC 7 • 400 MHz SIMD CPU • 204 GOPS NPU • 13 MB RAM • 32 MB FLASH • 1 MP color global shutter • 30 FPS, 120 FPS @ VGA • w/ mic, ToF, accel, gyro • USB, WiFi, BLE • GPIO: I2C, SPI, CAN, PWM • Full power: 60 mA @ 5V (0.25 W) • Deepsleep: 500 uA @ 5V (2.5 mW) 1” x 1”
  • 8. And say hello to the OpenMV-N6 © 2025 OpenMV, LLC 8 32 MB FLASH @ 400 MB/s 10/100/1000 ethernet STM32N6 MCU UHS-I µSD card socket (behind camera) 2.4 GHz WiFi BLE V5.2 Mic and user RGB LED 1MP 120 FPS global shutter color camera 3.7 V LIPO charger JTAG & SWD USB HS 480 Mb/s 64 MB RAM @ 800 MB/s IMU and user button JPEG & H.264 600 GOPS NPU 800 MHz SIMD CPU MIPI CSI w/ ISP Full Power: 150 mA @ 5 V (0.75 W) Deepsleep: 1 mA @ 5 V (5 mW)
  • 9. NPU Accelerated TensorFlow + NumPy Onboard = Vector Accelerated Python Processing © 2025 OpenMV, LLC 9
  • 10. There are a lot of models © 2025 OpenMV, LLC 10 The Problem • So many vision models! How can you quickly support one? • Quantized models may need tweaking too, custom output modifications and more! How to handle this?
  • 11. NPU accelerated TensorFlow lite for microcontrollers © 2025 OpenMV, LLC 11 OpenMV ML Framework 1. Load a model reference to execute in place from FLASH by the NPU. 2. Create a post-processing object which will receive the tensor output from the model. 3. Run inference using the NPU on image objects and post-process them in Python with Numpy. Accepts a list of Tensors and outputs a list of Tensors for multi-modal inference
  • 12. Post-process with Numpy on Micropython (1/2) © 2025 OpenMV, LLC 12 ARM Helium Accelerated Numpy 1. All YOLO V5 bounding box score outputs are thresholded at the same time using ARM Helium accelerated Numpy code! 2. Non-zero indices are then extracted to produce a new array of just the passing bounding boxes. ARM Helium vector acceleration applied to Numpy can be reused by all ML code.
  • 13. Post-process with Numpy on Micropython (2/2) © 2025 OpenMV, LLC 13 Finishing Up • Numpy makes it easy to find the maximum class score index of every bounding box row in one line of code! • Operations to extract the xmin, ymin, xmax, ymax of all bounding boxes are vectorized across all bounding box rows! As fast as C! • Non-Max-Suppression to filter overlapping bounding boxes, is implemented in Python using Numpy too!
  • 14. Multi-core processing in MicroPython using OpenAMP on the OpenMV AE3 © 2025 OpenMV, LLC 14
  • 15. Easy to use multi-core programming using OpenAMP © 2025 OpenMV, LLC 15 The dream 1. High-efficiency core runs AI model on Mic/IMU samples 2. Wake up high-performance core on detection to process images 3. Transmit any detections to the cloud and go back to sleep
  • 16. One Python script, two processors, two MicroPython VMs © 2025 OpenMV, LLC 16 What we’ve done 1. Python function decorator used to specify asyncio co-routines to run on the low-power core. 2. The callback running on the main core will receive messages from the asyncio co-routine. • Low-power core runs multiple asyncio co- routines connected to multiple callbacks. 3. Main core starts the low-power core and enters its own main loop.
  • 17. A processor and NPU for audio detection © 2025 OpenMV, LLC 17 46 GOPS available for a Wake Word Detector 1. Low power core has its own MicroPython VM, stack, heap, 46 GOPS NPU, and Mic. 2. Low power core runs Google MicroSpeech model to detect a keyword like “OK Google”. 3. Low power core sends any detected label strings to the main core via the OpenAMP end-point “ept”.
  • 18. Which triggers NPU image processing © 2025 OpenMV, LLC 18 204 GOPS available for an Object Detector 1. Main core loads YOLO V5 224 nano model reference from ROM to execute-in-place. 2. Main core wakes up when low-power core sends wake word. 3. If “Ok Google” the main core takes a picture, runs YOLOv5 on it to detect objects, and transmits the results. 4. The main core then goes back to sleep.
  • 19. What will you create? • The OpenMV AE3 • 1x 400 MHz Cortex-M55 w/ 204 GOPS NPU • 1x 160 MHz Cortex-M55 w/ 46 GOPS NPU • Five sensors: • 1MP color global shutter camera • 8x8 400 cm ToF distance sensor • Accelerometer/gyroscope • Microphone • Accelerometer/gyroscope/microphone are accessible by the low-power core during lightsleep() by the main core. © 2025 OpenMV, LLC 19
  • 20. OpenMV Website https://openmv.io OpenMV N6 Product Page https://openmv.io/collections/cameras /products/openmv-n6 OpenMV AE3 Product Page https://openmv.io/collections/cameras /products/openmv-ae3 © 2025 OpenMV, LLC 20 Resources Visit us at Booth #909