SlideShare a Scribd company logo
2
Most read
6
Most read
State-Space Models vs
Transformers for Ultra Low-
Power Edge AI
Tony Lewis
Chief Technology Officer
Jon Tapson
Chief Development Officer
BrainChip, Inc.
About BrainChip – Founded 2013
Design & license machine learning accelerators for ultra low-power AI
Business Model: IP Licensing
15+ years of AI architecture research & technologies
65+ data science, hardware & software engineers
Publicly traded Australian Stock Exchange (BRN:ASX)
2
© 2025 BrainChip Inc.
Goals of This Presentation
• Analysis of computation and bandwidth in state-space models and transformers
• Establish energy and costs savings measures that are available ONLY to SSM
• Efficient off-line processing of context information
• Use of read only memory, e.g., flash to dramatically reduce power
• Conclusion
© 2025 BrainChip Inc. 3
Problem Statement
4
© 2025 BrainChip Inc.
• Goal:
• Achieve < 0.5 W system power
• Sub-100 ms latency for edge AI such as RAG
• Low SRAM < 1 MB and SoTA performance
• Why? Unlocks new cost and power sensitive markets
• Key challenges:
• Transformer based LLMs dominate today
• Transformers KV cache expands blows up chip cache
• RAG uses long context length (> 1024 tokens)
• Opportunity:
• State-Space Models address power, size issues
State-Space Model Overview (1/2)
• State-Space refers to time-domain model of
coupled linear difference equations used to
model physical systems.
• x is the state of the system
• u is the input. For LLM, a real vector 1k-4k long
• A is a diagonal matrix. It is stable, acts as a low pass
filter, with oscillations. x=Ax are decoupled filters
• Bu drives this filter. B is a mixing term
• C reads out the state
• This part is a generic State-Space Model
5
© 2025 BrainChip Inc.
Layer 1
f(y)
Layer N
State-Space Model
State-Space Model Overview (2/2)
• Innovation in SSMs: F is a non-linearity, e.g., SiLU, ReLU
• Relation to neural networks
• B matrix is a set of input weights to individual neurons
• A gives the neurons dynamics, similar to RNNs
• Difference with RNNs
• Because of the regular structure, this RNN can be
converted to CNN for fast training on GPUs!
• Recurrent inference, small, efficient
6
© 2025 BrainChip Inc.
Layer 1
f(y)
Layer N
State-Space Model
Training
• Can train SSM as convolutional networks exploiting parallelism in
GPUs
• Distillation pathway from transformers to state space model
• Distillation is a popular way of training smaller LLMs
• Start with large LLM an use it as a teacher for the small SMM.
• Cross-architecture distillations for transformers -> State-Space
Models are being developed (Mohawk)
©2025 BrainChip Inc. 7
State-Space Model Cache Is Tiny
• Memory requirements, BrainChip TENNs LLM 1B
parameters
• State size includes:
• States per layer: 4K
• Word size : 2 Bytes
• N Layers : 24
• Calculation for 1B parameter model:
• State size = 4K*2*24* = 393 KB
8
©2025 BrainChip Inc.
Layer 1
f(y)
Layer N
State-Space Model
State-Space Machine Are Markov
• Given an SSM, the current state is conditionally
independent of the past:
• Implications for Retrieval Augmented Generation
• “Chunks” of text are retrieved for processing.
• With SSMs, preprocess the entire chunk and
store in hidden state
• Can then “seed” state-machine.
• Computation cost is ~0 for any context length
size!
9
©2025 BrainChip Inc.
Retrieve Pre-Processed Hidden State
vs
Retrieve Full Text
State-Space Models: Hardware Benefit
• Memory transfers can be Read Only
• DDR is not needed. Minimum DDR confirmation are ~2 watts and above
• Flash brings us below < 0.5 watts active. Plus no leakage
• Compute is constant
• At 20 tokens/sec, for a 1B model requires ~ 20 GMACs. Using 1 pJ/MAC, energy for LLM is ~20
milliwatts.
• Bandwidth < 5-10 GB/Sec,
• Time to first token < 100 ms for RAG do to caching
©2025 BrainChip Inc. 10
How Do Transformers Compare?
• Memory
• In a 1B model, i.e., Llama 3.2 1B must cache Key and Value terms for all layers.
• 1K tokens requires an overwhelming 50 MB of cache; 50 MB > 1 MB
Compute: Attention head grows in compute and size a N^2.
• Memory bandwidth: Must cache KV on DRAM. IO bandwidth begins to be
dominated by KV read/writes for long context.
• DDR means higher minimum floor; > 2 watts
• Compute of 1K input context requires trillions of macs.
• High compute means large energy costs.
©2025 BrainChip Inc. 11
Transformers versus SSM
©2025 BrainChip Inc. 12
Aspect Transformers SSM
Research Activity Intense optimization efforts Growing interest. Fewer
optimization techniques
Lossy Compression Full context retention Hidden state acts as lossy
bottleneck
Computational Complexity O(N^2) (very poor) O(N)
Inference Speed Slower Much faster
Die Area (cost) Very high Very low
Flash Compatible vs. DRAM No Yes
Precompute Offline No Yes
BrainChip TENNs 1B versus Transformer 1B
TENNs 1B Llama 3.2 1B Comment
Perplexity, lower is better 6.3 13.7 (base) SMM shows strong possibilities for
RAG applications
Teraflops 1024 context tokens
(RAG application)
0 2.5 Offline compute is great benefit
Teraflops additional 100 query
tokens
0.1 0.25
MMLU 40 49 Transformers excel for certain tasks
Write bandwidth KV cache 0 156 MB  Large on-chip memory or
external DRAM
Read bandwidth KV cache 0 95 GB Latency reduced w/ slow mem
© 2025 BrainChip Inc. 13
Summary
• State-Space Models are a viable alternative for LLM at the extreme edge
• SMM requires small cache, read-only memory at low bandwidth and low compute
intensity.
• Total power for a 1B design comes in under 0.5 watts for both Flash access and
compute
• Transformers cannot meet ultra-low power requirements today, due to the following:
• Transformers require large cache, read-write memory and off board DDR
• Transformers require high compute with many TOPS with many mac units, driving up
cost, power, and heat generation
©2025 BrainChip Inc. 14
What Are the Drawbacks of State-Space Models?
• Our models have better performance on metrics like perplexity versus public domain
Transformer models.
• Yet, some tasks like copying, and in context learning remain difficult for SSM.
• Transformers are the subject of intense research, with new efficiencies every day.
• The main strength and weakness of SSM is their Markov property.
©2025 BrainChip Inc. 15
Resources
• State-Space Models: https://en.wikipedia.org/wiki/State-space_representation
• Mohawk: https://goombalab.github.io/blog/2024/distillation-part1-mohawk/
• Transformer Compute Requirements: Kaplan et al., https://arxiv.org/pdf/2001.08361
©2025 BrainChip Inc. 16
17
©2025 BrainChip Inc.
Thank You
See our demonstration in booth # 716

More Related Content

Similar to “State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presentation from BrainChip (20)

PDF
Smart Data Webinar: Machine Learning Update
DATAVERSITY
 
PDF
Using Deep Learning in Production Pipelines to Predict Consumers’ Interest wi...
Databricks
 
PDF
Transformers in 2021
Grigory Sapunov
 
PDF
The Memory Wall in AI - A Crisis We Must Solve
AI Infra Forum
 
PDF
Webinar: Machine Learning para Microcontroladores
Embarcados
 
PDF
Deep learning 1.0 and Beyond, Part 2
Deakin University
 
PDF
[February 2017 - Ph.D. Final Dissertation] Enabling Power-awareness For Multi...
Matteo Ferroni
 
PPTX
Innovation with ai at scale on the edge vt sept 2019 v0
Ganesan Narayanasamy
 
PDF
NVIDIA @ Infinite Conference, London
Alison B. Lowndes
 
PPTX
Deep Learning - Hype, Reality and Applications in Manufacturing
Adam Cook
 
PPTX
Deep learning: what? how? why? How to win a Kaggle competition
317070
 
PDF
Ed Safford III MetroCon 2015 Verification, Validation, and Deployment of Hybr...
Edward L S Safford III
 
PDF
Deep Learning for Autonomous Driving
Jan Wiegelmann
 
PPTX
Artificial intelligence - A Teaser to the Topic.
Dr. Kim (Kyllesbech Larsen)
 
PPTX
The deep learning tour - Q1 2017
Eran Shlomo
 
PDF
Hardware for Deep Learning AI ML CNN.pdf
AhmedSaeed115917
 
PDF
HKG18-312 - CMSIS-NN
Linaro
 
PPTX
prace_days_ml_2019.pptx
RohanBorgalli
 
PPTX
prace_days_ml_2019.pptx
SreeVani74
 
PPTX
prace_days_ml_2019.pptx
ssuserf583ac
 
Smart Data Webinar: Machine Learning Update
DATAVERSITY
 
Using Deep Learning in Production Pipelines to Predict Consumers’ Interest wi...
Databricks
 
Transformers in 2021
Grigory Sapunov
 
The Memory Wall in AI - A Crisis We Must Solve
AI Infra Forum
 
Webinar: Machine Learning para Microcontroladores
Embarcados
 
Deep learning 1.0 and Beyond, Part 2
Deakin University
 
[February 2017 - Ph.D. Final Dissertation] Enabling Power-awareness For Multi...
Matteo Ferroni
 
Innovation with ai at scale on the edge vt sept 2019 v0
Ganesan Narayanasamy
 
NVIDIA @ Infinite Conference, London
Alison B. Lowndes
 
Deep Learning - Hype, Reality and Applications in Manufacturing
Adam Cook
 
Deep learning: what? how? why? How to win a Kaggle competition
317070
 
Ed Safford III MetroCon 2015 Verification, Validation, and Deployment of Hybr...
Edward L S Safford III
 
Deep Learning for Autonomous Driving
Jan Wiegelmann
 
Artificial intelligence - A Teaser to the Topic.
Dr. Kim (Kyllesbech Larsen)
 
The deep learning tour - Q1 2017
Eran Shlomo
 
Hardware for Deep Learning AI ML CNN.pdf
AhmedSaeed115917
 
HKG18-312 - CMSIS-NN
Linaro
 
prace_days_ml_2019.pptx
RohanBorgalli
 
prace_days_ml_2019.pptx
SreeVani74
 
prace_days_ml_2019.pptx
ssuserf583ac
 

More from Edge AI and Vision Alliance (20)

PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
PDF
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
Edge AI and Vision Alliance
 
PDF
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
Edge AI and Vision Alliance
 
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
Edge AI and Vision Alliance
 
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
Edge AI and Vision Alliance
 
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
Edge AI and Vision Alliance
 
PDF
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
Edge AI and Vision Alliance
 
PDF
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
Edge AI and Vision Alliance
 
PDF
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
PDF
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
Edge AI and Vision Alliance
 
PDF
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
PDF
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
PDF
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
PDF
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
Edge AI and Vision Alliance
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
Edge AI and Vision Alliance
 
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
Edge AI and Vision Alliance
 
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
Edge AI and Vision Alliance
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
Edge AI and Vision Alliance
 
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
Edge AI and Vision Alliance
 
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
Edge AI and Vision Alliance
 
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
Edge AI and Vision Alliance
 
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
Edge AI and Vision Alliance
 
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
Edge AI and Vision Alliance
 
Ad

Recently uploaded (20)

PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
July Patch Tuesday
Ivanti
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Ad

“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presentation from BrainChip

  • 1. State-Space Models vs Transformers for Ultra Low- Power Edge AI Tony Lewis Chief Technology Officer Jon Tapson Chief Development Officer BrainChip, Inc.
  • 2. About BrainChip – Founded 2013 Design & license machine learning accelerators for ultra low-power AI Business Model: IP Licensing 15+ years of AI architecture research & technologies 65+ data science, hardware & software engineers Publicly traded Australian Stock Exchange (BRN:ASX) 2 © 2025 BrainChip Inc.
  • 3. Goals of This Presentation • Analysis of computation and bandwidth in state-space models and transformers • Establish energy and costs savings measures that are available ONLY to SSM • Efficient off-line processing of context information • Use of read only memory, e.g., flash to dramatically reduce power • Conclusion © 2025 BrainChip Inc. 3
  • 4. Problem Statement 4 © 2025 BrainChip Inc. • Goal: • Achieve < 0.5 W system power • Sub-100 ms latency for edge AI such as RAG • Low SRAM < 1 MB and SoTA performance • Why? Unlocks new cost and power sensitive markets • Key challenges: • Transformer based LLMs dominate today • Transformers KV cache expands blows up chip cache • RAG uses long context length (> 1024 tokens) • Opportunity: • State-Space Models address power, size issues
  • 5. State-Space Model Overview (1/2) • State-Space refers to time-domain model of coupled linear difference equations used to model physical systems. • x is the state of the system • u is the input. For LLM, a real vector 1k-4k long • A is a diagonal matrix. It is stable, acts as a low pass filter, with oscillations. x=Ax are decoupled filters • Bu drives this filter. B is a mixing term • C reads out the state • This part is a generic State-Space Model 5 © 2025 BrainChip Inc. Layer 1 f(y) Layer N State-Space Model
  • 6. State-Space Model Overview (2/2) • Innovation in SSMs: F is a non-linearity, e.g., SiLU, ReLU • Relation to neural networks • B matrix is a set of input weights to individual neurons • A gives the neurons dynamics, similar to RNNs • Difference with RNNs • Because of the regular structure, this RNN can be converted to CNN for fast training on GPUs! • Recurrent inference, small, efficient 6 © 2025 BrainChip Inc. Layer 1 f(y) Layer N State-Space Model
  • 7. Training • Can train SSM as convolutional networks exploiting parallelism in GPUs • Distillation pathway from transformers to state space model • Distillation is a popular way of training smaller LLMs • Start with large LLM an use it as a teacher for the small SMM. • Cross-architecture distillations for transformers -> State-Space Models are being developed (Mohawk) ©2025 BrainChip Inc. 7
  • 8. State-Space Model Cache Is Tiny • Memory requirements, BrainChip TENNs LLM 1B parameters • State size includes: • States per layer: 4K • Word size : 2 Bytes • N Layers : 24 • Calculation for 1B parameter model: • State size = 4K*2*24* = 393 KB 8 ©2025 BrainChip Inc. Layer 1 f(y) Layer N State-Space Model
  • 9. State-Space Machine Are Markov • Given an SSM, the current state is conditionally independent of the past: • Implications for Retrieval Augmented Generation • “Chunks” of text are retrieved for processing. • With SSMs, preprocess the entire chunk and store in hidden state • Can then “seed” state-machine. • Computation cost is ~0 for any context length size! 9 ©2025 BrainChip Inc. Retrieve Pre-Processed Hidden State vs Retrieve Full Text
  • 10. State-Space Models: Hardware Benefit • Memory transfers can be Read Only • DDR is not needed. Minimum DDR confirmation are ~2 watts and above • Flash brings us below < 0.5 watts active. Plus no leakage • Compute is constant • At 20 tokens/sec, for a 1B model requires ~ 20 GMACs. Using 1 pJ/MAC, energy for LLM is ~20 milliwatts. • Bandwidth < 5-10 GB/Sec, • Time to first token < 100 ms for RAG do to caching ©2025 BrainChip Inc. 10
  • 11. How Do Transformers Compare? • Memory • In a 1B model, i.e., Llama 3.2 1B must cache Key and Value terms for all layers. • 1K tokens requires an overwhelming 50 MB of cache; 50 MB > 1 MB Compute: Attention head grows in compute and size a N^2. • Memory bandwidth: Must cache KV on DRAM. IO bandwidth begins to be dominated by KV read/writes for long context. • DDR means higher minimum floor; > 2 watts • Compute of 1K input context requires trillions of macs. • High compute means large energy costs. ©2025 BrainChip Inc. 11
  • 12. Transformers versus SSM ©2025 BrainChip Inc. 12 Aspect Transformers SSM Research Activity Intense optimization efforts Growing interest. Fewer optimization techniques Lossy Compression Full context retention Hidden state acts as lossy bottleneck Computational Complexity O(N^2) (very poor) O(N) Inference Speed Slower Much faster Die Area (cost) Very high Very low Flash Compatible vs. DRAM No Yes Precompute Offline No Yes
  • 13. BrainChip TENNs 1B versus Transformer 1B TENNs 1B Llama 3.2 1B Comment Perplexity, lower is better 6.3 13.7 (base) SMM shows strong possibilities for RAG applications Teraflops 1024 context tokens (RAG application) 0 2.5 Offline compute is great benefit Teraflops additional 100 query tokens 0.1 0.25 MMLU 40 49 Transformers excel for certain tasks Write bandwidth KV cache 0 156 MB  Large on-chip memory or external DRAM Read bandwidth KV cache 0 95 GB Latency reduced w/ slow mem © 2025 BrainChip Inc. 13
  • 14. Summary • State-Space Models are a viable alternative for LLM at the extreme edge • SMM requires small cache, read-only memory at low bandwidth and low compute intensity. • Total power for a 1B design comes in under 0.5 watts for both Flash access and compute • Transformers cannot meet ultra-low power requirements today, due to the following: • Transformers require large cache, read-write memory and off board DDR • Transformers require high compute with many TOPS with many mac units, driving up cost, power, and heat generation ©2025 BrainChip Inc. 14
  • 15. What Are the Drawbacks of State-Space Models? • Our models have better performance on metrics like perplexity versus public domain Transformer models. • Yet, some tasks like copying, and in context learning remain difficult for SSM. • Transformers are the subject of intense research, with new efficiencies every day. • The main strength and weakness of SSM is their Markov property. ©2025 BrainChip Inc. 15
  • 16. Resources • State-Space Models: https://en.wikipedia.org/wiki/State-space_representation • Mohawk: https://goombalab.github.io/blog/2024/distillation-part1-mohawk/ • Transformer Compute Requirements: Kaplan et al., https://arxiv.org/pdf/2001.08361 ©2025 BrainChip Inc. 16
  • 17. 17 ©2025 BrainChip Inc. Thank You See our demonstration in booth # 716