SlideShare a Scribd company logo
Chris Jones
Director Product Management
BrainChip Inc.
Temporal Event Neural
Networks: A More Efficient
Alternative to the Transformer
Brainchip AI – At a Glance
• First to commercialize neuromorphic IP
platform and reference chip.
• 15+ yrs fundamental research
• 65+ data science, hardware & software
engineers
• Publicly traded Austrialian Stock
Exchange (BRD:ASX)
• 10 Customers – Early Access, Proof of
Concept, IP License
*Fulfillment through VVDN technologies
©2024 BrainChip Inc.
PRODUCTS
IP
Reference
SoC
Software
Tools
TRUSTED
BY
PARTNERS
Edge Box*
2
• Provide path to run complex models on the Edge
• Reduce cost of training
• Reduce cost of inference
Key Focal Areas
©2024 BrainChip Inc.
©2024 BrainChip Inc. 3
Temporal Event Neural Networks (TENNs)
©2024 BrainChip Inc. 4
Change the Game
©2024 BrainChip Inc.
Unleash Unprecedented Edge Devices
ONE DIMENSIONAL
STREAMING DATA
Up to 5000X
More Energy Efficient
Up to 50X
Fewer Parameters
Same Or Better
Accuracy
10-30X
Lower Training cost vs. GPT-2
5
TENNs Application Areas
©2024 BrainChip Inc.
1. Multi-dimensional streaming requiring spatiotemporal integration
(3D)
• Video object detection – frames are correlated in time.
• Action recognition – classifying an action across many frames
• Video frame prediction – path prediction & planning
2. Sequence classification and generation in time:
• Raw audio classification: keyword spotting without MFCC preprocessing
• Audio denoising: generate contextual denoising
• ASR and GenAI: compressing LLMs
3. Any other sequence classification or prediction algorithms
• Healthcare: vital signs estimation
• Anything that can be transformed into a time-series/sequence prediction
problem
Spatiotemporal Integration
Kinetics400 KITT
I
Sequence classification & generation
BIDMC Vital Signs SC10 Raw Audio
Microsoft DNS Challenge
6
Improve Video Object Detection
©2024 BrainChip Inc.
Frame Based Camera Comparison
(vs SimCLR + ResNet50 using Kitti2D Dataset**)
Network mAP
(%)
Parameters
(millions)
MACs / sec
(Billions)
Akida TENN* +
CenterNet
57.6 0.57 18
Equivalent
precision
50x fewer
parameters
5x fewer
operations
< 20 mW
For 30 FPS in 7 nm***
Resolution
1382 x 512
Event Based Camera Comparison
(vs Gray Retinanet + Prophesee Road Object Dataset*)
Network mAP
(%)
Parameters
(millions)
MACs / sec
(Billions)
Akida TENN* +
CenterNet
56 0.57 94
30% better
precision
50x fewer
parameters
30x fewer
operations
Resolution
1280 x 720
* Gray Retinanet is the latest state of art in event-camera
object detection
** SimCLR with a RESNET50 backbone is the benchmark in
object detection -- Source: SiMCLR Review
*** Estimates for Akida neural processing scaled from 28 nm
7
TENN Can Be Extended to Spatio-Temporal Data
©2024 BrainChip Inc.
DVS Hand Gesture Recognition: IBM DVS128 Dataset
State of the Art
Network Accuracy
(%)
Parameters MACs (billion) /
sec
Latency*
(ms)
TrueNorth-CNN 96.5 18 M - 155
Loihi-Slayer 93.6 - - 1450
ANN-Rollouts 97.0 500 k 10.4 1500
TA-SNN 98.6 - - 1500
Akida-CNN 95.2 138 k 0.12 200
TENN-Fast 97.6 192 k 0.429 105
TENN 100.0 192 k 0.499 510
8
Enhance Raw Audio and Speech Processing
©2024 BrainChip Inc. 9
Task: Audio Denoising
Comparison of TENN Versus SoTA
Model Deep Filter
Net V1
TENN Deep Filter
Net V2
Deep Filter
Net V3
PESQ 2.49 2.61 2.67 2.68
Params
(relative
to TENN)
2.98 1 3.86 3.56
MACs
(relative
to TENN)
11.7 1 12.1 11.5
BRAINCHIP | TENN
STFT iSTFT
Conv1D/LSTM/
GRU
Traditional Denoising Model Approach
TENNs
TENNs Model Approach
Potentially consume 50%+ of
total power
STFT/iSTFT overhead and BOM not
needed with TENNs
• Audio denoising isolates a voice signal obscured by background noise
• Traditional approach employs computationally intensive time domain to
frequency domain transform and the inverse transform
• TENNs approach avoids expensive data transformations
©2024 BrainChip Inc. 10
TENN vs GPT2
Single thread CPU performance, 11th Gen Intel i7 - 3.00 GHz
Both models were prompted with the first 1024 words of the Harry Potter 1st novel
> 2100 tokens/minute < 10 tokens/minute
©2024 BrainChip Inc.
©2024 BrainChip Inc. 11
Task: Sentence Generation
Model GPT2
Small
GPT2
Medium
TENN Mamba
130M
GPT2 large GPT2 full Mamba
370M
Train_size 13 GB 13GB 0.1 GB 836GB 13GB 13GB 836GB
Score 9.7 10.2 10.3 10.4 10.4 10.8 10.9
Params
(relative to TENN)
1.35 4.8 1 2.06 10.4 21.7 5.9
Energy
(relative to TENN)
1700 5700 1 2.06 13000 27000 5.9
Training Time
(relative to TENN)
~768 GPU
hours
21x
~2264 GPU
hours
62.8x
35 GPU hours
1. TENN trained on WikiText-103. 100M tokens
2. GPT models trained on open_web_text, Mamba trained on the Pile
3. TENN training time: ~1.5 days on (1) A100 (35 GPU hours)
4. GPT-2 Small training time: 4 days on (8) A100 (768 hours)
5. GPT-2 Medium estimated training time
6. Scores reported as negative entropy:−𝑙𝑜𝑔2 1/𝑉𝑜𝑐𝑎𝑏𝑆𝑖𝑧𝑒 − 𝑙𝑜𝑔2 𝑝𝑒𝑟𝑝𝑙𝑒𝑥𝑖𝑡𝑦 (higher better)
7. Input (context) was 1024 tokens ©2024 BrainChip Inc.
©2024 BrainChip Inc. 12
Technical Details
©2024 BrainChip Inc. 13
• Colored plane represents the continuous
kernel we’re trying to learn
• Red arrows represent the individual weights
in a 7x7 filter
• A large number of weights requires a large
amount of computation
• Results in slow training and large memory
bottlenecks
Learning Continuous Convolution Kernels
©2024 BrainChip Inc. 14
Representing Convolution Kernels with Orthogonal
Polynomials
©2024 BrainChip Inc.
Chebyshev polynomial basis can lead to exponential
convergence for a wide range of functions, including
those with singularities or discontinuities.*
*Lloyd N. Trefethen. 2019. Approximation Theory and Approximation Practice, Extended Edition. SIAM-
Society for Industrial and Applied Mathematics, Philadelphia, PA, USA.
• TENNs learns the continuous kernel directly
through polynomial expansion.
• Learn coefficients for polynomials through
backpropagation.
• Training is much faster because the polynomial
coefficients (weights) converge independently and
do not affect each other due to polynomials being
orthogonal to each other.
Chebyshev polynomial
15
Visualizing the Computation
©2024 BrainChip Inc.
22 23 24 25
Polynomials
Coefficients
𝐶1−12 ∙
𝑎𝑙
Input Buffer 𝐼(𝑡)
h(t − τ) =෍
𝑙=0
𝐿
𝑎𝑙 𝐶𝑙 𝑡 − τ
Kernel
𝜒 𝑡 = h ∗ 𝐼 𝑡 = න
𝑡−𝐷
𝑡
h t − τ 𝐼 𝜏 𝑑𝜏 ≈ ෍
𝑘=22
25
h t − 𝑘 𝐼 𝑘
Time (𝑡)
Convolution
Convolution:
ℎ
[0.011, 0.871, 0.235, 0.678, 0.547, 0.298, 0.045, 0.945, 0.478, 0.284, 0.765, 0.199]
h ∙ = 𝑎1 𝐶1 ∙ + 𝑎2 𝐶2 ∙ + 𝑎3 𝐶3 ∙ + 𝑎4 𝐶4 ∙ + 𝑎5 𝐶5 ∙ + 𝑎6 𝐶6 ∙ + 𝑎𝑛 𝐶𝑛 ∙
𝜒 𝑡 = 25 = σ𝑘=22
25
h 25 − 𝑘 𝐼 𝑘 = ℎ(3) 𝐼(22) + ℎ(2) 𝐼(23) + ℎ(1) 𝐼(24) + ℎ(0) 𝐼(25)
𝜒
Nonlinear Output: 𝑜 𝑡 = 𝑓 𝜒 𝑡 𝑓 ∙ : nonlinear activation function:
16
Buffer Mode vs Recurrent Mode
©2024 BrainChip Inc.
Recurrence: Chebyshev polynomials have a recurrence relationship.
Duality: This particular recurrence imputes duality to buffer mode as well as
recurrent mode.
Buffer (Convolutional) Mode
Overview
Buffering inputs over time
Benefit
Speed up training by reading the
memory buffer in parallel
Training stability improved by
orthogonality
Drawbacks
Higher memory usage
Recurrent Mode
Overview
Update previous state over time
Benefit
Save memory by generating polynomials
recurrently, timestep-by-timestep
Lower memory usage benefits inference
Drawback
Training has to be done sequentially
17
Getting It to Market
©2024 BrainChip Inc. 18
©2024 BrainChip Inc.
Key Hardware Features
• Digital, event-based, at memory compute
• Highly scalable
• Each node connected by mesh network
• Inside each node is an event-based TENN
processing unit
Hardware IP to Run TENNs on the Edge
19
Fundamentally different. Extremely efficient.
Brainchip’s Differentiation: Akida Technology Foundations
©2024 BrainChip Inc. 20
BrainChip Resources
©2024 BrainChip Inc.
TENNs Paper “Building Temporal Kernels with Orthogonal Polynomials
https://bit.ly/brainchip_tenns
TENNs White Paper
https://brainchip.com/temporal-event-based-neural-networks-a-new-approach-to-temporal-processing/
Akida 2nd Generation
https://brainchip.com/wp-content/uploads/2023/03/BrainChip_second_generation_Platform_Brief.pdf
BrainChip Enablement Platforms
https://brainchip.com/akida-enablement-platforms/
Visit Us @ Booth #618
21
©2024 BrainChip Inc.
Backup Slides
22
Improve Efficiency Without Compromising Accuracy
©2024 BrainChip Inc.
Simplifies solution to complex problems
Reduces model size and footprint without loss in
accuracy
Easy to train (CNN-like pipeline)
Supports longer range dependencies than RNNs
Temporal Event Based Neural Nets (TENNs)
23
Principles:
1. Recurrence: Chebyshev and Legendre polynomials
have recurrence relationship.
2. Duality: Recurrence imputes duality: Buffer mode
as well as recurrent mode.
3. Stable training: Train in buffer mode
4. Fast Running: Run in recurrent mode. Small foot-
print
5. Insight: TENNs and SSM are a stack of generalized
Fourier filters running in a recurrent mode, with
non-linearities between layers.
TENN Has Two Modes: Buffer and Recurrent Modes
©2024 BrainChip Inc.
Recurrent Mode
24
TENN Has Two Modes: Buffer and Recurrent Modes
©2024 BrainChip Inc.
h 𝑡 = σ𝑙=0
𝐿
𝑎𝑙 𝐶𝑙 𝑡
kernel
convolution
Buffer mode:
buffer for h(t) & buffer for I(t)
convolution: dot product over 2 buffers
Recurrent mode:
h 𝑡 = σ𝑙=0
𝐿
𝑎𝑙 𝐶𝑙 𝑡
kernel
L convolutions
over polynomials
𝜒𝑙 = 𝐶𝑙 ∗ 𝐼(𝑡)
kernel convolution 𝜒 = σ𝑙=0
𝐿
𝑎𝑙 𝜒𝑙
𝜒 = h ∗ 𝐼(𝑡)
𝜒 = ෩
𝒉 ∙ 𝑰 = σ𝑘
෩
𝒉𝑘𝐼𝑘
Entire kernel is stored in a memory buffer accessible at
once
Convolution is computed in conventional way
Polynomials generated recurrently, timestep by timestep &
not stored in memory
Convolution of input over L polynomials computed timestep
by timestep, accumulated over time; L separate convolutions
Kernel convolution is L polynomial convolutions weighted
by the polynomial coefficients & summed
Buffer mode for fast parallel training:
Recurrent mode saves memory :
25

More Related Content

Similar to “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer,” a Presentation from BrainChip (20)

PDF
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
Edge AI and Vision Alliance
 
PDF
Deep Learning Hardware: Past, Present, & Future
Rouyun Pan
 
PDF
Deep learning and applications in non-cognitive domains I
Deakin University
 
PDF
EchoBay: optimization of Echo State Networks under memory and time constraints
NECST Lab @ Politecnico di Milano
 
PDF
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
GeeksLab Odessa
 
PDF
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
PDF
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
Edge AI and Vision Alliance
 
PDF
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Naoki (Neo) SATO
 
PPTX
Towards AGI - Berlin May 2019
Peter Morgan
 
PPTX
Towards AGI Berlin - Building AGI, May 2019
Peter Morgan
 
PDF
Deep Networks with Neuromorphic VLSI devices
Giacomo Indiveri
 
PDF
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Duy-Hieu Bui
 
PDF
Towards a General Theory of Intelligence - April 2018
Peter Morgan
 
PDF
Edge AI Miramond technical seminCERN.pdf
yagab5011
 
PDF
AI is Impacting HPC Everywhere
inside-BigData.com
 
PDF
Icon18revrec sudeshna
Muthusamy Chelliah
 
PDF
Hardware for Deep Learning AI ML CNN.pdf
AhmedSaeed115917
 
PPTX
Study of End to End memory networks
ASHISH MENKUDALE
 
PPTX
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
MLconf
 
PDF
ACCELERATED DEEP LEARNING INFERENCE FROM CONSTRAINED EMBEDDED DEVICES
IAEME Publication
 
“Trends in Neural Network Topologies for Vision at the Edge,” a Presentation ...
Edge AI and Vision Alliance
 
Deep Learning Hardware: Past, Present, & Future
Rouyun Pan
 
Deep learning and applications in non-cognitive domains I
Deakin University
 
EchoBay: optimization of Echo State Networks under memory and time constraints
NECST Lab @ Politecnico di Milano
 
AI&BigData Lab. Артем Чернодуб "Распознавание изображений методом Lazy Deep ...
GeeksLab Odessa
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
"Large-Scale Deep Learning for Building Intelligent Computer Systems," a Keyn...
Edge AI and Vision Alliance
 
Deep Learning, Microsoft Cognitive Toolkit (CNTK) and Azure Machine Learning ...
Naoki (Neo) SATO
 
Towards AGI - Berlin May 2019
Peter Morgan
 
Towards AGI Berlin - Building AGI, May 2019
Peter Morgan
 
Deep Networks with Neuromorphic VLSI devices
Giacomo Indiveri
 
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
Duy-Hieu Bui
 
Towards a General Theory of Intelligence - April 2018
Peter Morgan
 
Edge AI Miramond technical seminCERN.pdf
yagab5011
 
AI is Impacting HPC Everywhere
inside-BigData.com
 
Icon18revrec sudeshna
Muthusamy Chelliah
 
Hardware for Deep Learning AI ML CNN.pdf
AhmedSaeed115917
 
Study of End to End memory networks
ASHISH MENKUDALE
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
MLconf
 
ACCELERATED DEEP LEARNING INFERENCE FROM CONSTRAINED EMBEDDED DEVICES
IAEME Publication
 

More from Edge AI and Vision Alliance (20)

PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
PDF
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
Edge AI and Vision Alliance
 
PDF
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
Edge AI and Vision Alliance
 
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
Edge AI and Vision Alliance
 
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
Edge AI and Vision Alliance
 
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
Edge AI and Vision Alliance
 
PDF
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
Edge AI and Vision Alliance
 
PDF
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
Edge AI and Vision Alliance
 
PDF
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
PDF
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
Edge AI and Vision Alliance
 
PDF
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
PDF
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
PDF
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
PDF
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
Edge AI and Vision Alliance
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
Edge AI and Vision Alliance
 
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
Edge AI and Vision Alliance
 
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
Edge AI and Vision Alliance
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
Edge AI and Vision Alliance
 
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
Edge AI and Vision Alliance
 
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
Edge AI and Vision Alliance
 
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
Edge AI and Vision Alliance
 
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
Edge AI and Vision Alliance
 
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
Edge AI and Vision Alliance
 
Ad

Recently uploaded (20)

PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PPTX
Top Managed Service Providers in Los Angeles
Captain IT
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PPTX
Darren Mills The Migration Modernization Balancing Act: Navigating Risks and...
AWS Chicago
 
PDF
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
PDF
Rethinking Security Operations - SOC Evolution Journey.pdf
Haris Chughtai
 
PDF
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
PDF
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
Top Managed Service Providers in Los Angeles
Captain IT
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Darren Mills The Migration Modernization Balancing Act: Navigating Risks and...
AWS Chicago
 
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
Rethinking Security Operations - SOC Evolution Journey.pdf
Haris Chughtai
 
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
Ad

“Temporal Event Neural Networks: A More Efficient Alternative to the Transformer,” a Presentation from BrainChip

  • 1. Chris Jones Director Product Management BrainChip Inc. Temporal Event Neural Networks: A More Efficient Alternative to the Transformer
  • 2. Brainchip AI – At a Glance • First to commercialize neuromorphic IP platform and reference chip. • 15+ yrs fundamental research • 65+ data science, hardware & software engineers • Publicly traded Austrialian Stock Exchange (BRD:ASX) • 10 Customers – Early Access, Proof of Concept, IP License *Fulfillment through VVDN technologies ©2024 BrainChip Inc. PRODUCTS IP Reference SoC Software Tools TRUSTED BY PARTNERS Edge Box* 2
  • 3. • Provide path to run complex models on the Edge • Reduce cost of training • Reduce cost of inference Key Focal Areas ©2024 BrainChip Inc. ©2024 BrainChip Inc. 3
  • 4. Temporal Event Neural Networks (TENNs) ©2024 BrainChip Inc. 4
  • 5. Change the Game ©2024 BrainChip Inc. Unleash Unprecedented Edge Devices ONE DIMENSIONAL STREAMING DATA Up to 5000X More Energy Efficient Up to 50X Fewer Parameters Same Or Better Accuracy 10-30X Lower Training cost vs. GPT-2 5
  • 6. TENNs Application Areas ©2024 BrainChip Inc. 1. Multi-dimensional streaming requiring spatiotemporal integration (3D) • Video object detection – frames are correlated in time. • Action recognition – classifying an action across many frames • Video frame prediction – path prediction & planning 2. Sequence classification and generation in time: • Raw audio classification: keyword spotting without MFCC preprocessing • Audio denoising: generate contextual denoising • ASR and GenAI: compressing LLMs 3. Any other sequence classification or prediction algorithms • Healthcare: vital signs estimation • Anything that can be transformed into a time-series/sequence prediction problem Spatiotemporal Integration Kinetics400 KITT I Sequence classification & generation BIDMC Vital Signs SC10 Raw Audio Microsoft DNS Challenge 6
  • 7. Improve Video Object Detection ©2024 BrainChip Inc. Frame Based Camera Comparison (vs SimCLR + ResNet50 using Kitti2D Dataset**) Network mAP (%) Parameters (millions) MACs / sec (Billions) Akida TENN* + CenterNet 57.6 0.57 18 Equivalent precision 50x fewer parameters 5x fewer operations < 20 mW For 30 FPS in 7 nm*** Resolution 1382 x 512 Event Based Camera Comparison (vs Gray Retinanet + Prophesee Road Object Dataset*) Network mAP (%) Parameters (millions) MACs / sec (Billions) Akida TENN* + CenterNet 56 0.57 94 30% better precision 50x fewer parameters 30x fewer operations Resolution 1280 x 720 * Gray Retinanet is the latest state of art in event-camera object detection ** SimCLR with a RESNET50 backbone is the benchmark in object detection -- Source: SiMCLR Review *** Estimates for Akida neural processing scaled from 28 nm 7
  • 8. TENN Can Be Extended to Spatio-Temporal Data ©2024 BrainChip Inc. DVS Hand Gesture Recognition: IBM DVS128 Dataset State of the Art Network Accuracy (%) Parameters MACs (billion) / sec Latency* (ms) TrueNorth-CNN 96.5 18 M - 155 Loihi-Slayer 93.6 - - 1450 ANN-Rollouts 97.0 500 k 10.4 1500 TA-SNN 98.6 - - 1500 Akida-CNN 95.2 138 k 0.12 200 TENN-Fast 97.6 192 k 0.429 105 TENN 100.0 192 k 0.499 510 8
  • 9. Enhance Raw Audio and Speech Processing ©2024 BrainChip Inc. 9
  • 10. Task: Audio Denoising Comparison of TENN Versus SoTA Model Deep Filter Net V1 TENN Deep Filter Net V2 Deep Filter Net V3 PESQ 2.49 2.61 2.67 2.68 Params (relative to TENN) 2.98 1 3.86 3.56 MACs (relative to TENN) 11.7 1 12.1 11.5 BRAINCHIP | TENN STFT iSTFT Conv1D/LSTM/ GRU Traditional Denoising Model Approach TENNs TENNs Model Approach Potentially consume 50%+ of total power STFT/iSTFT overhead and BOM not needed with TENNs • Audio denoising isolates a voice signal obscured by background noise • Traditional approach employs computationally intensive time domain to frequency domain transform and the inverse transform • TENNs approach avoids expensive data transformations ©2024 BrainChip Inc. 10
  • 11. TENN vs GPT2 Single thread CPU performance, 11th Gen Intel i7 - 3.00 GHz Both models were prompted with the first 1024 words of the Harry Potter 1st novel > 2100 tokens/minute < 10 tokens/minute ©2024 BrainChip Inc. ©2024 BrainChip Inc. 11
  • 12. Task: Sentence Generation Model GPT2 Small GPT2 Medium TENN Mamba 130M GPT2 large GPT2 full Mamba 370M Train_size 13 GB 13GB 0.1 GB 836GB 13GB 13GB 836GB Score 9.7 10.2 10.3 10.4 10.4 10.8 10.9 Params (relative to TENN) 1.35 4.8 1 2.06 10.4 21.7 5.9 Energy (relative to TENN) 1700 5700 1 2.06 13000 27000 5.9 Training Time (relative to TENN) ~768 GPU hours 21x ~2264 GPU hours 62.8x 35 GPU hours 1. TENN trained on WikiText-103. 100M tokens 2. GPT models trained on open_web_text, Mamba trained on the Pile 3. TENN training time: ~1.5 days on (1) A100 (35 GPU hours) 4. GPT-2 Small training time: 4 days on (8) A100 (768 hours) 5. GPT-2 Medium estimated training time 6. Scores reported as negative entropy:−𝑙𝑜𝑔2 1/𝑉𝑜𝑐𝑎𝑏𝑆𝑖𝑧𝑒 − 𝑙𝑜𝑔2 𝑝𝑒𝑟𝑝𝑙𝑒𝑥𝑖𝑡𝑦 (higher better) 7. Input (context) was 1024 tokens ©2024 BrainChip Inc. ©2024 BrainChip Inc. 12
  • 14. • Colored plane represents the continuous kernel we’re trying to learn • Red arrows represent the individual weights in a 7x7 filter • A large number of weights requires a large amount of computation • Results in slow training and large memory bottlenecks Learning Continuous Convolution Kernels ©2024 BrainChip Inc. 14
  • 15. Representing Convolution Kernels with Orthogonal Polynomials ©2024 BrainChip Inc. Chebyshev polynomial basis can lead to exponential convergence for a wide range of functions, including those with singularities or discontinuities.* *Lloyd N. Trefethen. 2019. Approximation Theory and Approximation Practice, Extended Edition. SIAM- Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. • TENNs learns the continuous kernel directly through polynomial expansion. • Learn coefficients for polynomials through backpropagation. • Training is much faster because the polynomial coefficients (weights) converge independently and do not affect each other due to polynomials being orthogonal to each other. Chebyshev polynomial 15
  • 16. Visualizing the Computation ©2024 BrainChip Inc. 22 23 24 25 Polynomials Coefficients 𝐶1−12 ∙ 𝑎𝑙 Input Buffer 𝐼(𝑡) h(t − τ) =෍ 𝑙=0 𝐿 𝑎𝑙 𝐶𝑙 𝑡 − τ Kernel 𝜒 𝑡 = h ∗ 𝐼 𝑡 = න 𝑡−𝐷 𝑡 h t − τ 𝐼 𝜏 𝑑𝜏 ≈ ෍ 𝑘=22 25 h t − 𝑘 𝐼 𝑘 Time (𝑡) Convolution Convolution: ℎ [0.011, 0.871, 0.235, 0.678, 0.547, 0.298, 0.045, 0.945, 0.478, 0.284, 0.765, 0.199] h ∙ = 𝑎1 𝐶1 ∙ + 𝑎2 𝐶2 ∙ + 𝑎3 𝐶3 ∙ + 𝑎4 𝐶4 ∙ + 𝑎5 𝐶5 ∙ + 𝑎6 𝐶6 ∙ + 𝑎𝑛 𝐶𝑛 ∙ 𝜒 𝑡 = 25 = σ𝑘=22 25 h 25 − 𝑘 𝐼 𝑘 = ℎ(3) 𝐼(22) + ℎ(2) 𝐼(23) + ℎ(1) 𝐼(24) + ℎ(0) 𝐼(25) 𝜒 Nonlinear Output: 𝑜 𝑡 = 𝑓 𝜒 𝑡 𝑓 ∙ : nonlinear activation function: 16
  • 17. Buffer Mode vs Recurrent Mode ©2024 BrainChip Inc. Recurrence: Chebyshev polynomials have a recurrence relationship. Duality: This particular recurrence imputes duality to buffer mode as well as recurrent mode. Buffer (Convolutional) Mode Overview Buffering inputs over time Benefit Speed up training by reading the memory buffer in parallel Training stability improved by orthogonality Drawbacks Higher memory usage Recurrent Mode Overview Update previous state over time Benefit Save memory by generating polynomials recurrently, timestep-by-timestep Lower memory usage benefits inference Drawback Training has to be done sequentially 17
  • 18. Getting It to Market ©2024 BrainChip Inc. 18
  • 19. ©2024 BrainChip Inc. Key Hardware Features • Digital, event-based, at memory compute • Highly scalable • Each node connected by mesh network • Inside each node is an event-based TENN processing unit Hardware IP to Run TENNs on the Edge 19
  • 20. Fundamentally different. Extremely efficient. Brainchip’s Differentiation: Akida Technology Foundations ©2024 BrainChip Inc. 20
  • 21. BrainChip Resources ©2024 BrainChip Inc. TENNs Paper “Building Temporal Kernels with Orthogonal Polynomials https://bit.ly/brainchip_tenns TENNs White Paper https://brainchip.com/temporal-event-based-neural-networks-a-new-approach-to-temporal-processing/ Akida 2nd Generation https://brainchip.com/wp-content/uploads/2023/03/BrainChip_second_generation_Platform_Brief.pdf BrainChip Enablement Platforms https://brainchip.com/akida-enablement-platforms/ Visit Us @ Booth #618 21
  • 23. Improve Efficiency Without Compromising Accuracy ©2024 BrainChip Inc. Simplifies solution to complex problems Reduces model size and footprint without loss in accuracy Easy to train (CNN-like pipeline) Supports longer range dependencies than RNNs Temporal Event Based Neural Nets (TENNs) 23
  • 24. Principles: 1. Recurrence: Chebyshev and Legendre polynomials have recurrence relationship. 2. Duality: Recurrence imputes duality: Buffer mode as well as recurrent mode. 3. Stable training: Train in buffer mode 4. Fast Running: Run in recurrent mode. Small foot- print 5. Insight: TENNs and SSM are a stack of generalized Fourier filters running in a recurrent mode, with non-linearities between layers. TENN Has Two Modes: Buffer and Recurrent Modes ©2024 BrainChip Inc. Recurrent Mode 24
  • 25. TENN Has Two Modes: Buffer and Recurrent Modes ©2024 BrainChip Inc. h 𝑡 = σ𝑙=0 𝐿 𝑎𝑙 𝐶𝑙 𝑡 kernel convolution Buffer mode: buffer for h(t) & buffer for I(t) convolution: dot product over 2 buffers Recurrent mode: h 𝑡 = σ𝑙=0 𝐿 𝑎𝑙 𝐶𝑙 𝑡 kernel L convolutions over polynomials 𝜒𝑙 = 𝐶𝑙 ∗ 𝐼(𝑡) kernel convolution 𝜒 = σ𝑙=0 𝐿 𝑎𝑙 𝜒𝑙 𝜒 = h ∗ 𝐼(𝑡) 𝜒 = ෩ 𝒉 ∙ 𝑰 = σ𝑘 ෩ 𝒉𝑘𝐼𝑘 Entire kernel is stored in a memory buffer accessible at once Convolution is computed in conventional way Polynomials generated recurrently, timestep by timestep & not stored in memory Convolution of input over L polynomials computed timestep by timestep, accumulated over time; L separate convolutions Kernel convolution is L polynomial convolutions weighted by the polynomial coefficients & summed Buffer mode for fast parallel training: Recurrent mode saves memory : 25