SlideShare a Scribd company logo
12
Most read
17
Most read
20
Most read
Image Tokenization for
Distributed Neural Cascades
Derek Chow
Software Engineer, Google
Shang-Hung Lin
Vice President of NPU Technology
VeriSilicon
What is Tokenization?
Tokenization is the process of converting a sensor modality into a neural encoding.
© 2025 VeriSilicon and Google
2
Examples of Tokenizers
© 2025 VeriSilicon and Google 3
Tokenizer is a Feature Extractor
© 2025 VeriSilicon and Google 4
ResNet101
Classification Detection
Segmentation
• Serves as a feature extractor for a
neural network
• Enables features like classification,
generation, RAG
Multimodal AI
© 2025 VeriSilicon and Google 5
© 2025 VeriSilicon and Google 6
SigLIP / Gemma
Tokenization Creates a Form a Data Compression
© 2025 VeriSilicon and Google 7
• Tokenizer and detokenizer act as a Codec
• Saves power during transmission
• Saves capacity at rest
Compute Memory Bandwidth
High High High
Medium Medium Medium
Medium Low Low
Low Low Low
Low Low Low
© 2025 VeriSilicon and Google 8
Diverse Hardware Ecosystem
© 2025 VeriSilicon and Google 9
World’s Leading Smart Home Products
Can we combine the strengths of
multiple devices for GenAI experiences?
We think yes.
© 2025 VeriSilicon and Google 10
Anatomy of a Neural Cascade
© 2025 VeriSilicon and Google 11
Yes
Yes
No
Tokenizer
Tokenizer
Tokenizer
Image
Tokens
Image
Tokens
Gating
Model
Building a Large Gating Model
• We can build a gating model using a VLM
• Provide a prompt to describe what you
want to detect. i.e.: “Is there an animal
present?”
• Feed tokenized image into VLM
• Check probability of emitting “Yes” or “No”
© 2025 VeriSilicon and Google 12
“Is there an
animal present?”
Text
Embedder
VLM
Image
Tokenizer
P(“Yes”), P(“No”)
VLM
Based
Gating
Model
Distilling a Smaller Gating Model
© 2025 VeriSilicon and Google 13
“Is there an
animal present?”
Text
Embedder
VLM
Image
Tokenizer
P(“Yes”), P(“No”)
Student Gating Model
Teacher
Gating
Model
Gradient
Updates
Composing Models
© 2025 VeriSilicon and Google 15
Image
Tokenizer
Distilled
Animal
Detector
VLM
“Describe what the
animal is doing”
“The squirrel is eating
your avocado!”
Image Tokens
Image Tokens
Embedded
Device
Cascades Beyond Two Devices
© 2025 VeriSilicon and Google 16
Image
Tokens
Audio
Tokens
Health
Tokens
RAG
Queries
Squeezing Neural Cascade Frontend into Small Devices
• Knowledge distillation
17
• Quantization
• Sparsity, weight sharing
• Hybrid architecture
© 2025 VeriSilicon and Google
Image Token Compression
• Reducing image token numbers by text prompt
© 2025 VeriSilicon and Google 18
QueCC (ICLR 2025, arxiv:2411.03312)
16x
Compression
Ratio
36x
144x
576x
19
Project Open Se Cura – Edge and Cloud Collaborative
Computing
Extremely low power consumption
• Always on
• Ambient computing
Realizing large models everywhere
• Responsiveness
• Privacy (local & cloud)
• Computational resources
Cloud computing
© 2025 VeriSilicon and Google
Kelvin: A RISC-V ML Accelerator for Edge
Kelvin is a RISC-V based ML Accelerator
• Open-source design as part of Open Se Cura
• Provides familiar framework for programming
ML kernels to experts with SIMD/GPU
experience
• Support for RISC-V Vector and Matrix
extensions is in development, targeting 256+
MACs/cycle
• Security extensions via CHERI are on our
roadmap
© 2025 VeriSilicon and Google 20
S
C
A
L
A
R
ML
SIMD SIMD
T
C
M
VeriSilicon AI-Computing IP Product Lineup
© 2025 VeriSilicon and Google 21
Inferencing
Training
Inferencing
VIP9X00
(NPU IP)
CC9X00TC-MP
(GPGPU+NPU IP)
Embedded
Devices
Data Center
Server Chips
Edge Serer
Chips
VIP9X00CC
(NPU+GPGPU IP)
VIP
Nano/PICO
Sub TOPS
Inferencing
Incremental
Training
22
High Efficiency Inference NPU for VLMs & LLMs
Qwen2
1.5B
VIP9000
4 TOPS
16 GB/s
LLaMA2
7B
VIP9000
40 TOPS
128 GB/s
LLaMA3
70B
VIP9400
160 TOPS
512 GB/s
Embedded Devices AI-PC, Mobile Edge Server
© 2025 VeriSilicon and Google
Summary and Challenges
Summary
• Tokenizers provide a framework
building multi-modal LLMs
• Distillation based training can
create a gating mechanism to
separate tokenizers from the LLM
• Once separated, compute can be
distributed between embedded
devices and the cloud
Challenges
• Technical
• Memory and compute scaling for
tokenizers and LLMs
• Infrastructure for training
distributed models
• Ecosystem
• Changing model landscape
• Diverse hardware landscape
• Fostering community
23
© 2025 VeriSilicon and Google
Gemma
https://ai.google.dev/gemma
Project Open Se Cura
https://www.opensecura.googlesourc
e.com
VeriSilicon NPU IP
https://www.verisilicon.com/en/IPPor
tfolio/VivanteNPUIP
2025 Embedded Vision Summit
Visit us at booth 508!
24
Resources
MAIN
ENTRANCE
© 2025 VeriSilicon and Google

More Related Content

Similar to “Image Tokenization for Distributed Neural Cascades,” a Presentation from Google and VeriSilicon (20)

PDF
CHIPS Alliance_Object Automation Inc_workshop
Object Automation
 
PDF
Kernel Con 2022: Securing Cloud Native Workloads
Gabriel Schuyler
 
PPTX
Building Modern Platforms on Microsoft Azure by Steef-Jan Wiggers
Codit
 
PDF
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Neo4j
 
PPTX
Bahrain ch9 introduction to docker 5th birthday
Walid Shaari
 
PDF
Project calico - introduction
Hazzim Anaya
 
PPTX
StampedeCon 2015 Keynote
Ken Owens
 
PPTX
How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015
StampedeCon
 
PPTX
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
Mark Hinkle
 
PDF
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz
 
PDF
Belgium & Luxembourg dedicated online Data Virtualization discovery workshop
Denodo
 
PDF
Cisco Connect 2018 Thailand - Secure, intelligent platform for the digital bu...
NetworkCollaborators
 
PDF
56k.cloud intro and pitch deck
Brian Christner
 
PDF
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
Edge AI and Vision Alliance
 
PPTX
Insurtech, Cloud and Cybersecurity - Chartered Insurance Institute
Henrique Centieiro
 
PDF
LambdaFabric for Machine Learning Acceleration
KnuEdge
 
PDF
Speed up Digital Transformation with Openstack Cloud & Software Defined Storage
Matthew Sheppard
 
PDF
What is the Polygon Chain Development Kit(CDK) A Comprehensive Guide.pdf
Prolitus Technologies
 
PPTX
Big data journey to the cloud 5.30.18 asher bartch
Cloudera, Inc.
 
PDF
Powerup & GCP | Workshop on Google Kubernetes Engine
Powerup
 
CHIPS Alliance_Object Automation Inc_workshop
Object Automation
 
Kernel Con 2022: Securing Cloud Native Workloads
Gabriel Schuyler
 
Building Modern Platforms on Microsoft Azure by Steef-Jan Wiggers
Codit
 
Amsterdam - The Neo4j Graph Data Platform Today & Tomorrow
Neo4j
 
Bahrain ch9 introduction to docker 5th birthday
Walid Shaari
 
Project calico - introduction
Hazzim Anaya
 
StampedeCon 2015 Keynote
Ken Owens
 
How Cisco Migrated from MapReduce Jobs to Spark Jobs - StampedeCon 2015
StampedeCon
 
OSCON 2013 - The Hitchiker’s Guide to Open Source Cloud Computing
Mark Hinkle
 
Zilliz Cloud Monthly Technical Review: May 2025
Zilliz
 
Belgium & Luxembourg dedicated online Data Virtualization discovery workshop
Denodo
 
Cisco Connect 2018 Thailand - Secure, intelligent platform for the digital bu...
NetworkCollaborators
 
56k.cloud intro and pitch deck
Brian Christner
 
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
Edge AI and Vision Alliance
 
Insurtech, Cloud and Cybersecurity - Chartered Insurance Institute
Henrique Centieiro
 
LambdaFabric for Machine Learning Acceleration
KnuEdge
 
Speed up Digital Transformation with Openstack Cloud & Software Defined Storage
Matthew Sheppard
 
What is the Polygon Chain Development Kit(CDK) A Comprehensive Guide.pdf
Prolitus Technologies
 
Big data journey to the cloud 5.30.18 asher bartch
Cloudera, Inc.
 
Powerup & GCP | Workshop on Google Kubernetes Engine
Powerup
 

More from Edge AI and Vision Alliance (20)

PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
PDF
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
Edge AI and Vision Alliance
 
PDF
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
Edge AI and Vision Alliance
 
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
Edge AI and Vision Alliance
 
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
Edge AI and Vision Alliance
 
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
Edge AI and Vision Alliance
 
PDF
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
Edge AI and Vision Alliance
 
PDF
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
PDF
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
Edge AI and Vision Alliance
 
PDF
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
PDF
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
PDF
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
PDF
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
Edge AI and Vision Alliance
 
PDF
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
Edge AI and Vision Alliance
 
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
Edge AI and Vision Alliance
 
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
Edge AI and Vision Alliance
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
Edge AI and Vision Alliance
 
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
Edge AI and Vision Alliance
 
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
Edge AI and Vision Alliance
 
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
Edge AI and Vision Alliance
 
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
Edge AI and Vision Alliance
 
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
Ad

Recently uploaded (20)

PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PPTX
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PPTX
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PPTX
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
PDF
Market Insight : ETH Dominance Returns
CIFDAQ
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
The Future of Artificial Intelligence (AI)
Mukul
 
AVL ( audio, visuals or led ), technology.
Rajeshwri Panchal
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Agile Chennai 18-19 July 2025 | Workshop - Enhancing Agile Collaboration with...
AgileNetwork
 
Market Insight : ETH Dominance Returns
CIFDAQ
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
Ad

“Image Tokenization for Distributed Neural Cascades,” a Presentation from Google and VeriSilicon

  • 1. Image Tokenization for Distributed Neural Cascades Derek Chow Software Engineer, Google Shang-Hung Lin Vice President of NPU Technology VeriSilicon
  • 2. What is Tokenization? Tokenization is the process of converting a sensor modality into a neural encoding. © 2025 VeriSilicon and Google 2
  • 3. Examples of Tokenizers © 2025 VeriSilicon and Google 3
  • 4. Tokenizer is a Feature Extractor © 2025 VeriSilicon and Google 4 ResNet101 Classification Detection Segmentation • Serves as a feature extractor for a neural network • Enables features like classification, generation, RAG
  • 5. Multimodal AI © 2025 VeriSilicon and Google 5
  • 6. © 2025 VeriSilicon and Google 6 SigLIP / Gemma
  • 7. Tokenization Creates a Form a Data Compression © 2025 VeriSilicon and Google 7 • Tokenizer and detokenizer act as a Codec • Saves power during transmission • Saves capacity at rest
  • 8. Compute Memory Bandwidth High High High Medium Medium Medium Medium Low Low Low Low Low Low Low Low © 2025 VeriSilicon and Google 8 Diverse Hardware Ecosystem
  • 9. © 2025 VeriSilicon and Google 9 World’s Leading Smart Home Products
  • 10. Can we combine the strengths of multiple devices for GenAI experiences? We think yes. © 2025 VeriSilicon and Google 10
  • 11. Anatomy of a Neural Cascade © 2025 VeriSilicon and Google 11 Yes Yes No Tokenizer Tokenizer Tokenizer Image Tokens Image Tokens Gating Model
  • 12. Building a Large Gating Model • We can build a gating model using a VLM • Provide a prompt to describe what you want to detect. i.e.: “Is there an animal present?” • Feed tokenized image into VLM • Check probability of emitting “Yes” or “No” © 2025 VeriSilicon and Google 12 “Is there an animal present?” Text Embedder VLM Image Tokenizer P(“Yes”), P(“No”) VLM Based Gating Model
  • 13. Distilling a Smaller Gating Model © 2025 VeriSilicon and Google 13 “Is there an animal present?” Text Embedder VLM Image Tokenizer P(“Yes”), P(“No”) Student Gating Model Teacher Gating Model Gradient Updates
  • 14. Composing Models © 2025 VeriSilicon and Google 15 Image Tokenizer Distilled Animal Detector VLM “Describe what the animal is doing” “The squirrel is eating your avocado!” Image Tokens Image Tokens Embedded Device
  • 15. Cascades Beyond Two Devices © 2025 VeriSilicon and Google 16 Image Tokens Audio Tokens Health Tokens RAG Queries
  • 16. Squeezing Neural Cascade Frontend into Small Devices • Knowledge distillation 17 • Quantization • Sparsity, weight sharing • Hybrid architecture © 2025 VeriSilicon and Google
  • 17. Image Token Compression • Reducing image token numbers by text prompt © 2025 VeriSilicon and Google 18 QueCC (ICLR 2025, arxiv:2411.03312) 16x Compression Ratio 36x 144x 576x
  • 18. 19 Project Open Se Cura – Edge and Cloud Collaborative Computing Extremely low power consumption • Always on • Ambient computing Realizing large models everywhere • Responsiveness • Privacy (local & cloud) • Computational resources Cloud computing © 2025 VeriSilicon and Google
  • 19. Kelvin: A RISC-V ML Accelerator for Edge Kelvin is a RISC-V based ML Accelerator • Open-source design as part of Open Se Cura • Provides familiar framework for programming ML kernels to experts with SIMD/GPU experience • Support for RISC-V Vector and Matrix extensions is in development, targeting 256+ MACs/cycle • Security extensions via CHERI are on our roadmap © 2025 VeriSilicon and Google 20 S C A L A R ML SIMD SIMD T C M
  • 20. VeriSilicon AI-Computing IP Product Lineup © 2025 VeriSilicon and Google 21 Inferencing Training Inferencing VIP9X00 (NPU IP) CC9X00TC-MP (GPGPU+NPU IP) Embedded Devices Data Center Server Chips Edge Serer Chips VIP9X00CC (NPU+GPGPU IP) VIP Nano/PICO Sub TOPS Inferencing Incremental Training
  • 21. 22 High Efficiency Inference NPU for VLMs & LLMs Qwen2 1.5B VIP9000 4 TOPS 16 GB/s LLaMA2 7B VIP9000 40 TOPS 128 GB/s LLaMA3 70B VIP9400 160 TOPS 512 GB/s Embedded Devices AI-PC, Mobile Edge Server © 2025 VeriSilicon and Google
  • 22. Summary and Challenges Summary • Tokenizers provide a framework building multi-modal LLMs • Distillation based training can create a gating mechanism to separate tokenizers from the LLM • Once separated, compute can be distributed between embedded devices and the cloud Challenges • Technical • Memory and compute scaling for tokenizers and LLMs • Infrastructure for training distributed models • Ecosystem • Changing model landscape • Diverse hardware landscape • Fostering community 23 © 2025 VeriSilicon and Google
  • 23. Gemma https://ai.google.dev/gemma Project Open Se Cura https://www.opensecura.googlesourc e.com VeriSilicon NPU IP https://www.verisilicon.com/en/IPPor tfolio/VivanteNPUIP 2025 Embedded Vision Summit Visit us at booth 508! 24 Resources MAIN ENTRANCE © 2025 VeriSilicon and Google