Derek Chow, Software Engineer at Google, and Shang-Hung Lin, Vice President of NPU Technology at VeriSilicon, co-present the “Image Tokenization for Distributed Neural Cascades” tutorial at the May 2025 Embedded Vision Summit.
Multimodal LLMs promise to bring exciting new abilities to devices! As we see foundational models become more capable, we see compute requirements grow as well. It is not uncommon to see LLMs grow to tens of billions of parameters, at a rate faster than what embedded processors can provide.
In this talk, Chow and Lin introduce the concept of a “neural cascade,” a scheme that allows for division of computation across devices. They present a recipe for constructing a neural cascade from a pre-existing LLM and they show how this system harmonizes edge and cloud devices to enable new experiences.