SlideShare a Scribd company logo
3
Most read
12
Most read
13
Most read
Accelerating Newer ML
Models Using
Qualcomm® AI Stack
Dr. Vinesh Sukumar
Sr Director – AI/ML Product
Qualcomm Technologies, Inc.
Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.
On-device
intelligence is
paramount
Process data closest to the
source, complement the cloud
Privacy
Reliability
Low latency
Efficient use of
network bandwidth
Center of Gravity Moving to the Edge…
Historically
Increased Demand
Personalization
Security
Autonomy
Efficiency
© 2023 Qualcomm Technologies Inc. 2
AI Applications : Across Various Segments
Mobile CSS Compute Cloud Auto
AI Assisted Imaging
• AI 3A
• Scene-based Camera Selection
Image Understanding
• Face Detection / Tracking /
Features
• Object Detection / Tracking
• Body Detection / Tracking / Pose
• Human Segmentation
Beautify / Augment / Gaming
• Scene-based Image Enhancement
Image Processing
• AI based NR or Image SR
• Scene-based Camera Selection
Audio
• Real time language
• Natural language processing (NLP)
Modem
• Sensor Fusion (Cont. awareness)
• Modem RF E2E (Tuners..)
Robotics
• Autonomous navigation
• Obstacle Avoidance
Productivity
• Background based noise
cancellation on Audio (inbound and
outbound)
• Segmentation/Blur/Super
Resolution on Video
Data Centers
• Natural language processing
• Computer vision
• Recommendation system
Edge Compute
• Theft detection
• Face/body/license plate detection
/ recognition
• Image classification and
segmentation
IVI
• Occupancy monitoring system
(OMS)
• Driver monitoring system (DMS)
• Surround perception
• Audio Command & Control
Retail
• Visitor/Face/Gesture Recognition
• Object/People Detection and
Counting
• Barcode decoding
Privacy & Security
• Automatic screen unlock and login
• Privacy alert
• Guard mode
ADAS (Up to L4)
• Highway driving assist
• Front collision warning
• lane departure,
• Traffic jam assist
• Auto lane change
• Auto lane merge
• Traffic light recognition
• Construction zones
• Urban autonomous
driving
• Parking assist
• Person detection,
• Perception
• Valet parking
• Driver monitoring
Transportation
• License plate recognition
• Face and facial landmark detection
• Drowsiness detection
Content Creation & Gaming
• Gaming with gesture control
• Gaming with voice commands
• Intelligent highlight videos
• Game play improvement
XR
Smart Devices
• Object/People detection
• Speaker detection Metaverse
• Person and Object Detection
• Recommendation Engine &
Chatbots
• Multilingual translation (speech-to-
speech)
• Neural Super Resolution
• Content Summarization
Smart Buildings
• People Tracking
• Access Control
Performance & Efficiency
• Power and Screen optimization
Manufacturing/Logistics
• Predictive maintenance
• Energy management with Asset
demand
© 2023 Qualcomm Technologies Inc. 3
Emerging AI Models – For the Various Markets
Emerging Deep
Learning Models
Generative networks
(Image to Image Transformation)
Time series networks
(Behavior to Text Transformation)
Transformer networks (NLP/NLU)
(Sequence to Sequence transformation)
Canvas networks
(Virtual Transformation for Avatars)
© 2023 Qualcomm Technologies Inc. 4
Vision: Accelerate Solution Deployment
Performance Scalability Innovation
Tools
Accelerate “out of box”
operator functionality
and performance
Ability to have
programming consistency
from Cloud to Edge
Accelerate AI
solution deployment
with investment in tools
Innovation to drive
product leadership
(Pre-emption, DFS, Multi chaining)
© 2023 Qualcomm Technologies Inc. 5
STEP: 1 -> Model Optimization Using NAS
Integrated into Qualcomm® Software Stack
Space of allowable
architectures (Structure,
operations, connectivity)
Sampling populations of
good architecture
candidates
Estimate performance of
sampled architecture
Search Space
Search Algorithm
Evaluation Strategy
How 
© 2023 Qualcomm Technologies Inc. 6
NAS Results: Observations from ML Models
Category Model Task Dataset Results
CNNs
EfficientNet-B0
Image Classification
ImageNet
+1.0% accuracy
33% latency reduction
ResNet-18 ImageNet
+2.2% accuracy
31% latency reduction
RetinaNet
2D Object Detection
Pascal
+1.5 mAP accuracy
11% latency reduction
EfficientDet-D0 COCO
+0.8 mAP accuracy
30% latency reduction
RNNs CRNN Keyword Spotting
Google Speech Commands
v2
+1.0% accuracy
similar model size
Transformers MobileBERT Question & Answering SQuAD v1.1
On-par accuracy
12% latency reduction
© 2023 Qualcomm Technologies Inc. 7
1: FP32 model compared to quantized model
Promising results show that
low-precision integer inference
can become widespread
Virtually the same accuracy
between a FP32 and quantized
AI model through:
• Automated, data free,
post-training methods
• Automated training-based
mixed-precision method
Automated reduction in precision
of weights and activations while
maintaining accuracy
Models trained at
high precision
32-bit floating point
3452.3194
8-bit Integer
255
Increase in performance
per watt from savings in
memory and compute
Inference at
lower precision
16-bit Integer
3452
01010101
up to
4X
4-bit Integer
15
01010101
up to
16X
up to
64X
01010101
0101
01010101 01010101 01010101 01010101
STEP: 2 -> New Techniques to Quantize Models
Integrated into Qualcomm Software Stack
© 2023 Qualcomm Technologies Inc. 8
Pushing the Limits – For Quantization & Pruning
Data-free quantization
Created an automated method
that addresses bias and imbalance
in weight ranges:
No training
Data free
How can we make quantization as
simple as possible?
AdaRound
Created an automated method for
finding the best rounding choice:
No training
Minimal unlabeled data
Is rounding to the nearest value
the best approach for quantization?
SOTA 8-bit results
Making 8-bit weight quantization ubiquitous
<1%
Accuracy drop for
MobileNet V2
against FP32 model
Making 4-bit weight quantization ubiquitous
<2.5%
Accuracy drop for
MobileNet V2
against FP32 model
Bayesian bits
Created a novel method to learn
mixed-precision quantization:
Training required
Training data required
Jointly learns bit-width precision and pruning
Can we quantize layers to different bit
widths based on precision sensitivity?
SOTA mixed-precision results
Automating mixed-precision quantization and
enabling the tradeoff between accuracy and
kernel bit-width
<1%
Accuracy drop for MobileNet V2 against
FP32 model for mixed precision model
with computational complexity
equivalent to a 4-bit weight model
SOTA 4-bit weight results
Highest Focus of Attention
© 2023 Qualcomm Technologies Inc. 9
Moving towards W4A8 – Newer ML Models
With better PTQ and QAT
techniques, increasingly more
models will be able to use W4A8,
resulting in better energy
efficiency  This is going to be
major push for AI solution
deployment on the edge
Model FP32 INT4 Accuracy Comments
ResNet50 76.1% 75.4%
Using Post-training
Quantization
(PTQ)
ResNet18 69.8% 69%
EfficientNet-Lite 75.3% 74.3%
Regnext 78.3% 77.2%
Mobilenet-v2 71.7% 71.3%
Using Quantization
Aware Training (QAT)
8bit Weights 4bit Weights
Segmentation
Models: Seeing >20%
power + >40% in
memory footprint
saving
© 2023 Qualcomm Technologies Inc. 10
Need for FP8
– Is this Needed for ML Model Inference?
Published in the Qualcomm Technologies “FP8” White Paper
• Strong participation from many silicon vendors on driving FP8 engagements
• Various E/M (exponent/mantissa) ratios to support dynamic range for data
representation
• FP8 is an appealing potential speed-up for the costly and time-intensive training procedures in
deep learning
• Need for Inference (observations) :
• The hardware implementation of the FP8 format is somewhere between 50% to 180%
less efficient than INT8 in terms of chip area and energy usage
• Can we convert FP8 to INT8 with good accuracy?
© 2023 Qualcomm Technologies Inc. 11
STEP: 3  Performance and Scalability Support
- Application Deployment
Integrated into Qualcomm Software Stack
Your–NN-framework
Training .onnx .onnx .onnx .onnx .onnx
.onnx .pb
CPU
Qualcomm® Neural Network Library
QML
NEON
KERNELS KERNELS
eAI
Open CL Open CL
Performance
Qualcomm®
Neural
Processing
SDK
ONNXRT
PT
Mobile
TF
-
Lite
NNAPI
TF-Lite
µ
GPU
Hexagon
Processor
Qualcomm®
Sensing
Hub
Profiler
Debugger
Visualizer
Compilers
Scalability
Runtime
framework
Low level
library
Qualcomm®
AI engine
On-device
execution/inference
© 2023 Qualcomm Technologies Inc. 12
Qualcomm Model Studio:
Accelerating ML Model Deployment
Workflow panel
Shows steps in a workflow including
tools, artifacts and their relationships
Graph panel
Model visualization, node
information (precision, etc.)
Metrics panel
Detailed information on selected
model, nodes including
performance info from execution
Integrated into Qualcomm Software Stack
© 2023 Qualcomm Technologies Inc. 13
© 2023 Qualcomm Technologies Inc.
Recently Deployed Applications –
Using Qualcomm AI Stack
Industry’s first low power gesture control + context
awareness to service recommendation – Launched on
Honor
Windows 11 features for video
+ audio AI – Launched on
ThinkPad X13S
© 2023 Qualcomm Technologies Inc. 15
Conclusions
• AI applications expanding beyond modalities of computer vision to linguistics, communication, commerce
and language understanding
• With evolution of AI applications, this continues to stress on support for new DL architectures & models
• Qualcomm AI Stack expands to enable support for any developer and drive innovation in performance,
latency, QoS among others. Focus on
• Advanced quantization mechanics
• Support for newer data types
• Neural architecture support
• Flexible run time for performance & portability
© 2023 Qualcomm Technologies Inc. 16
Resources
© 2023 Qualcomm Technologies Inc. 17
Qualcomm® Mobile AI
Mobile AI | On-Device AI | Qualcomm
Qualcomm Technologies & Google NAS
Qualcomm Technologies and Google Cloud Announce
Collaboration on Neural Architecture Search for the
Connected Intelligent Edge | Qualcomm
Dr. Vinesh Sukumar
Senior Director, Product Management – AI/ML
vinesuku@qti.qualcomm.com
2023 Embedded Vision Summit
4:15 pm: Develop Next-Gen Camera Apps Using
Snapdragon Computer Vision Technologies
- Judd Heape, VP of Product Management for Camera,
Computer Vision and Video Technology, Qualcomm
Technologies
Qualcomm Wireless Academy
Fundamentals of AI
Available for free until
October 2023
THANK YOU
© 2023 Qualcomm Technologies Inc. 18

More Related Content

What's hot (20)

PDF
AI at the Edge
DATAVERSITY
 
PDF
What is MLOps
Henrik Skogström
 
PPTX
Cloud computing
Siddiq Abu Bakkar
 
PDF
PR-297: Training data-efficient image transformers & distillation through att...
Jinwon Lee
 
PDF
Q1 Memory Fabric Forum: Memory Fabric in a Composable System
Memory Fabric Forum
 
PDF
MATLAB Final Year IEEE Project Titles 2023 - 2024.pdf
JAYAPRAKASH JPINFOTECH
 
PDF
India, Internet of things and the role of government
Syam Madanapalli
 
PDF
MLOps Bridging the gap between Data Scientists and Ops.
Knoldus Inc.
 
PDF
Silicon Photonics for Extreme Computing - Challenges and Opportunities
inside-BigData.com
 
PDF
NVIDIA Keynote #GTC21
Alison B. Lowndes
 
PDF
Deep Learning for Time Series Data
Arun Kejariwal
 
PPTX
IoT Standardization and Implementation Challenges
Ahmed Banafa
 
PDF
Introduction to TinyML - Solomon Muhunyo Githu
Solomon Githu
 
PPTX
MongoDB Atlas
MongoDB
 
PDF
byteLAKE and Lenovo presenting Federated Learning at MWC 2019
byteLAKE
 
PPTX
OpenVINO introduction
Yury Gorbachev
 
PDF
MLOps – Applying DevOps to Competitive Advantage
DATAVERSITY
 
PDF
MLflow Model Serving
Databricks
 
PPTX
MLOps - The Assembly Line of ML
Jordan Birdsell
 
AI at the Edge
DATAVERSITY
 
What is MLOps
Henrik Skogström
 
Cloud computing
Siddiq Abu Bakkar
 
PR-297: Training data-efficient image transformers & distillation through att...
Jinwon Lee
 
Q1 Memory Fabric Forum: Memory Fabric in a Composable System
Memory Fabric Forum
 
MATLAB Final Year IEEE Project Titles 2023 - 2024.pdf
JAYAPRAKASH JPINFOTECH
 
India, Internet of things and the role of government
Syam Madanapalli
 
MLOps Bridging the gap between Data Scientists and Ops.
Knoldus Inc.
 
Silicon Photonics for Extreme Computing - Challenges and Opportunities
inside-BigData.com
 
NVIDIA Keynote #GTC21
Alison B. Lowndes
 
Deep Learning for Time Series Data
Arun Kejariwal
 
IoT Standardization and Implementation Challenges
Ahmed Banafa
 
Introduction to TinyML - Solomon Muhunyo Githu
Solomon Githu
 
MongoDB Atlas
MongoDB
 
byteLAKE and Lenovo presenting Federated Learning at MWC 2019
byteLAKE
 
OpenVINO introduction
Yury Gorbachev
 
MLOps – Applying DevOps to Competitive Advantage
DATAVERSITY
 
MLflow Model Serving
Databricks
 
MLOps - The Assembly Line of ML
Jordan Birdsell
 

Similar to “Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation from Qualcomm (20)

PDF
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
Edge AI and Vision Alliance
 
PDF
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
Edge AI and Vision Alliance
 
PDF
The future of AI is hybrid
Qualcomm Research
 
PDF
“Powering the Connected Intelligent Edge and the Future of On-Device AI,” a P...
Edge AI and Vision Alliance
 
PDF
Leading Research Across the AI Spectrum
Qualcomm Research
 
PDF
Achieving AI @scale on Mobile Devices
Qualcomm Research
 
PDF
China AI Summit talk 2017
Dileep Bhandarkar
 
PDF
AI firsts: Leading from research to proof-of-concept
Qualcomm Research
 
PDF
Making AI Ubiquitous
Qualcomm Research
 
PDF
“A Practical Guide to Getting the DNN Accuracy You Need and the Performance Y...
Edge AI and Vision Alliance
 
PDF
On-Device AI
LGCNSairesearch
 
PDF
“Autonomous Driving AI Workloads: Technology Trends and Optimization Strategi...
Edge AI and Vision Alliance
 
PDF
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
Edge AI and Vision Alliance
 
PDF
“What We Need to Transform Lives and Industries with On-Device AI, Cloud and ...
Edge AI and Vision Alliance
 
PDF
Pushing the boundaries of AI research
Qualcomm Research
 
PDF
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
Edge AI and Vision Alliance
 
PDF
Generative AI at the edge.pdf
Qualcomm Research
 
PDF
“Benchmarking vs. Benchmarketing: Why Should You Care?,” a Presentation from ...
Edge AI and Vision Alliance
 
PDF
Accelerating algorithmic and hardware advancements for power efficient on-dev...
Qualcomm Research
 
PDF
FPGA Hardware Accelerator for Machine Learning
Dr. Swaminathan Kathirvel
 
“The Future of AI is Here Today: Deep Dive into Qualcomm’s On-Device AI Offer...
Edge AI and Vision Alliance
 
"Efficient Deployment of Quantized ML Models at the Edge Using Snapdragon SoC...
Edge AI and Vision Alliance
 
The future of AI is hybrid
Qualcomm Research
 
“Powering the Connected Intelligent Edge and the Future of On-Device AI,” a P...
Edge AI and Vision Alliance
 
Leading Research Across the AI Spectrum
Qualcomm Research
 
Achieving AI @scale on Mobile Devices
Qualcomm Research
 
China AI Summit talk 2017
Dileep Bhandarkar
 
AI firsts: Leading from research to proof-of-concept
Qualcomm Research
 
Making AI Ubiquitous
Qualcomm Research
 
“A Practical Guide to Getting the DNN Accuracy You Need and the Performance Y...
Edge AI and Vision Alliance
 
On-Device AI
LGCNSairesearch
 
“Autonomous Driving AI Workloads: Technology Trends and Optimization Strategi...
Edge AI and Vision Alliance
 
“Deploying Large Models on the Edge: Success Stories and Challenges,” a Prese...
Edge AI and Vision Alliance
 
“What We Need to Transform Lives and Industries with On-Device AI, Cloud and ...
Edge AI and Vision Alliance
 
Pushing the boundaries of AI research
Qualcomm Research
 
“What’s Next in On-device Generative AI,” a Presentation from Qualcomm
Edge AI and Vision Alliance
 
Generative AI at the edge.pdf
Qualcomm Research
 
“Benchmarking vs. Benchmarketing: Why Should You Care?,” a Presentation from ...
Edge AI and Vision Alliance
 
Accelerating algorithmic and hardware advancements for power efficient on-dev...
Qualcomm Research
 
FPGA Hardware Accelerator for Machine Learning
Dr. Swaminathan Kathirvel
 
Ad

More from Edge AI and Vision Alliance (20)

PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
PDF
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
PDF
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
Edge AI and Vision Alliance
 
PDF
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
Edge AI and Vision Alliance
 
PDF
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
PDF
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
Edge AI and Vision Alliance
 
PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
Edge AI and Vision Alliance
 
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
Edge AI and Vision Alliance
 
PDF
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
Edge AI and Vision Alliance
 
PDF
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
Edge AI and Vision Alliance
 
PDF
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
PDF
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
Edge AI and Vision Alliance
 
PDF
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
PDF
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
PDF
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
PDF
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
Edge AI and Vision Alliance
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
“Squinting Vision Pipelines: Detecting and Correcting Errors in Vision Models...
Edge AI and Vision Alliance
 
“ONNX and Python to C++: State-of-the-art Graph Compilation,” a Presentation ...
Edge AI and Vision Alliance
 
“Beyond the Demo: Turning Computer Vision Prototypes into Scalable, Cost-effe...
Edge AI and Vision Alliance
 
“Running Accelerated CNNs on Low-power Microcontrollers Using Arm Ethos-U55, ...
Edge AI and Vision Alliance
 
“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accele...
Edge AI and Vision Alliance
 
“A Re-imagination of Embedded Vision System Design,” a Presentation from Imag...
Edge AI and Vision Alliance
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
Edge AI and Vision Alliance
 
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
Edge AI and Vision Alliance
 
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
Edge AI and Vision Alliance
 
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
Edge AI and Vision Alliance
 
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
Edge AI and Vision Alliance
 
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
Edge AI and Vision Alliance
 
Ad

Recently uploaded (20)

PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Python basic programing language for automation
DanialHabibi2
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 

“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation from Qualcomm

  • 1. Accelerating Newer ML Models Using Qualcomm® AI Stack Dr. Vinesh Sukumar Sr Director – AI/ML Product Qualcomm Technologies, Inc. Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.
  • 2. On-device intelligence is paramount Process data closest to the source, complement the cloud Privacy Reliability Low latency Efficient use of network bandwidth Center of Gravity Moving to the Edge… Historically Increased Demand Personalization Security Autonomy Efficiency © 2023 Qualcomm Technologies Inc. 2
  • 3. AI Applications : Across Various Segments Mobile CSS Compute Cloud Auto AI Assisted Imaging • AI 3A • Scene-based Camera Selection Image Understanding • Face Detection / Tracking / Features • Object Detection / Tracking • Body Detection / Tracking / Pose • Human Segmentation Beautify / Augment / Gaming • Scene-based Image Enhancement Image Processing • AI based NR or Image SR • Scene-based Camera Selection Audio • Real time language • Natural language processing (NLP) Modem • Sensor Fusion (Cont. awareness) • Modem RF E2E (Tuners..) Robotics • Autonomous navigation • Obstacle Avoidance Productivity • Background based noise cancellation on Audio (inbound and outbound) • Segmentation/Blur/Super Resolution on Video Data Centers • Natural language processing • Computer vision • Recommendation system Edge Compute • Theft detection • Face/body/license plate detection / recognition • Image classification and segmentation IVI • Occupancy monitoring system (OMS) • Driver monitoring system (DMS) • Surround perception • Audio Command & Control Retail • Visitor/Face/Gesture Recognition • Object/People Detection and Counting • Barcode decoding Privacy & Security • Automatic screen unlock and login • Privacy alert • Guard mode ADAS (Up to L4) • Highway driving assist • Front collision warning • lane departure, • Traffic jam assist • Auto lane change • Auto lane merge • Traffic light recognition • Construction zones • Urban autonomous driving • Parking assist • Person detection, • Perception • Valet parking • Driver monitoring Transportation • License plate recognition • Face and facial landmark detection • Drowsiness detection Content Creation & Gaming • Gaming with gesture control • Gaming with voice commands • Intelligent highlight videos • Game play improvement XR Smart Devices • Object/People detection • Speaker detection Metaverse • Person and Object Detection • Recommendation Engine & Chatbots • Multilingual translation (speech-to- speech) • Neural Super Resolution • Content Summarization Smart Buildings • People Tracking • Access Control Performance & Efficiency • Power and Screen optimization Manufacturing/Logistics • Predictive maintenance • Energy management with Asset demand © 2023 Qualcomm Technologies Inc. 3
  • 4. Emerging AI Models – For the Various Markets Emerging Deep Learning Models Generative networks (Image to Image Transformation) Time series networks (Behavior to Text Transformation) Transformer networks (NLP/NLU) (Sequence to Sequence transformation) Canvas networks (Virtual Transformation for Avatars) © 2023 Qualcomm Technologies Inc. 4
  • 5. Vision: Accelerate Solution Deployment Performance Scalability Innovation Tools Accelerate “out of box” operator functionality and performance Ability to have programming consistency from Cloud to Edge Accelerate AI solution deployment with investment in tools Innovation to drive product leadership (Pre-emption, DFS, Multi chaining) © 2023 Qualcomm Technologies Inc. 5
  • 6. STEP: 1 -> Model Optimization Using NAS Integrated into Qualcomm® Software Stack Space of allowable architectures (Structure, operations, connectivity) Sampling populations of good architecture candidates Estimate performance of sampled architecture Search Space Search Algorithm Evaluation Strategy How  © 2023 Qualcomm Technologies Inc. 6
  • 7. NAS Results: Observations from ML Models Category Model Task Dataset Results CNNs EfficientNet-B0 Image Classification ImageNet +1.0% accuracy 33% latency reduction ResNet-18 ImageNet +2.2% accuracy 31% latency reduction RetinaNet 2D Object Detection Pascal +1.5 mAP accuracy 11% latency reduction EfficientDet-D0 COCO +0.8 mAP accuracy 30% latency reduction RNNs CRNN Keyword Spotting Google Speech Commands v2 +1.0% accuracy similar model size Transformers MobileBERT Question & Answering SQuAD v1.1 On-par accuracy 12% latency reduction © 2023 Qualcomm Technologies Inc. 7
  • 8. 1: FP32 model compared to quantized model Promising results show that low-precision integer inference can become widespread Virtually the same accuracy between a FP32 and quantized AI model through: • Automated, data free, post-training methods • Automated training-based mixed-precision method Automated reduction in precision of weights and activations while maintaining accuracy Models trained at high precision 32-bit floating point 3452.3194 8-bit Integer 255 Increase in performance per watt from savings in memory and compute Inference at lower precision 16-bit Integer 3452 01010101 up to 4X 4-bit Integer 15 01010101 up to 16X up to 64X 01010101 0101 01010101 01010101 01010101 01010101 STEP: 2 -> New Techniques to Quantize Models Integrated into Qualcomm Software Stack © 2023 Qualcomm Technologies Inc. 8
  • 9. Pushing the Limits – For Quantization & Pruning Data-free quantization Created an automated method that addresses bias and imbalance in weight ranges: No training Data free How can we make quantization as simple as possible? AdaRound Created an automated method for finding the best rounding choice: No training Minimal unlabeled data Is rounding to the nearest value the best approach for quantization? SOTA 8-bit results Making 8-bit weight quantization ubiquitous <1% Accuracy drop for MobileNet V2 against FP32 model Making 4-bit weight quantization ubiquitous <2.5% Accuracy drop for MobileNet V2 against FP32 model Bayesian bits Created a novel method to learn mixed-precision quantization: Training required Training data required Jointly learns bit-width precision and pruning Can we quantize layers to different bit widths based on precision sensitivity? SOTA mixed-precision results Automating mixed-precision quantization and enabling the tradeoff between accuracy and kernel bit-width <1% Accuracy drop for MobileNet V2 against FP32 model for mixed precision model with computational complexity equivalent to a 4-bit weight model SOTA 4-bit weight results Highest Focus of Attention © 2023 Qualcomm Technologies Inc. 9
  • 10. Moving towards W4A8 – Newer ML Models With better PTQ and QAT techniques, increasingly more models will be able to use W4A8, resulting in better energy efficiency  This is going to be major push for AI solution deployment on the edge Model FP32 INT4 Accuracy Comments ResNet50 76.1% 75.4% Using Post-training Quantization (PTQ) ResNet18 69.8% 69% EfficientNet-Lite 75.3% 74.3% Regnext 78.3% 77.2% Mobilenet-v2 71.7% 71.3% Using Quantization Aware Training (QAT) 8bit Weights 4bit Weights Segmentation Models: Seeing >20% power + >40% in memory footprint saving © 2023 Qualcomm Technologies Inc. 10
  • 11. Need for FP8 – Is this Needed for ML Model Inference? Published in the Qualcomm Technologies “FP8” White Paper • Strong participation from many silicon vendors on driving FP8 engagements • Various E/M (exponent/mantissa) ratios to support dynamic range for data representation • FP8 is an appealing potential speed-up for the costly and time-intensive training procedures in deep learning • Need for Inference (observations) : • The hardware implementation of the FP8 format is somewhere between 50% to 180% less efficient than INT8 in terms of chip area and energy usage • Can we convert FP8 to INT8 with good accuracy? © 2023 Qualcomm Technologies Inc. 11
  • 12. STEP: 3  Performance and Scalability Support - Application Deployment Integrated into Qualcomm Software Stack Your–NN-framework Training .onnx .onnx .onnx .onnx .onnx .onnx .pb CPU Qualcomm® Neural Network Library QML NEON KERNELS KERNELS eAI Open CL Open CL Performance Qualcomm® Neural Processing SDK ONNXRT PT Mobile TF - Lite NNAPI TF-Lite µ GPU Hexagon Processor Qualcomm® Sensing Hub Profiler Debugger Visualizer Compilers Scalability Runtime framework Low level library Qualcomm® AI engine On-device execution/inference © 2023 Qualcomm Technologies Inc. 12
  • 13. Qualcomm Model Studio: Accelerating ML Model Deployment Workflow panel Shows steps in a workflow including tools, artifacts and their relationships Graph panel Model visualization, node information (precision, etc.) Metrics panel Detailed information on selected model, nodes including performance info from execution Integrated into Qualcomm Software Stack © 2023 Qualcomm Technologies Inc. 13
  • 14. © 2023 Qualcomm Technologies Inc.
  • 15. Recently Deployed Applications – Using Qualcomm AI Stack Industry’s first low power gesture control + context awareness to service recommendation – Launched on Honor Windows 11 features for video + audio AI – Launched on ThinkPad X13S © 2023 Qualcomm Technologies Inc. 15
  • 16. Conclusions • AI applications expanding beyond modalities of computer vision to linguistics, communication, commerce and language understanding • With evolution of AI applications, this continues to stress on support for new DL architectures & models • Qualcomm AI Stack expands to enable support for any developer and drive innovation in performance, latency, QoS among others. Focus on • Advanced quantization mechanics • Support for newer data types • Neural architecture support • Flexible run time for performance & portability © 2023 Qualcomm Technologies Inc. 16
  • 17. Resources © 2023 Qualcomm Technologies Inc. 17 Qualcomm® Mobile AI Mobile AI | On-Device AI | Qualcomm Qualcomm Technologies & Google NAS Qualcomm Technologies and Google Cloud Announce Collaboration on Neural Architecture Search for the Connected Intelligent Edge | Qualcomm Dr. Vinesh Sukumar Senior Director, Product Management – AI/ML [email protected] 2023 Embedded Vision Summit 4:15 pm: Develop Next-Gen Camera Apps Using Snapdragon Computer Vision Technologies - Judd Heape, VP of Product Management for Camera, Computer Vision and Video Technology, Qualcomm Technologies Qualcomm Wireless Academy Fundamentals of AI Available for free until October 2023
  • 18. THANK YOU © 2023 Qualcomm Technologies Inc. 18