SlideShare a Scribd company logo
4
Most read
9
Most read
22
Most read
Scaling i.MX Applications
Processors with Discrete AI
Accelerators
Ali O. Ors
Global Director, AI Strategy and
Technologies
NXP Semiconductors
AI Spans from Training in the Cloud to Inference at the Edge
2
Cloud AI (Microsoft, Amazon, Google, Nvidia)
Edge AI (NXP opportunity)
Desktop Cloud Local Servers
Industrial
Automation
In-Cabin &
ADAS
Autonomous
Home
AI DEVELOPMENT
TRAINING
AI DEPLOYMENT
INFERENCE
Building & Energy
© 2025 NXP Semiconductors Inc.
NXP’s strength in processing gives us a unique opportunity to shape the deployment of AI
at the edge.
Intelligent edge systems enabled by NXP
© 2025 NXP Semiconductors Inc. 3
eIQ® Neutron NPU
Highly scalable and optimized
integrated dedicated AI
acceleration
eIQ® Toolkit
AI/ML software toolkit for model
Creation, optimization and porting
Engaging with our customers to develop system
solutions and solve challenges together
MCX MCUs
i.MX Apps
Processors and
beyond
Expansive processor
portfolio
i.MX RT Crossover
MCUs
AI co-
processor
NPU
eIQ® Time Series Studio
Automated ML model
creation from sensor signals
eIQ® GenAI Flow
Context aware generative AI
application development
Differentiated HW and SW
enablement
© 2025 Your Company Name 4
Compute
• Linux & Android OS
• up to 4 Arm Cortex-
A53 cores at 1.8GHz
• Embedded real-time
M4 CPU
• 32-bit
LPDDR4/DDR4/DDDR3
L Memory
Intelligence
• CPU-based Neural Net
enabled via NXP eIQ
Toolkit
Platform
• Commercial &
Industrial
Temperature
Qualification
• 10-year longevity
• Secure Boot
Visualization
• 8 GFLOPs 3D GPU
• OpenGL ES 2.0,
Vulkan, OpenCL 1.2
• 2D GPU
• MIPI-DSI 1080P60
Display
Vision
• Up to 4 cameras with
MIPI-CSI virtual lanes
• 1080P video
encode/decode
• Pixel Compositor
Connectivity
• PCIe Gen. 2 x1 Lane
• 1G Ethernet
• 2x USB 2.0
• 3x SD/eMMC
Compute
• Linux & Android OS
• up to 4 Arm Cortex-
A53 cores at 1.8GHz
• Embedded real-time
M7 CPU
• 32-bit
LPDDR4/DDR4/DDDR3
L Memory
Intelligence
• Neural Net Acceleration
by embedded VSI
VIP8000 NPU, GPU, and
CPU
• Enabled via NXP eIQ
Toolkit
Platform
• Commercial &
Industrial
Temperature
Qualification
• 10-year longevity
• Secure Boot plus
Cryptographic
Accelerator
Visualization
• 16 GFLOPs 3D GPU
• OpenGL ES 3.1,
Vulkan, OpenCL 1.2
• 1.3 Gpixel/s 2D GPU
• MIPI-DSI/ LVDS/ HDMI
1080P60 Display
Vision
• Up to 4 cameras with
MIPI-CSI virtual lanes
• 375 Mpixel/s Image
Signal Processor
• 12MP @ 30fps / 8MP
@ 45fps
• 1080P video
encode/decode
Connectivity
• PCIe Gen. 3 x1 Lane
• 2x 1G Ethernet (1
w/TSN)
• USB 3.0 + 2.0
• 3x SD/eMMC
• 2x CAN-FD
Compute
• Linux & Android OS
• up to 6 Arm Cortex-
A55 cores at 1.8GHz
• Embedded real-time
M7 CPU
• NXP SafeAssure Safety
Domain
• 32-bit LPDDR5
/LPDDR4X Memory
Intelligence
• Embedded NXP eIQ®
Neutron 1024S NPU
• up to 3x more AI
acceleration than 8M
Plus
• eIQ support for NPU,
GPU, & CPU
• LLM and VLM support
Platform
• Extended Industrial
Temp. Qual.
• 15-year longevity
• EdgeLock Secure
Enclave + V2X
Cryptographic
Accelerator
Visualization
• 64 GFLOPs Arm Mali
3D GPU
• OpenGL ES 3.2, Vulkan
1.2, OpenCL 3.0
• 3 Gpixel/s 2D GPU
• 4K30P MIPI-DSI + 2x
1080P30 LVDS/ triple
Display
Vision
• Up to 8 cameras with
MIPI-CSI virtual lanes
• 500 Mpixel/s ISP with
RGB-IR
• 12MP @ 45fps / 8MP
@ 60fps
• 4K60P vid. codec
• Safe 2D display
pipeline
Connectivity
• 2x PCIe Gen. 3 x1
Lanes
• 1x 10G Eth. (w/TSN)
• 2x 1G Eth. (w/TSN)
• USB 3.0 + 2.0
• 3x SD/eMMC
• 5x CAN-FD
i.MX 8M Mini
Essential HMI & Vision Platform
i.MX 8M Plus
Powerful HMI & Vision Platform with Edge AI &
Industrial Connectivity
i.MX 95 Family
Advanced HMI & Vision Platform with Safety,
Security, and Next-Gen Edge AI
5
NXP i.MX 95 Family for Automotive Edge, Industrial, & IoT
© 2025 NXP Semiconductors Inc.
Notes and sources
Safety Intuitive Decisions Connect & Secure Visualize & Act
Ditch the hypervisor and simplify
building safety capable platforms with
the first-generation on-die i.MX
functional safety framework.
Featuring NXP Safety Manager, Safety
Documentation, & NXP Professional
support to enable ISO26262 (ASIL-B) /
IEC61508 (SIL-2) computing platforms,
including 2D display pipeline.
Deliver increased accessibility and
augment complex interfaces with
Generative AI-enhanced voice
command & control with the first i.MX
applications processor to integrate the
new, efficient NXP eIQ® Neutron neural
processing unit.
Responsive HMI for IoT, Industrial,
and Automotive applications are
easily created with NXPs partner
ecosystem, unlocked by a powerful
modern 3D graphics processor
combined with strong, efficient
hexacore application processor
performance.
Build secure, private applications with
peace of mind based on the combined
capabilities of integrated security and
authentication acceleration, including
post-quantum cryptographic
capabilities, and lifecycle
management.
Connectivity Leadership:
UWB, Wi-Fi, NFC, RFID, & BT
Co-Developed Platforms:
PMIC, Wi-Fi, Sensors, & More
Deep Application Insights:
26,000 Customers & Growing
6
i.MX 95 Vision Processing Pipeline
© 2025 NXP Semiconductors Inc.
Up to Single 12 MP high resolution camera - 4096x3072p30 / 3820x2160p60
7
i.MX 95 and i.MX 8M Plus ISP
© 2025 NXP Semiconductors Inc. Schedule, features, and enablement subject to change without
notice. For informational purposes only.
Specification/Feature i.MX8M Plus i.MX 95
Pixel Throughput 375 Megapixels/Sec 500 Megapixels/Sec
Image Resolution
12MP @ 30fps
8MP @ 45fps
12MP @ 45 fps
8MP @ 30 fps
Streaming Mode Support Yes Yes
Memory-to-Memory Support No Yes
RGB-IR Support No Yes (4x4 Array Pattern)
High Dynamic Range (HDR)
Support
12-bit 20-bit
Chromatic Aberration Support Yes No
Statistics Block Advanced Auto White Balance (AWB)
Output Formats
YUV 420
YUV 422
YUV 420, YUV 422
YUV 444, RGB 888
S/W Enablement 3rd
Party NXP Provided Toolchain
OS Support Linux oriented OS Agnostic S/W Stack
S/W Stack
V4L Layer provided on top
of a native S/W Stack
Direct Integration into V4L
LibCamera support (Default)
Compute Bound vs Memory Bound
Compute-bound and memory-bound are terms used to describe the limitations of a computational task
based on different factors:
Compute-Bound
A task is considered compute-bound when its performance is limited by processing power and the
number of computations that need to be performed. Convolution Neural networks, CNNs are
typically compute bound in embedded systems.
Memory-Bound
A task is considered memory-bound when its performance is limited by the speed and bandwidth of
the memory system. Generative AI workloads with large multi-billion parameter models are
typically memory-bound in embedded systems.
So the size and bandwidth of DDR memory available determines the time to first token (TTFT) and
token per second (TPS) performance.
© 2025 NXP Semiconductors Inc. 8
Generative AI and Transformer Models
Transformers and Generative AI dominating new AI development
What is Generative AI?
• Generative AI refers to deep-learning models that can take raw data and “learn” to generate probable outputs when prompted.
• Generative AI focuses on creating new content and data, while Traditional AI solves specific tasks with predefined rules.
• Generative AI Models are based on the “Transformer” architecture
How are Convolutional Neural Networks (CNNs) and Transformer Models different?
• Transformers require substantially more compute and have lower data or parameter parallelism.
• Transformers require a higher dynamic range of data which makes them less edge friendly.
• Transformers need more training data and training GPU performance to surpass CNN results.
• Transformer models are much larger than typical CNN models.
Transformer acceleration needs substantially more resources than more traditional convolutional AI models!
© 2025 NXP Semiconductors Inc. 9
10
NXP to Acquire Kinara
© 2025 NXP Semiconductors Inc.
Discrete NPUs
Two generations capable of
a variety
of neural networks,
incl. advanced generative AI
>500k NPUs shipped
to date
Bellwether IoT
and compute companies
Software Expertise
Enablement for CNN and
generative AI applications
Quality and
reliability
Aligned with rigorous
industrial quality
requirements
California-based technology leader in offering flexible, energy-efficient
discrete NPUs for Industrial and IoT Edge applications.
Two Generations of AI Accelerators Optimized for
Traditional & Generative AI Workloads
Ara-1 Ara-2
Ara-1 Ara-2: Vision,
Multi-modal LLMs
Latency Optimized for Edge Applications
10x Capex/TCO improvement over GPUs
Generative AI Capable
6 eTOPs. Up to 2GB LPDDR4
Computer Vision, Generative AI optimized!
5-8X Performance improvement over Ara-1
Up to 40 eTOPs
Up to 16GB LPDDR4
* eTOPS: equivalent TOPs , performance comparison used to derive value as the ARA architecture is not a traditional MAC Array
© 2025 NXP Semiconductors Inc. 11
Ara-2 High Level Features
• Up to 40 eTOPS*. 6.5 W typ. power. 17 mm x 17 mm EHS-FCBGA
• Host interface (x86 or ARM) PCIe or USB
• PCIe: Up to 4-lane Gen 3/4 Endpoint. x1, x2 and x4 modes. 16 Gbps per lane
• USB: 3.2 Gen1/2. 10 Gbps. Supports USB Type-C connector. Gen2 also supported
• External DDR memory options: Up to 16 GB density
• 1-2GB for most vision use cases and 4/8/16 GB for Gen AI
• LPDDR4 or 4X
• Single 64-bit or two 32-bit memory devices
• Industrial grade qualified (-40 to 85C ambient)
© 2025 NXP Semiconductors Inc. 12
* eTOPS: equivalent TOPs , performance comparison used to derive
value as the ARA architecture is not a traditional MAC Array
Why ARA Discrete AI Accelerators
System level features for selecting a discrete AI accelerator:
• Performance and Efficiency: Ara devices reduce the time and energy required for wide array of AI tasks like deep learning, large
language models, multi-modal generative AI models.
• Parallel Processing: Ara devices can handle multiple data streams and multiple concurrent model executions.
• Scalability: Ara accelerators can be scaled to handle larger workloads or expanded AI applications. This scalability ensures that AI
systems can grow and adapt to increasing demands without significant overhauls.
• Memory bandwidth: Ara devices support high transfer rate DDR, which is needed to run multi billion parameter generative AI models.
• Connectivity: Ara devices support up to 4 lanes of PCIe gen3/4 for handling high bandwidth connections when pairing with host
controllers to provide inference on more data inputs. Ara devices also support USB and Ethernet connection options to provide
flexibility in system design
• Flexibility: Ara devices have programmability and flexibility allowing newer models and operators to be supported without any
hardware changes.
• SW Enablement: Ara devices are supported by an intelligent AI compiler that automatically determines the most efficient data and
compute flow for any AI graph.
© 2025 NXP Semiconductors Inc. 13
14
eIQ GenAI Flow: Bringing Generative AI to the Edge
© 2025 NXP Semiconductors Inc.
Transformers
Require specific types of optimization to be small and fast enough
for edge devices.
RAG
Secure method of fine-tuning: customers’ private knowledge sources
aren’t passed to the LLM training data.
Library of Functional Blocks
Necessary building blocks needed to create real Edge GenAI applications.
Wake Event Engines
Optimized LLM &
MMLM Library
Input sensors
🔊 Audio*
📄 Text
️ Image
🎥 Video
…
Auto-Speech-
Recognition
RAG Fine Tuning
LLM/MMLM
RAG Database
Text-to-Speech
eIQ® GenAI Flow
Actions
Dashed arrow shows possible pathway using pre-defined intents (no LLM)
15
Simple i.MX and Ara Decision Tree
© 2025 NXP Semiconductors Inc.
For guidance only, it is possible to select other i.MX 8M * and i.MX 9* portfolio
of applications processors to pair with Ara DNPU devices.
Currently using: Application is: Wants to: Recommended path: Additional expansion:
i.MX 8M Plus Vision based classification and
detection use case
Extend existing product with
more AI capabilities and
performance
Add ARA device on PCIe to
reuse existing applications
with more AI performance
i.MX 8M Plus Vision based classification and
detection use case
Design new product with more
AI performance and possibly
higher resolution camera, or
more camera sensors
Select i.MX 95 to get
higher AI inference
performance and higher
camera pixel throughput
Add ARA device to system to
extend AI applications with
more AI performance
New design Vision based classification and
detection use case
Design state of the art vision AI
system
Select i.MX 95 as
applications processor
Add ARA-2 device to system to
extend AI applications with
more AI performance as needed
New design Gen AI for conversational HMI
and system health monitoring
Build a solution with generative
AI for better system monitoring
and operator user experience
Select i.MX 95 as
applications processor.
Support <4B parameter
LLMs
Add ARA-2 device to system to
extend for support >4B
parameter Gen AI models
New design Gen AI multi-modal video
event and scene
understanding
Use gen Ai models to build
applications with vision, audio
and sensor signals
Select i.MX 95
and ARA-2 device
Add additional ARA-2 device to
system to extend
16
Ara AI SW Enablement
© 2025 NXP Semiconductors Inc.
Creates optimal execution plan
AI Compiler automatically determines the most efficient data and compute flow for
any AI graph
Readily support new operators
Fully programmable compute engines with a neural-optimized instruction set
Efficient dataflow for any network architecture type
Software defined Tensor partitioning and routing optimized for dataflow
Extensible compiler
Converts and schedules models ranging from CNNs to complex vision
transformers and Generative AI
Support for multiple datatypes INT8, INT4 and MSFP16
Utilizes flexible quantization methods
Choose between Kinara integrated quantizer or TensorFlow Lite and PyTorch pre-
quantized networks
Ara AI SDK combined with i.MX BSP
and eIQ AI SW suite
solution for
immediate engagement
Drivers
Runtime
Applications
Vision
Voice
Anomaly detection
Generative AI
17
GenAI on the Edge: Cloud Experience on the EDGE
© 2025 NXP Semiconductors Inc.
GenAI on the Edge: Cloud experience on the EDGE
Occupational Health and Safety GenAI Example
© 2025 NXP Semiconductors Inc. 18
Occupational Health and Safety
© 2025 NXP Semiconductors Inc. 19
20
© 2025 NXP Semiconductors Inc.
Why Discrete AI Accelerators
• Leveraging discrete AI accelerators like the Ara-2 offer improvements in several key areas for Edge AI solutions:
• Performance:
• They use specialized architectures that are optimized for AI workloads and can provide path to scale
beyond the native AI performance for i.MX applications processors.
• Scalability:
• These accelerators can be scaled to meet increasing demands, ensuring that systems can grow seamlessly
without necessitating changes to the i.MX applications processor. This scalability is crucial for
accommodating expanding AI applications and workloads with faster time to market.
• Flexibility:
• They can be used to adapt to changing processing needs, like new operators and models like LLMs and
emerging paradigms like Agentic AI and Physical AI providing versatility needed to handle diverse and
dynamic tasks.
© 2025 NXP Semiconductors Inc. 21
Resources and Links
• AI and Machine Learning at NXP Semiconductors (www.nxp.com/ai)
• eIQ® ML Software Development Environment (www.nxp.com/eiq)
• eIQ GenAI Flow Demonstrator on ACH (
https://mcuxpresso.nxp.com/appcodehub?search=dm-eiq-genai-flow-
demonstrator)
• eIQ Neutron Neural Processing Unit (NPU) | NXP Semiconductors
(www.nxp.com/neutron)
• Kinara AI Accelerators (www.kinara.ai)
© 2025 NXP Semiconductors Inc. 22

More Related Content

More from Edge AI and Vision Alliance (20)

PDF
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
PDF
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
Edge AI and Vision Alliance
 
PDF
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
Edge AI and Vision Alliance
 
PDF
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
Edge AI and Vision Alliance
 
PDF
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
Edge AI and Vision Alliance
 
PDF
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
PDF
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
Edge AI and Vision Alliance
 
PDF
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
PDF
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
PDF
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
PDF
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
Edge AI and Vision Alliance
 
PDF
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
PDF
“How Qualcomm Is Powering AI-driven Multimedia at the Edge,” a Presentation f...
Edge AI and Vision Alliance
 
PDF
“OAAX: One Standard for AI Vision on Any Compute Platform,” a Presentation fr...
Edge AI and Vision Alliance
 
PDF
“Improved Data Sampling Techniques for Training Neural Networks,” a Presentat...
Edge AI and Vision Alliance
 
PDF
“Cost-efficient, High-quality AI for Consumer-grade Smart Home Cameras,” a Pr...
Edge AI and Vision Alliance
 
PDF
“Edge AI Optimization on Rails—Literally,” a Presentation from Wabtec
Edge AI and Vision Alliance
 
PDF
“How Large Language Models Are Impacting Computer Vision,” a Presentation fro...
Edge AI and Vision Alliance
 
PDF
“Implementing AI/Computer Vision for Corporate Security Surveillance,” a Pres...
Edge AI and Vision Alliance
 
PDF
“Continual Learning thru Sequential, Lightweight Optimization,” a Presentatio...
Edge AI and Vision Alliance
 
“MPU+: A Transformative Solution for Next-Gen AI at the Edge,” a Presentation...
Edge AI and Vision Alliance
 
“Evolving Inference Processor Software Stacks to Support LLMs,” a Presentatio...
Edge AI and Vision Alliance
 
“Efficiently Registering Depth and RGB Images,” a Presentation from eInfochips
Edge AI and Vision Alliance
 
“How to Right-size and Future-proof a Container-first Edge AI Infrastructure,...
Edge AI and Vision Alliance
 
“Image Tokenization for Distributed Neural Cascades,” a Presentation from Goo...
Edge AI and Vision Alliance
 
“Key Requirements to Successfully Implement Generative AI in Edge Devices—Opt...
Edge AI and Vision Alliance
 
“Bridging the Gap: Streamlining the Process of Deploying AI onto Processors,”...
Edge AI and Vision Alliance
 
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
“Addressing Evolving AI Model Challenges Through Memory and Storage,” a Prese...
Edge AI and Vision Alliance
 
“Why It’s Critical to Have an Integrated Development Methodology for Edge AI,...
Edge AI and Vision Alliance
 
“Solving Tomorrow’s AI Problems Today with Cadence’s Newest Processor,” a Pre...
Edge AI and Vision Alliance
 
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
“How Qualcomm Is Powering AI-driven Multimedia at the Edge,” a Presentation f...
Edge AI and Vision Alliance
 
“OAAX: One Standard for AI Vision on Any Compute Platform,” a Presentation fr...
Edge AI and Vision Alliance
 
“Improved Data Sampling Techniques for Training Neural Networks,” a Presentat...
Edge AI and Vision Alliance
 
“Cost-efficient, High-quality AI for Consumer-grade Smart Home Cameras,” a Pr...
Edge AI and Vision Alliance
 
“Edge AI Optimization on Rails—Literally,” a Presentation from Wabtec
Edge AI and Vision Alliance
 
“How Large Language Models Are Impacting Computer Vision,” a Presentation fro...
Edge AI and Vision Alliance
 
“Implementing AI/Computer Vision for Corporate Security Surveillance,” a Pres...
Edge AI and Vision Alliance
 
“Continual Learning thru Sequential, Lightweight Optimization,” a Presentatio...
Edge AI and Vision Alliance
 

Recently uploaded (20)

PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PDF
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Automating Feature Enrichment and Station Creation in Natural Gas Utility Net...
Safe Software
 
Digital Circuits, important subject in CS
contactparinay1
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
🚀 Let’s Build Our First Slack Workflow! 🔧.pdf
SanjeetMishra29
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Ad

“Scaling i.MX Applications Processors’ Native Edge AI with Discrete AI Accelerators,” a Presentation from NXP Semiconductors

  • 1. Scaling i.MX Applications Processors with Discrete AI Accelerators Ali O. Ors Global Director, AI Strategy and Technologies NXP Semiconductors
  • 2. AI Spans from Training in the Cloud to Inference at the Edge 2 Cloud AI (Microsoft, Amazon, Google, Nvidia) Edge AI (NXP opportunity) Desktop Cloud Local Servers Industrial Automation In-Cabin & ADAS Autonomous Home AI DEVELOPMENT TRAINING AI DEPLOYMENT INFERENCE Building & Energy © 2025 NXP Semiconductors Inc. NXP’s strength in processing gives us a unique opportunity to shape the deployment of AI at the edge.
  • 3. Intelligent edge systems enabled by NXP © 2025 NXP Semiconductors Inc. 3 eIQ® Neutron NPU Highly scalable and optimized integrated dedicated AI acceleration eIQ® Toolkit AI/ML software toolkit for model Creation, optimization and porting Engaging with our customers to develop system solutions and solve challenges together MCX MCUs i.MX Apps Processors and beyond Expansive processor portfolio i.MX RT Crossover MCUs AI co- processor NPU eIQ® Time Series Studio Automated ML model creation from sensor signals eIQ® GenAI Flow Context aware generative AI application development Differentiated HW and SW enablement
  • 4. © 2025 Your Company Name 4 Compute • Linux & Android OS • up to 4 Arm Cortex- A53 cores at 1.8GHz • Embedded real-time M4 CPU • 32-bit LPDDR4/DDR4/DDDR3 L Memory Intelligence • CPU-based Neural Net enabled via NXP eIQ Toolkit Platform • Commercial & Industrial Temperature Qualification • 10-year longevity • Secure Boot Visualization • 8 GFLOPs 3D GPU • OpenGL ES 2.0, Vulkan, OpenCL 1.2 • 2D GPU • MIPI-DSI 1080P60 Display Vision • Up to 4 cameras with MIPI-CSI virtual lanes • 1080P video encode/decode • Pixel Compositor Connectivity • PCIe Gen. 2 x1 Lane • 1G Ethernet • 2x USB 2.0 • 3x SD/eMMC Compute • Linux & Android OS • up to 4 Arm Cortex- A53 cores at 1.8GHz • Embedded real-time M7 CPU • 32-bit LPDDR4/DDR4/DDDR3 L Memory Intelligence • Neural Net Acceleration by embedded VSI VIP8000 NPU, GPU, and CPU • Enabled via NXP eIQ Toolkit Platform • Commercial & Industrial Temperature Qualification • 10-year longevity • Secure Boot plus Cryptographic Accelerator Visualization • 16 GFLOPs 3D GPU • OpenGL ES 3.1, Vulkan, OpenCL 1.2 • 1.3 Gpixel/s 2D GPU • MIPI-DSI/ LVDS/ HDMI 1080P60 Display Vision • Up to 4 cameras with MIPI-CSI virtual lanes • 375 Mpixel/s Image Signal Processor • 12MP @ 30fps / 8MP @ 45fps • 1080P video encode/decode Connectivity • PCIe Gen. 3 x1 Lane • 2x 1G Ethernet (1 w/TSN) • USB 3.0 + 2.0 • 3x SD/eMMC • 2x CAN-FD Compute • Linux & Android OS • up to 6 Arm Cortex- A55 cores at 1.8GHz • Embedded real-time M7 CPU • NXP SafeAssure Safety Domain • 32-bit LPDDR5 /LPDDR4X Memory Intelligence • Embedded NXP eIQ® Neutron 1024S NPU • up to 3x more AI acceleration than 8M Plus • eIQ support for NPU, GPU, & CPU • LLM and VLM support Platform • Extended Industrial Temp. Qual. • 15-year longevity • EdgeLock Secure Enclave + V2X Cryptographic Accelerator Visualization • 64 GFLOPs Arm Mali 3D GPU • OpenGL ES 3.2, Vulkan 1.2, OpenCL 3.0 • 3 Gpixel/s 2D GPU • 4K30P MIPI-DSI + 2x 1080P30 LVDS/ triple Display Vision • Up to 8 cameras with MIPI-CSI virtual lanes • 500 Mpixel/s ISP with RGB-IR • 12MP @ 45fps / 8MP @ 60fps • 4K60P vid. codec • Safe 2D display pipeline Connectivity • 2x PCIe Gen. 3 x1 Lanes • 1x 10G Eth. (w/TSN) • 2x 1G Eth. (w/TSN) • USB 3.0 + 2.0 • 3x SD/eMMC • 5x CAN-FD i.MX 8M Mini Essential HMI & Vision Platform i.MX 8M Plus Powerful HMI & Vision Platform with Edge AI & Industrial Connectivity i.MX 95 Family Advanced HMI & Vision Platform with Safety, Security, and Next-Gen Edge AI
  • 5. 5 NXP i.MX 95 Family for Automotive Edge, Industrial, & IoT © 2025 NXP Semiconductors Inc. Notes and sources Safety Intuitive Decisions Connect & Secure Visualize & Act Ditch the hypervisor and simplify building safety capable platforms with the first-generation on-die i.MX functional safety framework. Featuring NXP Safety Manager, Safety Documentation, & NXP Professional support to enable ISO26262 (ASIL-B) / IEC61508 (SIL-2) computing platforms, including 2D display pipeline. Deliver increased accessibility and augment complex interfaces with Generative AI-enhanced voice command & control with the first i.MX applications processor to integrate the new, efficient NXP eIQ® Neutron neural processing unit. Responsive HMI for IoT, Industrial, and Automotive applications are easily created with NXPs partner ecosystem, unlocked by a powerful modern 3D graphics processor combined with strong, efficient hexacore application processor performance. Build secure, private applications with peace of mind based on the combined capabilities of integrated security and authentication acceleration, including post-quantum cryptographic capabilities, and lifecycle management. Connectivity Leadership: UWB, Wi-Fi, NFC, RFID, & BT Co-Developed Platforms: PMIC, Wi-Fi, Sensors, & More Deep Application Insights: 26,000 Customers & Growing
  • 6. 6 i.MX 95 Vision Processing Pipeline © 2025 NXP Semiconductors Inc. Up to Single 12 MP high resolution camera - 4096x3072p30 / 3820x2160p60
  • 7. 7 i.MX 95 and i.MX 8M Plus ISP © 2025 NXP Semiconductors Inc. Schedule, features, and enablement subject to change without notice. For informational purposes only. Specification/Feature i.MX8M Plus i.MX 95 Pixel Throughput 375 Megapixels/Sec 500 Megapixels/Sec Image Resolution 12MP @ 30fps 8MP @ 45fps 12MP @ 45 fps 8MP @ 30 fps Streaming Mode Support Yes Yes Memory-to-Memory Support No Yes RGB-IR Support No Yes (4x4 Array Pattern) High Dynamic Range (HDR) Support 12-bit 20-bit Chromatic Aberration Support Yes No Statistics Block Advanced Auto White Balance (AWB) Output Formats YUV 420 YUV 422 YUV 420, YUV 422 YUV 444, RGB 888 S/W Enablement 3rd Party NXP Provided Toolchain OS Support Linux oriented OS Agnostic S/W Stack S/W Stack V4L Layer provided on top of a native S/W Stack Direct Integration into V4L LibCamera support (Default)
  • 8. Compute Bound vs Memory Bound Compute-bound and memory-bound are terms used to describe the limitations of a computational task based on different factors: Compute-Bound A task is considered compute-bound when its performance is limited by processing power and the number of computations that need to be performed. Convolution Neural networks, CNNs are typically compute bound in embedded systems. Memory-Bound A task is considered memory-bound when its performance is limited by the speed and bandwidth of the memory system. Generative AI workloads with large multi-billion parameter models are typically memory-bound in embedded systems. So the size and bandwidth of DDR memory available determines the time to first token (TTFT) and token per second (TPS) performance. © 2025 NXP Semiconductors Inc. 8
  • 9. Generative AI and Transformer Models Transformers and Generative AI dominating new AI development What is Generative AI? • Generative AI refers to deep-learning models that can take raw data and “learn” to generate probable outputs when prompted. • Generative AI focuses on creating new content and data, while Traditional AI solves specific tasks with predefined rules. • Generative AI Models are based on the “Transformer” architecture How are Convolutional Neural Networks (CNNs) and Transformer Models different? • Transformers require substantially more compute and have lower data or parameter parallelism. • Transformers require a higher dynamic range of data which makes them less edge friendly. • Transformers need more training data and training GPU performance to surpass CNN results. • Transformer models are much larger than typical CNN models. Transformer acceleration needs substantially more resources than more traditional convolutional AI models! © 2025 NXP Semiconductors Inc. 9
  • 10. 10 NXP to Acquire Kinara © 2025 NXP Semiconductors Inc. Discrete NPUs Two generations capable of a variety of neural networks, incl. advanced generative AI >500k NPUs shipped to date Bellwether IoT and compute companies Software Expertise Enablement for CNN and generative AI applications Quality and reliability Aligned with rigorous industrial quality requirements California-based technology leader in offering flexible, energy-efficient discrete NPUs for Industrial and IoT Edge applications.
  • 11. Two Generations of AI Accelerators Optimized for Traditional & Generative AI Workloads Ara-1 Ara-2 Ara-1 Ara-2: Vision, Multi-modal LLMs Latency Optimized for Edge Applications 10x Capex/TCO improvement over GPUs Generative AI Capable 6 eTOPs. Up to 2GB LPDDR4 Computer Vision, Generative AI optimized! 5-8X Performance improvement over Ara-1 Up to 40 eTOPs Up to 16GB LPDDR4 * eTOPS: equivalent TOPs , performance comparison used to derive value as the ARA architecture is not a traditional MAC Array © 2025 NXP Semiconductors Inc. 11
  • 12. Ara-2 High Level Features • Up to 40 eTOPS*. 6.5 W typ. power. 17 mm x 17 mm EHS-FCBGA • Host interface (x86 or ARM) PCIe or USB • PCIe: Up to 4-lane Gen 3/4 Endpoint. x1, x2 and x4 modes. 16 Gbps per lane • USB: 3.2 Gen1/2. 10 Gbps. Supports USB Type-C connector. Gen2 also supported • External DDR memory options: Up to 16 GB density • 1-2GB for most vision use cases and 4/8/16 GB for Gen AI • LPDDR4 or 4X • Single 64-bit or two 32-bit memory devices • Industrial grade qualified (-40 to 85C ambient) © 2025 NXP Semiconductors Inc. 12 * eTOPS: equivalent TOPs , performance comparison used to derive value as the ARA architecture is not a traditional MAC Array
  • 13. Why ARA Discrete AI Accelerators System level features for selecting a discrete AI accelerator: • Performance and Efficiency: Ara devices reduce the time and energy required for wide array of AI tasks like deep learning, large language models, multi-modal generative AI models. • Parallel Processing: Ara devices can handle multiple data streams and multiple concurrent model executions. • Scalability: Ara accelerators can be scaled to handle larger workloads or expanded AI applications. This scalability ensures that AI systems can grow and adapt to increasing demands without significant overhauls. • Memory bandwidth: Ara devices support high transfer rate DDR, which is needed to run multi billion parameter generative AI models. • Connectivity: Ara devices support up to 4 lanes of PCIe gen3/4 for handling high bandwidth connections when pairing with host controllers to provide inference on more data inputs. Ara devices also support USB and Ethernet connection options to provide flexibility in system design • Flexibility: Ara devices have programmability and flexibility allowing newer models and operators to be supported without any hardware changes. • SW Enablement: Ara devices are supported by an intelligent AI compiler that automatically determines the most efficient data and compute flow for any AI graph. © 2025 NXP Semiconductors Inc. 13
  • 14. 14 eIQ GenAI Flow: Bringing Generative AI to the Edge © 2025 NXP Semiconductors Inc. Transformers Require specific types of optimization to be small and fast enough for edge devices. RAG Secure method of fine-tuning: customers’ private knowledge sources aren’t passed to the LLM training data. Library of Functional Blocks Necessary building blocks needed to create real Edge GenAI applications. Wake Event Engines Optimized LLM & MMLM Library Input sensors 🔊 Audio* 📄 Text ️ Image 🎥 Video … Auto-Speech- Recognition RAG Fine Tuning LLM/MMLM RAG Database Text-to-Speech eIQ® GenAI Flow Actions Dashed arrow shows possible pathway using pre-defined intents (no LLM)
  • 15. 15 Simple i.MX and Ara Decision Tree © 2025 NXP Semiconductors Inc. For guidance only, it is possible to select other i.MX 8M * and i.MX 9* portfolio of applications processors to pair with Ara DNPU devices. Currently using: Application is: Wants to: Recommended path: Additional expansion: i.MX 8M Plus Vision based classification and detection use case Extend existing product with more AI capabilities and performance Add ARA device on PCIe to reuse existing applications with more AI performance i.MX 8M Plus Vision based classification and detection use case Design new product with more AI performance and possibly higher resolution camera, or more camera sensors Select i.MX 95 to get higher AI inference performance and higher camera pixel throughput Add ARA device to system to extend AI applications with more AI performance New design Vision based classification and detection use case Design state of the art vision AI system Select i.MX 95 as applications processor Add ARA-2 device to system to extend AI applications with more AI performance as needed New design Gen AI for conversational HMI and system health monitoring Build a solution with generative AI for better system monitoring and operator user experience Select i.MX 95 as applications processor. Support <4B parameter LLMs Add ARA-2 device to system to extend for support >4B parameter Gen AI models New design Gen AI multi-modal video event and scene understanding Use gen Ai models to build applications with vision, audio and sensor signals Select i.MX 95 and ARA-2 device Add additional ARA-2 device to system to extend
  • 16. 16 Ara AI SW Enablement © 2025 NXP Semiconductors Inc. Creates optimal execution plan AI Compiler automatically determines the most efficient data and compute flow for any AI graph Readily support new operators Fully programmable compute engines with a neural-optimized instruction set Efficient dataflow for any network architecture type Software defined Tensor partitioning and routing optimized for dataflow Extensible compiler Converts and schedules models ranging from CNNs to complex vision transformers and Generative AI Support for multiple datatypes INT8, INT4 and MSFP16 Utilizes flexible quantization methods Choose between Kinara integrated quantizer or TensorFlow Lite and PyTorch pre- quantized networks Ara AI SDK combined with i.MX BSP and eIQ AI SW suite solution for immediate engagement Drivers Runtime Applications Vision Voice Anomaly detection Generative AI
  • 17. 17 GenAI on the Edge: Cloud Experience on the EDGE © 2025 NXP Semiconductors Inc.
  • 18. GenAI on the Edge: Cloud experience on the EDGE Occupational Health and Safety GenAI Example © 2025 NXP Semiconductors Inc. 18
  • 19. Occupational Health and Safety © 2025 NXP Semiconductors Inc. 19
  • 20. 20 © 2025 NXP Semiconductors Inc.
  • 21. Why Discrete AI Accelerators • Leveraging discrete AI accelerators like the Ara-2 offer improvements in several key areas for Edge AI solutions: • Performance: • They use specialized architectures that are optimized for AI workloads and can provide path to scale beyond the native AI performance for i.MX applications processors. • Scalability: • These accelerators can be scaled to meet increasing demands, ensuring that systems can grow seamlessly without necessitating changes to the i.MX applications processor. This scalability is crucial for accommodating expanding AI applications and workloads with faster time to market. • Flexibility: • They can be used to adapt to changing processing needs, like new operators and models like LLMs and emerging paradigms like Agentic AI and Physical AI providing versatility needed to handle diverse and dynamic tasks. © 2025 NXP Semiconductors Inc. 21
  • 22. Resources and Links • AI and Machine Learning at NXP Semiconductors (www.nxp.com/ai) • eIQ® ML Software Development Environment (www.nxp.com/eiq) • eIQ GenAI Flow Demonstrator on ACH ( https://mcuxpresso.nxp.com/appcodehub?search=dm-eiq-genai-flow- demonstrator) • eIQ Neutron Neural Processing Unit (NPU) | NXP Semiconductors (www.nxp.com/neutron) • Kinara AI Accelerators (www.kinara.ai) © 2025 NXP Semiconductors Inc. 22