SlideShare a Scribd company logo
Reducing Deep Learning
Integration Costs and Maximising
Compute Efficiency for Multiple AI
Hardware
Jianhui Li
Principal Engineer, Intel
2
Deep Learning Trends
INT8
FP32
Training
Inference
Deep Learning Steps
Data Precision
Topologies
Computer Vision Natural Language Processing
Recommendation Systems
Re-Inforcement Learning
Frameworks
ResNet-50, Squeezenets, Mobilenet GNMT, Bert
NCF, Wide & Deep
MiniGO
Diverse and rapidly
evolving
BFloat16
The driving forces of AI Optimization
Diversifying AI
application
3
(conv: General Matrix Multiply)
conv
Recommendation
Engine
conv
Natural Language Processing
conv
Computer
Vision
Hardware
Acceleration
for AI
CPU
+ DL
Acceleration
GPU
+DL
Acceleration
Accelera
tors
4
Deep learn workload time breakdown
• Accelerating matrix multiplication alone doesn’t solve the problem
• Conv and Matmul operations are less dominant beyond computer vision application
• Low-Precision introduces memory bound quantize operations
• Amdahl's law
• Need to have aggressive fusion
*Profiling data collected from internal performance study
Accelerating Matrix Multiplication
5
Dot product
Matrix A
Matrix C
Matrix B
M
K
K
N
Dot product with
matrix operation
Matrix A
Matrix C
Matrix B
M
K
K
N
potential
fusion function
6
Performance
Library
Integration
Framework
Graph
1
3
4
2
1
3
4
2
Framework
Runtime
1
3
4
2
Pattern
Matcher
Graph
Rewriter
Function API
Extend Function API to support Fusion
Matmul
+Relu
Mat
mul
Activ
ation
Norm RNN
Conv
+Relu
Kernel wrapper
Performance Library
implements DNN ops and
fused op and exposed
using function APIs
Dispatch fused OPs to
registered library functions
at Framework Runtime
Enhance FW pattern
matcher and replace
matched subgraph as one
fused op backed by library
functions
1
2 3
Gelu
Framework Graph
Representation for Gelu
Passing
Graph
Limitation of Pattern Match
7
Another Framework Graph
Representation for Gelu
Passing
Graph
Gelu
conv
relu
conv
relu
conv
relu
Input
NHWC
Output0
NHWC
Output1
NHWC
Output2
NHWC
Small pattern miss optimization for large graph
conv
relu
conv
relu
conv
relu
Input
NHWC
Output0
Blocked Layout
Output1
Blocked Layout
Output2
NHWC
Pattern too rigid to match the input graphs
8
• Graph API allows HW backend to maximize performance
• Same integration for multiple AI HW: CPU, GPU, and accelerators
Today
Deep Learning frameworks
Primitives API
HW
Accel
Future
Deep Learning frameworks
CPU
+ DL
Acceleration
GPU
+DL
Acceleration
HW
Accel
Primitives API + Graph API
oneDNN
CPU
+ DL
Acceleration
GPU
+DL
Acceleration
oneDNN
oneDNN is evolving…
9
Framework
Runtime
Context
Graph
Rewrite
get_partitions()
Framework Graph
Passing
Graph
1
3
4
2
oneDNN
Graph API add_op()
1
3
4
2
DL
Framework
oneDNN
Graph
Backend
1
3
4
2
compile() execute()
Forming
graph
1
3
4
2
Backend decides
partition
4
2
Backend compiles
partition
4
2
Backend executes
compiled partition
4
2
oneDNN Graph API
10
oneDNN Graph API Usage
oneDNN
Graph API
Graph
Rewrite
Framework
Graph
Passing
Graph
1
3
4
2
1
3
4
2
DL Framework
Framework
Runtime Context
1
3
4
2
CPU GPU
Intel®, ARM Intel®, NVIDIA GPU
* Other names and brands may be claimed as the property of others.
Other implementations
Accelerators
Graph
Rewrite
Framework
Graph
Passing
Graph
1
3
4
2
1
3
4
2
DL Framework
Framework
Runtime Context
1
2
4
3
Leverage oneDNN based framework
integration and oneDNN implementation
Leverage oneDNN based framework
integration and bring your own
implementation based on backend API
Unified API for DL
acceleration libraries
targeting AI HWs
1
3
4
2
4
2 4
2 4
2
oneDNN w/ Graph
backend API
Industry
Momentum
oneDNN implementation
ported to A64FX Fugaku CPU
Optimized for the Armv8-A and
SVE instruction set
9.3x speedup for Tensorflow
Resnet-50 training and 7.8x for
inference on A64FX
https://github.com/oneapi-src/oneDNN
11
https://blog.fltech.dev/entry/2020/11/19/fugaku-onednn-deep-dive-en
Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy
Call to action
• Join us on this journey -
• Hardware developers – read, provide feedback, and adopt oneDNN Graph for
XPU computing!​
https://spec.oneapi.com/onednn-graph/latest/
https://github.com/oneapi-src/oneDNN/tree/dev-graph
• Check out www.oneAPI.com for oneAPI specification
• Software developers – try out oneAPI in the Intel DevCloud
https://software.intel.com/content/www/us/en/develop/tools/devcloud.html
12
Preview
Notices and Disclaimers
• Intel technologies may require enabled hardware, software or service
activation.
• No product or component can be absolutely secure.
• Your costs and results may vary.
• © Intel Corporation. Intel, the Intel logo, and other Intel marks are
trademarks of Intel Corporation or its subsidiaries. Other names and
brands may be claimed as the property of others.
13
oneCCL
Specification
14
Thank You!
http://oneapi.com

More Related Content

What's hot (20)

PDF
How to use Impala query plan and profile to fix performance issues
Cloudera, Inc.
 
PPTX
Effective Modern C++ 勉強会 Item 22
Keisuke Fukuda
 
PPTX
「Helix Core」導入事例紹介 『小~中規模事例 "Unreal Engine 4 × Helix Core ヒストリア運用レギュレーション紹介"』
historia_Inc
 
PDF
Tracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracing
Yuri Shkuro
 
PDF
【Unite Tokyo 2019】大量のアセットも怖くない!~HTTP/2による高速な通信の実装例~
UnityTechnologiesJapan002
 
PDF
わかる!metadata.managedFields / Kubernetes Meetup Tokyo 48
Preferred Networks
 
PDF
「スプラトゥーン」リアルタイム画像解析ツール 「IkaLog」の裏側
Takeshi HASEGAWA
 
PDF
Grafana LokiではじめるKubernetesロギングハンズオン(NTT Tech Conference #4 ハンズオン資料)
NTT DATA Technology & Innovation
 
PDF
[GKE & Spanner 勉強会] Cloud Spanner の技術概要
Google Cloud Platform - Japan
 
PPTX
Introduction to GItlab CICD Presentation.pptx
Knoldus Inc.
 
PDF
Static Membership: Rebalance Strategy Designed for the Cloud (Boyang Chen,Con...
confluent
 
PPTX
CEDEC2021 ダウンロード時間を大幅減!~大量のアセットをさばく高速な実装と運用事例の共有~
SEGADevTech
 
PPTX
GTMF 2016:Perforce HelixによるGit環境の改善と拡張 株式会社東陽テクニカ(Perforce Helix)
Game Tools & Middleware Forum
 
PDF
Rust で RTOS を考える
ryuz88
 
PPTX
Oracle Advanced Security Data Redactionのご紹介
オラクルエンジニア通信
 
PDF
Linking Metrics to Logs using Loki
Knoldus Inc.
 
PDF
大規模ゲーム開発におけるHoudini活用事例
hiranodesuyo_sqex
 
PDF
UE4 MultiPlayer Online Deep Dive 実践編2 (ソレイユ株式会社様ご講演) #UE4DD
エピック・ゲームズ・ジャパン Epic Games Japan
 
PDF
20221226_TITECH_lecture_ishizaki_public.pdf
Kazuaki Ishizaki
 
How to use Impala query plan and profile to fix performance issues
Cloudera, Inc.
 
Effective Modern C++ 勉強会 Item 22
Keisuke Fukuda
 
「Helix Core」導入事例紹介 『小~中規模事例 "Unreal Engine 4 × Helix Core ヒストリア運用レギュレーション紹介"』
historia_Inc
 
Tracing 2000+ polyglot microservices at Uber with Jaeger and OpenTracing
Yuri Shkuro
 
【Unite Tokyo 2019】大量のアセットも怖くない!~HTTP/2による高速な通信の実装例~
UnityTechnologiesJapan002
 
わかる!metadata.managedFields / Kubernetes Meetup Tokyo 48
Preferred Networks
 
「スプラトゥーン」リアルタイム画像解析ツール 「IkaLog」の裏側
Takeshi HASEGAWA
 
Grafana LokiではじめるKubernetesロギングハンズオン(NTT Tech Conference #4 ハンズオン資料)
NTT DATA Technology & Innovation
 
[GKE & Spanner 勉強会] Cloud Spanner の技術概要
Google Cloud Platform - Japan
 
Introduction to GItlab CICD Presentation.pptx
Knoldus Inc.
 
Static Membership: Rebalance Strategy Designed for the Cloud (Boyang Chen,Con...
confluent
 
CEDEC2021 ダウンロード時間を大幅減!~大量のアセットをさばく高速な実装と運用事例の共有~
SEGADevTech
 
GTMF 2016:Perforce HelixによるGit環境の改善と拡張 株式会社東陽テクニカ(Perforce Helix)
Game Tools & Middleware Forum
 
Rust で RTOS を考える
ryuz88
 
Oracle Advanced Security Data Redactionのご紹介
オラクルエンジニア通信
 
Linking Metrics to Logs using Loki
Knoldus Inc.
 
大規模ゲーム開発におけるHoudini活用事例
hiranodesuyo_sqex
 
UE4 MultiPlayer Online Deep Dive 実践編2 (ソレイユ株式会社様ご講演) #UE4DD
エピック・ゲームズ・ジャパン Epic Games Japan
 
20221226_TITECH_lecture_ishizaki_public.pdf
Kazuaki Ishizaki
 

Similar to Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| Software for AI Optimization Summit 2021 Technical Session (20)

PPTX
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Intel® Software
 
PDF
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Intel® Software
 
PDF
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
Intel® Software
 
PDF
Python* Scalability in Production Environments
Intel® Software
 
PDF
Intel python 2017
DESMOND YUEN
 
PDF
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Databricks
 
PPTX
Enabling Cross-platform Deep Learning Applications with Intel OpenVINO™
Yury Gorbachev
 
PPTX
Optimization Deep Dive: Unreal Engine 4 on Intel
Intel® Software
 
PDF
Using a Field Programmable Gate Array to Accelerate Application Performance
Odinot Stanislas
 
PPTX
Innovation with ai at scale on the edge vt sept 2019 v0
Ganesan Narayanasamy
 
PDF
Enabling Artificial Intelligence - Alison B. Lowndes
WithTheBest
 
PPT
Host Simulation
napoleaninlondon
 
PPTX
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Intel® Software
 
PDF
Accelerating Real Time Applications on Heterogeneous Platforms
IJMER
 
PDF
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
tdc-globalcode
 
PDF
Deep learning for FinTech
geetachauhan
 
PDF
Enabling NFV features in kubernetes
Kuralamudhan Ramakrishnan
 
PPTX
intel presentation of 2023330000 (1).pptx
AnjaliSharma489502
 
PDF
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Lablup Inc.
 
PDF
Accelerate Machine Learning Software on Intel Architecture
Intel® Software
 
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Intel® Software
 
Streamline End-to-End AI Pipelines with Intel, Databricks, and OmniSci
Intel® Software
 
clCaffe*: Unleashing the Power of Intel Graphics for Deep Learning Acceleration
Intel® Software
 
Python* Scalability in Production Environments
Intel® Software
 
Intel python 2017
DESMOND YUEN
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Databricks
 
Enabling Cross-platform Deep Learning Applications with Intel OpenVINO™
Yury Gorbachev
 
Optimization Deep Dive: Unreal Engine 4 on Intel
Intel® Software
 
Using a Field Programmable Gate Array to Accelerate Application Performance
Odinot Stanislas
 
Innovation with ai at scale on the edge vt sept 2019 v0
Ganesan Narayanasamy
 
Enabling Artificial Intelligence - Alison B. Lowndes
WithTheBest
 
Host Simulation
napoleaninlondon
 
Python Data Science and Machine Learning at Scale with Intel and Anaconda
Intel® Software
 
Accelerating Real Time Applications on Heterogeneous Platforms
IJMER
 
TDC2019 Intel Software Day - Tecnicas de Programacao Paralela em Machine Lear...
tdc-globalcode
 
Deep learning for FinTech
geetachauhan
 
Enabling NFV features in kubernetes
Kuralamudhan Ramakrishnan
 
intel presentation of 2023330000 (1).pptx
AnjaliSharma489502
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Lablup Inc.
 
Accelerate Machine Learning Software on Intel Architecture
Intel® Software
 
Ad

More from Intel® Software (20)

PPTX
AI for All: Biology is eating the world & AI is eating Biology
Intel® Software
 
PDF
AI for good: Scaling AI in science, healthcare, and more.
Intel® Software
 
PDF
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Intel® Software
 
PPTX
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Intel® Software
 
PPTX
AWS & Intel Webinar Series - Accelerating AI Research
Intel® Software
 
PPTX
Intel Developer Program
Intel® Software
 
PDF
Intel AIDC Houston Summit - Overview Slides
Intel® Software
 
PDF
AIDC NY: BODO AI Presentation - 09.19.2019
Intel® Software
 
PDF
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
Intel® Software
 
PDF
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Intel® Software
 
PDF
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Intel® Software
 
PDF
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Intel® Software
 
PDF
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
Intel® Software
 
PDF
AIDC India - AI on IA
Intel® Software
 
PDF
AIDC India - Intel Movidius / Open Vino Slides
Intel® Software
 
PDF
AIDC India - AI Vision Slides
Intel® Software
 
PDF
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Intel® Software
 
PDF
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Software
 
PDF
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
Intel® Software
 
PDF
Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...
Intel® Software
 
AI for All: Biology is eating the world & AI is eating Biology
Intel® Software
 
AI for good: Scaling AI in science, healthcare, and more.
Intel® Software
 
Software AI Accelerators: The Next Frontier | Software for AI Optimization Su...
Intel® Software
 
Advanced Techniques to Accelerate Model Tuning | Software for AI Optimization...
Intel® Software
 
AWS & Intel Webinar Series - Accelerating AI Research
Intel® Software
 
Intel Developer Program
Intel® Software
 
Intel AIDC Houston Summit - Overview Slides
Intel® Software
 
AIDC NY: BODO AI Presentation - 09.19.2019
Intel® Software
 
AIDC NY: Applications of Intel AI by QuEST Global - 09.19.2019
Intel® Software
 
Advanced Single Instruction Multiple Data (SIMD) Programming with Intel® Impl...
Intel® Software
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Intel® Software
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Intel® Software
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
Intel® Software
 
AIDC India - AI on IA
Intel® Software
 
AIDC India - Intel Movidius / Open Vino Slides
Intel® Software
 
AIDC India - AI Vision Slides
Intel® Software
 
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Intel® Software
 
Intel® Open Image Denoise: Optimized CPU Denoising | SIGGRAPH 2019 Technical ...
Intel® Software
 
ANYFACE*: Create Film Industry-Quality Facial Rendering & Animation Using Mai...
Intel® Software
 
Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...
Intel® Software
 
Ad

Recently uploaded (20)

PDF
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
PPTX
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PDF
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PPTX
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
PPTX
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
PPTX
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
PDF
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
Technical-Careers-Roadmap-in-Software-Market.pdf
Hussein Ali
 
Finding Your License Details in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
Tally software_Introduction_Presentation
AditiBansal54083
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
Agentic Automation Journey Series Day 2 – Prompt Engineering for UiPath Agents
klpathrudu
 
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 

Reducing Deep Learning Integration Costs and Maximizing Compute Efficiency| Software for AI Optimization Summit 2021 Technical Session

  • 1. Reducing Deep Learning Integration Costs and Maximising Compute Efficiency for Multiple AI Hardware Jianhui Li Principal Engineer, Intel
  • 2. 2 Deep Learning Trends INT8 FP32 Training Inference Deep Learning Steps Data Precision Topologies Computer Vision Natural Language Processing Recommendation Systems Re-Inforcement Learning Frameworks ResNet-50, Squeezenets, Mobilenet GNMT, Bert NCF, Wide & Deep MiniGO Diverse and rapidly evolving BFloat16
  • 3. The driving forces of AI Optimization Diversifying AI application 3 (conv: General Matrix Multiply) conv Recommendation Engine conv Natural Language Processing conv Computer Vision Hardware Acceleration for AI CPU + DL Acceleration GPU +DL Acceleration Accelera tors
  • 4. 4 Deep learn workload time breakdown • Accelerating matrix multiplication alone doesn’t solve the problem • Conv and Matmul operations are less dominant beyond computer vision application • Low-Precision introduces memory bound quantize operations • Amdahl's law • Need to have aggressive fusion *Profiling data collected from internal performance study
  • 5. Accelerating Matrix Multiplication 5 Dot product Matrix A Matrix C Matrix B M K K N Dot product with matrix operation Matrix A Matrix C Matrix B M K K N potential fusion function
  • 6. 6 Performance Library Integration Framework Graph 1 3 4 2 1 3 4 2 Framework Runtime 1 3 4 2 Pattern Matcher Graph Rewriter Function API Extend Function API to support Fusion Matmul +Relu Mat mul Activ ation Norm RNN Conv +Relu Kernel wrapper Performance Library implements DNN ops and fused op and exposed using function APIs Dispatch fused OPs to registered library functions at Framework Runtime Enhance FW pattern matcher and replace matched subgraph as one fused op backed by library functions 1 2 3 Gelu
  • 7. Framework Graph Representation for Gelu Passing Graph Limitation of Pattern Match 7 Another Framework Graph Representation for Gelu Passing Graph Gelu conv relu conv relu conv relu Input NHWC Output0 NHWC Output1 NHWC Output2 NHWC Small pattern miss optimization for large graph conv relu conv relu conv relu Input NHWC Output0 Blocked Layout Output1 Blocked Layout Output2 NHWC Pattern too rigid to match the input graphs
  • 8. 8 • Graph API allows HW backend to maximize performance • Same integration for multiple AI HW: CPU, GPU, and accelerators Today Deep Learning frameworks Primitives API HW Accel Future Deep Learning frameworks CPU + DL Acceleration GPU +DL Acceleration HW Accel Primitives API + Graph API oneDNN CPU + DL Acceleration GPU +DL Acceleration oneDNN oneDNN is evolving…
  • 9. 9 Framework Runtime Context Graph Rewrite get_partitions() Framework Graph Passing Graph 1 3 4 2 oneDNN Graph API add_op() 1 3 4 2 DL Framework oneDNN Graph Backend 1 3 4 2 compile() execute() Forming graph 1 3 4 2 Backend decides partition 4 2 Backend compiles partition 4 2 Backend executes compiled partition 4 2 oneDNN Graph API
  • 10. 10 oneDNN Graph API Usage oneDNN Graph API Graph Rewrite Framework Graph Passing Graph 1 3 4 2 1 3 4 2 DL Framework Framework Runtime Context 1 3 4 2 CPU GPU Intel®, ARM Intel®, NVIDIA GPU * Other names and brands may be claimed as the property of others. Other implementations Accelerators Graph Rewrite Framework Graph Passing Graph 1 3 4 2 1 3 4 2 DL Framework Framework Runtime Context 1 2 4 3 Leverage oneDNN based framework integration and oneDNN implementation Leverage oneDNN based framework integration and bring your own implementation based on backend API Unified API for DL acceleration libraries targeting AI HWs 1 3 4 2 4 2 4 2 4 2 oneDNN w/ Graph backend API
  • 11. Industry Momentum oneDNN implementation ported to A64FX Fugaku CPU Optimized for the Armv8-A and SVE instruction set 9.3x speedup for Tensorflow Resnet-50 training and 7.8x for inference on A64FX https://github.com/oneapi-src/oneDNN 11 https://blog.fltech.dev/entry/2020/11/19/fugaku-onednn-deep-dive-en Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy
  • 12. Call to action • Join us on this journey - • Hardware developers – read, provide feedback, and adopt oneDNN Graph for XPU computing!​ https://spec.oneapi.com/onednn-graph/latest/ https://github.com/oneapi-src/oneDNN/tree/dev-graph • Check out www.oneAPI.com for oneAPI specification • Software developers – try out oneAPI in the Intel DevCloud https://software.intel.com/content/www/us/en/develop/tools/devcloud.html 12 Preview
  • 13. Notices and Disclaimers • Intel technologies may require enabled hardware, software or service activation. • No product or component can be absolutely secure. • Your costs and results may vary. • © Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. 13