IntelfpgasforaiSupercomputing 2018
ScaleYourInnovation 2
WhyFPGAsWINInDeepLearning
Enabling real time AI in a wide range of
embedded, edge, and data center applications
FIRSTTOMARKETTOACCELERATE
EVoLVINGAIWORKLOADS
▪ Precision
▪ Latency
▪ Sparsity
▪ AdversarialNetworks
▪ ReinforcementLearning
▪ NeuromorphicComputing
▪ …
Lowlatencymemory
constrainedworkloads
▪ Rnn
▪ Lstm
▪ SpeechWL
DeliveringAI+forFlexible
systemlevelfunctionality
▪ AI+I/OIngest
▪ AI+Networking
▪ Ai+security
▪ Ai+pre/postprocessing
▪ …
ScaleYourInnovation 3
Fpgas-flexibleforevolvingprecision
ResNet-34 1x Wide ResNet-34 2x Wide ResNet-34 3x Wide
Activation Weight Eq TOPS Top-1 Acc Eq TOPS Top-1 Acc Eq TOPS Top-1 Acc
FP32 FP32 7 0.7359 NR NR NR NR
8-bit 8-bit 8 0.7093 2 NR 1 NR
8-bit Ternary 43 0.6919 11 NR 5 NR
8-bit Binary 52 NR 13 NR 6 NR
4-bit 4-bit 18 0.7033 5 0.7453 2 NR
3-bit 3-bit 51 NR 13 6 NR
2-bit 2-bit 85 0.6793 21 0.7332 9 NR
2-bit Ternary 98 0.6793 25 0.7332 11 NR
1-bit 1-bit 267 0.6054 67 0.6985 30 0.7238
▪ Explore precision
and accuracy
balance
▪ 4X performance
gain with the
same FPGA
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark,
are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should
consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other
products. For more complete information visit http://www.intel.com/performance. Copyright © 2017, Intel Corporation
Throughput and Accuracy for various PE configurations on ResNet Topologies
ScaleYourInnovation 4
FpgassolveMemoryboundworkloads
Mozilla DeepSpeech topology implementation
▪ Intel® Stratix 10 MX can further
reduce latency by directly
ingesting the speech signal
*Estimations performed by Manjeera Design Systems Assumption: ~4.4 TOPs of 16b compute (8192 MACs at 266MHz) for Intel Stratix 10 MX
Stream Length FPGA (estimated) (16 bit)
1s 0.003s
10s 0.312s
20s 0.624s
40s 1.25s
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark,
are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should
consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other
products. For more complete information visit http://www.intel.com/performance. Copyright © 2017, Intel Corporation
▪ Intel Stratix 10 MX offers
512GBps bandwidth via multiple
integrated HBMs
ScaleYourInnovation
Intel®
Xeon®
Processor
5
AI+flexibleI/o&networking
Per-chip performance increases when scaled
AI + I/O & networking unlocks nonlinear performance gains through pooling
2x improvement
w/ ResNet-101
Intel®
Xeon®
Processor
Intel®
Xeon®
Processor
Intel®
Arria® 10
FPGA
Intel®
Arria® 10
FPGA
Intel®
Arria® 10
FPGA
ScaleYourInnovation 6
AI+Pre/postprocessing&directI/oprovideslowlatency
FPGA
Compute
Latency
FPGAs can perform in-line, real-time
acceleration on the data ingest and
avoid costly data movement within
the system
Intel® Xeon®
Processor
Data Sources
LowerSystemlatency
AI Crash Course- Supercomputing
ScaleYourInnovation 8
HowIntel®FPGAsenableDEEPLearningI/O
I/O
I/O
I/O
▪ Millions of reconfigurable logic elements & routing
fabric
▪ Thousands of 20Kb memory blocks & MLABs
▪ Thousands of variable precision digital signal
processing (DSP) blocks
▪ Hundreds of configurable I/O & high-speed
transceivers
▪ Programmable Datapath
▪ Customized Memory structure
▪ Configurable compute
ScaleYourInnovation 9
Adaptingtoinnovation
Many efforts to improve efficiency
▪ Batching
▪ Reduce bit width
▪ Sparse weights
▪ Sparse activations
▪ Weight sharing
▪ Compact network
SparseCNN
[CVPR’15]
Spatially SparseCNN
[CIFAR-10 winner ‘14]
Pruning
[NIPS’15]
TernaryConnec
t [ICLR’16]
BinaryConnect
[NIPS’15]
DeepComp
[ICLR’16]
HashedNets
[ICML’15]
XNORNet
SqueezeNet
I
X
W
=
···
···
O
3 2
1 3
13
1
3
Shared Weights
LeNet
[IEEE}
AlexNet
[ILSVRC’12}
VGG
[ILSVRC’14}
GoogleNet
[ILSVRC’14}
ResNet
[ILSVRC’15}
I W O
2
3
ScaleYourInnovation 10
Performanceimprovementovertime
Model
Sept-17
Baseline
Dec-17 Feb-18 Apr-18 Jun-18 Oct-18 Dec-18 (projected)
SqueezeNet 1x 1.13x 1.75x 2.61x 3.89x 4.33x 4.51x
GoogleNet 1x 1.13x 1.22x 1.46x 3.55x 4.11x 4.50x
▪ Continually adapting
the custom data flow,
memory hierarchy and
compute enables
improved performance
with the same power
footprint
Jun-17 Sep-17 Dec-17 Apr-18 Jul-18 Oct-18 Feb-19
Performance(img/s)
SqueezeNet and Googlenet
Performance over Time, Batch=1
AI Crash Course- Supercomputing
ScaleYourInnovation 12
Intel® FPGADeepLearning accelerationsuite
Pre-compiledGraphArchitecture ExampleTopologies
DDR
DDR
DDR
DDR
Configuration
Engine
AlexNet GoogleNet Tiny Yolo
SqueezeNetVGG16 ResNet 18
…*
ResNet 50ResNet 101
Memory
Reader
/Writer
Crossbar
CUST
OM*
PRIM
Conv
PE Array
Feature Map Cache
*Deeper customization options
COMING SOON!
PRIM PRIM
*More topologies added with every release
MobileNet ResNetSSD
SqueezeNet
SDD
ScaleYourInnovation 13
OpenvinoTM toolkitforintelfpgas
Anall-in-onesolutiontoeasily
harnessthebenefitsofFPGAs
▪ Enables developers and data scientists to take
their prototype application to production
▪ Utilize API-based & direct coding to maximize
performance
▪ Deeper customization capabilities coming
soon
OpenVINO™ Toolkit
IntelDeepLearning
DeploymentToolkit
Inference
Engine
Model
Optimizer
Intel FPGA DL
Acceleration Suite
TODAY’S INTEL FPGA
SUPPORTED
DEEP LEARNING FRAMEWORKS
Intel
Xeon®
Processor
Intel
FPGAHeterogeneous
CPU/FPGA
Deployment
Free Download 
software.intel.com/openvino-toolkit
ScaleYourInnovation 14
Yourapplicationaccelerationwithfpgapoweredplatforms
*Please contact Intel representative for complete list of ODM manufacturers. Other names and brands may be claimed as the property of others.
INTERFACE
CURRENTLY MANUFACTURED
BY*
Mustang F-100
PCIe x8
Develop NN Model; Deploy across Intel® CPU, GPU, VPU, FPGA; Leverage common algorithms
SOFTWARE
TOOLS
SUPPORTED
PLATFORMS FOR
FPGA
Intel Programmable
Acceleration Card with
Intel Arria 10
PCIe x8
Intel® Arria® 10
Development Kit
PCIe x8
INTEL® INTEL®
Openvino™toolkit
ScaleYourInnovation 15
Usecase1:search
Solution Search
Looking for a quick path to deploy and accelerate instant
reverse image searches of products for retail convenience
Solution Success
Intel® FPGAs offered real-time AI inferencing using OpenVINO™
toolkit. This enabled engineers to map neural networks to FPGA,
accelerating image searches with increased throughput and lower
latency, all without the need for FPGA programming experience
Real-timeaioptimizedforperformance,powerandcost
OpenVINO™ Toolkit
Accelerating workloads,
enabling deep learning
capabilities for smarter and
faster ways to transform data
for competitive edge
Intel Programmable
Acceleration Card with
Intel Arria® 10 FPGA
Deployment ready PCIe-
based card with versatile
built-in multifunction
acceleration capabilities with
low-power dissipation and
low-profile form factor
Acceleration stack for
Intel® Xeon® CPU with
FPGAs
Abstracting programming
complexity and maximizing
ease of use by hot-swapping
accelerators and enabling
application portability for
Intel FPGA based
acceleration solutions
ScaleYourInnovation 16
UseCase2:Microsoft’sAIforEarth
Microsoft leverages the multimode
capabilities of Intel FPGAs to push through
the memory wall to maximize performance
Project Brainwave with Intel®
Stratix® 10 gives Performance/$ 
only $42 of compute*
200M Images, 20TB
Land cover mapping for the whole US
10+ minutes
*Microsoft’s Blog
ScaleYourInnovation 17
Summary
Delivering AI+ for Flexible system
level functionality
First to market to accelerate
evolving AI workloads
▪ OpenVINO™ Toolkit is free to download and enables you to deploy on Intel
FPGAs directly from TensorFlow or Caffe
▪ Intel’s FPGA architecture enables programmable datapath, custom
memory structure and configurable compute
INTELFPGASENABLE
ScaleYourInnovation 18
resources
Intel FPGA Training
https://www.intel.com/content/www/us/en/programmable/support/training/overview.html
Get started quickly with:
▪ Find out more online at ww w.intel.com/ai and www.intel.com/fpga
▪ Intel Tech.Decoded online webinars, tool
how-tos & quick tips
▪ Hands-on in-person events
Support
▪ Connect with Intel engineers & AI experts via the public Community Forum
Download 
Free OPENVINO™ toolkit
AI Crash Course- Supercomputing

More Related Content

PDF
High Memory Bandwidth Demo @ One Intel Station
PPTX
FPGA Inference - DellEMC SURFsara
PDF
Increasing Throughput per Node for Content Delivery Networks
PDF
FPGAs and Machine Learning
PDF
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
PDF
Using Xeon + FPGA for Accelerating HPC Workloads
PPTX
Altera’s Role In Accelerating the Internet of Things
PDF
What are latest new features that DPDK brings into 2018?
High Memory Bandwidth Demo @ One Intel Station
FPGA Inference - DellEMC SURFsara
Increasing Throughput per Node for Content Delivery Networks
FPGAs and Machine Learning
3rd Generation Intel® Xeon® Scalable Processor - Achieving 1 Tbps IPsec with ...
Using Xeon + FPGA for Accelerating HPC Workloads
Altera’s Role In Accelerating the Internet of Things
What are latest new features that DPDK brings into 2018?

What's hot (20)

PPTX
Cost-Effective System Continuation using Xilinx FPGAs and Legacy Processor IP
PDF
Using a Field Programmable Gate Array to Accelerate Application Performance
PPTX
Improving Quality of Service via Intel RDT
PDF
AIDC NY: BODO AI Presentation - 09.19.2019
PDF
Introduction to the DAOS Scale-out object store (HLRS Workshop, April 2017)
PDF
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
PDF
Intel Knights Landing Slides
PDF
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
PDF
Machine programming
PPTX
Data-Intensive Workflows with DAOS
PDF
NNSA Explorations: ARM for Supercomputing
PDF
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
PDF
AIDC India - AI on IA
PDF
Hardware & Software Platforms for HPC, AI and ML
PPTX
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
PDF
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
PDF
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
PDF
Ac922 cdac webinar
PDF
Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...
PDF
Under the Armor of Knights Corner: Intel MIC Architecture at Hotchips 2012
Cost-Effective System Continuation using Xilinx FPGAs and Legacy Processor IP
Using a Field Programmable Gate Array to Accelerate Application Performance
Improving Quality of Service via Intel RDT
AIDC NY: BODO AI Presentation - 09.19.2019
Introduction to the DAOS Scale-out object store (HLRS Workshop, April 2017)
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Intel Knights Landing Slides
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
Machine programming
Data-Intensive Workflows with DAOS
NNSA Explorations: ARM for Supercomputing
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
AIDC India - AI on IA
Hardware & Software Platforms for HPC, AI and ML
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
IMCSummit 2015 - Day 1 Developer Track - Evolution of non-volatile memory exp...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
Ac922 cdac webinar
Ray Tracing with Intel® Embree and Intel® OSPRay: Use Cases and Updates | SIG...
Under the Armor of Knights Corner: Intel MIC Architecture at Hotchips 2012
Ad

Similar to AI Crash Course- Supercomputing (20)

PPTX
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
PDF
DPDK: Multi Architecture High Performance Packet Processing
PDF
Pedal to the Metal: Accelerating Spark with Silicon Innovation
PDF
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
PDF
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
PDF
Forwarding Plane Opportunities: How to Accelerate Deployment
PPTX
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
PDF
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
PDF
Netronome Corporate Brochure
PDF
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
PDF
Python* Scalability in Production Environments
PDF
Intel python 2017
PDF
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
PDF
NFV and SDN: 4G LTE and 5G Wireless Networks on Intel(r) Architecture
PDF
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
PDF
GTC15-Manoj-Roge-OpenPOWER
PDF
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
PPTX
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
PPTX
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
PDF
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
DPDK: Multi Architecture High Performance Packet Processing
Pedal to the Metal: Accelerating Spark with Silicon Innovation
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Forwarding Plane Opportunities: How to Accelerate Deployment
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
Netronome Corporate Brochure
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Python* Scalability in Production Environments
Intel python 2017
Red Hat Storage Day Atlanta - Designing Ceph Clusters Using Intel-Based Hardw...
NFV and SDN: 4G LTE and 5G Wireless Networks on Intel(r) Architecture
Optimized HPC/AI cloud with OpenStack acceleration service and composable har...
GTC15-Manoj-Roge-OpenPOWER
Ceph Day Seoul - Delivering Cost Effective, High Performance Ceph cluster
Ceph Day KL - Delivering cost-effective, high performance Ceph cluster
Ceph Day Tokyo - Delivering cost effective, high performance Ceph cluster
Ceph Day Taipei - Delivering cost-effective, high performance, Ceph cluster
Ad

More from Intel IT Center (20)

PDF
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
PDF
Disrupt Hackers With Robust User Authentication
PDF
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
PDF
Harness Digital Disruption to Create 2022’s Workplace Today
PPTX
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
PDF
Achieve Unconstrained Collaboration in a Digital World
PDF
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
PDF
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
PPTX
Identity Protection for the Digital Age
PDF
Three Steps to Making a Digital Workplace a Reality
PDF
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
PDF
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
PDF
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
PDF
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
PDF
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
PDF
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
PDF
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
PDF
Intel® Xeon® Processor E5-2600 v4 Big Data Analytics Applications Showcase
PDF
Intel® Xeon® Processor E5-2600 v4 Product Family EAMG
PDF
Gobblin for Data Analytics
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
Disrupt Hackers With Robust User Authentication
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Harness Digital Disruption to Create 2022’s Workplace Today
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
Achieve Unconstrained Collaboration in a Digital World
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
Identity Protection for the Digital Age
Three Steps to Making a Digital Workplace a Reality
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Big Data Analytics Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Product Family EAMG
Gobblin for Data Analytics

Recently uploaded (20)

PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PDF
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
PDF
Early detection and classification of bone marrow changes in lumbar vertebrae...
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
The AI Revolution in Customer Service - 2025
PDF
Connector Corner: Transform Unstructured Documents with Agentic Automation
PPTX
Report in SIP_Distance_Learning_Technology_Impact.pptx
PDF
Lung cancer patients survival prediction using outlier detection and optimize...
PDF
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
PDF
NewMind AI Journal Monthly Chronicles - August 2025
PDF
Identification of potential depression in social media posts
PDF
Rapid Prototyping: A lecture on prototyping techniques for interface design
PDF
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
PDF
EIS-Webinar-Regulated-Industries-2025-08.pdf
PPTX
Presentation - Principles of Instructional Design.pptx
PPTX
Blending method and technology for hydrogen.pptx
PDF
Launch a Bumble-Style App with AI Features in 2025.pdf
PPTX
Rise of the Digital Control Grid Zeee Media and Hope and Tivon FTWProject.com
PDF
CEH Module 2 Footprinting CEH V13, concepts
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
IT-ITes Industry bjjbnkmkhkhknbmhkhmjhjkhj
Early detection and classification of bone marrow changes in lumbar vertebrae...
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
The AI Revolution in Customer Service - 2025
Connector Corner: Transform Unstructured Documents with Agentic Automation
Report in SIP_Distance_Learning_Technology_Impact.pptx
Lung cancer patients survival prediction using outlier detection and optimize...
CXOs-Are-you-still-doing-manual-DevOps-in-the-age-of-AI.pdf
NewMind AI Journal Monthly Chronicles - August 2025
Identification of potential depression in social media posts
Rapid Prototyping: A lecture on prototyping techniques for interface design
The-Future-of-Automotive-Quality-is-Here-AI-Driven-Engineering.pdf
EIS-Webinar-Regulated-Industries-2025-08.pdf
Presentation - Principles of Instructional Design.pptx
Blending method and technology for hydrogen.pptx
Launch a Bumble-Style App with AI Features in 2025.pdf
Rise of the Digital Control Grid Zeee Media and Hope and Tivon FTWProject.com
CEH Module 2 Footprinting CEH V13, concepts
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf

AI Crash Course- Supercomputing

  • 2. ScaleYourInnovation 2 WhyFPGAsWINInDeepLearning Enabling real time AI in a wide range of embedded, edge, and data center applications FIRSTTOMARKETTOACCELERATE EVoLVINGAIWORKLOADS ▪ Precision ▪ Latency ▪ Sparsity ▪ AdversarialNetworks ▪ ReinforcementLearning ▪ NeuromorphicComputing ▪ … Lowlatencymemory constrainedworkloads ▪ Rnn ▪ Lstm ▪ SpeechWL DeliveringAI+forFlexible systemlevelfunctionality ▪ AI+I/OIngest ▪ AI+Networking ▪ Ai+security ▪ Ai+pre/postprocessing ▪ …
  • 3. ScaleYourInnovation 3 Fpgas-flexibleforevolvingprecision ResNet-34 1x Wide ResNet-34 2x Wide ResNet-34 3x Wide Activation Weight Eq TOPS Top-1 Acc Eq TOPS Top-1 Acc Eq TOPS Top-1 Acc FP32 FP32 7 0.7359 NR NR NR NR 8-bit 8-bit 8 0.7093 2 NR 1 NR 8-bit Ternary 43 0.6919 11 NR 5 NR 8-bit Binary 52 NR 13 NR 6 NR 4-bit 4-bit 18 0.7033 5 0.7453 2 NR 3-bit 3-bit 51 NR 13 6 NR 2-bit 2-bit 85 0.6793 21 0.7332 9 NR 2-bit Ternary 98 0.6793 25 0.7332 11 NR 1-bit 1-bit 267 0.6054 67 0.6985 30 0.7238 ▪ Explore precision and accuracy balance ▪ 4X performance gain with the same FPGA Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. Copyright © 2017, Intel Corporation Throughput and Accuracy for various PE configurations on ResNet Topologies
  • 4. ScaleYourInnovation 4 FpgassolveMemoryboundworkloads Mozilla DeepSpeech topology implementation ▪ Intel® Stratix 10 MX can further reduce latency by directly ingesting the speech signal *Estimations performed by Manjeera Design Systems Assumption: ~4.4 TOPs of 16b compute (8192 MACs at 266MHz) for Intel Stratix 10 MX Stream Length FPGA (estimated) (16 bit) 1s 0.003s 10s 0.312s 20s 0.624s 40s 1.25s Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. Copyright © 2017, Intel Corporation ▪ Intel Stratix 10 MX offers 512GBps bandwidth via multiple integrated HBMs
  • 5. ScaleYourInnovation Intel® Xeon® Processor 5 AI+flexibleI/o&networking Per-chip performance increases when scaled AI + I/O & networking unlocks nonlinear performance gains through pooling 2x improvement w/ ResNet-101 Intel® Xeon® Processor Intel® Xeon® Processor Intel® Arria® 10 FPGA Intel® Arria® 10 FPGA Intel® Arria® 10 FPGA
  • 6. ScaleYourInnovation 6 AI+Pre/postprocessing&directI/oprovideslowlatency FPGA Compute Latency FPGAs can perform in-line, real-time acceleration on the data ingest and avoid costly data movement within the system Intel® Xeon® Processor Data Sources LowerSystemlatency
  • 8. ScaleYourInnovation 8 HowIntel®FPGAsenableDEEPLearningI/O I/O I/O I/O ▪ Millions of reconfigurable logic elements & routing fabric ▪ Thousands of 20Kb memory blocks & MLABs ▪ Thousands of variable precision digital signal processing (DSP) blocks ▪ Hundreds of configurable I/O & high-speed transceivers ▪ Programmable Datapath ▪ Customized Memory structure ▪ Configurable compute
  • 9. ScaleYourInnovation 9 Adaptingtoinnovation Many efforts to improve efficiency ▪ Batching ▪ Reduce bit width ▪ Sparse weights ▪ Sparse activations ▪ Weight sharing ▪ Compact network SparseCNN [CVPR’15] Spatially SparseCNN [CIFAR-10 winner ‘14] Pruning [NIPS’15] TernaryConnec t [ICLR’16] BinaryConnect [NIPS’15] DeepComp [ICLR’16] HashedNets [ICML’15] XNORNet SqueezeNet I X W = ··· ··· O 3 2 1 3 13 1 3 Shared Weights LeNet [IEEE} AlexNet [ILSVRC’12} VGG [ILSVRC’14} GoogleNet [ILSVRC’14} ResNet [ILSVRC’15} I W O 2 3
  • 10. ScaleYourInnovation 10 Performanceimprovementovertime Model Sept-17 Baseline Dec-17 Feb-18 Apr-18 Jun-18 Oct-18 Dec-18 (projected) SqueezeNet 1x 1.13x 1.75x 2.61x 3.89x 4.33x 4.51x GoogleNet 1x 1.13x 1.22x 1.46x 3.55x 4.11x 4.50x ▪ Continually adapting the custom data flow, memory hierarchy and compute enables improved performance with the same power footprint Jun-17 Sep-17 Dec-17 Apr-18 Jul-18 Oct-18 Feb-19 Performance(img/s) SqueezeNet and Googlenet Performance over Time, Batch=1
  • 12. ScaleYourInnovation 12 Intel® FPGADeepLearning accelerationsuite Pre-compiledGraphArchitecture ExampleTopologies DDR DDR DDR DDR Configuration Engine AlexNet GoogleNet Tiny Yolo SqueezeNetVGG16 ResNet 18 …* ResNet 50ResNet 101 Memory Reader /Writer Crossbar CUST OM* PRIM Conv PE Array Feature Map Cache *Deeper customization options COMING SOON! PRIM PRIM *More topologies added with every release MobileNet ResNetSSD SqueezeNet SDD
  • 13. ScaleYourInnovation 13 OpenvinoTM toolkitforintelfpgas Anall-in-onesolutiontoeasily harnessthebenefitsofFPGAs ▪ Enables developers and data scientists to take their prototype application to production ▪ Utilize API-based & direct coding to maximize performance ▪ Deeper customization capabilities coming soon OpenVINO™ Toolkit IntelDeepLearning DeploymentToolkit Inference Engine Model Optimizer Intel FPGA DL Acceleration Suite TODAY’S INTEL FPGA SUPPORTED DEEP LEARNING FRAMEWORKS Intel Xeon® Processor Intel FPGAHeterogeneous CPU/FPGA Deployment Free Download  software.intel.com/openvino-toolkit
  • 14. ScaleYourInnovation 14 Yourapplicationaccelerationwithfpgapoweredplatforms *Please contact Intel representative for complete list of ODM manufacturers. Other names and brands may be claimed as the property of others. INTERFACE CURRENTLY MANUFACTURED BY* Mustang F-100 PCIe x8 Develop NN Model; Deploy across Intel® CPU, GPU, VPU, FPGA; Leverage common algorithms SOFTWARE TOOLS SUPPORTED PLATFORMS FOR FPGA Intel Programmable Acceleration Card with Intel Arria 10 PCIe x8 Intel® Arria® 10 Development Kit PCIe x8 INTEL® INTEL® Openvino™toolkit
  • 15. ScaleYourInnovation 15 Usecase1:search Solution Search Looking for a quick path to deploy and accelerate instant reverse image searches of products for retail convenience Solution Success Intel® FPGAs offered real-time AI inferencing using OpenVINO™ toolkit. This enabled engineers to map neural networks to FPGA, accelerating image searches with increased throughput and lower latency, all without the need for FPGA programming experience Real-timeaioptimizedforperformance,powerandcost OpenVINO™ Toolkit Accelerating workloads, enabling deep learning capabilities for smarter and faster ways to transform data for competitive edge Intel Programmable Acceleration Card with Intel Arria® 10 FPGA Deployment ready PCIe- based card with versatile built-in multifunction acceleration capabilities with low-power dissipation and low-profile form factor Acceleration stack for Intel® Xeon® CPU with FPGAs Abstracting programming complexity and maximizing ease of use by hot-swapping accelerators and enabling application portability for Intel FPGA based acceleration solutions
  • 16. ScaleYourInnovation 16 UseCase2:Microsoft’sAIforEarth Microsoft leverages the multimode capabilities of Intel FPGAs to push through the memory wall to maximize performance Project Brainwave with Intel® Stratix® 10 gives Performance/$  only $42 of compute* 200M Images, 20TB Land cover mapping for the whole US 10+ minutes *Microsoft’s Blog
  • 17. ScaleYourInnovation 17 Summary Delivering AI+ for Flexible system level functionality First to market to accelerate evolving AI workloads ▪ OpenVINO™ Toolkit is free to download and enables you to deploy on Intel FPGAs directly from TensorFlow or Caffe ▪ Intel’s FPGA architecture enables programmable datapath, custom memory structure and configurable compute INTELFPGASENABLE
  • 18. ScaleYourInnovation 18 resources Intel FPGA Training https://www.intel.com/content/www/us/en/programmable/support/training/overview.html Get started quickly with: ▪ Find out more online at ww w.intel.com/ai and www.intel.com/fpga ▪ Intel Tech.Decoded online webinars, tool how-tos & quick tips ▪ Hands-on in-person events Support ▪ Connect with Intel engineers & AI experts via the public Community Forum Download  Free OPENVINO™ toolkit