SlideShare a Scribd company logo
CONDOR
AN AUTOMATED FRAMEWORK TO ACCELERATE
CONVOLUTIONAL NEURAL NETWORKS ON FPGA
Oracle Labs

Redwood, CA

May 30th, 2018
Niccolò Raspa, Marco Bacis, 

Giuseppe Natale, Marco D. Santambrogio
CONDOR: An automated framework to accelerate convolutional neural networks on FPGA
Convolutional Neural Networks
!3
!4
CNN on siliconCNN on silicon
GPU
Fixed architecture
High power consumption
Adaptable
FPGA
Reconfigurable architecture
Low Power Consumption
Adaptable
ASIC
Fixed architecture
Low Power Consumption
Not adaptable
Manual Design
!5
Extract the parameters

and the weights
Write the code Synthesis
Evaluate DesignPackage IP
Iterate
Automatic Design
!6
CONDOR
Framework Architecture
!7
Parse structure 

of the CNN
FRONTEND
Creation of HW
Accelerator
CORE LOGIC
Deployment
BACKEND
Create DAG computation
!8
PROTOTXT

CAFFEMODEL
{
Input Data
Convolution
Pooling
Fully Connected
Convolution
Pooling
Fully Connected
Input Dimension: (28, 28, 1)
Output Dimension (24, 24, 20)
Kernel: 5
Padding: 0
Stride: 1
Input dimension (28, 28, 1)
Input Dimension: (24, 24, 20)
Output Dimension (12, 12, 20)
Kernel: 2
Padding: 0
Stride: 2
Map computation in hardware
!9
Area
Convolution Pooling Fully ConnectedConvolution Pooling
Increase parallelism
!10
Area
Integration with SDAccel
!11
CONDOR
What if I don’t have an FPGA?
!12
CONDOR
Features
!13
Cloud Integration
via Amazon F1 Instances
Automatic creation of
an hardware accelerator for FPGA
Tune the tradeoff between 

performance and power consumption
Support main deep
learning libraries
Roadmap
!14
Automated
Framework
Methodology for
Acceleration of
CNN
Integration
with Caffe
Cloud Integration
M. Bacis, G. Natale, E. Del Sozzo, and M. D. Santambrogio.
“A Pipelined and Scalable Dataflow Implementation of Convolutional Neural Networks on FPGA”
In: 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Giuseppe Natale, Marco Bacis and Marco Domenico Santambrogio.
“On how to design dataflow FPGA-based accelerators for Convolutional Neural Networks”
In: 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)
*
2017
Support the new 

standard ONXX
2018
Open Source 

Release
2019?
Extend
Methodology
XOHW
Competition
*
Architectural choices for the HW design
!15
Parse structure 

of the CNN
FRONTEND
Creation of HW
Accelerator
CORE LOGIC
Deployment
BACKEND
FPGAs and CNNs
!16
Dataflow Computation
Data reusability
Distributed Architecture
Our first approach
!17
[1] Giuseppe Natale, Marco Bacis, Marco D. Santambrogio
“On how to design fpga-based accelerators for Convolutional Neural Networks”, ISVLSI 2017
DMA
in
CONV POOL CONV LINEAR
w/b w/b w/b
POOL
POOL
POOL
POOL
Bigger networks, bigger FPGAs… or not?
!18
• Weights don’t fit on the on-chip BRAMs

• Unrolling leads to the explosion of DSPs (multipliers) usage
Methodology improvements
!19
• No complete unrolling - partial accumulations

• Generic set of “one size fits all” blocks

• Semi-dataflow architecture

• More complex data movement
Customizable data flow
!20
Datamover/Control
Conv
Pooling
ReLU
in w/b out
Customizable data flow
!21
ReLU
Pool
ReLU
ReLU
Conv
Conv
Conv
Datamover/Control
Conv
Pooling
ReLU
in w/b out
Datamover/Control
Conv
Pooling
ReLU
in w/b out
Datamover/Control
Conv
Pooling
ReLU
in w/b out
Datamover/Control
Conv
Pooling
ReLU
in w/b out
MAC
weights
input
result
Dataflow Blocks
!22
• Convolution, Pooling

• Non-uniform memory partitioning

• Streaming pattern 

• Optimal full buffering

• Concurrent accesses
Partial accumulations approach
!23
• Custom level of parallelism

• Compute subset of both input/output feature maps

• Accumulation done with a FIFO and/or from DDR
Memory control and data buffering
!24
• Memory mapped to streaming and viceversa

• Exploit the maximum transaction size and bursts
Weights Double Buffering
!25
• Masks weights loading latency

• Allows to not flush the MAC pipeline on each iteration
ping
pong
Input Caching
!26
• Reduces memory accesses

• Stores entire input for a layer

• Used for small layers (avoid lots of small transactions)
Datamover/ControlInput
Cache
in
Datamover/ControlInput
Cache
in
Architecture Evaluation
!27
•~4 MB BRAM

•2880 DSPs (27x15 bits mult)

•1 DDR port (512 bits wide)

•115.2 GFLOPs max (100Mhz)
Alphadata Virtex-7
Setup Results
•30.6 GOPs, 56MB parameters

•4 input, 4 output ports

•27.2 GFLOPs estimated

•14.4 GFLOPs reached
VGG16 Network
Lessons Learned
!28
• Floating point is dead, long live the fixed!

• Off-chip memory vs On-chip memory
Next Steps
!29
• Possibility to use URAMs as on chip storage (~33 MB)
• Higher number of DSPs (~2.3X)

• Efficient multiplication (8 bits fixed point -> 2 mul/dsp)

• Higher memory BW (4 DDR ports)
[2] Deep Learning with INT8 Optimization on Xilinx Devices
Next Steps
MAC/
Window
FSM
Acc/ReLU
Pooling
1024 out512 in 2-64 out
Weights
I/O Buffer
Marco Bacis
marco.bacis@mail.polimi.it
Niccolo’ Raspa

niccolo.raspa@mail.polimi.it
Giuseppe Natale
giuseppe.natale@polimi.it
Marco D. Santambrogio
marco.santambrogio@polimi.it
twitter.com/CondorAtNECST
facebook.com/CondorAtNECST
AN AUTOMATED FRAMEWORK TO ACCELERATE
CONVOLUTIONAL NEURAL NETWORKS ON FPGA

More Related Content

What's hot (17)

PDF
Embedded Recipes 2017 - Reliable monitoring with systemd - Jérémy Rosen
Anne Nicolas
 
PDF
Circuit Simplifier
Vineet Markan
 
PDF
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
byteLAKE
 
PDF
OSCON 2017: To contain or not to contain
Jeremy Eder
 
PPT
Dad i want a supercomputer on my next
Akash Sahoo
 
PDF
LinuxCon NA 2016: When Containers and Virtualization Do - and Don’t - Work T...
Jeremy Eder
 
ODP
µCLinux on Pluto 6 Project presentation
edlangley
 
PPTX
Gpu acceleration for simulating massively parallel many core platforms
WMLab,NCU
 
PDF
CONDOR @ NGCLE@e-Novia 15.11.2017
NECST Lab @ Politecnico di Milano
 
ODP
UKUUG presentation about µCLinux on Pluto 6
edlangley
 
ODP
UWE Linux Boot Camp 2007: Hacking embedded Linux on the cheap
edlangley
 
PPTX
ICSIPA 2017 presentation
MohamedShaafiee
 
PDF
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
Yan Vugenfirer
 
PDF
Let’s Fix Logging Once and for All
ScyllaDB
 
PPTX
Nextgen virtualization.pptx
Ryoichi Kida
 
PDF
Generic Resource Manager - László Vadkerti, András Kovács
harryvanhaaren
 
PPTX
Hardware considerations for different node types
Deepak Mane
 
Embedded Recipes 2017 - Reliable monitoring with systemd - Jérémy Rosen
Anne Nicolas
 
Circuit Simplifier
Vineet Markan
 
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
byteLAKE
 
OSCON 2017: To contain or not to contain
Jeremy Eder
 
Dad i want a supercomputer on my next
Akash Sahoo
 
LinuxCon NA 2016: When Containers and Virtualization Do - and Don’t - Work T...
Jeremy Eder
 
µCLinux on Pluto 6 Project presentation
edlangley
 
Gpu acceleration for simulating massively parallel many core platforms
WMLab,NCU
 
CONDOR @ NGCLE@e-Novia 15.11.2017
NECST Lab @ Politecnico di Milano
 
UKUUG presentation about µCLinux on Pluto 6
edlangley
 
UWE Linux Boot Camp 2007: Hacking embedded Linux on the cheap
edlangley
 
ICSIPA 2017 presentation
MohamedShaafiee
 
Receive side scaling (RSS) with eBPF in QEMU and virtio-net
Yan Vugenfirer
 
Let’s Fix Logging Once and for All
ScyllaDB
 
Nextgen virtualization.pptx
Ryoichi Kida
 
Generic Resource Manager - László Vadkerti, András Kovács
harryvanhaaren
 
Hardware considerations for different node types
Deepak Mane
 

Similar to CONDOR: An automated framework to accelerate convolutional neural networks on FPGA (20)

PDF
A Framework with Cloud Integration for CNN Acceleration on FPGA Devices
NECST Lab @ Politecnico di Milano
 
PPTX
Implementation strategies for digital ics
aroosa khan
 
PPSX
Summary Of Course Projects
awan2008
 
PPTX
Dr.s.shiyamala fpga ppt
SHIYAMALASUBRAMANI1
 
PPTX
Designing for High Performance Ceph at Scale
James Saint-Rossy
 
PPTX
Processors selection
Pradeep Shankhwar
 
PDF
00 opencapi acceleration framework yonglu_ver2
Yutaka Kawai
 
PPTX
Programmable Exascale Supercomputer
Sagar Dolas
 
PDF
Oow 2008 yahoo_pie-db
bohanchen
 
PDF
A Dataflow Processing Chip for Training Deep Neural Networks
inside-BigData.com
 
PDF
Reconfigurable Coprocessors Synthesis in the MPEG-RVC Domain
MDC_UNICA
 
PPT
Mp So C 18 Apr
FNian
 
PPT
chameleon chip
Sucharita Bohidar
 
PPTX
Trends and challenges in IP based SOC design
AishwaryaRavishankar8
 
PPT
An Introduction to Field Programmable Gate Arrays
KingshukDas35
 
PPT
CASFPGA1.ppt
AswiniSamantray2
 
PPTX
EMBEDDED SYSTEM BASICS
RANAALIMAJEEDRAJPUT
 
PPT
Asic
Kshitij Gajam
 
PDF
POWER9 for AI & HPC
inside-BigData.com
 
PPTX
Mirabilis Design | Chiplet Summit | 2024
Deepak Shankar
 
A Framework with Cloud Integration for CNN Acceleration on FPGA Devices
NECST Lab @ Politecnico di Milano
 
Implementation strategies for digital ics
aroosa khan
 
Summary Of Course Projects
awan2008
 
Dr.s.shiyamala fpga ppt
SHIYAMALASUBRAMANI1
 
Designing for High Performance Ceph at Scale
James Saint-Rossy
 
Processors selection
Pradeep Shankhwar
 
00 opencapi acceleration framework yonglu_ver2
Yutaka Kawai
 
Programmable Exascale Supercomputer
Sagar Dolas
 
Oow 2008 yahoo_pie-db
bohanchen
 
A Dataflow Processing Chip for Training Deep Neural Networks
inside-BigData.com
 
Reconfigurable Coprocessors Synthesis in the MPEG-RVC Domain
MDC_UNICA
 
Mp So C 18 Apr
FNian
 
chameleon chip
Sucharita Bohidar
 
Trends and challenges in IP based SOC design
AishwaryaRavishankar8
 
An Introduction to Field Programmable Gate Arrays
KingshukDas35
 
CASFPGA1.ppt
AswiniSamantray2
 
EMBEDDED SYSTEM BASICS
RANAALIMAJEEDRAJPUT
 
POWER9 for AI & HPC
inside-BigData.com
 
Mirabilis Design | Chiplet Summit | 2024
Deepak Shankar
 
Ad

More from NECST Lab @ Politecnico di Milano (20)

PDF
Mesticheria Team - WiiReflex
NECST Lab @ Politecnico di Milano
 
PPTX
Punto e virgola Team - Stressometro
NECST Lab @ Politecnico di Milano
 
PDF
BitIt Team - Stay.straight
NECST Lab @ Politecnico di Milano
 
PDF
BabYodini Team - Talking Gloves
NECST Lab @ Politecnico di Milano
 
PDF
printf("Nome Squadra"); Team - NeoTon
NECST Lab @ Politecnico di Milano
 
PPTX
BlackBoard Team - Motion Tracking Platform
NECST Lab @ Politecnico di Milano
 
PDF
#include<brain.h> Team - HomeBeatHome
NECST Lab @ Politecnico di Milano
 
PDF
Flipflops Team - Wave U
NECST Lab @ Politecnico di Milano
 
PDF
Bug(atta) Team - Little Brother
NECST Lab @ Politecnico di Milano
 
PDF
#NECSTCamp: come partecipare
NECST Lab @ Politecnico di Milano
 
PDF
NECSTLab101 2020.2021
NECST Lab @ Politecnico di Milano
 
PDF
TreeHouse, nourish your community
NECST Lab @ Politecnico di Milano
 
PDF
TiReX: Tiled Regular eXpressionsmatching architecture
NECST Lab @ Politecnico di Milano
 
PDF
Embedding based knowledge graph link prediction for drug repurposing
NECST Lab @ Politecnico di Milano
 
PDF
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
NECST Lab @ Politecnico di Milano
 
PDF
EMPhASIS - An EMbedded Public Attention Stress Identification System
NECST Lab @ Politecnico di Milano
 
PDF
Luns - Automatic lungs segmentation through neural network
NECST Lab @ Politecnico di Milano
 
PDF
BlastFunction: How to combine Serverless and FPGAs
NECST Lab @ Politecnico di Milano
 
PDF
Maeve - Fast genome analysis leveraging exact string matching
NECST Lab @ Politecnico di Milano
 
Mesticheria Team - WiiReflex
NECST Lab @ Politecnico di Milano
 
Punto e virgola Team - Stressometro
NECST Lab @ Politecnico di Milano
 
BitIt Team - Stay.straight
NECST Lab @ Politecnico di Milano
 
BabYodini Team - Talking Gloves
NECST Lab @ Politecnico di Milano
 
printf("Nome Squadra"); Team - NeoTon
NECST Lab @ Politecnico di Milano
 
BlackBoard Team - Motion Tracking Platform
NECST Lab @ Politecnico di Milano
 
#include<brain.h> Team - HomeBeatHome
NECST Lab @ Politecnico di Milano
 
Flipflops Team - Wave U
NECST Lab @ Politecnico di Milano
 
Bug(atta) Team - Little Brother
NECST Lab @ Politecnico di Milano
 
#NECSTCamp: come partecipare
NECST Lab @ Politecnico di Milano
 
NECSTLab101 2020.2021
NECST Lab @ Politecnico di Milano
 
TreeHouse, nourish your community
NECST Lab @ Politecnico di Milano
 
TiReX: Tiled Regular eXpressionsmatching architecture
NECST Lab @ Politecnico di Milano
 
Embedding based knowledge graph link prediction for drug repurposing
NECST Lab @ Politecnico di Milano
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
NECST Lab @ Politecnico di Milano
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
NECST Lab @ Politecnico di Milano
 
Luns - Automatic lungs segmentation through neural network
NECST Lab @ Politecnico di Milano
 
BlastFunction: How to combine Serverless and FPGAs
NECST Lab @ Politecnico di Milano
 
Maeve - Fast genome analysis leveraging exact string matching
NECST Lab @ Politecnico di Milano
 
Ad

Recently uploaded (20)

PDF
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
PPTX
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
PDF
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
PPTX
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
PDF
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
PPTX
Arduino Based Gas Leakage Detector Project
CircuitDigest
 
PPTX
Introduction to Design of Machine Elements
PradeepKumarS27
 
PPTX
fatigue in aircraft structures-221113192308-0ad6dc8c.pptx
aviatecofficial
 
PPTX
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
PPTX
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
PPTX
Green Building & Energy Conservation ppt
Sagar Sarangi
 
PPTX
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
PPTX
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
PDF
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
DOCX
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
PPTX
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
DOC
MRRS Strength and Durability of Concrete
CivilMythili
 
PDF
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
PDF
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
PDF
Electrical Engineer operation Supervisor
ssaruntatapower143
 
AI TECHNIQUES FOR IDENTIFYING ALTERATIONS IN THE HUMAN GUT MICROBIOME IN MULT...
vidyalalltv1
 
Server Side Web Development Unit 1 of Nodejs.pptx
sneha852132
 
International Journal of Information Technology Convergence and services (IJI...
ijitcsjournal4
 
美国电子版毕业证南卡罗莱纳大学上州分校水印成绩单USC学费发票定做学位证书编号怎么查
Taqyea
 
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
Arduino Based Gas Leakage Detector Project
CircuitDigest
 
Introduction to Design of Machine Elements
PradeepKumarS27
 
fatigue in aircraft structures-221113192308-0ad6dc8c.pptx
aviatecofficial
 
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
Element 11. ELECTRICITY safety and hazards
merrandomohandas
 
Green Building & Energy Conservation ppt
Sagar Sarangi
 
Solar Thermal Energy System Seminar.pptx
Gpc Purapuza
 
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
CS-802 (A) BDH Lab manual IPS Academy Indore
thegodhimself05
 
The Role of Information Technology in Environmental Protectio....pptx
nallamillisriram
 
MRRS Strength and Durability of Concrete
CivilMythili
 
Reasons for the succes of MENARD PRESSUREMETER.pdf
majdiamz
 
GTU Civil Engineering All Semester Syllabus.pdf
Vimal Bhojani
 
Electrical Engineer operation Supervisor
ssaruntatapower143
 

CONDOR: An automated framework to accelerate convolutional neural networks on FPGA