SlideShare a Scribd company logo
Implementing AI: Hardware Challenges
• Knowledge Transfer Network (KTN) is Innovate UK’s Network partner
• Innovate UK drive productivity and economic growth by supporting
businesses to develop and realise the potential of new ideas,
including those from the UK’s world-class research base.
• Connecting with Knowledge Transfer Network can “lead to potential
collaborations, horizon-expanding events, bespoke support and
innovation insights relevant to your needs.”
• Nigel Rix, Head of Enabling Technology: Nigel.rix@ktn-uk.org
eFutures aims to strengthen and support a network of people
working in electronic systems across the UK
• Building new links and increasing involvement with industry
• Mapping the national electronics research, to ensure the work across the UK is known and noted
• Encouraging and funding innovative multi-disciplinary/multi-university proposals
• Communicating with our network via a monthly magazine, social media and new website
• Running events that support our network and our strategy
• Piloting an academic Mentoring Scheme pilot
• Launching a Big Ideas Challenge – more details soon
• Ideas warmly welcomed. Please get involved!
Twitter @efuturesuk
Sign up to our mailing list: efutures@qub.ac.uk
Agenda
10:15 - Professor Themis Prodromakis, Director of the Centre for Electronics Frontiers at the University of
Southampton & founder of SoneT.ai
10:35 - Iain Wallace, Rovco
10:55 - Dr Jose Nunez Yanez, Reader in adaptive and energy efficient computing, University of Bristol
11.15 - Matt Holdsworth, Lattice SemiConductor
11:35 - 12:00 Panel Q&A, hosted by Professor Roger Woods, Queen’s University Belfast & CTO Analytics Engines
BREAK
14:00- 15:00 Workshop
15:30 - 16:30 1-to-1 Meetings via meeting Mojo
Memristive Technologies
from functional oxides to AI on a chip
Themis Prodromakis
Professor of Nanotechnology
Zepler Institute, University of Southampton
Zepler Institute for Photonics & Nanoelectronics
Outline
Modern electronics challenges & the AI era needs
Memristors:
• Technology
• Tools & Infrastructure
Application Examples – beyond memory
Conclusion
Our AI is as good as our access to data
ENGINEERING CHALLENGE: “The fundamental design of separate memory and
processing places a limit on what can be achieved.”
Can we continue scaling?
The end of Moore’s law???
Memristive Technologies
Chua’s symmetry argument STM image of HP’s memristor cross-bar
Cross-section of a memristor’s
core.
L. Chua, “Memristor-the missing circuit element,” IEEE Trans. Circuit Theory, vol. 18, 1971.
R. Williams, “How we found the missing memristor,” IEEE spectrum, vol. 45, 2008.
Memristor (Memory-resistor)
E-Beam lithography of Sub 15 nm ultrahigh density cross-bar memory chips
Memristors fabrication
Scientific Reports, 6, 32614, 2016.
12
Metal-oxide memristors memory capacity up to 7-bit states per cell
13
Scientific Reports, 7, 17532, 2017.
b)a)
Resistance(k)
30
50
70
90
0 5010 4020 30
Resistance(k)
80
Time (hrs)
2 3 4 50 1
60
40
30
70
50
Time (ms)
S1
S47
b) c)
Cumulativedistributionfunction(%)
Resistance (k
30
0
20
80
60
40
100
40 50 60
Resistance(k)
6
80
Time (hrs)
2 3 4 50 871
60
40
30 S1
S47
S5
S7
70
50
S1
Memristors as analogue memory
Application Demonstrators
Examples – beyond memory
Example #1
In-silico ML implementations
Scientific Reports, 6, 18639, 2016.
Eric Kandel
Nobel Prize
in Physiology 2000
Emulating synapses with memristors
17
Unsupervised learning in probabilistic memristor neural network
Switching vs.
resistive state
relation at fixed
voltage levels ->
Exploit to encode
conditional
probabilities
Desired switching level
Approx. operating V
Unsupervised Learning
Nature Communications, 7, 12611, 2016.
18
• Network shows capability of learning in unsupervised manner and handles mistakes rather well.
• Copes with cases where class centres drift over time.
Unsupervised learning in probabilistic memristor neural network
Unsupervised Learning
Nature Communications, 7, 12611, 2016.
19
Unsupervised learning in probabilistic memristor neural network
Unsupervised Learning
• Whilst ‘learn once’ systems have their uses, ideally one wants something more flexible
(e.g. if class centres drift over time).
Nature Communications, 7, 12611, 2016.
Example #2
Energy-efficient Bayesian Inference
21
Bayesian Inference
“Hardware-Level Bayesian Inference”, Neural Information Processing Systems (NIPS), 2017.
Computing directly in the probability domain
Vector-Matrix-Vector Scalar multiply
Example #3
Empowering new design paradigms
Our world is analogue!
Our electronics is mainly digital!
24
Fusing Analogue and Digital Paradigms
Charge-based computing
Nature Communications, 9, 2170, 2018.
25
Fusing Analogue and Digital Paradigms
Charge-based computing
Nature Communications, 9, 2170, 2018.
26
In silico classifiers
Charge-based computing
Nature Communications, 9, 2170, 2018.
27
In silico classifiers
Charge-based computing
Nature Communications, 9, 2170, 2018.
Example #4
Employ device physics for sensory data compression
On-node processing of rich data with single nanoscale devices
29
Memristive Sensors
Nature Communications, 7, 12805, 2016.
Memristive Sensors
Spike detection & sorting with single nanoscale devices
Nature Communications, vol. 7, 12805, 2016.
RSC Faraday Discussions, 213, 511-520, 2019.
31
Memristive Sensors
Spike sorting with single nanoscale devices
Nature Communications, vol. 7, 12805, 2016.
RSC Faraday Discussions, 213, 511-520, 2019.
Example #5
Bio-hybrid systems: Linking Brain and Silicon Neurons
“Memristive synapses connect brain and silicon
spiking neurons”, Sc. Reports, 10, 2590, 2020
A geographically distributed bio-hybrid neural network
Internet of Neuroelectronics
ANPREBNABm
What’s next?
Unique solutions that address technology gaps across
4 computational pillars
Thinking
AI on a chip
Our chipsets will equip AI systems with sensing, recognition, learning and
reasoning capabilities, paving the way towards “Thinking Machines”.
“AI on chips” will embed intelligence everywhere
How could the future look like?
A pathway to keep your data private!
Bioelectronic Medicines
Feynman: “What I cannot create, I do not understand”
Can we replace parts of our brain?
Can we extend our brain’s capacity?
Can we…???
Augmented Intelligence
What’s next?
Nature Communications, 9, 5267, 2018.
Challenges vs opportunities
180nm TSMC node:
- Custom design kit
- Primitive cells (symbol,
layout, extracted, Verilog-A)
- HV infrastructure
- Memory array design
Under development:
- Shared design library
(IP, analogue cells, etc)
- Scalable on-chip
instrumentation 40
Monolithic integration on CMOS
Top level reticle:
Overall size:
10.9mm x 13.8mm
t.prodromakis@soton.ac.uk
Acknowledgments
This work was supported by:
EU-FP7 RAMP, EP/K017829/1 and EP/R024642/1,
the Royal Academy of Engineering and the Royal Society.
Iain Wallace, Rovco
Heterogeneous and adaptive computing for
energy efficient AI
Jose Nunez-Yanez
University of Bristol/Royal Society industrial fellow
Talk structure
§ The energy and performance challenge in AI.
§ Addressing this challenge with custom
hardware.
§ Optimizing energy and performance with
adaptive voltage scaling and heterogenous
circuits.
§ Conclusions and future work.
AI is an energy guzzler
§ AI can be extremely power-hungry for both training and
inference:
§ Training is especially power intensive but you need to
do it a number of limited times and you can do it at
locations with no constraints in resources.
§ The complexity of inference is lower but needs to be
done continuously and potentially in constraint
environments (e.g. mobile computing, edge
computing etc)
AI hardware accelerators available
§ Hardware accelerators deliver high-throughput, energy-
efficiency and low-latency with power profiles ranging from
watts (e.g. Google TPU, Intel NCS, Intel/Xilinx FPGAs,
Graphcore ) to milliwatts with embedded processors based
on ARM/RISCV with parallel (e.g. RISCV GAP) or
subthreshold voltage computing (e.g. ETA/Ambiq)
§ Challenge: how to combine and deploy these different
architectures to obtain optimal operating points for energy
efficiency and performance.
Case study: FPGA + TPU
§ Hardware consists of a ZCU102 board with
2 Xilinx DPUs and 2 Google TPUs units.
§ The Xilinx DPU is a soft FPGA overlay that
adapts to the DNN complexity and FPGA
resources.
§ The Google EdgeTPU is an ASIC also
based on a systolic array architecture with
similar 8-bit precision.
§ We use a single framework for both type of
devices based on Tensorflow and train only
once. Then we can freeze the network and
customize it for TPU/DPU.
DPU architecture
TPU architecture
Host is a Zynq MPSOC Ultrascale device => (ARM + FPGA)
§ The Zynq processing
platform are a system
on a chip (SoC)
processor with
embedded
programmable logic :
processing system
(PS) + programmable
logic (PL).
§ Google TPU attached
to high-performance
USB3 interface. ZYNQ Ultrascale (High performance)
Object detection with SSD (Single Shot Detection)
§ Host ARM
schedules
detections in
TPUs and DPUs.
§ 1 DPU power
~5.2Watt and 1
TPU power
~1.2Watt.
§ 1 DPU obtains
up to 80FPS and
1 TPU 35 FPS
and 115 FPS
combined.
Power Subsystem in ZCU102 board enables
voltage scaling investigation
I2C
A series of PMBus commands are
required to set the output
voltage.
Open-Standard Digital
Power Management
Better FPGA energy efficiency with Adaptive Voltage Scaling
§ Elongate is a tool and IP
blocks to control the
frequency and voltage and
detect optimal operational
points using in-situ
detectors.
§ Elongate instruments the
FPGA design with in-situ
timing detectors
Elongate implementation flow
MAP
PLACE
&
ROUTE
BITGEN
.v
.vhd
netlist
.NCD
netlist
.BIT
bitstream
.TWR
Timing
Elongate
User
constraints
.v
.vhd
netlist
.VHD .V
source
NTC
component
library
SYN
HLS
High Level
Synthesis
OpenCL, C++
source
Example of timing detector for logic
§ Soft-macro detectors create
different paths for the slow
flip-flop (SFF) and the main
flip-flop (MFF).
§ Discrepancies between MFF
and SFF are detector in
XOR.
§ MFF replicates the
functionality of the original
flip-flop in the critical path.
Generate 0
Generate 1
XOR
SFF
MFF
Q
Output
Detector
Output
D
Input
0/1
0/1
Data Steering
MUXF8
MUXF5
Synchronizer FF
AI architecture with voltage and frequency scalability
ARM A53 MP
ELO CONTROL
FREQ/PHASE
BNN_ZU0
ELO_CLK
ELO_CLK_
PHASE
I2C
Voltage
regulators
AXI
slave (128b)
Reset elo
freq
Reset elo
phase
Peripheral and
PMBUS interfaces
AXI interconnect
BNN_ZU1 BNN_ZU2 BNN_ZU3Enable
Detector
error
LEDS (Locked,
error, debug)
Master
HPM1
(128b)
Slave
HP0
(128b)
Slave
HP1
(128b)
Slave
HP2
(128b)
Slave
HP3
(128b)
DMA0 DMA1 DMA2 DMA3
AXI
master (128b)
Power rail voltage regulators
AXI
master
(128b)
AXI
slave
(128b)
AXI
master
(128b)
AXI
slave
(128b)
AXI
master
(128b)
AXI
slave
(128b)
VCCINT
AXI interconnect
CCI (cache coherent interface)
Master
HPM0
(128b)
§ Only one
FPGA
core is
instrument
ed with
Elongate
detectors.
§ All cores
use the
same
voltage
and
frequency.
Elongate complexity overheads (LUTs and
FFs)
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
0 2 4 6 8 10 12 14
overhead%
path timing %
FF ZY LUTS ZY FF ZU LUTS ZU
Power scalability in Zynq ultrascale
§ Voltage levels
range from
0.55 v to 0.85
v for the 16nm
Zynq
Ultrascale
device.
§ Elastic
power/perform
ance with up to
85% power
reduction or 2x
performance.
Adaptive Voltage Scaling Applied to
convolutional neural network
380.95, 78.5
352.38, 78.4292.59, 78.4180.95, 78.3
0
10
20
30
40
50
60
70
80
90
170 190 210 230 250 270 290 310 330 350 370 390 410
Accuracy%
Frequency(MHz)
run_0.85v run_0.75v run_0.65v run_0.55v
Conclusions and Future work
§ FPGAs enable custom circuits neural network circuits with
many levels of precision from 1-bit to floating point.
§ FPGA hardware instrumentation enables significant better
energy efficiency and performance at run-time.
§ Heterogenous hardware can work together with different
network precisions and power operating points.
§ Explore novel ways to combine heterogenous hardware that
includes other architectures in addition to CPUs/TPUs and
FPGAs to deliver energy proportional AI.
§ More details:
§ J. Nunez-Yanez, "Energy Proportional Neural Network Inference
with Adaptive Voltage and Frequency Scaling," in IEEE
Transactions on Computers, vol. 68, no. 5, pp. 676-687, 1 May 2019.
Acknowledgement
• Thanks to the Royal Society with the
MINET industrial fellow award and Xilinx
for the hardware/software support.
Matt Holdsworth, Lattice
Semiconductors
LATTICE RISING
2020
Delivering Milliwatt AI to the Edge with
Ultra-Low Power FPGAs
Matt Holdsworth
FAE Lattice Semiconductors
matt.holdsworth@latticesemi.com
- NASDAQ: LSCC2
Rapidly Emerging Edge Computing Trend
Edge Networking Cloud
IoT Communication
Gateway
Wireless /
Wireline Access
Core Network
Driven by Latency, Privacy, and Bandwidth Limitations
Unit growth for edge devices with AI will explode increasing over 110% CAGR
over the next five years – Semico Research
- NASDAQ: LSCC3
HARDWARE PLATFORMS
IP CORES
SOFTWARE TOOLS
REFERENCE DESIGNS / DEMOS
CNN Compact Accelerator CNN Accelerator
UPduino + Himax Shield
– iCE40 UltraPlus FPGA
Embedded Vision Development
Kit
– ECP5 FPGA
1 mW, 5.5 mm2, 1/8/16 bits 1 W, 100 mm2, 1/8/16 bits
CUSTOM DESIGN SERVICES
Smart CarSmart Home Smart City Smart Factory
Neural Network Compiler
Ultra Low Power
Small Form Factor
Customizable
Neural Network Accelerators
Key Phrase
Detection
Object
Counting
Object
Identification
Human Presence
Detection
Face
Tracking
Hand Gesture
Detection
- NASDAQ: LSCC4
Focus Applications
Focus Applications
Object Detection Human Machine Interface (HMI) Object Identification
Defect detection in smart
security and embedded
vision cameras
Feature extraction
enabling navigation of
robots
Key Phrase
detection to control
smart appliances
- NASDAQ: LSCC5
Reference Design / Demo – Human Presence Detection
FEATURES
Sensor CMOS image sensor
Speed 5 frames per second
Power 7 mW on iCE40 UltraPlus
ALWAYS ON HUMAN DETECTION IN APPLIANCE
LOW POWER HUMAN DETECTION FOR WAKE ON APPROACH FOR
LAPTOPS AND PRINTERS
- NASDAQ: LSCC6
Reference Design / Demo Object Counting
FEATURES
Sensor CMOS image sensor
Speed
17 frames per second - Lower
Latency
Power 850 mW on ECP5-85K
HUMAN DETECTION IN VIDEO SECURITY DEVICES
HUMAN COUNTING IN RETAIL CAMERA
APPLICATIONS
DEFECT DETECTION AND OPERATOR COMPLIANCE IN
SMART FACTORY CAMERAS
Defect Detected
Type: Crack
- NASDAQ: LSCC7
Popular sensAI Accelerator Use Cases
Post Processing Preprocessing
PreprocessingStand-alone
- NASDAQ: LSCC8
Hardware Platforms
Modular Platforms for Rapid Prototyping
Key features
▪ Video and Audio sensors
▪ Compact 22 x 50 mm
▪ Includes HM01B0 image sensor board
▪ Arduino Micro form factor UltraPlus board
HM01B0 UPduino Shield Board
Key features
▪ Video and Audio sensors
▪ Compact 22 x 50 mm
▪ Includes HM01B0 image sensor board
▪ Arduino Micro form factor UltraPlus board
Key features
▪ Video and Audio sensors
▪ Compact 22 x 50 mm
▪ Includes HM01B0 image sensor board
▪ Arduino Micro form factor UltraPlus board
Embedded Vision Development Kit
Key features
▪ ECP5 FPGA consuming under 1 W of power
consumption
▪ Flexible video connectivity with support for MIPI
CSI-2, eDP, HDMI, GigE Vision, USB 3.0, and more
- NASDAQ: LSCC9
Software Tools
Neural Network Compiler
▪ Implement networks developed using
standard frameworks into Lattice FPGAs
without prior RTL experience
▪ Rapidly analyze, simulate, and compile
CNNs/BNNs for implementation on Lattice
sensAI IP cores
Key Features
- NASDAQ: LSCC10
Customizable Reference Designs
Trained Model Quantized Weights and Instructions
FPGA Bitstream
Training
FPGA Design
NN Models
NN IP
System
Interface
Training
Dataset
Training
Scripts
NN Compiler
Lattice sensAI Components Lattice FPGA Design Tools ML Frameworks
- NASDAQ: LSCC11
PERFORMANCE
POWER
1 fps
5 fps
MCU
2W
400mW
SoC
5x
FASTER
5x
LOWER
Sensors
MCU
Results
Lattice CrossLink-NX
SRAM
(weights /
activations)
Sensor
Interface
Neural Network Accelerators
ALWAYS-ON HUMAN COUNTING
Higher Performance and Lower Power with CrossLink-NX
ECP5-45K NX-40K
10 fps
2x
FASTER
ECP5-45K
200mW
NX-40K2x
LOWER
- NASDAQ: LSCC12
HIGHER
ACCURACY
REFERENCE
DESIGNS
HIGHER
SPEED
LOWER
POWER
Summary of Latest sensAI Updates
CrossLink-NX, the
dedicated embedded vision
and AI inference FPGA,
provides the highest
accuracy at the lowest
power
ECP5 FPGA extends
support to MobileNet and
Resnet for higher speed
processing at high
accuracy
iCE40 UltraPlus, the ultra-
low power edge AI
accelerator now delivers
higher accuracy at the
lowest power
New and updated demos
and end-to-end reference
designs
Key Phrase Detection
Human Identification
Human Presence Detection
Object Counting with MobileNet
- NASDAQ: LSCC13
Reference Design
▪Where to find sensAI page
• Applications -> AI/Machine Learning
- NASDAQ: LSCC14
Where to find Demos and Reference Designs
▪ Demos:
• Provided as bitstream and
Quickstart Guide
• Allows easy demonstration of
functionality.
▪ RDs:
• Complete solution: RTL Code,
Training Scripts, Dataset,
Complete User Guide
• Allows user to reproduce
solution and reuse in own
design framework.
- NASDAQ: LSCC15
HARDWARE PLATFORMS
IP CORES
SOFTWARE TOOLS
REFERENCE DESIGNS / DEMOS
CNN Compact Accelerator CNN Accelerator
UPduino + Himax Shield
– iCE40 UltraPlus FPGA
Embedded Vision Development
Kit
– ECP5 FPGA
1 mW, 5.5 mm2, 1/8/16 bits 1 W, 100 mm2, 1/8/16 bits
CUSTOM DESIGN SERVICES
Smart CarSmart Home Smart City Smart Factory
Neural Network Compiler
Ultra Low Power
Small Form Factor
Customizable
Neural Network Accelerators
Key Phrase
Detection
Object
Counting
Object
Identification
Human Presence
Detection
Face
Tracking
Hand Gesture
Detection

More Related Content

What's hot (10)

PDF
Deep learning for medical imaging
geetachauhan
 
PPT
2017 07 03_meetup_d
Dana Brophy
 
PDF
Emc 2013 Big Data in Astronomy
Fabio Porto
 
PDF
Coupling Australia’s Researchers to the Global Innovation Economy
Larry Smarr
 
PDF
PointNet
PetteriTeikariPhD
 
PDF
Computational decision making
Boris Adryan
 
PDF
Data-intensive profile for the VAMDC
AstroAtom
 
PPTX
Ariificial brain
Vamshikrishna Goud
 
PDF
High Performance Reconfigurable Computing at NECSTLab
NECST Lab @ Politecnico di Milano
 
PPT
CI image processing mns
Meenakshi Sood
 
Deep learning for medical imaging
geetachauhan
 
2017 07 03_meetup_d
Dana Brophy
 
Emc 2013 Big Data in Astronomy
Fabio Porto
 
Coupling Australia’s Researchers to the Global Innovation Economy
Larry Smarr
 
Computational decision making
Boris Adryan
 
Data-intensive profile for the VAMDC
AstroAtom
 
Ariificial brain
Vamshikrishna Goud
 
High Performance Reconfigurable Computing at NECSTLab
NECST Lab @ Politecnico di Milano
 
CI image processing mns
Meenakshi Sood
 

Similar to Implementing AI: Hardware Challenges (20)

PDF
How HPC and large-scale data analytics are transforming experimental science
inside-BigData.com
 
PDF
Future of hpc
Putchong Uthayopas
 
PDF
Implementing AI: Hardware Challenges: Memristive Technologies: from Functiona...
KTN
 
PPTX
KIIT.pptx
coebgpi
 
PDF
Implementing AI: Running AI at the Edge
KTN
 
PPTX
Stories About Spark, HPC and Barcelona by Jordi Torres
Spark Summit
 
PDF
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
inside-BigData.com
 
PDF
Vertex Perspectives | AI Optimized Chipsets | Part IV
Vertex Holdings
 
PDF
10 Abundant-Data Computing
RCCSRENKEI
 
PDF
Expectations for optical network from the viewpoint of system software research
Ryousei Takano
 
PDF
Weebit nano presentation at Leti Memory Workshop
Amir Regev
 
PDF
Neurosynaptic chips
Jeffrey Funk
 
PDF
The Open Science Data Cloud: Empowering the Long Tail of Science
Robert Grossman
 
PPT
grid computing
elliando dias
 
PPTX
Alternative Computing
Shayshab Azad
 
PDF
Novi sad ai event 1-2018
Jovan Stojanovic
 
PDF
Modern Computing: Cloud, Distributed, & High Performance
inside-BigData.com
 
PDF
Edge AI Miramond technical seminCERN.pdf
yagab5011
 
PDF
High–Performance Computing
BRAC University Computer Club
 
PDF
100G network research at UCL
Jisc
 
How HPC and large-scale data analytics are transforming experimental science
inside-BigData.com
 
Future of hpc
Putchong Uthayopas
 
Implementing AI: Hardware Challenges: Memristive Technologies: from Functiona...
KTN
 
KIIT.pptx
coebgpi
 
Implementing AI: Running AI at the Edge
KTN
 
Stories About Spark, HPC and Barcelona by Jordi Torres
Spark Summit
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
inside-BigData.com
 
Vertex Perspectives | AI Optimized Chipsets | Part IV
Vertex Holdings
 
10 Abundant-Data Computing
RCCSRENKEI
 
Expectations for optical network from the viewpoint of system software research
Ryousei Takano
 
Weebit nano presentation at Leti Memory Workshop
Amir Regev
 
Neurosynaptic chips
Jeffrey Funk
 
The Open Science Data Cloud: Empowering the Long Tail of Science
Robert Grossman
 
grid computing
elliando dias
 
Alternative Computing
Shayshab Azad
 
Novi sad ai event 1-2018
Jovan Stojanovic
 
Modern Computing: Cloud, Distributed, & High Performance
inside-BigData.com
 
Edge AI Miramond technical seminCERN.pdf
yagab5011
 
High–Performance Computing
BRAC University Computer Club
 
100G network research at UCL
Jisc
 
Ad

More from KTN (20)

PDF
Competition Briefing - Open Digital Solutions for Net Zero Energy
KTN
 
PDF
An Introduction to Eurostars - an Opportunity for SMEs to Collaborate Interna...
KTN
 
PDF
Prospering from the Energy Revolution: Six in Sixty - Technology and Infrastr...
KTN
 
PPTX
UK Catalysis: Innovation opportunities for an enabling technology
KTN
 
PPTX
Industrial Energy Transformational Fund Phase 2 Spring 2022 - Competition Bri...
KTN
 
PDF
Horizon Europe ‘Culture, Creativity and Inclusive Society’ Consortia Building...
KTN
 
PDF
Horizon Europe ‘Culture, Creativity and Inclusive Society’ Consortia Building...
KTN
 
PPTX
Smart Networks and Services Joint Undertaking (SNS JU) Call Topics
KTN
 
PDF
Building Talent for the Future 2 – Expression of Interest Briefing
KTN
 
PDF
Connected and Autonomous Vehicles Cohort Workshop
KTN
 
PDF
Biodiversity and Food Production: The Future of the British Landscape
KTN
 
PDF
Engage with...Performance Projects
KTN
 
PDF
How to Create a Good Horizon Europe Proposal Webinar
KTN
 
PDF
Horizon Europe Tackling Diseases and Antimicrobial Resistance (AMR) Webinar a...
KTN
 
PDF
Engage with...Custom Interconnect
KTN
 
PDF
Engage with...ZF
KTN
 
PDF
Engage with...FluxSys
KTN
 
PDF
Made Smarter Innovation: Sustainable Smart Factory Competition Briefing
KTN
 
PDF
Driving the Electric Revolution – PEMD Skills Hub
KTN
 
PDF
Medicines Manufacturing Challenge EDI Survey Briefing Webinar
KTN
 
Competition Briefing - Open Digital Solutions for Net Zero Energy
KTN
 
An Introduction to Eurostars - an Opportunity for SMEs to Collaborate Interna...
KTN
 
Prospering from the Energy Revolution: Six in Sixty - Technology and Infrastr...
KTN
 
UK Catalysis: Innovation opportunities for an enabling technology
KTN
 
Industrial Energy Transformational Fund Phase 2 Spring 2022 - Competition Bri...
KTN
 
Horizon Europe ‘Culture, Creativity and Inclusive Society’ Consortia Building...
KTN
 
Horizon Europe ‘Culture, Creativity and Inclusive Society’ Consortia Building...
KTN
 
Smart Networks and Services Joint Undertaking (SNS JU) Call Topics
KTN
 
Building Talent for the Future 2 – Expression of Interest Briefing
KTN
 
Connected and Autonomous Vehicles Cohort Workshop
KTN
 
Biodiversity and Food Production: The Future of the British Landscape
KTN
 
Engage with...Performance Projects
KTN
 
How to Create a Good Horizon Europe Proposal Webinar
KTN
 
Horizon Europe Tackling Diseases and Antimicrobial Resistance (AMR) Webinar a...
KTN
 
Engage with...Custom Interconnect
KTN
 
Engage with...ZF
KTN
 
Engage with...FluxSys
KTN
 
Made Smarter Innovation: Sustainable Smart Factory Competition Briefing
KTN
 
Driving the Electric Revolution – PEMD Skills Hub
KTN
 
Medicines Manufacturing Challenge EDI Survey Briefing Webinar
KTN
 
Ad

Recently uploaded (20)

PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
PPTX
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
PDF
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PPTX
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
PDF
Français Patch Tuesday - Juillet
Ivanti
 
PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Rethinking Security Operations - SOC Evolution Journey.pdf
Haris Chughtai
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Building Resilience with Digital Twins : Lessons from Korea
SANGHEE SHIN
 
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
Empowering Cloud Providers with Apache CloudStack and Stackbill
ShapeBlue
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
Building and Operating a Private Cloud with CloudStack and LINBIT CloudStack ...
ShapeBlue
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
Français Patch Tuesday - Juillet
Ivanti
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Rethinking Security Operations - SOC Evolution Journey.pdf
Haris Chughtai
 

Implementing AI: Hardware Challenges

  • 2. • Knowledge Transfer Network (KTN) is Innovate UK’s Network partner • Innovate UK drive productivity and economic growth by supporting businesses to develop and realise the potential of new ideas, including those from the UK’s world-class research base. • Connecting with Knowledge Transfer Network can “lead to potential collaborations, horizon-expanding events, bespoke support and innovation insights relevant to your needs.” • Nigel Rix, Head of Enabling Technology: [email protected]
  • 3. eFutures aims to strengthen and support a network of people working in electronic systems across the UK • Building new links and increasing involvement with industry • Mapping the national electronics research, to ensure the work across the UK is known and noted • Encouraging and funding innovative multi-disciplinary/multi-university proposals • Communicating with our network via a monthly magazine, social media and new website • Running events that support our network and our strategy • Piloting an academic Mentoring Scheme pilot • Launching a Big Ideas Challenge – more details soon • Ideas warmly welcomed. Please get involved! Twitter @efuturesuk Sign up to our mailing list: [email protected]
  • 4. Agenda 10:15 - Professor Themis Prodromakis, Director of the Centre for Electronics Frontiers at the University of Southampton & founder of SoneT.ai 10:35 - Iain Wallace, Rovco 10:55 - Dr Jose Nunez Yanez, Reader in adaptive and energy efficient computing, University of Bristol 11.15 - Matt Holdsworth, Lattice SemiConductor 11:35 - 12:00 Panel Q&A, hosted by Professor Roger Woods, Queen’s University Belfast & CTO Analytics Engines BREAK 14:00- 15:00 Workshop 15:30 - 16:30 1-to-1 Meetings via meeting Mojo
  • 5. Memristive Technologies from functional oxides to AI on a chip Themis Prodromakis Professor of Nanotechnology Zepler Institute, University of Southampton
  • 6. Zepler Institute for Photonics & Nanoelectronics
  • 7. Outline Modern electronics challenges & the AI era needs Memristors: • Technology • Tools & Infrastructure Application Examples – beyond memory Conclusion
  • 8. Our AI is as good as our access to data ENGINEERING CHALLENGE: “The fundamental design of separate memory and processing places a limit on what can be achieved.”
  • 9. Can we continue scaling? The end of Moore’s law???
  • 11. Chua’s symmetry argument STM image of HP’s memristor cross-bar Cross-section of a memristor’s core. L. Chua, “Memristor-the missing circuit element,” IEEE Trans. Circuit Theory, vol. 18, 1971. R. Williams, “How we found the missing memristor,” IEEE spectrum, vol. 45, 2008. Memristor (Memory-resistor)
  • 12. E-Beam lithography of Sub 15 nm ultrahigh density cross-bar memory chips Memristors fabrication Scientific Reports, 6, 32614, 2016. 12
  • 13. Metal-oxide memristors memory capacity up to 7-bit states per cell 13 Scientific Reports, 7, 17532, 2017. b)a) Resistance(k) 30 50 70 90 0 5010 4020 30 Resistance(k) 80 Time (hrs) 2 3 4 50 1 60 40 30 70 50 Time (ms) S1 S47 b) c) Cumulativedistributionfunction(%) Resistance (k 30 0 20 80 60 40 100 40 50 60 Resistance(k) 6 80 Time (hrs) 2 3 4 50 871 60 40 30 S1 S47 S5 S7 70 50 S1 Memristors as analogue memory
  • 15. Example #1 In-silico ML implementations
  • 16. Scientific Reports, 6, 18639, 2016. Eric Kandel Nobel Prize in Physiology 2000 Emulating synapses with memristors
  • 17. 17 Unsupervised learning in probabilistic memristor neural network Switching vs. resistive state relation at fixed voltage levels -> Exploit to encode conditional probabilities Desired switching level Approx. operating V Unsupervised Learning Nature Communications, 7, 12611, 2016.
  • 18. 18 • Network shows capability of learning in unsupervised manner and handles mistakes rather well. • Copes with cases where class centres drift over time. Unsupervised learning in probabilistic memristor neural network Unsupervised Learning Nature Communications, 7, 12611, 2016.
  • 19. 19 Unsupervised learning in probabilistic memristor neural network Unsupervised Learning • Whilst ‘learn once’ systems have their uses, ideally one wants something more flexible (e.g. if class centres drift over time). Nature Communications, 7, 12611, 2016.
  • 21. 21 Bayesian Inference “Hardware-Level Bayesian Inference”, Neural Information Processing Systems (NIPS), 2017. Computing directly in the probability domain Vector-Matrix-Vector Scalar multiply
  • 22. Example #3 Empowering new design paradigms
  • 23. Our world is analogue! Our electronics is mainly digital!
  • 24. 24 Fusing Analogue and Digital Paradigms Charge-based computing Nature Communications, 9, 2170, 2018.
  • 25. 25 Fusing Analogue and Digital Paradigms Charge-based computing Nature Communications, 9, 2170, 2018.
  • 26. 26 In silico classifiers Charge-based computing Nature Communications, 9, 2170, 2018.
  • 27. 27 In silico classifiers Charge-based computing Nature Communications, 9, 2170, 2018.
  • 28. Example #4 Employ device physics for sensory data compression
  • 29. On-node processing of rich data with single nanoscale devices 29 Memristive Sensors Nature Communications, 7, 12805, 2016.
  • 30. Memristive Sensors Spike detection & sorting with single nanoscale devices Nature Communications, vol. 7, 12805, 2016. RSC Faraday Discussions, 213, 511-520, 2019.
  • 31. 31 Memristive Sensors Spike sorting with single nanoscale devices Nature Communications, vol. 7, 12805, 2016. RSC Faraday Discussions, 213, 511-520, 2019.
  • 32. Example #5 Bio-hybrid systems: Linking Brain and Silicon Neurons
  • 33. “Memristive synapses connect brain and silicon spiking neurons”, Sc. Reports, 10, 2590, 2020 A geographically distributed bio-hybrid neural network Internet of Neuroelectronics ANPREBNABm
  • 35. Unique solutions that address technology gaps across 4 computational pillars Thinking AI on a chip Our chipsets will equip AI systems with sensing, recognition, learning and reasoning capabilities, paving the way towards “Thinking Machines”. “AI on chips” will embed intelligence everywhere
  • 36. How could the future look like?
  • 37. A pathway to keep your data private!
  • 38. Bioelectronic Medicines Feynman: “What I cannot create, I do not understand” Can we replace parts of our brain? Can we extend our brain’s capacity? Can we…??? Augmented Intelligence
  • 39. What’s next? Nature Communications, 9, 5267, 2018. Challenges vs opportunities
  • 40. 180nm TSMC node: - Custom design kit - Primitive cells (symbol, layout, extracted, Verilog-A) - HV infrastructure - Memory array design Under development: - Shared design library (IP, analogue cells, etc) - Scalable on-chip instrumentation 40 Monolithic integration on CMOS Top level reticle: Overall size: 10.9mm x 13.8mm
  • 41. [email protected] Acknowledgments This work was supported by: EU-FP7 RAMP, EP/K017829/1 and EP/R024642/1, the Royal Academy of Engineering and the Royal Society.
  • 43. Heterogeneous and adaptive computing for energy efficient AI Jose Nunez-Yanez University of Bristol/Royal Society industrial fellow
  • 44. Talk structure § The energy and performance challenge in AI. § Addressing this challenge with custom hardware. § Optimizing energy and performance with adaptive voltage scaling and heterogenous circuits. § Conclusions and future work.
  • 45. AI is an energy guzzler § AI can be extremely power-hungry for both training and inference: § Training is especially power intensive but you need to do it a number of limited times and you can do it at locations with no constraints in resources. § The complexity of inference is lower but needs to be done continuously and potentially in constraint environments (e.g. mobile computing, edge computing etc)
  • 46. AI hardware accelerators available § Hardware accelerators deliver high-throughput, energy- efficiency and low-latency with power profiles ranging from watts (e.g. Google TPU, Intel NCS, Intel/Xilinx FPGAs, Graphcore ) to milliwatts with embedded processors based on ARM/RISCV with parallel (e.g. RISCV GAP) or subthreshold voltage computing (e.g. ETA/Ambiq) § Challenge: how to combine and deploy these different architectures to obtain optimal operating points for energy efficiency and performance.
  • 47. Case study: FPGA + TPU § Hardware consists of a ZCU102 board with 2 Xilinx DPUs and 2 Google TPUs units. § The Xilinx DPU is a soft FPGA overlay that adapts to the DNN complexity and FPGA resources. § The Google EdgeTPU is an ASIC also based on a systolic array architecture with similar 8-bit precision. § We use a single framework for both type of devices based on Tensorflow and train only once. Then we can freeze the network and customize it for TPU/DPU. DPU architecture TPU architecture
  • 48. Host is a Zynq MPSOC Ultrascale device => (ARM + FPGA) § The Zynq processing platform are a system on a chip (SoC) processor with embedded programmable logic : processing system (PS) + programmable logic (PL). § Google TPU attached to high-performance USB3 interface. ZYNQ Ultrascale (High performance)
  • 49. Object detection with SSD (Single Shot Detection) § Host ARM schedules detections in TPUs and DPUs. § 1 DPU power ~5.2Watt and 1 TPU power ~1.2Watt. § 1 DPU obtains up to 80FPS and 1 TPU 35 FPS and 115 FPS combined.
  • 50. Power Subsystem in ZCU102 board enables voltage scaling investigation I2C A series of PMBus commands are required to set the output voltage. Open-Standard Digital Power Management
  • 51. Better FPGA energy efficiency with Adaptive Voltage Scaling § Elongate is a tool and IP blocks to control the frequency and voltage and detect optimal operational points using in-situ detectors. § Elongate instruments the FPGA design with in-situ timing detectors Elongate implementation flow MAP PLACE & ROUTE BITGEN .v .vhd netlist .NCD netlist .BIT bitstream .TWR Timing Elongate User constraints .v .vhd netlist .VHD .V source NTC component library SYN HLS High Level Synthesis OpenCL, C++ source
  • 52. Example of timing detector for logic § Soft-macro detectors create different paths for the slow flip-flop (SFF) and the main flip-flop (MFF). § Discrepancies between MFF and SFF are detector in XOR. § MFF replicates the functionality of the original flip-flop in the critical path. Generate 0 Generate 1 XOR SFF MFF Q Output Detector Output D Input 0/1 0/1 Data Steering MUXF8 MUXF5 Synchronizer FF
  • 53. AI architecture with voltage and frequency scalability ARM A53 MP ELO CONTROL FREQ/PHASE BNN_ZU0 ELO_CLK ELO_CLK_ PHASE I2C Voltage regulators AXI slave (128b) Reset elo freq Reset elo phase Peripheral and PMBUS interfaces AXI interconnect BNN_ZU1 BNN_ZU2 BNN_ZU3Enable Detector error LEDS (Locked, error, debug) Master HPM1 (128b) Slave HP0 (128b) Slave HP1 (128b) Slave HP2 (128b) Slave HP3 (128b) DMA0 DMA1 DMA2 DMA3 AXI master (128b) Power rail voltage regulators AXI master (128b) AXI slave (128b) AXI master (128b) AXI slave (128b) AXI master (128b) AXI slave (128b) VCCINT AXI interconnect CCI (cache coherent interface) Master HPM0 (128b) § Only one FPGA core is instrument ed with Elongate detectors. § All cores use the same voltage and frequency.
  • 54. Elongate complexity overheads (LUTs and FFs) 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 2 4 6 8 10 12 14 overhead% path timing % FF ZY LUTS ZY FF ZU LUTS ZU
  • 55. Power scalability in Zynq ultrascale § Voltage levels range from 0.55 v to 0.85 v for the 16nm Zynq Ultrascale device. § Elastic power/perform ance with up to 85% power reduction or 2x performance.
  • 56. Adaptive Voltage Scaling Applied to convolutional neural network 380.95, 78.5 352.38, 78.4292.59, 78.4180.95, 78.3 0 10 20 30 40 50 60 70 80 90 170 190 210 230 250 270 290 310 330 350 370 390 410 Accuracy% Frequency(MHz) run_0.85v run_0.75v run_0.65v run_0.55v
  • 57. Conclusions and Future work § FPGAs enable custom circuits neural network circuits with many levels of precision from 1-bit to floating point. § FPGA hardware instrumentation enables significant better energy efficiency and performance at run-time. § Heterogenous hardware can work together with different network precisions and power operating points. § Explore novel ways to combine heterogenous hardware that includes other architectures in addition to CPUs/TPUs and FPGAs to deliver energy proportional AI. § More details: § J. Nunez-Yanez, "Energy Proportional Neural Network Inference with Adaptive Voltage and Frequency Scaling," in IEEE Transactions on Computers, vol. 68, no. 5, pp. 676-687, 1 May 2019.
  • 58. Acknowledgement • Thanks to the Royal Society with the MINET industrial fellow award and Xilinx for the hardware/software support.
  • 60. LATTICE RISING 2020 Delivering Milliwatt AI to the Edge with Ultra-Low Power FPGAs Matt Holdsworth FAE Lattice Semiconductors [email protected]
  • 61. - NASDAQ: LSCC2 Rapidly Emerging Edge Computing Trend Edge Networking Cloud IoT Communication Gateway Wireless / Wireline Access Core Network Driven by Latency, Privacy, and Bandwidth Limitations Unit growth for edge devices with AI will explode increasing over 110% CAGR over the next five years – Semico Research
  • 62. - NASDAQ: LSCC3 HARDWARE PLATFORMS IP CORES SOFTWARE TOOLS REFERENCE DESIGNS / DEMOS CNN Compact Accelerator CNN Accelerator UPduino + Himax Shield – iCE40 UltraPlus FPGA Embedded Vision Development Kit – ECP5 FPGA 1 mW, 5.5 mm2, 1/8/16 bits 1 W, 100 mm2, 1/8/16 bits CUSTOM DESIGN SERVICES Smart CarSmart Home Smart City Smart Factory Neural Network Compiler Ultra Low Power Small Form Factor Customizable Neural Network Accelerators Key Phrase Detection Object Counting Object Identification Human Presence Detection Face Tracking Hand Gesture Detection
  • 63. - NASDAQ: LSCC4 Focus Applications Focus Applications Object Detection Human Machine Interface (HMI) Object Identification Defect detection in smart security and embedded vision cameras Feature extraction enabling navigation of robots Key Phrase detection to control smart appliances
  • 64. - NASDAQ: LSCC5 Reference Design / Demo – Human Presence Detection FEATURES Sensor CMOS image sensor Speed 5 frames per second Power 7 mW on iCE40 UltraPlus ALWAYS ON HUMAN DETECTION IN APPLIANCE LOW POWER HUMAN DETECTION FOR WAKE ON APPROACH FOR LAPTOPS AND PRINTERS
  • 65. - NASDAQ: LSCC6 Reference Design / Demo Object Counting FEATURES Sensor CMOS image sensor Speed 17 frames per second - Lower Latency Power 850 mW on ECP5-85K HUMAN DETECTION IN VIDEO SECURITY DEVICES HUMAN COUNTING IN RETAIL CAMERA APPLICATIONS DEFECT DETECTION AND OPERATOR COMPLIANCE IN SMART FACTORY CAMERAS Defect Detected Type: Crack
  • 66. - NASDAQ: LSCC7 Popular sensAI Accelerator Use Cases Post Processing Preprocessing PreprocessingStand-alone
  • 67. - NASDAQ: LSCC8 Hardware Platforms Modular Platforms for Rapid Prototyping Key features ▪ Video and Audio sensors ▪ Compact 22 x 50 mm ▪ Includes HM01B0 image sensor board ▪ Arduino Micro form factor UltraPlus board HM01B0 UPduino Shield Board Key features ▪ Video and Audio sensors ▪ Compact 22 x 50 mm ▪ Includes HM01B0 image sensor board ▪ Arduino Micro form factor UltraPlus board Key features ▪ Video and Audio sensors ▪ Compact 22 x 50 mm ▪ Includes HM01B0 image sensor board ▪ Arduino Micro form factor UltraPlus board Embedded Vision Development Kit Key features ▪ ECP5 FPGA consuming under 1 W of power consumption ▪ Flexible video connectivity with support for MIPI CSI-2, eDP, HDMI, GigE Vision, USB 3.0, and more
  • 68. - NASDAQ: LSCC9 Software Tools Neural Network Compiler ▪ Implement networks developed using standard frameworks into Lattice FPGAs without prior RTL experience ▪ Rapidly analyze, simulate, and compile CNNs/BNNs for implementation on Lattice sensAI IP cores Key Features
  • 69. - NASDAQ: LSCC10 Customizable Reference Designs Trained Model Quantized Weights and Instructions FPGA Bitstream Training FPGA Design NN Models NN IP System Interface Training Dataset Training Scripts NN Compiler Lattice sensAI Components Lattice FPGA Design Tools ML Frameworks
  • 70. - NASDAQ: LSCC11 PERFORMANCE POWER 1 fps 5 fps MCU 2W 400mW SoC 5x FASTER 5x LOWER Sensors MCU Results Lattice CrossLink-NX SRAM (weights / activations) Sensor Interface Neural Network Accelerators ALWAYS-ON HUMAN COUNTING Higher Performance and Lower Power with CrossLink-NX ECP5-45K NX-40K 10 fps 2x FASTER ECP5-45K 200mW NX-40K2x LOWER
  • 71. - NASDAQ: LSCC12 HIGHER ACCURACY REFERENCE DESIGNS HIGHER SPEED LOWER POWER Summary of Latest sensAI Updates CrossLink-NX, the dedicated embedded vision and AI inference FPGA, provides the highest accuracy at the lowest power ECP5 FPGA extends support to MobileNet and Resnet for higher speed processing at high accuracy iCE40 UltraPlus, the ultra- low power edge AI accelerator now delivers higher accuracy at the lowest power New and updated demos and end-to-end reference designs Key Phrase Detection Human Identification Human Presence Detection Object Counting with MobileNet
  • 72. - NASDAQ: LSCC13 Reference Design ▪Where to find sensAI page • Applications -> AI/Machine Learning
  • 73. - NASDAQ: LSCC14 Where to find Demos and Reference Designs ▪ Demos: • Provided as bitstream and Quickstart Guide • Allows easy demonstration of functionality. ▪ RDs: • Complete solution: RTL Code, Training Scripts, Dataset, Complete User Guide • Allows user to reproduce solution and reuse in own design framework.
  • 74. - NASDAQ: LSCC15 HARDWARE PLATFORMS IP CORES SOFTWARE TOOLS REFERENCE DESIGNS / DEMOS CNN Compact Accelerator CNN Accelerator UPduino + Himax Shield – iCE40 UltraPlus FPGA Embedded Vision Development Kit – ECP5 FPGA 1 mW, 5.5 mm2, 1/8/16 bits 1 W, 100 mm2, 1/8/16 bits CUSTOM DESIGN SERVICES Smart CarSmart Home Smart City Smart Factory Neural Network Compiler Ultra Low Power Small Form Factor Customizable Neural Network Accelerators Key Phrase Detection Object Counting Object Identification Human Presence Detection Face Tracking Hand Gesture Detection