SlideShare a Scribd company logo
CS295: Modern Systems
What Are FPGAs
and Why Should You Care
Sang-Woo Jun
Spring, 2019
What Are FPGAs
 Field-Programmable Gate Array
 Can be configured to act like any circuit – More later!
 Can do many things, but we focus on computation acceleration
FPGAs Come In Many Forms
PCIe-Attached
CPU Integrated In-Network
In-Storage
How Is It Different From CPU/GPUs
 GPU – The other major accelerator
 CPU/GPU hardware is fixed
o “General purpose”
o we write programs (sequence of instructions) for them
 FPGA hardware is not fixed
o “Special purpose”
o Hardware can be whatever we want
o Will our hardware require/support software? Maybe!
 Optimized hardware is very efficient
o GPU-level performance**
o 10x power efficiency (300 W vs 30 W)
Analogy
“The Z-Berry”
“Experimental Investigations on Radiation Characteristics of IC Chips”
benryves.com “Z80 Computer”
CPU/GPU comes with fixed circuits FPGA gives you a big bag of components
To build whatever
Shadi Soundation: Homebrew 4 bit CPU
Could be a CPU/GPU!
Fine-Grained Parallelism of
Special-Purpose Circuits
 Example -- Calculating gravitational force:
𝐺×𝑚1×𝑚2
(𝑥1−𝑥2)2+(𝑦1−𝑦2)2
 8 instructions on a CPU → 8 cycles**
 Much fewer cycles on a special purpose circuit
A = G × m1
B = A × m2
C = x1 - x2
D = C2
E = y1 - y2
F = E2
G = D + F
Ret = B / G
A = G × m1 × m2 B = (x1 -x2)2 C = (y1 -y2)2
D = B + C
Ret = B / G
4 cycles with basic operations
3 cycles with compound operations
Ret = (G × m1 × m2) / ((x1 - x2)2 + (y1 -y2)2)
1 cycle with even further compound operations
May slow down clock
Coarse-Grained Parallelism of
Special-Purpose Circuits
 Typical unit of parallelism for general-purpose units are threads ~= cores
 Special-purpose processing units can also be replicated for parallelism
o Large, complex processing units: Few can fit in chip
o Small, simple processing units: Many can fit in chip
 Only generates hardware useful for the application
o Instruction? Decoding? Cache? Coherence?
How Is It Different From ASICs
 ASIC (Application-Specific Integrated Circuit)
o Special chip purpose-built for an application
o E.g., ASIC bitcoin miner, Intel neural network accelerator
o Function cannot be changed once expensively built
 + FPGAs can be field-programmed
o Function can be changed completely whenever
o FPGA fabric emulates custom circuits
 - Emulated circuits are not as efficient as bare-metal
o ~10x performance (larger circuits, faster clock)
o ~10x power efficiency
Basic FPGA Architecture
“Configurable logic block (CLB)”
Programmable interconnect
I/O block
6-Input
Look-Up
Table
FF
Latch
Programmable
Input 1 Input 2 Output
0 0 0
0 1 0
1 0 0
1 1 1
Ex) 2-LUT for “AND”
~
Sequential circuit
construction
Basic FPGA Architecture – DSP Blocks
 CLBs act as gates – Many needed to
implement high-level logic
 Arithmetic operation provided as
efficient ALU blocks
o “Digital Signal Processing (DSP) blocks”
o Each block provides an adder + multiplier
“DSP block”
× +/-
Basic FPGA Architecture – Block RAM
 CLB can act as flip-flops
o (~1 bit/block) – tiny!
 Some on-chip SRAM provided as blocks
o ~18/36 Kbit/block, MBs per chip
o Massively parallel access to data → multi-
TB/s bandwidth
“Block RAM”
Basic FPGA Architecture – Hard Cores
 Some functions are provided as
efficient, non-configurable “hard cores”
o Multi-core ARM cores (“Zynq” series)
o Multi-Gigabit Transceivers
o PCIe/Ethernet PHY
o Memory controllers
o …
ARM PCIe
Ethernet
Memory
Example Accelerator Card Architecture
PCIe
FPGA
DRAM
DRAM
1GbE
FMC
40GbE
 “FPGA Mezzanine Card” Expansion
o Network Ports, Memory, Storage, PCIe, …
General-Purpose I/O Pins Multi-Gigabit Transceivers
Example Accelerator Card (VCU108)
Programming FPGAs
 Languages and tools overlap with ASIC/VLSI design
 FPGAs for acceleration typically done with either
o Hardware Description Languages (HDL): Register-Transfer Level (RTL) languages
o High-Level Synthesis: Compiler translates software programming languages to RTL
 RTL models a circuit using:
o Registers (state), and
o Combinational logic (computation)
Hardware Description Language
 Software programming languages: Describes process
 Hardware description languages: Describes structure
FIFO#(Float) input_queue <- mkFIFO;
FIFO#(Float) output_queue <- mkFIFO;
Reg#(Float) factor <- mkReg;
FloatMultIfc mult <- mkFloatMult;
rule in;
mult.enq(factor, input_queue.first);
input_queue.deq;
endrule
rule out;
ret <- mult.result;
output_queue.enq(ret);
endrule
std::queue<float> input_queue;
std::queue<float> output_queue;
float factor;
while (true) {
if ( !input_queue.empty() ) {
ret = input_queue.front() * factor;
output_queue.push(ret)
input_queue.pop();
}
}
Exists in memory Exists on chip
Creates
circuits
Instructions
For CPU
Major Hardware Description Languages
 Verilog: Most widely used in industry
o Relatively low-level language supported by everyone
 Chisel – Compiles to Verilog
o Relatively high-level language from Berkeley
o Embedded in the Scala programming language
o Prominently used in RISC-V development (Rocket core, etc)
 Bluespec – Compiles to Verilog
o Relatively high-level language from MIT
o Supports types, interfaces, etc
o Also active RISC-V development (Piccolo, etc)
High-Level Synthesis
 Compiler translates software programming languages to RTL
 High-Level Synthesis compiler from Xilinx, Altera/Intel
o Compiles C/C++, annotated with #pragma’s into RTL
o Theory/history behind it is a complex can of worms we won’t go into
o Personal experience: needs to be HEAVILY annotated to get performance
o Anecdote: Naïve RISC-V in Vivado HLS achieves IPC of 0.0002 [1], 0.04 after
optimizations [2]
 OpenCL
o Inherently parallel language more efficiently translated to hardware
o Stable software interface
[1] http://msyksphinz.hatenablog.com/entry/2019/02/20/040000
[2] http://msyksphinz.hatenablog.com/entry/2019/02/27/040000
FPGA Compilation Toolchain
High-Level
HDL Code
Language
Compiler
Verilog/
VHDL Synthesize Netlist
Map/
Place/
Route
Bitfile
High-level language vendor tool
FPGA Vendor toolchain (Few open source)
Constraint
File
“Which transceiver instance should
top_transceiver_01 map to?”
And so, so much more…
Cycle-level
Simulation
Functional
Simulation
Programming/Using an FPGA Accelerator
 Bitfile is programmed to FPGA over “JTAG” interface
o Typically used over USB cable
o Supports FPGA programming, limited debugging access, etc
 PCIe-attached FPGA accelerator card is typically used similarly to GPUs
o Program FPGA, execute software
o Software copies data to FPGA board, notify FPGA
-> FPGA logic performs computations
-> Software copies data back from FPGA
 FPGA flexibility gives immense freedom of usage patterns
o Streaming, coherent memory, …
Partial Reconfiguration
FPGA
Sub-components  Parts of the FPGA can be
swapped out dynamically
without turning off FPGA
o Physical area is drawn on chip
 Used in Amazon F1, etc
 Toolchain support for
isolation
FPGAs In The Cloud
 Amazon EC2 F1 instance (1 – 4 FPGAs)
 Microsoft Azure, etc…

More Related Content

PPTX
Introduction to EDA Tools
venkatasuman1983
 
PPT
ASIC VS FPGA.ppt
gopakumar885691
 
PPT
Fundamentals of FPGA
velamakuri
 
PDF
Introduction to Advanced embedded systems course
anishgoel
 
PDF
Introduction to FPGA, VHDL
Amr Rashed
 
PPT
The Cell Processor
Heiko Joerg Schick
 
PPT
FPGA and ASIC technologies comparison.ppt
BEVARAVASUDEVAAP1813
 
PPTX
Specialized parallel computing
Alaref Abushaala
 
Introduction to EDA Tools
venkatasuman1983
 
ASIC VS FPGA.ppt
gopakumar885691
 
Fundamentals of FPGA
velamakuri
 
Introduction to Advanced embedded systems course
anishgoel
 
Introduction to FPGA, VHDL
Amr Rashed
 
The Cell Processor
Heiko Joerg Schick
 
FPGA and ASIC technologies comparison.ppt
BEVARAVASUDEVAAP1813
 
Specialized parallel computing
Alaref Abushaala
 

Similar to fpga1 - What is.pptx (20)

PPT
FPGA Intro
naito88
 
PPT
L12 programmable+logic+devices+(pld)
NAGASAI547
 
PPT
L12_PROGRAMMABLE+LOGIC+DEVICES+(PLD).ppt
MikeTango5
 
PPT
L12_PROGRAMMABLE+LOGIC+DEVICES+(PLD).ppt
Rethabile37
 
PPTX
Fpga in space
JBPooMahaVinuShree
 
PPTX
Introduction to DPDK
Kernel TLV
 
PDF
Cpld fpga
anishgoel
 
PPTX
module 1-2 - Design Methods, parameters and examples.pptx
Maaz609108
 
PPT
0507036
meraz rizel
 
PDF
Nios2 and ip core
anishgoel
 
PDF
Deep learning: Hardware Landscape
Grigory Sapunov
 
PDF
ODSA Proof of Concept SmartNIC Speeds & Feeds
ODSA Workgroup
 
PDF
0.FPGA for dummies: Historical introduction
Maurizio Donna
 
PDF
FPGAs : An Overview
Sanjiv Malik
 
PDF
FPGA In a Nutshell
Somnath Mazumdar
 
PDF
1. FPGA architectures.pdf
TesfuFiseha1
 
ODP
Zpu
flexcore
 
PPTX
SoC FPGA Technology
Siraj Muhammad
 
RTF
4_BIT_ALU
Sohel Siddique
 
PPT
NIOS II Processor.ppt
Atef46
 
FPGA Intro
naito88
 
L12 programmable+logic+devices+(pld)
NAGASAI547
 
L12_PROGRAMMABLE+LOGIC+DEVICES+(PLD).ppt
MikeTango5
 
L12_PROGRAMMABLE+LOGIC+DEVICES+(PLD).ppt
Rethabile37
 
Fpga in space
JBPooMahaVinuShree
 
Introduction to DPDK
Kernel TLV
 
Cpld fpga
anishgoel
 
module 1-2 - Design Methods, parameters and examples.pptx
Maaz609108
 
0507036
meraz rizel
 
Nios2 and ip core
anishgoel
 
Deep learning: Hardware Landscape
Grigory Sapunov
 
ODSA Proof of Concept SmartNIC Speeds & Feeds
ODSA Workgroup
 
0.FPGA for dummies: Historical introduction
Maurizio Donna
 
FPGAs : An Overview
Sanjiv Malik
 
FPGA In a Nutshell
Somnath Mazumdar
 
1. FPGA architectures.pdf
TesfuFiseha1
 
SoC FPGA Technology
Siraj Muhammad
 
4_BIT_ALU
Sohel Siddique
 
NIOS II Processor.ppt
Atef46
 
Ad

Recently uploaded (20)

PDF
Top 10 read articles In Managing Information Technology.pdf
IJMIT JOURNAL
 
PDF
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
PPTX
MT Chapter 1.pptx- Magnetic particle testing
ABCAnyBodyCanRelax
 
PDF
Chad Ayach - A Versatile Aerospace Professional
Chad Ayach
 
PDF
Zero Carbon Building Performance standard
BassemOsman1
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PPTX
Victory Precisions_Supplier Profile.pptx
victoryprecisions199
 
PDF
FLEX-LNG-Company-Presentation-Nov-2017.pdf
jbloggzs
 
PDF
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
PDF
The Effect of Artifact Removal from EEG Signals on the Detection of Epileptic...
Partho Prosad
 
PPTX
easa module 3 funtamental electronics.pptx
tryanothert7
 
PPT
1. SYSTEMS, ROLES, AND DEVELOPMENT METHODOLOGIES.ppt
zilow058
 
PDF
Introduction to Data Science: data science process
ShivarkarSandip
 
PPTX
22PCOAM21 Session 2 Understanding Data Source.pptx
Guru Nanak Technical Institutions
 
PDF
20ME702-Mechatronics-UNIT-1,UNIT-2,UNIT-3,UNIT-4,UNIT-5, 2025-2026
Mohanumar S
 
PDF
Biodegradable Plastics: Innovations and Market Potential (www.kiu.ac.ug)
publication11
 
PDF
Advanced LangChain & RAG: Building a Financial AI Assistant with Real-Time Data
Soufiane Sejjari
 
PDF
top-5-use-cases-for-splunk-security-analytics.pdf
yaghutialireza
 
PPTX
Inventory management chapter in automation and robotics.
atisht0104
 
PDF
Introduction to Ship Engine Room Systems.pdf
Mahmoud Moghtaderi
 
Top 10 read articles In Managing Information Technology.pdf
IJMIT JOURNAL
 
Natural_Language_processing_Unit_I_notes.pdf
sanguleumeshit
 
MT Chapter 1.pptx- Magnetic particle testing
ABCAnyBodyCanRelax
 
Chad Ayach - A Versatile Aerospace Professional
Chad Ayach
 
Zero Carbon Building Performance standard
BassemOsman1
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
Victory Precisions_Supplier Profile.pptx
victoryprecisions199
 
FLEX-LNG-Company-Presentation-Nov-2017.pdf
jbloggzs
 
EVS+PRESENTATIONS EVS+PRESENTATIONS like
saiyedaqib429
 
The Effect of Artifact Removal from EEG Signals on the Detection of Epileptic...
Partho Prosad
 
easa module 3 funtamental electronics.pptx
tryanothert7
 
1. SYSTEMS, ROLES, AND DEVELOPMENT METHODOLOGIES.ppt
zilow058
 
Introduction to Data Science: data science process
ShivarkarSandip
 
22PCOAM21 Session 2 Understanding Data Source.pptx
Guru Nanak Technical Institutions
 
20ME702-Mechatronics-UNIT-1,UNIT-2,UNIT-3,UNIT-4,UNIT-5, 2025-2026
Mohanumar S
 
Biodegradable Plastics: Innovations and Market Potential (www.kiu.ac.ug)
publication11
 
Advanced LangChain & RAG: Building a Financial AI Assistant with Real-Time Data
Soufiane Sejjari
 
top-5-use-cases-for-splunk-security-analytics.pdf
yaghutialireza
 
Inventory management chapter in automation and robotics.
atisht0104
 
Introduction to Ship Engine Room Systems.pdf
Mahmoud Moghtaderi
 
Ad

fpga1 - What is.pptx

  • 1. CS295: Modern Systems What Are FPGAs and Why Should You Care Sang-Woo Jun Spring, 2019
  • 2. What Are FPGAs  Field-Programmable Gate Array  Can be configured to act like any circuit – More later!  Can do many things, but we focus on computation acceleration
  • 3. FPGAs Come In Many Forms PCIe-Attached CPU Integrated In-Network In-Storage
  • 4. How Is It Different From CPU/GPUs  GPU – The other major accelerator  CPU/GPU hardware is fixed o “General purpose” o we write programs (sequence of instructions) for them  FPGA hardware is not fixed o “Special purpose” o Hardware can be whatever we want o Will our hardware require/support software? Maybe!  Optimized hardware is very efficient o GPU-level performance** o 10x power efficiency (300 W vs 30 W)
  • 5. Analogy “The Z-Berry” “Experimental Investigations on Radiation Characteristics of IC Chips” benryves.com “Z80 Computer” CPU/GPU comes with fixed circuits FPGA gives you a big bag of components To build whatever Shadi Soundation: Homebrew 4 bit CPU Could be a CPU/GPU!
  • 6. Fine-Grained Parallelism of Special-Purpose Circuits  Example -- Calculating gravitational force: 𝐺×𝑚1×𝑚2 (𝑥1−𝑥2)2+(𝑦1−𝑦2)2  8 instructions on a CPU → 8 cycles**  Much fewer cycles on a special purpose circuit A = G × m1 B = A × m2 C = x1 - x2 D = C2 E = y1 - y2 F = E2 G = D + F Ret = B / G A = G × m1 × m2 B = (x1 -x2)2 C = (y1 -y2)2 D = B + C Ret = B / G 4 cycles with basic operations 3 cycles with compound operations Ret = (G × m1 × m2) / ((x1 - x2)2 + (y1 -y2)2) 1 cycle with even further compound operations May slow down clock
  • 7. Coarse-Grained Parallelism of Special-Purpose Circuits  Typical unit of parallelism for general-purpose units are threads ~= cores  Special-purpose processing units can also be replicated for parallelism o Large, complex processing units: Few can fit in chip o Small, simple processing units: Many can fit in chip  Only generates hardware useful for the application o Instruction? Decoding? Cache? Coherence?
  • 8. How Is It Different From ASICs  ASIC (Application-Specific Integrated Circuit) o Special chip purpose-built for an application o E.g., ASIC bitcoin miner, Intel neural network accelerator o Function cannot be changed once expensively built  + FPGAs can be field-programmed o Function can be changed completely whenever o FPGA fabric emulates custom circuits  - Emulated circuits are not as efficient as bare-metal o ~10x performance (larger circuits, faster clock) o ~10x power efficiency
  • 9. Basic FPGA Architecture “Configurable logic block (CLB)” Programmable interconnect I/O block 6-Input Look-Up Table FF Latch Programmable Input 1 Input 2 Output 0 0 0 0 1 0 1 0 0 1 1 1 Ex) 2-LUT for “AND” ~ Sequential circuit construction
  • 10. Basic FPGA Architecture – DSP Blocks  CLBs act as gates – Many needed to implement high-level logic  Arithmetic operation provided as efficient ALU blocks o “Digital Signal Processing (DSP) blocks” o Each block provides an adder + multiplier “DSP block” × +/-
  • 11. Basic FPGA Architecture – Block RAM  CLB can act as flip-flops o (~1 bit/block) – tiny!  Some on-chip SRAM provided as blocks o ~18/36 Kbit/block, MBs per chip o Massively parallel access to data → multi- TB/s bandwidth “Block RAM”
  • 12. Basic FPGA Architecture – Hard Cores  Some functions are provided as efficient, non-configurable “hard cores” o Multi-core ARM cores (“Zynq” series) o Multi-Gigabit Transceivers o PCIe/Ethernet PHY o Memory controllers o … ARM PCIe Ethernet Memory
  • 13. Example Accelerator Card Architecture PCIe FPGA DRAM DRAM 1GbE FMC 40GbE  “FPGA Mezzanine Card” Expansion o Network Ports, Memory, Storage, PCIe, … General-Purpose I/O Pins Multi-Gigabit Transceivers
  • 15. Programming FPGAs  Languages and tools overlap with ASIC/VLSI design  FPGAs for acceleration typically done with either o Hardware Description Languages (HDL): Register-Transfer Level (RTL) languages o High-Level Synthesis: Compiler translates software programming languages to RTL  RTL models a circuit using: o Registers (state), and o Combinational logic (computation)
  • 16. Hardware Description Language  Software programming languages: Describes process  Hardware description languages: Describes structure FIFO#(Float) input_queue <- mkFIFO; FIFO#(Float) output_queue <- mkFIFO; Reg#(Float) factor <- mkReg; FloatMultIfc mult <- mkFloatMult; rule in; mult.enq(factor, input_queue.first); input_queue.deq; endrule rule out; ret <- mult.result; output_queue.enq(ret); endrule std::queue<float> input_queue; std::queue<float> output_queue; float factor; while (true) { if ( !input_queue.empty() ) { ret = input_queue.front() * factor; output_queue.push(ret) input_queue.pop(); } } Exists in memory Exists on chip Creates circuits Instructions For CPU
  • 17. Major Hardware Description Languages  Verilog: Most widely used in industry o Relatively low-level language supported by everyone  Chisel – Compiles to Verilog o Relatively high-level language from Berkeley o Embedded in the Scala programming language o Prominently used in RISC-V development (Rocket core, etc)  Bluespec – Compiles to Verilog o Relatively high-level language from MIT o Supports types, interfaces, etc o Also active RISC-V development (Piccolo, etc)
  • 18. High-Level Synthesis  Compiler translates software programming languages to RTL  High-Level Synthesis compiler from Xilinx, Altera/Intel o Compiles C/C++, annotated with #pragma’s into RTL o Theory/history behind it is a complex can of worms we won’t go into o Personal experience: needs to be HEAVILY annotated to get performance o Anecdote: Naïve RISC-V in Vivado HLS achieves IPC of 0.0002 [1], 0.04 after optimizations [2]  OpenCL o Inherently parallel language more efficiently translated to hardware o Stable software interface [1] http://msyksphinz.hatenablog.com/entry/2019/02/20/040000 [2] http://msyksphinz.hatenablog.com/entry/2019/02/27/040000
  • 19. FPGA Compilation Toolchain High-Level HDL Code Language Compiler Verilog/ VHDL Synthesize Netlist Map/ Place/ Route Bitfile High-level language vendor tool FPGA Vendor toolchain (Few open source) Constraint File “Which transceiver instance should top_transceiver_01 map to?” And so, so much more… Cycle-level Simulation Functional Simulation
  • 20. Programming/Using an FPGA Accelerator  Bitfile is programmed to FPGA over “JTAG” interface o Typically used over USB cable o Supports FPGA programming, limited debugging access, etc  PCIe-attached FPGA accelerator card is typically used similarly to GPUs o Program FPGA, execute software o Software copies data to FPGA board, notify FPGA -> FPGA logic performs computations -> Software copies data back from FPGA  FPGA flexibility gives immense freedom of usage patterns o Streaming, coherent memory, …
  • 21. Partial Reconfiguration FPGA Sub-components  Parts of the FPGA can be swapped out dynamically without turning off FPGA o Physical area is drawn on chip  Used in Amazon F1, etc  Toolchain support for isolation
  • 22. FPGAs In The Cloud  Amazon EC2 F1 instance (1 – 4 FPGAs)  Microsoft Azure, etc…