2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 1 of 29
“Zeppelin”: an SoC for
Multi-chip Architectures
Noah Beck1, Sean White1, Milam Paraschou2, Samuel Naffziger2
1AMD, Boxborough, 2AMD, Fort Collins
Presented at ISSCC 2018
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 2 of 29
Outline
▪ Design Goals for the System-on-a-Chip codenamed “Zeppelin”
▪ SoC Architecture
▪ Core Complex codenamed “Zen”
▪ AMD Infinity Fabric (IF)
▪ I/O Capabilities, I/O muxing
▪ Floorplan and Packaging
▪ Results
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 3 of 29
“Zeppelin” SoC Goals
Design a System-on-a-Chip Solution for scalability
across the Server market
▪ 4-die multi-chip module (MCM) for Server in new
infrastructure
▪ Same SoC suitable for High-End Desktop
– 1-die Desktop in existing AM4 infrastructure
– 2-die MCM High-End Desktop in new infrastructure
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 4 of 29
“Zeppelin” Die Functional Overview
▪ Compute
– 8 “Zen” x86 cores
– 4MB total L2 cache
– 16 MB total L3 cache
▪ Memory
– 2 channel DDR4 with ECC
– 2 DIMMs/channel and
up to 256GB/channel
▪ Integrated I/O
– Coherent and control Infinity Fabric links
– 32 lanes high-speed SERDES
– 4 USB3.1 Gen1 ports
– Server Controller Hub (SPI, LPC, UART, I2C, RTC, SMBus)
IFIS/PCIe® IFOP IFOP
Zen
Zen
Zen
Zen
L3
Zen
Zen
Zen
Zen
L3
IFOP IFIS/PCIe/SATAIFOP
DDR
DDR
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 5 of 29
Chip Architecture
Infinity Fabric
Scalable Data Fabric
plane
IFIS/PCIe®
IFIS/PCIe/SATA
IFOP IF SCF
SMU
CAKE CAKE CAKE
PCIe Southbridge
IO Complex
SATA PCIe
CCM
CCX
4 cores +
L3
CAKE
CAKE
IFOP
CCX
4 cores +
L3
CCM
IFOP
DDRDDRIFOP
IOMS
CAKE UMC UMC
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 6 of 29
CCX: CPU Complex
▪ 4 cores with L1/L2 caches,
plus shared L3 cache
▪ “Zen” core described in
[Singh ISSCC17]
– L1 Instruction Cache 64KB,
4-way associative
– L1 Data Cache 32KB,
8-way associative
– L2 Cache 512KB,
8-way associative
– 2 threads per core
▪ L3 cache 8MB, 16-way associative, shared by all four cores
CORE 3
CORE L3M
1MB
L
3
C
T
L
L
2
C
T
L
L2M
512K
L3M
1MB
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 7 of 29
▪ Fast private L2 cache,
12 cycles
▪ Fast shared L3 cache,
35 cycles
▪ L3 filled from L2 victims
of all four cores
▪ L2 tags duplicated in L3 for
probe filtering and fast cache transfer
▪ Multiple smart prefetchers
▪ 50 outstanding misses from L2 to L3 per core
▪ 96 outstanding misses from L3 to memory
“Zen” Cache hierarchy
32B
fetch
32B/
cycle
CORE 0
32B/
cycle
2*16B
load
8M L3
I+D
Cache
16-way
32K
D-Cache
8-way
64K
I-Cache
4-way
512K L2
I+D
Cache
8-way
1*16B
store
32B/
cycle
32B/
cycle
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 8 of 29
AMD Infinity Fabric: Scalable Data Fabric
SDF Transport Layer CAKE
IFIS or
IFOP to
off-chip
IOMS
IO
Complex
CCM
CCX
CCM
CCX
UMC
DDR4
UMC
DDR4
I/O Master/Slave
Unified Memory
Controller
Coherent AMD
SocKet Extender
IF Inter-
Socket SerDes
IF On-Package
SerDes
Cache-Coherent Master
Core Complex
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 9 of 29
SDF Local Memory Access
SDF Transport Layer CAKE
IFIS or
IFOP to
off-chip
IOMS
IO
Complex
CCM
CCX
CCM
CCX
UMC
DDR4
UMC
DDR4
Latency to local memory: ~90ns
* See Endnotes for additional system configuration details
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 10 of 29
SDF Die-to-Die Memory Accesses
SDF Transport Layer CAKE IFOP
CCM
CCX
CCM
CCX
UMC DDR4
Latency to other memory
within socket: ~145ns
CAKEIFIS
CAKE
SDF Transport
Layer
Latency to memory
attached to other socket
(single hop): ~200ns
IFOP
* See Endnotes for additional system configuration details
UMCDDR4
CAKE
SDF Transport
Layer
IFIS
Other socket
die
Same package
die
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 11 of 29
▪ Low-swing, single-ended data for ~50% of power of an equivalent
differential driver
▪ Zero power driver state
during logic 0 transmit
– Transmit/receive
impedance termination to
ground while driver pullup
is disabled
– Also applied during link idle
▪ Data bit inversion
encoding saving 10%
average power per bit
2pJ/bit IFOP SerDes
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 12 of 29
Hierarchical Power Management
▪ System Management Unit (SMU) uses
IF Scalable Control Fabric (SCF) plane
▪ SCF: single-lane IFIS SerDes link for
chip-to-chip or socket-to-socket
▪ SMU calculation hierarchy for voltage
level control, C-State Boost, thermal
management, electrical design current
management
– Local chip SMU fast loop
– Master chip SMU slower loop
Die 2
Die 1
Die 3
Die 0
SMU
SMU
SMU
Master
SMU
To other socket
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 13 of 29
IO Subsystem & Muxing
▪ 32 lanes multi-protocol I/O
– PCIe, IFIS: two 16-lane links
– PCIe link bifurcation:
max 8 devices per 16-lane link
– SATA: 8 lanes of bottom link
▪ Supports multiple market
segments
▪ Muxing support adds
<1 channel clock latency to IFIS
16-lane link
x16
x16
x8 x8
x4 x4 x4 x4
x2 x2 x2 x2 x2 x2 x2 x2
x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1
x2 x2 x2 x2 x2 x2 x2 x2
x4 x4 x4 x4
16-lane link
x8 x8
x16
x16
IFIS
PCIe
SATA
x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1
x1 x1 x1 x1 x1 x1 x1 x1
I/O
CCX CCX
DDR I/O
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 14 of 29
Chip Floorplanning for Package
▪ DDR placement on one die edge
▪ Chips in 4-die MCM rotated 180°,
DDR facing package left/right edges
▪ Package-top Infinity Fabric pinout
requires diagonal placement of IFIS
▪ 4th IFOP enables routing of high-
speed I/O in only four package
substrate
layers
I/O
CCX CCX
DDR I/O
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 15 of 29
DDR+IFOP Package Routing
▪ Vertical and Horizontal IFOP: 2 layers each
▪ Diagonal IFOP: 1 layer each
▪ DDR channel: 1 layer each
Layer A Layer B
I/ODDR
Die2
CCX
CCX
I/O
DDR
Die1
CCX
CCX
I/O
I/ODDR
Die3
CCX
CCX
I/O
DDR
Die0
CCX
CCX
I/O
I/OI/O
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 16 of 29
DDR+IFIS Package Routing
▪ DDR channel: 1 layer each
▪ IFIS links: 2 layers each Layer C Layer D
I/ODDR
Die2
CCX
CCX
I/O
DDR
Die1
CCX
CCX
I/O
I/ODDR
Die3
CCX
CCX
I/O
DDR
Die0
CCX
CCX
I/O
I/OI/O
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 17 of 29
MCM Versus Single-Chip Design
▪ 4-die MCM package: 852mm2 of silicon (4 * 213mm2)
▪ Large single-chip design:
– ~10% area savings: 777mm2 (near reticle size limit)
– Manufacturing/test cost: ~40% higher
– Full 32-core yield: ~17% lower
– Full 32-core cost: ~70% higher
▪ High-yielding multi-chip assembly process
– Achievable based on internal production data
– Die frequency matching using on-die frequency sensors
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 18 of 29
MCM Package Achievements
▪ 4094 total LGA pins
▪ 58mm x 75mm organic substrate
▪ 534 IF high-speed chip-to-chip nets
– Over 256GB/s total in-package bandwidth
▪ 1760 high-speed pins
– Over 450GB/s total off-package bandwidth
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 19 of 29
More MCM Package Achievements
▪ ~300µF of on-package cap
▪ ~300A current
▪ Up to 200W TDP
Core supply pins, 180A
Uncore supply pins, 65A
1.2Vsupplypins,30A
1.2Vsupplypins,30A
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 20 of 29
MCM Core Voltage Variation
▪ Per-core measurements shown
– +/-25mV accuracy with max power
workload
▪ Per-core ring oscillators
– Calibrated for temperature and voltage
– Min/max voltage sampled 470M/s
▪ Static differences compensated by
per-core LDOs
▪ Dynamic differences mitigated by
clock stretcher, DPM states
I/ODDR
Die 2
I/O
DDRDDRDDR
+10.8
+10.8
+7.3
+14.4
+8.9
+12.5
-5.0
-20.9
I/ODDR
Die 1
I/O
I/O
I/O
DDRDDRDDR
+10.5
+10.9
+10.9
-6.5+25.1
+14.1
+7.4
+14.3
I/ODDR
Die 3
I/O
I/O
I/O
DDRDDR
-9.5
-3.0
-3.0
+3.8
-6.4
-13.1
-16.4
-25.7
I/ODDR
Die 0
I/O
I/O
I/O
DDR
-7.3
+6.5
-6.6
-20.1-7.3
-7.3
-9.7
-0.2
* See Endnotes for additional system configuration details
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 21 of 29
Core Voltage Measurements
▪ Measured data shows
excellent tracking of per-
core voltage from the
digital LDO with mV-
accurate target voltage
▪ Power savings through
per-core voltage
optimization
* See Endnotes for additional system configuration details
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 22 of 29
4-Chip EPYC Package
▪ 128 lanes can be used as PCIe
– Attach six 16-lane accelerator cards
to a single socket
▪ 8 DDR4 channels
NIC
16 DIMMs
Memory
8 Drives
Single Socket AMD EPYCTM System
64 lanes High-speed I/O
I/O
Die2
CCX
CCX
I/O
DDR
Die1
CCX
CCX
I/ODDR
I/O
Die3
CCX
CCX
I/O
DDR
Die0
CCX
CCX
I/ODDR
I/OI/O
64 lanes High-speed I/O
4 Channels
DDR4
4 Channels
DDR4
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 23 of 29
Dual 4-Chip EPYC Packages
Dual Socket AMD EPYCTM System
128 lanes High-speed I/O
I/O
Die2
CCX
CCX
I/O
DDR
Die1
CCX
CCX
I/O
DDR
Die0
CCX
CCX
I/O
DDR
I/OI/O
4 Channels
DDR4
4 Channels
DDR4
I/O
Die2
CCX
CCX
I/O
DDR
Die1
CCX
CCX
I/O
DDR
Die0
CCX
CCX
I/O
DDR
I/OI/O
4 Channels
DDR4
4 Channels
DDR4
Die3
CCX
CCX
I/O
DDRI/O
I/O
Die3
CCX
CCX
I/O
DDR
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 24 of 29
Single Chip AM4 Package
▪ Socket compatible with
other AMD SoCs for
desktop market
▪ 8 cores / 16 threads
▪ 2 DDR4 channels
▪ 24 PCIe Gen3 lanes
▪ Up to 95W TDP
AMD RyzenTM System
Die
CCX
CCX
I/O
DDR
I/O
24 lanes High-speed I/O
2 Channels
DDR4
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 25 of 29
2-Chip sTR4 Package
▪ Socket defined for “Zeppelin”
SoC and compatible with future
designs
▪ 16 cores / 32 threads
▪ 4 DDR4 channels
▪ 64 PCIe Gen3 lanes
AMD RyzenTM ThreadripperTM System
32 lanes High-speed I/O
I/O
Die 1
CCX
CCX
I/O
DDR
Die 0
CCX
CCX
I/O
DDR
I/O
32 lanes High-speed I/O
2 Channels
DDR4
2 Channels
DDR4
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 26 of 29
Benchmark results
▪ Scalable performance
from single-chip up to
8-chip 2-socket
configuration
* See Endnotes for additional system configuration details
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 27 of 29
IFIS/PCIe IFOP IFOP
Zen
Zen
Zen
Zen
L3
Zen
Zen
Zen
Zen
L3
IFOP IFIS/PCIe/SATAIFOP
DDR
DDR
An SoC for Multi-chip Architectures
Mainstream Desktop Performance Server High-End Desktop
Dummy
Dummy
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 28 of 29
Acknowledgment
▪ We would like to thank our talented AMD design teams across
Austin, Bangalore, Boston, Fort Collins, Hyderabad, Markham,
Santa Clara, and Shanghai, who contributed on “Zen” and “Zeppelin”
▪ Please check out our demo tonight
2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 29 of 29
Endnotes
AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this
publication are for identification purposes only and may be trademarks of their respective companies.
Slides 9, 10:
Latencies assume 2.4GHz CPU core frequency and 1R DDR4-2667 19-19-19 RDIMM; Memory, IFIS, IFOP latencies are dependent on DRAM
clock; Memory latencies include testing overhead (including DRAM refresh).
Slides 20, 21:
Power measurements taken from a SP3 Diesel non-DAP AMD evaluation system, with EPYC rev B1 parts, BIOS revision WDL7405N,
Windows Server 2016, running a Max Power pattern at 2.5GHz core frequency
Slide 26:
AMD RyzenTM 7 1800X CPU scored 211, using estimated scores based on testing performed in AMD Internal Labs as of 30 March 2017.
System config: RyzenTM 7 1800X: AMD Myrtle-SM with 95W R7 1800X, 32GB DDR4-2667 RAM, Crucial CT256M550SSD, Ubuntu 15.10,
GCC –O2 v4.6 compiler suite.
AMD RyzenTM ThreadripperTM 1950X CPU scored 375, using estimated scores based on testing performed in AMD Internal Labs as of 7
September 2017. System config: RyzenTM ThreadripperTM 1950X: AMD Whitehaven-DAP with 180W TR 1950X, 64GB DDR4-2667 RAM,
CT256M4SSD disk, Ubuntu 15.10, GCC –O2 v4.6 compiler suite.
AMD EPYCTM 7601 CPU scored 702 in a 1-socket using estimated scores based on internal AMD testing as of 6 June 2017. 1 x EPYCTM 7601
CPU in HPE Cloudline CL3150, Ubuntu 16.04, GCC -O2 v6.3 compiler suite, 256 GB (8 x 32 GB 2Rx4 PC4-2666) memory, 1 x 500 GB SSD
AMD EPYCTM 7601 scored 1390 in a 2-socket system using estimated scores based on internal AMD testing as of 6 June 2017. 2 x EPYCTM
7601 CPU in Supermicro AS-1123US-TR4, Ubuntu 16.04, GCC -O2 v6.3 compiler suite, 512 GB (16 x 32GB 2Rx4 PC4-2666 running at 2400)
memory, 1 x 500 GB SSD.

More Related Content

PDF
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
 
PPTX
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
 
PDF
AMD Zen 5 Architecture Deep Dive from Tech Day
PDF
The Path to "Zen 2"
 
PPTX
AMD Chiplet Architecture for High-Performance Server and Desktop Products
 
PPTX
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
 
PDF
AMD Ryzen CPU Zen Cores Architecture
PPTX
3D V-Cache
 
AMD and the new “Zen” High Performance x86 Core at Hot Chips 28
 
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
 
AMD Zen 5 Architecture Deep Dive from Tech Day
The Path to "Zen 2"
 
AMD Chiplet Architecture for High-Performance Server and Desktop Products
 
“Zen 3”: AMD 2nd Generation 7nm x86-64 Microprocessor Core
 
AMD Ryzen CPU Zen Cores Architecture
3D V-Cache
 

What's hot (20)

PDF
AMD EPYC™ Microprocessor Architecture
 
PPTX
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
PPTX
Heterogeneous Integration with 3D Packaging
 
PDF
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
 
PDF
Delivering the Future of High-Performance Computing
 
PDF
Shared Memory Centric Computing with CXL & OMI
PDF
01 nand flash_reliability_notes
PPTX
Broadcom PCIe & CXL Switches OCP Final.pptx
PPTX
Evaluating UCIe based multi-die SoC to meet timing and power
PDF
Chiplets in Data Centers
PPTX
All Presentations during CXL Forum at Flash Memory Summit 22
PDF
AMD: Where Gaming Begins
 
PPTX
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
PPTX
CXL Consortium Update: Advancing Coherent Connectivity
PPTX
PPTX
Slideshare - PCIe
PDF
DesignCon 2019 112-Gbps Electrical Interfaces: An OIF Update on CEI-112G
PDF
If AMD Adopted OMI in their EPYC Architecture
PDF
Verification Strategy for PCI-Express
PPTX
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
 
AMD EPYC™ Microprocessor Architecture
 
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
Heterogeneous Integration with 3D Packaging
 
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
 
Delivering the Future of High-Performance Computing
 
Shared Memory Centric Computing with CXL & OMI
01 nand flash_reliability_notes
Broadcom PCIe & CXL Switches OCP Final.pptx
Evaluating UCIe based multi-die SoC to meet timing and power
Chiplets in Data Centers
All Presentations during CXL Forum at Flash Memory Summit 22
AMD: Where Gaming Begins
 
03_03_Implementing_PCIe_ATS_in_ARM-based_SoCs_Final
CXL Consortium Update: Advancing Coherent Connectivity
Slideshare - PCIe
DesignCon 2019 112-Gbps Electrical Interfaces: An OIF Update on CEI-112G
If AMD Adopted OMI in their EPYC Architecture
Verification Strategy for PCI-Express
AMD Radeon™ RX 5700 Series 7nm Energy-Efficient High-Performance GPUs
 
Ad

Similar to ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures (20)

PDF
12 la bel_soc overview
PPT
PDF
Z14_IBM__APL_Presentation_by_Christian_Demmer.pdf
PDF
Z14_IBM__APL_by_Christian_Demmer_IBM.pdf
PDF
dokumen.tips_3d-ics-advances-in-the-industry-ectc-ieee-electronic-thursday-pm...
PDF
io and pad ring.pdf
PDF
CMOS Digital Integrated Circuits - Ch 01_Introduction
PPTX
Software hardware co-design using xilinx zynq soc
DOCX
The end of the line for single-chip processors_.docx
PDF
cxl introduction of intel compute expresser link.pdf
PDF
Much Ado about CPU
PDF
Much Ado About CPU
PDF
The_New_IBM_z15_A-technical_review_of_the_Processor_Design_New_Features_IO_Ca...
PDF
58979380-3d-ics-Seminar-Report-08 (1).pdf
PDF
EW2023-MIPI-Advantages-I3C-End-Equipment-Applications-Chaundry.pdf
PPTX
Seminario utovrm
PDF
Systems on chip (so c)
PDF
SOC Design Challenges and Practices
PDF
Pres
12 la bel_soc overview
Z14_IBM__APL_Presentation_by_Christian_Demmer.pdf
Z14_IBM__APL_by_Christian_Demmer_IBM.pdf
dokumen.tips_3d-ics-advances-in-the-industry-ectc-ieee-electronic-thursday-pm...
io and pad ring.pdf
CMOS Digital Integrated Circuits - Ch 01_Introduction
Software hardware co-design using xilinx zynq soc
The end of the line for single-chip processors_.docx
cxl introduction of intel compute expresser link.pdf
Much Ado about CPU
Much Ado About CPU
The_New_IBM_z15_A-technical_review_of_the_Processor_Design_New_Features_IO_Ca...
58979380-3d-ics-Seminar-Report-08 (1).pdf
EW2023-MIPI-Advantages-I3C-End-Equipment-Applications-Chaundry.pdf
Seminario utovrm
Systems on chip (so c)
SOC Design Challenges and Practices
Pres
Ad

More from AMD (17)

PPTX
AMD EPYC Family World Record Performance Summary Mar 2022
 
PPTX
AMD EPYC Family of Processors World Record
 
PPTX
AMD EPYC Family of Processors World Record
 
PPTX
AMD EPYC World Records
 
PPTX
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
PPTX
AMD EPYC 7002 World Records
 
PPTX
AMD EPYC 7002 World Records
 
PPTX
AMD EPYC 100 World Records and Counting
 
PPTX
AMD EPYC 7002 Launch World Records
 
PDF
7nm "Navi" GPU - A GPU Built For Performance
 
PPTX
AMD Next Horizon
 
PPTX
AMD Next Horizon
 
PDF
AMD Next Horizon
 
PDF
Race to Reality: The Next Billion-People Market Opportunity
 
PDF
GPU Compute in Medical and Print Imaging
 
PPTX
Enabling ARM® Server Technology for the Datacenter
 
PPTX
Lessons From MineCraft: Building the Right SMB Network
 
AMD EPYC Family World Record Performance Summary Mar 2022
 
AMD EPYC Family of Processors World Record
 
AMD EPYC Family of Processors World Record
 
AMD EPYC World Records
 
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
AMD EPYC 7002 World Records
 
AMD EPYC 7002 World Records
 
AMD EPYC 100 World Records and Counting
 
AMD EPYC 7002 Launch World Records
 
7nm "Navi" GPU - A GPU Built For Performance
 
AMD Next Horizon
 
AMD Next Horizon
 
AMD Next Horizon
 
Race to Reality: The Next Billion-People Market Opportunity
 
GPU Compute in Medical and Print Imaging
 
Enabling ARM® Server Technology for the Datacenter
 
Lessons From MineCraft: Building the Right SMB Network
 

Recently uploaded (20)

PDF
Auditboard EB SOX Playbook 2023 edition.
PPTX
Module 1 Introduction to Web Programming .pptx
PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
PPTX
MuleSoft-Compete-Deck for midddleware integrations
PDF
The influence of sentiment analysis in enhancing early warning system model f...
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
PPTX
Custom Battery Pack Design Considerations for Performance and Safety
PDF
sbt 2.0: go big (Scala Days 2025 edition)
PDF
giants, standing on the shoulders of - by Daniel Stenberg
PPTX
Configure Apache Mutual Authentication
PDF
4 layer Arch & Reference Arch of IoT.pdf
PPTX
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
PDF
Comparative analysis of machine learning models for fake news detection in so...
PDF
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
PDF
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
PDF
Data Virtualization in Action: Scaling APIs and Apps with FME
DOCX
search engine optimization ppt fir known well about this
PPTX
Microsoft User Copilot Training Slide Deck
PDF
Advancing precision in air quality forecasting through machine learning integ...
PDF
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...
Auditboard EB SOX Playbook 2023 edition.
Module 1 Introduction to Web Programming .pptx
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
MuleSoft-Compete-Deck for midddleware integrations
The influence of sentiment analysis in enhancing early warning system model f...
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
Custom Battery Pack Design Considerations for Performance and Safety
sbt 2.0: go big (Scala Days 2025 edition)
giants, standing on the shoulders of - by Daniel Stenberg
Configure Apache Mutual Authentication
4 layer Arch & Reference Arch of IoT.pdf
AI-driven Assurance Across Your End-to-end Network With ThousandEyes
Comparative analysis of machine learning models for fake news detection in so...
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
Transform-Your-Streaming-Platform-with-AI-Driven-Quality-Engineering.pdf
Data Virtualization in Action: Scaling APIs and Apps with FME
search engine optimization ppt fir known well about this
Microsoft User Copilot Training Slide Deck
Advancing precision in air quality forecasting through machine learning integ...
Produktkatalog für HOBO Datenlogger, Wetterstationen, Sensoren, Software und ...

ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures

  • 1. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 1 of 29 “Zeppelin”: an SoC for Multi-chip Architectures Noah Beck1, Sean White1, Milam Paraschou2, Samuel Naffziger2 1AMD, Boxborough, 2AMD, Fort Collins Presented at ISSCC 2018
  • 2. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 2 of 29 Outline ▪ Design Goals for the System-on-a-Chip codenamed “Zeppelin” ▪ SoC Architecture ▪ Core Complex codenamed “Zen” ▪ AMD Infinity Fabric (IF) ▪ I/O Capabilities, I/O muxing ▪ Floorplan and Packaging ▪ Results
  • 3. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 3 of 29 “Zeppelin” SoC Goals Design a System-on-a-Chip Solution for scalability across the Server market ▪ 4-die multi-chip module (MCM) for Server in new infrastructure ▪ Same SoC suitable for High-End Desktop – 1-die Desktop in existing AM4 infrastructure – 2-die MCM High-End Desktop in new infrastructure
  • 4. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 4 of 29 “Zeppelin” Die Functional Overview ▪ Compute – 8 “Zen” x86 cores – 4MB total L2 cache – 16 MB total L3 cache ▪ Memory – 2 channel DDR4 with ECC – 2 DIMMs/channel and up to 256GB/channel ▪ Integrated I/O – Coherent and control Infinity Fabric links – 32 lanes high-speed SERDES – 4 USB3.1 Gen1 ports – Server Controller Hub (SPI, LPC, UART, I2C, RTC, SMBus) IFIS/PCIe® IFOP IFOP Zen Zen Zen Zen L3 Zen Zen Zen Zen L3 IFOP IFIS/PCIe/SATAIFOP DDR DDR
  • 5. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 5 of 29 Chip Architecture Infinity Fabric Scalable Data Fabric plane IFIS/PCIe® IFIS/PCIe/SATA IFOP IF SCF SMU CAKE CAKE CAKE PCIe Southbridge IO Complex SATA PCIe CCM CCX 4 cores + L3 CAKE CAKE IFOP CCX 4 cores + L3 CCM IFOP DDRDDRIFOP IOMS CAKE UMC UMC
  • 6. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 6 of 29 CCX: CPU Complex ▪ 4 cores with L1/L2 caches, plus shared L3 cache ▪ “Zen” core described in [Singh ISSCC17] – L1 Instruction Cache 64KB, 4-way associative – L1 Data Cache 32KB, 8-way associative – L2 Cache 512KB, 8-way associative – 2 threads per core ▪ L3 cache 8MB, 16-way associative, shared by all four cores CORE 3 CORE L3M 1MB L 3 C T L L 2 C T L L2M 512K L3M 1MB
  • 7. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 7 of 29 ▪ Fast private L2 cache, 12 cycles ▪ Fast shared L3 cache, 35 cycles ▪ L3 filled from L2 victims of all four cores ▪ L2 tags duplicated in L3 for probe filtering and fast cache transfer ▪ Multiple smart prefetchers ▪ 50 outstanding misses from L2 to L3 per core ▪ 96 outstanding misses from L3 to memory “Zen” Cache hierarchy 32B fetch 32B/ cycle CORE 0 32B/ cycle 2*16B load 8M L3 I+D Cache 16-way 32K D-Cache 8-way 64K I-Cache 4-way 512K L2 I+D Cache 8-way 1*16B store 32B/ cycle 32B/ cycle
  • 8. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 8 of 29 AMD Infinity Fabric: Scalable Data Fabric SDF Transport Layer CAKE IFIS or IFOP to off-chip IOMS IO Complex CCM CCX CCM CCX UMC DDR4 UMC DDR4 I/O Master/Slave Unified Memory Controller Coherent AMD SocKet Extender IF Inter- Socket SerDes IF On-Package SerDes Cache-Coherent Master Core Complex
  • 9. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 9 of 29 SDF Local Memory Access SDF Transport Layer CAKE IFIS or IFOP to off-chip IOMS IO Complex CCM CCX CCM CCX UMC DDR4 UMC DDR4 Latency to local memory: ~90ns * See Endnotes for additional system configuration details
  • 10. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 10 of 29 SDF Die-to-Die Memory Accesses SDF Transport Layer CAKE IFOP CCM CCX CCM CCX UMC DDR4 Latency to other memory within socket: ~145ns CAKEIFIS CAKE SDF Transport Layer Latency to memory attached to other socket (single hop): ~200ns IFOP * See Endnotes for additional system configuration details UMCDDR4 CAKE SDF Transport Layer IFIS Other socket die Same package die
  • 11. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 11 of 29 ▪ Low-swing, single-ended data for ~50% of power of an equivalent differential driver ▪ Zero power driver state during logic 0 transmit – Transmit/receive impedance termination to ground while driver pullup is disabled – Also applied during link idle ▪ Data bit inversion encoding saving 10% average power per bit 2pJ/bit IFOP SerDes
  • 12. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 12 of 29 Hierarchical Power Management ▪ System Management Unit (SMU) uses IF Scalable Control Fabric (SCF) plane ▪ SCF: single-lane IFIS SerDes link for chip-to-chip or socket-to-socket ▪ SMU calculation hierarchy for voltage level control, C-State Boost, thermal management, electrical design current management – Local chip SMU fast loop – Master chip SMU slower loop Die 2 Die 1 Die 3 Die 0 SMU SMU SMU Master SMU To other socket
  • 13. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 13 of 29 IO Subsystem & Muxing ▪ 32 lanes multi-protocol I/O – PCIe, IFIS: two 16-lane links – PCIe link bifurcation: max 8 devices per 16-lane link – SATA: 8 lanes of bottom link ▪ Supports multiple market segments ▪ Muxing support adds <1 channel clock latency to IFIS 16-lane link x16 x16 x8 x8 x4 x4 x4 x4 x2 x2 x2 x2 x2 x2 x2 x2 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x2 x2 x2 x2 x2 x2 x2 x2 x4 x4 x4 x4 16-lane link x8 x8 x16 x16 IFIS PCIe SATA x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 x1 I/O CCX CCX DDR I/O
  • 14. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 14 of 29 Chip Floorplanning for Package ▪ DDR placement on one die edge ▪ Chips in 4-die MCM rotated 180°, DDR facing package left/right edges ▪ Package-top Infinity Fabric pinout requires diagonal placement of IFIS ▪ 4th IFOP enables routing of high- speed I/O in only four package substrate layers I/O CCX CCX DDR I/O
  • 15. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 15 of 29 DDR+IFOP Package Routing ▪ Vertical and Horizontal IFOP: 2 layers each ▪ Diagonal IFOP: 1 layer each ▪ DDR channel: 1 layer each Layer A Layer B I/ODDR Die2 CCX CCX I/O DDR Die1 CCX CCX I/O I/ODDR Die3 CCX CCX I/O DDR Die0 CCX CCX I/O I/OI/O
  • 16. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 16 of 29 DDR+IFIS Package Routing ▪ DDR channel: 1 layer each ▪ IFIS links: 2 layers each Layer C Layer D I/ODDR Die2 CCX CCX I/O DDR Die1 CCX CCX I/O I/ODDR Die3 CCX CCX I/O DDR Die0 CCX CCX I/O I/OI/O
  • 17. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 17 of 29 MCM Versus Single-Chip Design ▪ 4-die MCM package: 852mm2 of silicon (4 * 213mm2) ▪ Large single-chip design: – ~10% area savings: 777mm2 (near reticle size limit) – Manufacturing/test cost: ~40% higher – Full 32-core yield: ~17% lower – Full 32-core cost: ~70% higher ▪ High-yielding multi-chip assembly process – Achievable based on internal production data – Die frequency matching using on-die frequency sensors
  • 18. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 18 of 29 MCM Package Achievements ▪ 4094 total LGA pins ▪ 58mm x 75mm organic substrate ▪ 534 IF high-speed chip-to-chip nets – Over 256GB/s total in-package bandwidth ▪ 1760 high-speed pins – Over 450GB/s total off-package bandwidth
  • 19. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 19 of 29 More MCM Package Achievements ▪ ~300µF of on-package cap ▪ ~300A current ▪ Up to 200W TDP Core supply pins, 180A Uncore supply pins, 65A 1.2Vsupplypins,30A 1.2Vsupplypins,30A
  • 20. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 20 of 29 MCM Core Voltage Variation ▪ Per-core measurements shown – +/-25mV accuracy with max power workload ▪ Per-core ring oscillators – Calibrated for temperature and voltage – Min/max voltage sampled 470M/s ▪ Static differences compensated by per-core LDOs ▪ Dynamic differences mitigated by clock stretcher, DPM states I/ODDR Die 2 I/O DDRDDRDDR +10.8 +10.8 +7.3 +14.4 +8.9 +12.5 -5.0 -20.9 I/ODDR Die 1 I/O I/O I/O DDRDDRDDR +10.5 +10.9 +10.9 -6.5+25.1 +14.1 +7.4 +14.3 I/ODDR Die 3 I/O I/O I/O DDRDDR -9.5 -3.0 -3.0 +3.8 -6.4 -13.1 -16.4 -25.7 I/ODDR Die 0 I/O I/O I/O DDR -7.3 +6.5 -6.6 -20.1-7.3 -7.3 -9.7 -0.2 * See Endnotes for additional system configuration details
  • 21. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 21 of 29 Core Voltage Measurements ▪ Measured data shows excellent tracking of per- core voltage from the digital LDO with mV- accurate target voltage ▪ Power savings through per-core voltage optimization * See Endnotes for additional system configuration details
  • 22. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 22 of 29 4-Chip EPYC Package ▪ 128 lanes can be used as PCIe – Attach six 16-lane accelerator cards to a single socket ▪ 8 DDR4 channels NIC 16 DIMMs Memory 8 Drives Single Socket AMD EPYCTM System 64 lanes High-speed I/O I/O Die2 CCX CCX I/O DDR Die1 CCX CCX I/ODDR I/O Die3 CCX CCX I/O DDR Die0 CCX CCX I/ODDR I/OI/O 64 lanes High-speed I/O 4 Channels DDR4 4 Channels DDR4
  • 23. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 23 of 29 Dual 4-Chip EPYC Packages Dual Socket AMD EPYCTM System 128 lanes High-speed I/O I/O Die2 CCX CCX I/O DDR Die1 CCX CCX I/O DDR Die0 CCX CCX I/O DDR I/OI/O 4 Channels DDR4 4 Channels DDR4 I/O Die2 CCX CCX I/O DDR Die1 CCX CCX I/O DDR Die0 CCX CCX I/O DDR I/OI/O 4 Channels DDR4 4 Channels DDR4 Die3 CCX CCX I/O DDRI/O I/O Die3 CCX CCX I/O DDR
  • 24. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 24 of 29 Single Chip AM4 Package ▪ Socket compatible with other AMD SoCs for desktop market ▪ 8 cores / 16 threads ▪ 2 DDR4 channels ▪ 24 PCIe Gen3 lanes ▪ Up to 95W TDP AMD RyzenTM System Die CCX CCX I/O DDR I/O 24 lanes High-speed I/O 2 Channels DDR4
  • 25. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 25 of 29 2-Chip sTR4 Package ▪ Socket defined for “Zeppelin” SoC and compatible with future designs ▪ 16 cores / 32 threads ▪ 4 DDR4 channels ▪ 64 PCIe Gen3 lanes AMD RyzenTM ThreadripperTM System 32 lanes High-speed I/O I/O Die 1 CCX CCX I/O DDR Die 0 CCX CCX I/O DDR I/O 32 lanes High-speed I/O 2 Channels DDR4 2 Channels DDR4
  • 26. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 26 of 29 Benchmark results ▪ Scalable performance from single-chip up to 8-chip 2-socket configuration * See Endnotes for additional system configuration details
  • 27. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 27 of 29 IFIS/PCIe IFOP IFOP Zen Zen Zen Zen L3 Zen Zen Zen Zen L3 IFOP IFIS/PCIe/SATAIFOP DDR DDR An SoC for Multi-chip Architectures Mainstream Desktop Performance Server High-End Desktop Dummy Dummy
  • 28. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 28 of 29 Acknowledgment ▪ We would like to thank our talented AMD design teams across Austin, Bangalore, Boston, Fort Collins, Hyderabad, Markham, Santa Clara, and Shanghai, who contributed on “Zen” and “Zeppelin” ▪ Please check out our demo tonight
  • 29. 2.4: “Zeppelin”: an SoC for Multi-chip Architectures© 2018 IEEE International Solid-State Circuits Conference 29 of 29 Endnotes AMD, the AMD Arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. Slides 9, 10: Latencies assume 2.4GHz CPU core frequency and 1R DDR4-2667 19-19-19 RDIMM; Memory, IFIS, IFOP latencies are dependent on DRAM clock; Memory latencies include testing overhead (including DRAM refresh). Slides 20, 21: Power measurements taken from a SP3 Diesel non-DAP AMD evaluation system, with EPYC rev B1 parts, BIOS revision WDL7405N, Windows Server 2016, running a Max Power pattern at 2.5GHz core frequency Slide 26: AMD RyzenTM 7 1800X CPU scored 211, using estimated scores based on testing performed in AMD Internal Labs as of 30 March 2017. System config: RyzenTM 7 1800X: AMD Myrtle-SM with 95W R7 1800X, 32GB DDR4-2667 RAM, Crucial CT256M550SSD, Ubuntu 15.10, GCC –O2 v4.6 compiler suite. AMD RyzenTM ThreadripperTM 1950X CPU scored 375, using estimated scores based on testing performed in AMD Internal Labs as of 7 September 2017. System config: RyzenTM ThreadripperTM 1950X: AMD Whitehaven-DAP with 180W TR 1950X, 64GB DDR4-2667 RAM, CT256M4SSD disk, Ubuntu 15.10, GCC –O2 v4.6 compiler suite. AMD EPYCTM 7601 CPU scored 702 in a 1-socket using estimated scores based on internal AMD testing as of 6 June 2017. 1 x EPYCTM 7601 CPU in HPE Cloudline CL3150, Ubuntu 16.04, GCC -O2 v6.3 compiler suite, 256 GB (8 x 32 GB 2Rx4 PC4-2666) memory, 1 x 500 GB SSD AMD EPYCTM 7601 scored 1390 in a 2-socket system using estimated scores based on internal AMD testing as of 6 June 2017. 2 x EPYCTM 7601 CPU in Supermicro AS-1123US-TR4, Ubuntu 16.04, GCC -O2 v6.3 compiler suite, 512 GB (16 x 32GB 2Rx4 PC4-2666 running at 2400) memory, 1 x 500 GB SSD.