SlideShare a Scribd company logo
Rich Graham
February 2016, HPCAC Stanford Conference
Interconnect Your Future
© 2015 Mellanox Technologies 2
The Ever Growing Demand for Higher Performance
2000 202020102005
“Roadrunner”
1st
2015
Terascale Petascale Exascale
Single-Core to Many-CoreSMP to Clusters
Performance Development
Co-Design
HW SW
APP
Hardware
Software
Application
The Interconnect is the Enabling Technology
© 2015 Mellanox Technologies 3
Co-Design Architecture to Enable Exascale Performance
CPU-Centric Co-Design
Limited to Main CPU Usage
Results in Performance Limitation
Creating Synergies
Enables Higher Performance and Scale
Software
Software
In-CPU
Computing
In-Network
Computing
In-Storage
Computing
© 2015 Mellanox Technologies 4
The Intelligence is Moving to the Interconnect
CPU
Interconnect
Past Future
© 2015 Mellanox Technologies 5
Breaking the Application Latency Wall
§ Today: Network device latencies are on the order of 100 nanoseconds
§ Challenge: Enabling the next order of magnitude improvement in application performance
§ Solution: Creating synergies between software and hardware – intelligent interconnect
Intelligent Interconnect Paves the Road to Exascale Performance
10 years ago
~10
microsecond
~100
microsecond
NetworkCommunication
Framework
Today
~10
microsecond
Communication
Framework
~0.1
microsecond
Network
~1
microsecond
Communication
Framework
Future
~0.05
microsecond
Co-Design
Network
© 2015 Mellanox Technologies 6
Co-Design: Offloaded Technologies Target Application Characteristics
Programmability
RDMA GPUDirect Virtualization
Backward and Future Compatibility
Direct Communication
Applications (Innovations, Scalability, Performance)
Software-Defined
Network (SDN)
Co-Design Requires Intelligent Interconnect
Offloaded Technologies: Intelligent Interconnect
© 2015 Mellanox Technologies 7
The Road to Exascale – Co-Design System Architecture
Co-Design
Co-Design
Co-Design
Co-Design
CPU GPU
HCA
Switch
FPGA
In-CPU
Computing
In-GPU
Computing
In-FPGA
Computing
In-Network
Computing
In-Network
Computing
© 2015 Mellanox Technologies 8
Introducing Switch-IB 2 World’s First Smart Switch
© 2015 Mellanox Technologies 9
Introducing Switch-IB 2 World’s First Smart Switch
§ The world fastest switch with <90 nanosecond latency
§ 36-ports, 100Gb/s per port, 7.2Tb/s throughput, 7.02 Billion messages/sec
§ Adaptive Routing, Congestion control, support for multiple topologies
World’s First Smart Switch
Build for Scalable Compute and Storage Infrastructures
10X Higher Performance with The New Switch SHArP Technology
© 2015 Mellanox Technologies 10
SHArP (Scalable Hierarchical Aggregation Protocol) Technology
Delivering 10X Performance Improvement
for MPI and SHMEM/PAGS Applications
Switch-IB 2 Enables the Switch Network to
Operate as a Co-Processor
SHArP Enables Switch-IB 2 to Manage and
Execute MPI Operations in the Network
© 2015 Mellanox Technologies 11
Scalable Hierarchical Aggregation Protocol
§ Reliable Scalable General Purpose Primitive, Applicable to Multiple Use-cases
•  In-network Tree based aggregation mechanism
•  Large number of groups
•  Multiple simultaneous outstanding operations
Accelerating HPC applications
§ Scalable High Performance Collective Offload
•  Barrier, Reduce, All-Reduce, Broadcast
•  Sum, Min, Max, Min-loc, max-loc, OR, XOR, AND
•  Integer and Floating-Point, 32 / 64 bit
§ Significantly reduce MPI collective runtime
§ Increase CPU availability and efficiency
§ Enable communication and computation overlap
Accelerating MapReduce Applications
§ Prevent the Incast Traffic Pattern
© 2015 Mellanox Technologies 12
SHArP Performance Advantage – MiniFE Details
§  MiniFE is a Finite Element mini-application
•  Implements kernels that represent implicit finite-element applications
10X to 25X Performance Improvement
AllRedcue MPI Collective
Number
of Nodes
CPU-Based
Latency (usec)
SHArP
Latency (usec)
Ratio
32 41.7 4.24 9.9
64 49.08 4.63 10.6
128 57.67 4.76 12.1
256 67.76 4.87 13.9
512 79.62 5.09 15.6
1024 93.55 5.58 16.8
2048 109.92 5.63 19.5
4096 129.16 5.73 22.5
8192 151.76 5.94 25.5
© 2015 Mellanox Technologies 13
SHArP Performance– First Results (Partial Implementation)
3.5X Performance Improvement on 64 Nodes
© 2015 Mellanox Technologies 14
The Intelligence is Moving to the Interconnect
Communication Frameworks (MPI, SHMEM/PGAS)
The Only Approach to Deliver 10X Performance Improvements
Applications Transport
RDMA
SR-IOV
Collectives
Peer-Direct
GPUDirect
More…
MPI / SHMEM Offloads
Q1’16
Q3’16
© 2015 Mellanox Technologies 15
Introducing ConnectX-4 Lx Programmable Adapter
Scalable, Efficient, High-Performance and Flexible Solution
Security
Cloud/Virtualization
Storage
High Performance Computing
Precision Time Synchronization
Networking + FPGA
Mellanox Acceleration Engines
and FGPA Programmability
On One Adapter
© 2015 Mellanox Technologies 16
InfiniBand Router – In Progress
§ Isolation between InfiniBand subnets
§ Simple connectivity between different topologies
•  Enable sharing a common storage network by multiple disconnected subnets
§ Support 2^128 nodes (unlimited system size)
SB7780
© 2015 Mellanox Technologies 17
§ Router implements GID to LID mapping
§ SM allocates Alias GID to HCA
§ Address resolution
•  IP based applications
-  Name to IP (standard), IP to GID using new API
•  Pure IB applications
-  Upon LID assignment change, GID DNS is updated
InfiniBand Router Details
IB	subnet
IB	subnetIB	subnet
GID	DNS
RMA	1
RPA
RPA	 RPA	
RTM
HCA
GID	DNA	
Agent
SM
SRPM	
SRTM
HCA
GID	DNA	
Agent
SM
SRPM	
SRTM
HCA
GID	DNA	
Agent
SM
SRPM	
SRTM
RTM: Routing Table Manager
SRTM: Subnet Routing Table Manager
RPA: Router Port Agent
SRPM: Subnet Router Port Manager
GID DNS: IP to GID resolution
© 2015 Mellanox Technologies 18
Multi-Host Socket Direct – Low Latency Socket Communication
§ Each CPU with direct network access
§  QPI avoidance for I/O – improve performance
§  Enables GPU / peer direct on both sockets
§ Solution is transparent to software
CPU CPUCPU CPU
QPI
Multi-Host Socket Direct Performance
50% Lower CPU Utilization
20% lower Latency
Multi Host Evaluation Kit
Lower Application Latency, Free-up CPU
© 2015 Mellanox Technologies 19
Switch LatencyMessage Rate
Mellanox InfiniBand Leadership Over Future Competition
20%
Lower
44%
Higher
Power Consumption
Per Switch Port
Scalability
CPU efficiency
25%
Lower
2X
Higher
100
Gb/s
Link Speed
200
Gb/s
Link Speed
2014
Gain Competitive Advantage Today
Protect Your Future
2017
Smart Network For Smart Systems
RDMA, Acceleration Engines, Programmability
Higher Performance
Unlimited Scalability
Higher Resiliency
Proven!
© 2015 Mellanox Technologies 20
Technology Roadmap – One-Generation Lead over the Competition
2000 202020102005
20G 40G 56G 100G
“Roadrunner”
Mellanox Connected
1st3rd
TOP500 2003
Virginia Tech (Apple)
2015
200G
Terascale Petascale Exascale
Mellanox 400G
Thank You

More Related Content

What's hot (20)

PDF
LF_DPDK17_Serverless DPDK - How SmartNIC resident DPDK Accelerates Packet Pro...
LF_DPDK
 
PDF
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
 
PDF
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
PDF
Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...
inside-BigData.com
 
PDF
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Ganesan Narayanasamy
 
PPTX
Using SmartNICs to Provide Better Data Center Security - Jack Matheson - 44CO...
44CON
 
PDF
InfiniBand In-Network Computing Technology and Roadmap
inside-BigData.com
 
PDF
Apache Pulsar @Splunk
Karthik Ramasamy
 
PDF
The HPE Machine and Gen-Z - BUD17-503
Linaro
 
PDF
Programming Models for Exascale Systems
inside-BigData.com
 
PPT
OpenPOWER Webinar
Ganesan Narayanasamy
 
PDF
DDN: Protecting Your Data, Protecting Your Hardware
inside-BigData.com
 
PDF
Overview of the MVAPICH Project and Future Roadmap
inside-BigData.com
 
PDF
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
inside-BigData.com
 
PDF
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Linaro
 
PDF
Challenges and Opportunities for HPC Interconnects and MPI
inside-BigData.com
 
PPSX
Development, test, and characterization of MEC platforms with Teranium and Dr...
Michelle Holley
 
PDF
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
inside-BigData.com
 
PDF
High Performance Interconnects: Landscape, Assessments & Rankings
inside-BigData.com
 
PPT
State Of FPGA: Current & Future - A Panel discussion @ 4th FPGA Camp
FPGA Central
 
LF_DPDK17_Serverless DPDK - How SmartNIC resident DPDK Accelerates Packet Pro...
LF_DPDK
 
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Building Efficient HPC Clouds with MCAPICH2 and RDMA-Hadoop over SR-IOV Infin...
inside-BigData.com
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Ganesan Narayanasamy
 
Using SmartNICs to Provide Better Data Center Security - Jack Matheson - 44CO...
44CON
 
InfiniBand In-Network Computing Technology and Roadmap
inside-BigData.com
 
Apache Pulsar @Splunk
Karthik Ramasamy
 
The HPE Machine and Gen-Z - BUD17-503
Linaro
 
Programming Models for Exascale Systems
inside-BigData.com
 
OpenPOWER Webinar
Ganesan Narayanasamy
 
DDN: Protecting Your Data, Protecting Your Hardware
inside-BigData.com
 
Overview of the MVAPICH Project and Future Roadmap
inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
inside-BigData.com
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Linaro
 
Challenges and Opportunities for HPC Interconnects and MPI
inside-BigData.com
 
Development, test, and characterization of MEC platforms with Teranium and Dr...
Michelle Holley
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
inside-BigData.com
 
High Performance Interconnects: Landscape, Assessments & Rankings
inside-BigData.com
 
State Of FPGA: Current & Future - A Panel discussion @ 4th FPGA Camp
FPGA Central
 

Viewers also liked (7)

DOCX
Mercedes gomez tawe
joseluismon2
 
DOCX
Diario cantero
joseluismon2
 
DOCX
Video de mi grupo: Yo,Lydia, Marta
joseluismon2
 
DOCX
RÚBRIICA ORAL; Ismael
joseluismon2
 
PDF
Shakespeare 2015 16
Mariacarla De Giorgi McDegiorgi
 
PDF
Informe de la valoracion de Estudios Academicos
ConcejoMunicipalBetulia
 
Mercedes gomez tawe
joseluismon2
 
Diario cantero
joseluismon2
 
Video de mi grupo: Yo,Lydia, Marta
joseluismon2
 
RÚBRIICA ORAL; Ismael
joseluismon2
 
Shakespeare 2015 16
Mariacarla De Giorgi McDegiorgi
 
Informe de la valoracion de Estudios Academicos
ConcejoMunicipalBetulia
 
Ad

Similar to Interconnect your future (20)

PDF
Mellanox Announcements at SC15
inside-BigData.com
 
PDF
Co-Design Architecture for Exascale
inside-BigData.com
 
PDF
Interconnect Your Future With Mellanox
Mellanox Technologies
 
PPTX
Mellnox Interconnect presentation in OpenPOWER Brazil workshop
Ganesan Narayanasamy
 
PDF
Advancing Applications Performance With InfiniBand
Mellanox Technologies
 
PDF
InfiniBand In-Network Computing Technology and Roadmap
inside-BigData.com
 
PPTX
InfiniBand Strengthens Leadership as the Interconnect Of Choice
Mellanox Technologies
 
PDF
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Linaro
 
PDF
Mellanox IBM
IBM Danmark
 
PPTX
InfiniBand Growth Trends - TOP500 (July 2015)
Mellanox Technologies
 
PDF
Mellanox OpenPOWER features
Ganesan Narayanasamy
 
PDF
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
inside-BigData.com
 
PDF
Announcing the Mellanox ConnectX-5 100G InfiniBand Adapter
inside-BigData.com
 
PDF
Interconnect Your Future: Paving the Road to Exascale
inside-BigData.com
 
PPTX
Mellanox 2013 Analyst Day
Mellanox Technologies
 
PDF
Mellanox hpc day 2011 kiev
Volodymyr Saviak
 
PDF
Deploying HPC Cluster with Mellanox InfiniBand Interconnect Solutions
Mellanox Technologies
 
PPTX
Interconnect Your Future
Mellanox Technologies
 
PPTX
Interconnect Your Future with Connect-IB
Mellanox Technologies
 
PPTX
Mellanox Approach to NFV & SDN
Mellanox Technologies
 
Mellanox Announcements at SC15
inside-BigData.com
 
Co-Design Architecture for Exascale
inside-BigData.com
 
Interconnect Your Future With Mellanox
Mellanox Technologies
 
Mellnox Interconnect presentation in OpenPOWER Brazil workshop
Ganesan Narayanasamy
 
Advancing Applications Performance With InfiniBand
Mellanox Technologies
 
InfiniBand In-Network Computing Technology and Roadmap
inside-BigData.com
 
InfiniBand Strengthens Leadership as the Interconnect Of Choice
Mellanox Technologies
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Linaro
 
Mellanox IBM
IBM Danmark
 
InfiniBand Growth Trends - TOP500 (July 2015)
Mellanox Technologies
 
Mellanox OpenPOWER features
Ganesan Narayanasamy
 
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
inside-BigData.com
 
Announcing the Mellanox ConnectX-5 100G InfiniBand Adapter
inside-BigData.com
 
Interconnect Your Future: Paving the Road to Exascale
inside-BigData.com
 
Mellanox 2013 Analyst Day
Mellanox Technologies
 
Mellanox hpc day 2011 kiev
Volodymyr Saviak
 
Deploying HPC Cluster with Mellanox InfiniBand Interconnect Solutions
Mellanox Technologies
 
Interconnect Your Future
Mellanox Technologies
 
Interconnect Your Future with Connect-IB
Mellanox Technologies
 
Mellanox Approach to NFV & SDN
Mellanox Technologies
 
Ad

More from inside-BigData.com (20)

PDF
Major Market Shifts in IT
inside-BigData.com
 
PDF
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
PPTX
Transforming Private 5G Networks
inside-BigData.com
 
PDF
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
inside-BigData.com
 
PDF
HPC Impact: EDA Telemetry Neural Networks
inside-BigData.com
 
PDF
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
PDF
Machine Learning for Weather Forecasts
inside-BigData.com
 
PPTX
HPC AI Advisory Council Update
inside-BigData.com
 
PDF
Fugaku Supercomputer joins fight against COVID-19
inside-BigData.com
 
PDF
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
PDF
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
inside-BigData.com
 
PDF
Scaling TCO in a Post Moore's Era
inside-BigData.com
 
PDF
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 
PDF
Overview of HPC Interconnects
inside-BigData.com
 
PDF
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
inside-BigData.com
 
PDF
Data Parallel Deep Learning
inside-BigData.com
 
PDF
Making Supernovae with Jets
inside-BigData.com
 
PDF
Adaptive Linear Solvers and Eigensolvers
inside-BigData.com
 
PDF
Scientific Applications and Heterogeneous Architectures
inside-BigData.com
 
PDF
SW/HW co-design for near-term quantum computing
inside-BigData.com
 
Major Market Shifts in IT
inside-BigData.com
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
inside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
inside-BigData.com
 
HPC AI Advisory Council Update
inside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
inside-BigData.com
 
Scaling TCO in a Post Moore's Era
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 
Overview of HPC Interconnects
inside-BigData.com
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
inside-BigData.com
 
Data Parallel Deep Learning
inside-BigData.com
 
Making Supernovae with Jets
inside-BigData.com
 
Adaptive Linear Solvers and Eigensolvers
inside-BigData.com
 
Scientific Applications and Heterogeneous Architectures
inside-BigData.com
 
SW/HW co-design for near-term quantum computing
inside-BigData.com
 

Recently uploaded (20)

PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Biography of Daniel Podor.pdf
Daniel Podor
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 

Interconnect your future

  • 1. Rich Graham February 2016, HPCAC Stanford Conference Interconnect Your Future
  • 2. © 2015 Mellanox Technologies 2 The Ever Growing Demand for Higher Performance 2000 202020102005 “Roadrunner” 1st 2015 Terascale Petascale Exascale Single-Core to Many-CoreSMP to Clusters Performance Development Co-Design HW SW APP Hardware Software Application The Interconnect is the Enabling Technology
  • 3. © 2015 Mellanox Technologies 3 Co-Design Architecture to Enable Exascale Performance CPU-Centric Co-Design Limited to Main CPU Usage Results in Performance Limitation Creating Synergies Enables Higher Performance and Scale Software Software In-CPU Computing In-Network Computing In-Storage Computing
  • 4. © 2015 Mellanox Technologies 4 The Intelligence is Moving to the Interconnect CPU Interconnect Past Future
  • 5. © 2015 Mellanox Technologies 5 Breaking the Application Latency Wall § Today: Network device latencies are on the order of 100 nanoseconds § Challenge: Enabling the next order of magnitude improvement in application performance § Solution: Creating synergies between software and hardware – intelligent interconnect Intelligent Interconnect Paves the Road to Exascale Performance 10 years ago ~10 microsecond ~100 microsecond NetworkCommunication Framework Today ~10 microsecond Communication Framework ~0.1 microsecond Network ~1 microsecond Communication Framework Future ~0.05 microsecond Co-Design Network
  • 6. © 2015 Mellanox Technologies 6 Co-Design: Offloaded Technologies Target Application Characteristics Programmability RDMA GPUDirect Virtualization Backward and Future Compatibility Direct Communication Applications (Innovations, Scalability, Performance) Software-Defined Network (SDN) Co-Design Requires Intelligent Interconnect Offloaded Technologies: Intelligent Interconnect
  • 7. © 2015 Mellanox Technologies 7 The Road to Exascale – Co-Design System Architecture Co-Design Co-Design Co-Design Co-Design CPU GPU HCA Switch FPGA In-CPU Computing In-GPU Computing In-FPGA Computing In-Network Computing In-Network Computing
  • 8. © 2015 Mellanox Technologies 8 Introducing Switch-IB 2 World’s First Smart Switch
  • 9. © 2015 Mellanox Technologies 9 Introducing Switch-IB 2 World’s First Smart Switch § The world fastest switch with <90 nanosecond latency § 36-ports, 100Gb/s per port, 7.2Tb/s throughput, 7.02 Billion messages/sec § Adaptive Routing, Congestion control, support for multiple topologies World’s First Smart Switch Build for Scalable Compute and Storage Infrastructures 10X Higher Performance with The New Switch SHArP Technology
  • 10. © 2015 Mellanox Technologies 10 SHArP (Scalable Hierarchical Aggregation Protocol) Technology Delivering 10X Performance Improvement for MPI and SHMEM/PAGS Applications Switch-IB 2 Enables the Switch Network to Operate as a Co-Processor SHArP Enables Switch-IB 2 to Manage and Execute MPI Operations in the Network
  • 11. © 2015 Mellanox Technologies 11 Scalable Hierarchical Aggregation Protocol § Reliable Scalable General Purpose Primitive, Applicable to Multiple Use-cases •  In-network Tree based aggregation mechanism •  Large number of groups •  Multiple simultaneous outstanding operations Accelerating HPC applications § Scalable High Performance Collective Offload •  Barrier, Reduce, All-Reduce, Broadcast •  Sum, Min, Max, Min-loc, max-loc, OR, XOR, AND •  Integer and Floating-Point, 32 / 64 bit § Significantly reduce MPI collective runtime § Increase CPU availability and efficiency § Enable communication and computation overlap Accelerating MapReduce Applications § Prevent the Incast Traffic Pattern
  • 12. © 2015 Mellanox Technologies 12 SHArP Performance Advantage – MiniFE Details §  MiniFE is a Finite Element mini-application •  Implements kernels that represent implicit finite-element applications 10X to 25X Performance Improvement AllRedcue MPI Collective Number of Nodes CPU-Based Latency (usec) SHArP Latency (usec) Ratio 32 41.7 4.24 9.9 64 49.08 4.63 10.6 128 57.67 4.76 12.1 256 67.76 4.87 13.9 512 79.62 5.09 15.6 1024 93.55 5.58 16.8 2048 109.92 5.63 19.5 4096 129.16 5.73 22.5 8192 151.76 5.94 25.5
  • 13. © 2015 Mellanox Technologies 13 SHArP Performance– First Results (Partial Implementation) 3.5X Performance Improvement on 64 Nodes
  • 14. © 2015 Mellanox Technologies 14 The Intelligence is Moving to the Interconnect Communication Frameworks (MPI, SHMEM/PGAS) The Only Approach to Deliver 10X Performance Improvements Applications Transport RDMA SR-IOV Collectives Peer-Direct GPUDirect More… MPI / SHMEM Offloads Q1’16 Q3’16
  • 15. © 2015 Mellanox Technologies 15 Introducing ConnectX-4 Lx Programmable Adapter Scalable, Efficient, High-Performance and Flexible Solution Security Cloud/Virtualization Storage High Performance Computing Precision Time Synchronization Networking + FPGA Mellanox Acceleration Engines and FGPA Programmability On One Adapter
  • 16. © 2015 Mellanox Technologies 16 InfiniBand Router – In Progress § Isolation between InfiniBand subnets § Simple connectivity between different topologies •  Enable sharing a common storage network by multiple disconnected subnets § Support 2^128 nodes (unlimited system size) SB7780
  • 17. © 2015 Mellanox Technologies 17 § Router implements GID to LID mapping § SM allocates Alias GID to HCA § Address resolution •  IP based applications -  Name to IP (standard), IP to GID using new API •  Pure IB applications -  Upon LID assignment change, GID DNS is updated InfiniBand Router Details IB subnet IB subnetIB subnet GID DNS RMA 1 RPA RPA RPA RTM HCA GID DNA Agent SM SRPM SRTM HCA GID DNA Agent SM SRPM SRTM HCA GID DNA Agent SM SRPM SRTM RTM: Routing Table Manager SRTM: Subnet Routing Table Manager RPA: Router Port Agent SRPM: Subnet Router Port Manager GID DNS: IP to GID resolution
  • 18. © 2015 Mellanox Technologies 18 Multi-Host Socket Direct – Low Latency Socket Communication § Each CPU with direct network access §  QPI avoidance for I/O – improve performance §  Enables GPU / peer direct on both sockets § Solution is transparent to software CPU CPUCPU CPU QPI Multi-Host Socket Direct Performance 50% Lower CPU Utilization 20% lower Latency Multi Host Evaluation Kit Lower Application Latency, Free-up CPU
  • 19. © 2015 Mellanox Technologies 19 Switch LatencyMessage Rate Mellanox InfiniBand Leadership Over Future Competition 20% Lower 44% Higher Power Consumption Per Switch Port Scalability CPU efficiency 25% Lower 2X Higher 100 Gb/s Link Speed 200 Gb/s Link Speed 2014 Gain Competitive Advantage Today Protect Your Future 2017 Smart Network For Smart Systems RDMA, Acceleration Engines, Programmability Higher Performance Unlimited Scalability Higher Resiliency Proven!
  • 20. © 2015 Mellanox Technologies 20 Technology Roadmap – One-Generation Lead over the Competition 2000 202020102005 20G 40G 56G 100G “Roadrunner” Mellanox Connected 1st3rd TOP500 2003 Virginia Tech (Apple) 2015 200G Terascale Petascale Exascale Mellanox 400G