Panasas High Performance Storage
Powers the First Petaflop Supercomputer
at Los Alamos National Laboratory
June 2010
Customer Success Story
Los Alamos National Laboratory
1.888.PANASAS www.panasas.com
Los Alamos National Laboratory
Highlights
First Petaflop Supercomputer
• #1 on the Top-500 list in 2009
• Over 3,250 Compute Nodes
• Over 156 I/O Nodes
• Over 12,000 Core Processors
• Hundreds of Thousands of Cell Processors
Panasas High Performance Storage Solutions
• 100 Panasas Storage Shelves
• 2 Petabytes Capacity
• 55 GB/s Throughput
• Throughput Scales Linearly with Capacity
• Non-Stop Availability & Simple to Deploy
Abstract
Scientists want faster, more powerful high-performance
supercomputers to simulate complex physical,
biological, and socioeconomic systems with greater
realism and predictive power. In May 2009, Los
Alamos scientists doubled the processing speed of the
previously fastest computer. Roadrunner, a new hybrid
supercomputer, uses specialized Cell coprocessors to
propel performance to petaflop speeds capable of more
than a thousand trillion calculations per second.
One of the keys to the project’s success was the
need for a highly reliable storage subsystem that could
provide massively parallel I/O throughput with linear
scalability that was simple to deploy and maintain. Los
Alamos National Laboratory deployed Panasas High
Performance storage to meet the stringent needs of
the Roadrunner project. Panasas provides scalable
performance with commodity parts providing excellent
price/performance, scalable capacity and performance
that scale symmetrically with processor, caching, and
network bandwidth.
Introduction
From its origins as a
secret Manhattan Project
laboratory, Los Alamos
National Laboratory (LANL)
has attracted world-class
scientists to solve the
nation’s most challenging
problems. As one of
the U.S. Department of
Energy’s multi-program,
multi-disciplinary research
laboratories, Los Alamos
thrives on having the best
people doing the best
science to solve problems
of global importance.
The Laboratory’s central
focus is to foster science,
encourage innovation, and
recruit and retain top talent
while balancing short-term
and long-term needs with
research driven by principal
investigators and aligned with missions.
In 2002, when Los Alamos scientists were planning
for their next-generation supercomputer, they looked
at the commodity market for a way to make an end run
around the speed and memory barriers looming in the
future. What they found was a joint project by Sony
Computer Entertainment, Toshiba, and IBM to develop
a specialized microprocessor that could revolutionize
computer games and consumer electronics, as well as
scientific computing.
The major application areas addressed were radiation
transport (how radiation deposits energy in and moves
through matter), neutron transport (how neutrons move
through matter), molecular dynamics (how matter
responds at the molecular level to shock waves and
other extreme conditions), fluid turbulence, and the
behavior of plasmas (ionized gases) in relation to fusion
experiments at the National Ignition Facility at Lawrence
Livermore National Laboratory.
2
1.888.PANASAS www.panasas.com
Los Alamos National Laboratory
Roadrunner Architecture
Roadrunner is a cluster of approximately 3,250 compute
nodes interconnected by an off-the-shelf parallel-
computing network. Each compute node consists of two
AMD Opteron dual-core microprocessors, with each
of the Opteron cores internally attached to one of four
enhanced Cell microprocessors. This enhanced Cell
does double-precision arithmetic faster and can access
more memory than can the original Cell in a PlayStation
3. The entire machine will have almost 13,000 Cells and
half as many dual-core Opterons.
Unique I/O Challenges
LANL used a breakthrough architecture designed
to achieve high performance. The cluster could run
a large number of jobs. In order to make the best
use of the cluster, there were some I/O challenges
and considerations that had to be addressed,
including achieving:
• Performance required to serve and write data for
the cluster to keep it busy
• Parallel I/O for optimized performance for each
node
• Scalability needed to support a large number of
cluster nodes
• Reliability needed to keep the cluster running
• A reasonable cost, both in terms of acquisition and
management
• A storage architecture that could support future
generations of clusters
Storage Requirements
LANL wanted a shared storage architecture where all
their compute clusters on a network could have access
to shared Panasas storage. This is in contrast to many
other HPC sites that bind a storage cluster tightly to a
compute cluster. LANL has been using Panasas for all
their high performance storage needs for several years
and so naturally, when they deployed RoadRunner,
Panasas storage was the logical choice to satisfy their
demanding performance, availability and scalability
requirements.
In order to achieve the performance goals of
RoadRunner, the network storage system would have
to deliver superior bandwidth, lower latency and be
superior in terms of the file creation rate per second
and the aggregate throughput. Parallel I/O would be
important, as this would enable parallel data streams
to go to the 156 I/O nodes, which in turn provide I/O
service to the compute nodes. In addition, the storage
system would have to be able to scale to support
storage capacities of 10 Petabyes (PB) and beyond,
plus a growing number of nodes in the cluster. This
eliminated the possibility of implementing NFS-based
storage systems, as they would not be able to scale
past a certain number of nodes.
Availability was another key consideration. Because
RoadRunner is such a high demand resource, downtime
for maintenance, expansion and repair is extremely
scarce, so the storage system would need to support
automatic provisioning for easy growth
LANL designed the cluster architecture to be simple,
easy to manage, and cost effective. One aspect of
simplicity and cost was to use I/O nodes that interface
with network attached storage, lowering cost by
reducing the number of GbE connections from a few
3
Source: Los Alamos National Laboratory
1.888.PANASAS www.panasas.com
Los Alamos National Laboratory
thousand to a few hundred. The storage system used,
likewise, would have to be easy to manage, provide
a reasonable total cost of ownership, and fit into the
established architecture.
Last but not least, the storage system architecture
needed to have “headroom” to provide I/O to larger
future cluster configurations. Instead of being able
to support just a single large cluster, the storage
architecture needs be able to scale to support multiple
clusters from a single, central storage pool.
Deployment of Storage from Panasas
Panasas ActiveStor™ high performance storage met
all the criteria dictated by LANL’s computer cluster
architecture. Panasas utilizes a parallel file system and
provides GbE or InfiniBand connectivity between some
of the cluster nodes and storage. In fact, the Panasas
storage system itself is a cluster. It uses an object-
based file system that is an integral part of the PanFSTM
Storage Operating System.
PanFS divides files into large virtual data objects.
These objects can be stored on Panasas storage blades
or units of storage, enabling dynamic distribution of data
activity throughout the storage system.
Parallel data paths between compute clusters and
the storage blade modules result in high performance
data access to large files. The result is that Panasas
Scale-out NAS delivers performance that scales almost
linearly with capacity. In fact, the current implementation
supports more than 2 PB capacity while delivering a
massive 55 GB per second throughput.
Parallel access is made possible by empowering each
of the LANL cluster I/O nodes with a small installable
file system from Panasas—the DirectFlow®
client access
software. This enables direct communication and data
transfer between the I/O nodes and the storage blades.
A simple three-step process is required to initiate direct
data transfers:
1. Requests for I/O are made to a Panasas director
blade, which controls access to data.
2. The director blade authenticates the requests, obtains
the object maps of all applicable objects across the
storage blade and sends the maps to the I/O nodes.
3. With authentication and virtual maps, I/O nodes
access data on storage blade modules directly and in
parallel.
This concurrency eliminates the bottleneck of traditional,
monolithic storage systems, which manage data in small
blocks, and delivers record-setting data throughput. The
number of data streams is limited only by the number
of storage blades and the number of I/O nodes in the
server cluster.
Performance is also a key factor in evaluating the cost
effectiveness of storage for large, expensive clusters.
It is important to keep a powerful cluster busy doing
computations and processing jobs rather than waiting
for I/O operations to complete. If a cluster costs $3.5M
and is amortized over 3 years, the cost is approximately
$3200 per day. As such, it makes sense to keep the
cluster utilized and completing jobs as fast as possible.
In order to do this, outages have to be minimized and
the cluster must be kept up and running. Therefore,
system availability is another key factor.
4
ActiveStor SystemClients
Linux
Unix
Windows
NFS
DirectFlow data
DirectFlow metadata
DirectFlow data
CIFS
PANASAS Multi-Protocol Access
Director Blade(s)
(in-band)
Storage Blades
Director Blade(s)
(in-band)
Director Blade(s)
(out-of-band)
070120111030
Los Alamos National Laboratory
| Phone: 1.888.PANASAS | www.panasas.com
© 2011 Panasas Incorporated. All rights reserved. Panasas is a trademark of Panasas, Inc. in the United States and other countries.
The Panasas ActiveStor system provided the availability
that LANL was looking for. In terms of simplicity
of administration, the Panasas architecture allows
management of all data within a single seamless
namespace. There is no NFS root as NFS is replaced
by a global file system that is scaleable. Data objects
can be dynamically rebalanced across storage blades
for continual ongoing performance optimization.
Furthermore, the object-based architecture enables
faster data reconstruction in the event of a drive failure
because storage blade modules have the
intelligence to reconstruct data objects
only, not unused sectors on a drive.
Finally, the Panasas storage architecture
is capable of supporting future
generations of more complex cluster
configurations, including the scalability
to support multiple clusters from one
central storage pool. Instead of using
one big, expensive GbE switch through
one subnet, Panasas storage can be
configured across many subnets through
smaller, less expensive network switches
that connect to the I/O nodes. This
improves reliability by providing even more paths to
serve data to the compute cluster. Furthermore, by
having a centralized pool of high-performance storage,
there is no need to copy data for different kinds of jobs.
After the computation jobs, visualization tasks can take
place with a “compute in place” approach rather than
copying the data to another storage system.
Summary
The Roadrunner project has proven to be a tremendous
asset to the Laboratory’s nuclear weapons program
simulations as well as for other scientific endeavors
like cosmology, antibiotic drug design, HIV vaccine
development, astrophysics, ocean or climate modeling,
turbulence, and many others.
The Panasas architecture is designed specifically to
support Linux clusters, scaling performance in concert
with capacity. Panasas ActiveStor scale-out NAS is
capable of meeting the needs of the world’s leading
high performance computing clusters, both now and for
future generations of cluster technology.
5
Rather than having a cluster node
failure at least once a week, as a
comparable system with local disks
would experience, the time between
node failures was increased to once
every 7 weeks.
Ron Minnich,
Leader of cluster
research team, LANL

More Related Content

PDF
Panasas ® University of Cologne Success Story
PDF
spectrum Storage Whitepaper
DOCX
OPTIMIZING END-TO-END BIG DATA TRANSFERS OVER TERABITS NETWORK INFRASTRUCTURE
PDF
Rutherford Appleton Laboratory uses Panasas ActiveStor to accelerate global c...
PDF
Sep 2012 HUG: Giraffa File System to Grow Hadoop Bigger
PPTX
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
PPTX
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
PDF
Method and apparatus for transporting parcels of data using network elements ...
Panasas ® University of Cologne Success Story
spectrum Storage Whitepaper
OPTIMIZING END-TO-END BIG DATA TRANSFERS OVER TERABITS NETWORK INFRASTRUCTURE
Rutherford Appleton Laboratory uses Panasas ActiveStor to accelerate global c...
Sep 2012 HUG: Giraffa File System to Grow Hadoop Bigger
Designing Convergent HPC and Big Data Software Stacks: An Overview of the HiB...
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Method and apparatus for transporting parcels of data using network elements ...

What's hot (20)

PDF
Dynamic Namespace Partitioning with Giraffa File System
DOCX
Mobile data gathering with load balanced
PDF
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
PPT
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
DOC
DOCX
Load rebalancing for distributed file systems in clouds
PDF
Data management for Quantitative Biology -Basics and challenges in biomedical...
PDF
PDF
A Survey on Different File Handling Mechanisms in HDFS
PPTX
Sector Vs Hadoop
PDF
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
PPT
hadoop
PPTX
Big data- HDFS(2nd presentation)
PDF
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
PDF
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER WR...
PPTX
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
PPTX
Hadoop storage
PDF
getFamiliarWithHadoop
PPT
Hadoop training in bangalore
PDF
RaptorX: Building a 10X Faster Presto with hierarchical cache
Dynamic Namespace Partitioning with Giraffa File System
Mobile data gathering with load balanced
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Load rebalancing for distributed file systems in clouds
Data management for Quantitative Biology -Basics and challenges in biomedical...
A Survey on Different File Handling Mechanisms in HDFS
Sector Vs Hadoop
Hadoop World 2011: Next Generation Apache Hadoop MapReduce - Mohadev Konar, H...
hadoop
Big data- HDFS(2nd presentation)
WANdisco Non-Stop Hadoop: PHXDataConference Presentation Oct 2014
AN ENERGY EFFICIENT L2 CACHE ARCHITECTURE USING WAY TAG INFORMATION UNDER WR...
HPC-ABDS High Performance Computing Enhanced Apache Big Data Stack (with a ...
Hadoop storage
getFamiliarWithHadoop
Hadoop training in bangalore
RaptorX: Building a 10X Faster Presto with hierarchical cache
Ad

Viewers also liked (19)

PDF
KAMEHAMEHA I --- The Passenger and Cargo Interisland Ferry for Hawai`i
PDF
Maui GMO initiative - US DISTRICT COURT - HAWAII
PDF
2011 3 series coupe
PDF
PDF
Martime Shiipping for Hawaii - Revelations - Seeking the Truth
PDF
PUBLIC PRIVATE PARTNERSHIPS - KITV4 - HAWAI`I GUBERNATORIAL DEBATE
PDF
2012 BMW 1 Series Convertible For Sale NJ | BMW Dealer In Eatontown
PDF
MAUNA KEA (TMT) – HALEAKALA (DKIST) – PRESIDENT BARACK OBAMA -- HAWAII LEGISL...
PPT
Colonial Surface Solutions, Inc.
PDF
MAUNA KEA - Thirty Meter Telescope (TMT) - Hear Our Voices!
PPTX
Van Wert County Truck
PDF
WATER - FIDUCIARY DUTIES OF THE STATE OF HAWAI`I COMMISSION ON WATER RESOURC...
PDF
MAUI - Today, Tomorrow and The Future
PDF
Alexander and Baldwin - Shutdown of Sugar Operations on Maui - State Water Code
PDF
2011 BMW 128i Circle BMW NJ
PDF
2011 BMW 328i Circle BMW NJ
PDF
2010 BMW M6 Coupe Circle BMW NJ
PDF
Clifton M. Hasegawa & Associates, LLC
PDF
CHIEF LOUIS KEALOHA - HONOLULU POLICE DEPARTMENT --- ACTION ITEM FOR HONOLULU...
KAMEHAMEHA I --- The Passenger and Cargo Interisland Ferry for Hawai`i
Maui GMO initiative - US DISTRICT COURT - HAWAII
2011 3 series coupe
Martime Shiipping for Hawaii - Revelations - Seeking the Truth
PUBLIC PRIVATE PARTNERSHIPS - KITV4 - HAWAI`I GUBERNATORIAL DEBATE
2012 BMW 1 Series Convertible For Sale NJ | BMW Dealer In Eatontown
MAUNA KEA (TMT) – HALEAKALA (DKIST) – PRESIDENT BARACK OBAMA -- HAWAII LEGISL...
Colonial Surface Solutions, Inc.
MAUNA KEA - Thirty Meter Telescope (TMT) - Hear Our Voices!
Van Wert County Truck
WATER - FIDUCIARY DUTIES OF THE STATE OF HAWAI`I COMMISSION ON WATER RESOURC...
MAUI - Today, Tomorrow and The Future
Alexander and Baldwin - Shutdown of Sugar Operations on Maui - State Water Code
2011 BMW 128i Circle BMW NJ
2011 BMW 328i Circle BMW NJ
2010 BMW M6 Coupe Circle BMW NJ
Clifton M. Hasegawa & Associates, LLC
CHIEF LOUIS KEALOHA - HONOLULU POLICE DEPARTMENT --- ACTION ITEM FOR HONOLULU...
Ad

Similar to Panasas ® Los Alamos National Laboratory (20)

PDF
Accelerate Oil & Gas Discovery
PDF
Building modern data lakes
PDF
Kafka & Hadoop in Rakuten
PDF
Positioning IBM Flex System 16 Gb Fibre Channel Fabric for Storage-Intensive ...
PDF
International Journal of Engineering Research and Development (IJERD)
PPT
040419 san forum
PDF
Benchmark: Bananas vs Spark Streaming
PPT
Waters Grid & HPC Course
PDF
Virtual SAN - A Deep Dive into Converged Storage (technical whitepaper)
PDF
Acunu Whitepaper v1
PDF
HCSA-Presales-Storage V4.0 Training Material (2).pdf
PDF
Technical Report NetApp Clustered Data ONTAP 8.2: An Introduction
PDF
Ramcloud
PDF
Virtual SAN- Deep Dive Into Converged Storage
DOCX
C8-1 CASE STUDY 8 CARLSON COMPANIES STORAGE SOLUT.docx
DOCX
C8-1 CASE STUDY 8 CARLSON COMPANIES STORAGE SOLUT.docx
DOCX
Super-Computer Architecture
PDF
MayaData Datastax webinar - Operating Cassandra on Kubernetes with the help ...
Accelerate Oil & Gas Discovery
Building modern data lakes
Kafka & Hadoop in Rakuten
Positioning IBM Flex System 16 Gb Fibre Channel Fabric for Storage-Intensive ...
International Journal of Engineering Research and Development (IJERD)
040419 san forum
Benchmark: Bananas vs Spark Streaming
Waters Grid & HPC Course
Virtual SAN - A Deep Dive into Converged Storage (technical whitepaper)
Acunu Whitepaper v1
HCSA-Presales-Storage V4.0 Training Material (2).pdf
Technical Report NetApp Clustered Data ONTAP 8.2: An Introduction
Ramcloud
Virtual SAN- Deep Dive Into Converged Storage
C8-1 CASE STUDY 8 CARLSON COMPANIES STORAGE SOLUT.docx
C8-1 CASE STUDY 8 CARLSON COMPANIES STORAGE SOLUT.docx
Super-Computer Architecture
MayaData Datastax webinar - Operating Cassandra on Kubernetes with the help ...

More from Panasas (20)

PPTX
Is Your Storage Ready for Commercial HPC? - Three Steps to Take
PDF
PanasasActiveStor
PDF
Panasas ActiveStor Reliability that Improves with Scale
PDF
Evolution of RAID
PDF
ActiveStor Performance at Scale
PDF
Panasas® activestor® and ansys
PDF
Panasas ® California Institute of Technology Success Story
PDF
PANASAS® ACTIVESTOR® AND STAR-CCM+
PDF
Panasas ® Deluxe Australlia
PDF
Panasas ® University of Oxford
PDF
Panasas ® Terraspark Geosciences Customer Success Story
PDF
Panasas ® UCLA Customer Success Story
PDF
Panasas ® The Defence Academy of the United Kingdom
PDF
Panasas® Utah State Univercity
PDF
Accelerate Discovery
PDF
Accelerate Financial Simulation & Analytics
PDF
Panasas® ActiveStor ® 16
PDF
Accelerating Design in Manufacturing Environments
PDF
Panasas ActiveStor 11 and 12: Parallel NAS Appliance for HPC Workloads
PDF
Genomics Center Compares 100s of Computations Simultaneously with Panasas
Is Your Storage Ready for Commercial HPC? - Three Steps to Take
PanasasActiveStor
Panasas ActiveStor Reliability that Improves with Scale
Evolution of RAID
ActiveStor Performance at Scale
Panasas® activestor® and ansys
Panasas ® California Institute of Technology Success Story
PANASAS® ACTIVESTOR® AND STAR-CCM+
Panasas ® Deluxe Australlia
Panasas ® University of Oxford
Panasas ® Terraspark Geosciences Customer Success Story
Panasas ® UCLA Customer Success Story
Panasas ® The Defence Academy of the United Kingdom
Panasas® Utah State Univercity
Accelerate Discovery
Accelerate Financial Simulation & Analytics
Panasas® ActiveStor ® 16
Accelerating Design in Manufacturing Environments
Panasas ActiveStor 11 and 12: Parallel NAS Appliance for HPC Workloads
Genomics Center Compares 100s of Computations Simultaneously with Panasas

Panasas ® Los Alamos National Laboratory

  • 1. Panasas High Performance Storage Powers the First Petaflop Supercomputer at Los Alamos National Laboratory June 2010 Customer Success Story Los Alamos National Laboratory
  • 2. 1.888.PANASAS www.panasas.com Los Alamos National Laboratory Highlights First Petaflop Supercomputer • #1 on the Top-500 list in 2009 • Over 3,250 Compute Nodes • Over 156 I/O Nodes • Over 12,000 Core Processors • Hundreds of Thousands of Cell Processors Panasas High Performance Storage Solutions • 100 Panasas Storage Shelves • 2 Petabytes Capacity • 55 GB/s Throughput • Throughput Scales Linearly with Capacity • Non-Stop Availability & Simple to Deploy Abstract Scientists want faster, more powerful high-performance supercomputers to simulate complex physical, biological, and socioeconomic systems with greater realism and predictive power. In May 2009, Los Alamos scientists doubled the processing speed of the previously fastest computer. Roadrunner, a new hybrid supercomputer, uses specialized Cell coprocessors to propel performance to petaflop speeds capable of more than a thousand trillion calculations per second. One of the keys to the project’s success was the need for a highly reliable storage subsystem that could provide massively parallel I/O throughput with linear scalability that was simple to deploy and maintain. Los Alamos National Laboratory deployed Panasas High Performance storage to meet the stringent needs of the Roadrunner project. Panasas provides scalable performance with commodity parts providing excellent price/performance, scalable capacity and performance that scale symmetrically with processor, caching, and network bandwidth. Introduction From its origins as a secret Manhattan Project laboratory, Los Alamos National Laboratory (LANL) has attracted world-class scientists to solve the nation’s most challenging problems. As one of the U.S. Department of Energy’s multi-program, multi-disciplinary research laboratories, Los Alamos thrives on having the best people doing the best science to solve problems of global importance. The Laboratory’s central focus is to foster science, encourage innovation, and recruit and retain top talent while balancing short-term and long-term needs with research driven by principal investigators and aligned with missions. In 2002, when Los Alamos scientists were planning for their next-generation supercomputer, they looked at the commodity market for a way to make an end run around the speed and memory barriers looming in the future. What they found was a joint project by Sony Computer Entertainment, Toshiba, and IBM to develop a specialized microprocessor that could revolutionize computer games and consumer electronics, as well as scientific computing. The major application areas addressed were radiation transport (how radiation deposits energy in and moves through matter), neutron transport (how neutrons move through matter), molecular dynamics (how matter responds at the molecular level to shock waves and other extreme conditions), fluid turbulence, and the behavior of plasmas (ionized gases) in relation to fusion experiments at the National Ignition Facility at Lawrence Livermore National Laboratory. 2
  • 3. 1.888.PANASAS www.panasas.com Los Alamos National Laboratory Roadrunner Architecture Roadrunner is a cluster of approximately 3,250 compute nodes interconnected by an off-the-shelf parallel- computing network. Each compute node consists of two AMD Opteron dual-core microprocessors, with each of the Opteron cores internally attached to one of four enhanced Cell microprocessors. This enhanced Cell does double-precision arithmetic faster and can access more memory than can the original Cell in a PlayStation 3. The entire machine will have almost 13,000 Cells and half as many dual-core Opterons. Unique I/O Challenges LANL used a breakthrough architecture designed to achieve high performance. The cluster could run a large number of jobs. In order to make the best use of the cluster, there were some I/O challenges and considerations that had to be addressed, including achieving: • Performance required to serve and write data for the cluster to keep it busy • Parallel I/O for optimized performance for each node • Scalability needed to support a large number of cluster nodes • Reliability needed to keep the cluster running • A reasonable cost, both in terms of acquisition and management • A storage architecture that could support future generations of clusters Storage Requirements LANL wanted a shared storage architecture where all their compute clusters on a network could have access to shared Panasas storage. This is in contrast to many other HPC sites that bind a storage cluster tightly to a compute cluster. LANL has been using Panasas for all their high performance storage needs for several years and so naturally, when they deployed RoadRunner, Panasas storage was the logical choice to satisfy their demanding performance, availability and scalability requirements. In order to achieve the performance goals of RoadRunner, the network storage system would have to deliver superior bandwidth, lower latency and be superior in terms of the file creation rate per second and the aggregate throughput. Parallel I/O would be important, as this would enable parallel data streams to go to the 156 I/O nodes, which in turn provide I/O service to the compute nodes. In addition, the storage system would have to be able to scale to support storage capacities of 10 Petabyes (PB) and beyond, plus a growing number of nodes in the cluster. This eliminated the possibility of implementing NFS-based storage systems, as they would not be able to scale past a certain number of nodes. Availability was another key consideration. Because RoadRunner is such a high demand resource, downtime for maintenance, expansion and repair is extremely scarce, so the storage system would need to support automatic provisioning for easy growth LANL designed the cluster architecture to be simple, easy to manage, and cost effective. One aspect of simplicity and cost was to use I/O nodes that interface with network attached storage, lowering cost by reducing the number of GbE connections from a few 3 Source: Los Alamos National Laboratory
  • 4. 1.888.PANASAS www.panasas.com Los Alamos National Laboratory thousand to a few hundred. The storage system used, likewise, would have to be easy to manage, provide a reasonable total cost of ownership, and fit into the established architecture. Last but not least, the storage system architecture needed to have “headroom” to provide I/O to larger future cluster configurations. Instead of being able to support just a single large cluster, the storage architecture needs be able to scale to support multiple clusters from a single, central storage pool. Deployment of Storage from Panasas Panasas ActiveStor™ high performance storage met all the criteria dictated by LANL’s computer cluster architecture. Panasas utilizes a parallel file system and provides GbE or InfiniBand connectivity between some of the cluster nodes and storage. In fact, the Panasas storage system itself is a cluster. It uses an object- based file system that is an integral part of the PanFSTM Storage Operating System. PanFS divides files into large virtual data objects. These objects can be stored on Panasas storage blades or units of storage, enabling dynamic distribution of data activity throughout the storage system. Parallel data paths between compute clusters and the storage blade modules result in high performance data access to large files. The result is that Panasas Scale-out NAS delivers performance that scales almost linearly with capacity. In fact, the current implementation supports more than 2 PB capacity while delivering a massive 55 GB per second throughput. Parallel access is made possible by empowering each of the LANL cluster I/O nodes with a small installable file system from Panasas—the DirectFlow® client access software. This enables direct communication and data transfer between the I/O nodes and the storage blades. A simple three-step process is required to initiate direct data transfers: 1. Requests for I/O are made to a Panasas director blade, which controls access to data. 2. The director blade authenticates the requests, obtains the object maps of all applicable objects across the storage blade and sends the maps to the I/O nodes. 3. With authentication and virtual maps, I/O nodes access data on storage blade modules directly and in parallel. This concurrency eliminates the bottleneck of traditional, monolithic storage systems, which manage data in small blocks, and delivers record-setting data throughput. The number of data streams is limited only by the number of storage blades and the number of I/O nodes in the server cluster. Performance is also a key factor in evaluating the cost effectiveness of storage for large, expensive clusters. It is important to keep a powerful cluster busy doing computations and processing jobs rather than waiting for I/O operations to complete. If a cluster costs $3.5M and is amortized over 3 years, the cost is approximately $3200 per day. As such, it makes sense to keep the cluster utilized and completing jobs as fast as possible. In order to do this, outages have to be minimized and the cluster must be kept up and running. Therefore, system availability is another key factor. 4 ActiveStor SystemClients Linux Unix Windows NFS DirectFlow data DirectFlow metadata DirectFlow data CIFS PANASAS Multi-Protocol Access Director Blade(s) (in-band) Storage Blades Director Blade(s) (in-band) Director Blade(s) (out-of-band)
  • 5. 070120111030 Los Alamos National Laboratory | Phone: 1.888.PANASAS | www.panasas.com © 2011 Panasas Incorporated. All rights reserved. Panasas is a trademark of Panasas, Inc. in the United States and other countries. The Panasas ActiveStor system provided the availability that LANL was looking for. In terms of simplicity of administration, the Panasas architecture allows management of all data within a single seamless namespace. There is no NFS root as NFS is replaced by a global file system that is scaleable. Data objects can be dynamically rebalanced across storage blades for continual ongoing performance optimization. Furthermore, the object-based architecture enables faster data reconstruction in the event of a drive failure because storage blade modules have the intelligence to reconstruct data objects only, not unused sectors on a drive. Finally, the Panasas storage architecture is capable of supporting future generations of more complex cluster configurations, including the scalability to support multiple clusters from one central storage pool. Instead of using one big, expensive GbE switch through one subnet, Panasas storage can be configured across many subnets through smaller, less expensive network switches that connect to the I/O nodes. This improves reliability by providing even more paths to serve data to the compute cluster. Furthermore, by having a centralized pool of high-performance storage, there is no need to copy data for different kinds of jobs. After the computation jobs, visualization tasks can take place with a “compute in place” approach rather than copying the data to another storage system. Summary The Roadrunner project has proven to be a tremendous asset to the Laboratory’s nuclear weapons program simulations as well as for other scientific endeavors like cosmology, antibiotic drug design, HIV vaccine development, astrophysics, ocean or climate modeling, turbulence, and many others. The Panasas architecture is designed specifically to support Linux clusters, scaling performance in concert with capacity. Panasas ActiveStor scale-out NAS is capable of meeting the needs of the world’s leading high performance computing clusters, both now and for future generations of cluster technology. 5 Rather than having a cluster node failure at least once a week, as a comparable system with local disks would experience, the time between node failures was increased to once every 7 weeks. Ron Minnich, Leader of cluster research team, LANL