Computing Outside The Box June 2009

Ian Foster Computation Institute Argonne National Lab & University of Chicago

Abstract The past decade has seen increasingly ambitious and successful methods for outsourcing computing. Approaches such as utility computing, on-demand computing, grid computing, software as a service, and cloud computing all seek to free computer applications from the limiting confines of a single computer. Software that thus runs "outside the box" can be more powerful (think Google, TeraGrid), dynamic (think Animoto, caBIG), and collaborative (think FaceBook, myExperiment). It can also be cheaper, due to economies of scale in hardware and software. The combination of new functionality and new economics inspires new applications, reduces barriers to entry for application providers, and in general disrupts the computing ecosystem. I discuss the new applications that outside-the-box computing enables, in both business and science, and the hardware and software architectures that make these new applications possible.

“ I’ve been doing cloud computing since before it was called grid.”

“ Computation may someday be organized as a public utility … The computing utility could become the basis for a new and important industry.” John McCarthy (1961)

Time Connectivity (on log scale) Science “ When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances” (George Gilder, 2001) Grid

Layered grid architecture (“The Anatomy of the Grid,” 2001) Application Fabric “ Controlling things locally”: Access to, & control of, resources Connectivity “ Talking to things”: communication (Internet protocols) & security Resource “ Sharing single resources”: negotiating access, controlling use Collective “ Managing multiple resources”: ubiquitous infrastructure services User “ Specialized services”: user- or appln-specific distributed services Internet Transport Application Link Internet Protocol Architecture

Application Infrastructure Service oriented infrastructure

Application Service oriented applications Infrastructure Service oriented infrastructure

As of Oct 19 , 2008: 122 participants 105 services 70 data 35 analytical

Microarray clustering using Taverna Query and retrieve microarray data from a caArray data service: cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/CaArrayScrub Normalize microarray data using GenePattern analytical service node255.broad.mit.edu:6060/wsrf/services/cagrid/PreprocessDatasetMAGEService Hierarchical clustering using geWorkbench analytical service: cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/HierarchicalClusteringMage Workflow in/output caGrid services “ Shim” services others Wei Tan

Energy Progress of adoption $$ $$ $$

Time Connectivity (on log scale) Science Enterprise “ When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances” (George Gilder, 2001) Grid Cloud

Animoto EC2 image usage Day 1 Day 8 0 4000

Software Platform Infrastructure Salesforce.com, Google, Animoto, …, …, caBIG, TeraGrid gateways

Software Platform Infrastructure Amazon, GoGrid, Sun, Microsoft, … Salesforce.com, Google, Animoto, …, …, caBIG, TeraGrid gateways

Software Platform Infrastructure Amazon, GoGrid, Sun, Microsoft, … Amazon, Google, Microsoft, … Salesforce.com, Google, Animoto, …, …, caBIG, TeraGrid gateways

Dynamo: Amazon’s highly available key-value store (DeCandia et al., SOSP’07) Simple query model Weak consistency, no isolation Stringent SLAs (e.g., 300ms for 99.9% of requests; peak 500 requests/sec) Incremental scalability Symmetry Decentralization Heterogeneity

Technologies used in Dynamo Problem Technique Advantage Partitioning Consistent hashing Incremental scalability High Availability for writes Vector clocks with reconciliation during reads Version size is decoupled from update rates Handling temporary failures Sloppy quorum and hinted handoff Provides high availability and durability guarantee when some of the replicas are not available Recovering from permanent failures Anti-entropy using Merkle trees Synchronizes divergent replicas in the background Membership and failure detection Gossip-based membership protocol and failure detection. Preserves symmetry and avoids having a centralized registry for storing membership and node liveness information

The Globus-based LIGO data grid Birmingham • Replicating >1 Terabyte/day to 8 sites >100 million replicas so far MTBF = 1 month LIGO Gravitational Wave Observatory Cardiff AEI/Golm

Pull “missing” files to a storage system Data replication service List of required Files GridFTP Local Replica Catalog Replica Location Index Data Replication Service Reliable File Transfer Service Local Replica Catalog GridFTP “ Design and Implementation of a Data Replication Service Based on the Lightweight Data Replicator System,” Chervenak et al., 2005 Replica Location Index Data Movement Data Location Data Replication

Specializing further … User D S1 S2 S3 Service Provider “ Provide access to data D at S1, S2, S3 with performance P” Resource Provider “ Provide storage with performance P1, network with P2, …” D S1 S2 S3 Replica catalog, User-level multicast, … D S1 S2 S3

Using IaaS in biomedical informatics My servers Chicago Chicago handle.net BIRN Chicago IaaS provider Chicago BIRN Chicago

Clouds and supercomputers: Conventional wisdom? Too slow Too expensive Clouds/ clusters Super computers Loosely coupled applications Tightly coupled applications ✔ ✔

Ed Walker, Benchmarking Amazon EC2 for high-performance scientific computing, ;Login, October 2008.

D. Nurmi, J. Brevik, R. Wolski: QBETS: queue bounds estimation from time series. SIGMETRICS 2007: 379-380

Clouds and supercomputers: Conventional wisdom? Good for rapid response Too expensive Clouds/ clusters Super computers Loosely coupled applications Tightly coupled applications ✔ ✔

Loosely coupled problems Ensemble runs to quantify climate model uncertainty Identify potential drug targets by screening a database of ligand structures against target proteins Study economic model sensitivity to parameters Analyze turbulence dataset from many perspectives Perform numerical optimization to determine optimal resource assignment in energy problems Mine collection of data from advanced light sources Construct databases of computed properties of chemical compounds Analyze data from the Large Hadron Collider Analyze log data from 100,000-node parallel computations

Many many tasks: Identifying potential drug targets 2M+ ligands Protein x target(s) (Mike Kubal, Benoit Roux, and others)

start report DOCK6 Receptor (1 per protein: defines pocket to bind to) ZINC 3-D structures ligands complexes NAB script parameters (defines flexible residues, #MDsteps) Amber Score: 1. AmberizeLigand 3. AmberizeComplex 5. RunNABScript end BuildNABScript NAB Script NAB Script Template Amber prep: 2. AmberizeReceptor 4. perl: gen nabscript FRED Receptor (1 per protein: defines pocket to bind to) Manually prep DOCK6 rec file Manually prep FRED rec file 1 protein (1MB) PDB protein descriptions For 1 target: 4 million tasks 500,000 cpu-hrs (50 cpu-years) 6 GB 2M structures (6 GB) DOCK6 FRED ~4M x 60s x 1 cpu ~60K cpu-hrs Amber ~10K x 20m x 1 cpu ~3K cpu-hrs Select best ~500 ~500 x 10hr x 100 cpu ~500K cpu-hrs GCMC Select best ~5K Select best ~5K

DOCK on BG/P: ~1M tasks on 118,000 CPUs CPU cores: 118784 Tasks: 934803 Elapsed time: 7257 sec Compute time: 21.43 CPU years Average task time: 667 sec Relative Efficiency: 99.7% (from 16 to 32 racks) Utilization: Sustained: 99.6% Overall: 78.3% GPFS 1 script (~5KB) 2 file read (~10KB) 1 file write (~10KB) RAM (cached from GPFS on first task per node) 1 binary (~7MB) Static input data (~45MB) Ioan Raicu Zhao Zhang Mike Wilde Time (secs)

Managing 160,000 cores Slower shared storage High-speed local “disk” Falkon

Scaling Posix to petascale … . . . Large dataset CN-striped intermediate file system  Torus and tree interconnects  Global file system Chirp (multicast) MosaStore (striping) Staging Intermediate Local LFS Compute node (local datasets) LFS Compute node (local datasets)

Efficiency for 4 second tasks and varying data size (1KB to 1MB) for CIO and GPFS up to 32K processors

“ Sine” workload, 2M tasks, 10MB:10ms ratio, 100 nodes, GCC policy, 50GB caches/node Ioan Raicu

Same scenario, but with dynamic resource provisioning

Data diffusion sine-wave workload: Summary GPFS  5.70 hrs, ~8Gb/s, 1138 CPU hrs DD+SRP  1.80 hrs, ~25Gb/s, 361 CPU hrs DD+DRP  1.86 hrs, ~24Gb/s, 253 CPU hrs

Clouds and supercomputers: Conventional wisdom? Good for rapid response Excellent Clouds/ clusters Super computers Loosely coupled applications Tightly coupled applications ✔ ✔

“ The computer revolution hasn’t happened yet.” Alan Kay, 1997

Time Connectivity (on log scale) Science Enterprise Consumer “ When the network is as fast as the computer's internal links, the machine disintegrates across the net into a set of special purpose appliances” (George Gilder, 2001) Grid Cloud ????

Energy Internet The Shape of Grids to Come?

Thank you! Computation Institute www.ci.uchicago.edu

Computing Outside The Box June 2009

More Related Content

What's hot(20)

Similar to Computing Outside The Box June 2009(20)

More from Ian Foster(20)

Recently uploaded(20)

Computing Outside The Box June 2009